Why Extract Invoice Data Programmatically?

Every business deals with invoices. Whether you process 50 or 50,000 per month, manually entering invoice data into spreadsheets or accounting systems is slow, error-prone, and expensive. Industry estimates put the cost of manual invoice processing at $15-25 per document when you account for labor, error correction, and payment delays.

An invoice extraction API solves this by reading the document, identifying key fields (invoice number, vendor, line items, totals), and returning clean JSON that you can feed directly into your database or accounting system. The entire process takes under 2 seconds per document.

What You Will Need

An API key from ParseFlow (free tier: 100 docs/month)
An invoice file (PDF, JPG, or PNG)
A tool to make HTTP requests (cURL, Python, Node.js, or any language with HTTP support)

Step 1: Get Your API Key

Sign up at parseflow.dev/login to create your account. Your API key will appear in the Dashboard. Keys follow the format dm_live_... for production and dm_test_... for sandbox.

Step 2: Upload and Extract with cURL

The simplest way to test the API is with a cURL command:

curl -X POST https://parseflow.dev/api/v1/extract \
  -H "X-API-Key: dm_live_your_api_key" \
  -F "file=@invoice.pdf" \
  -F "document_type=invoice"

The document_type parameter is optional. If omitted, ParseFlow auto-detects the document type. Setting it explicitly can improve accuracy for known document types.

Step 3: Understand the JSON Response

The API returns structured JSON with all extracted fields and metadata:

{
  "id": "ext_7f3a2b1c-4d5e-6f78-9a0b-cdef12345678",
  "status": "completed",
  "documentType": "invoice",
  "confidence": 0.94,
  "data": {
    "invoiceNumber": "INV-2026-0142",
    "invoiceDate": "2026-03-15",
    "dueDate": "2026-04-14",
    "vendor": {
      "name": "Acme Corporation",
      "email": "billing@acme.com"
    },
    "customer": {
      "name": "Widget Inc."
    },
    "lineItems": [
      {
        "description": "API Integration Service",
        "quantity": 1,
        "unitPrice": 2500.00,
        "amount": 2500.00
      },
      {
        "description": "Cloud Hosting (Annual)",
        "quantity": 1,
        "unitPrice": 5000.00,
        "amount": 5000.00
      }
    ],
    "subtotal": 7500.00,
    "taxRate": 8.25,
    "taxAmount": 618.75,
    "total": 8118.75,
    "currency": "USD"
  },
  "processingTimeMs": 342
}

Key fields to note: confidence is a 0-1 score indicating extraction reliability. Above 0.90 is excellent. The processingTimeMs shows how long extraction took.

Step 4: Python Integration

Here is a complete Python example with error handling and confidence-based routing:

import requests

API_KEY = "dm_live_your_api_key"

def extract_invoice(file_path):
    """Extract data from an invoice PDF or image."""
    with open(file_path, "rb") as f:
        response = requests.post(
            "https://parseflow.dev/api/v1/extract",
            headers={"X-API-Key": API_KEY},
            files={"file": f},
            data={"document_type": "invoice"},
            timeout=30
        )

    if response.status_code != 200:
        error = response.json()
        raise Exception(f"API error: {error['code']}")

    result = response.json()
    data = result["data"]

    print(f"Invoice: {data['invoiceNumber']}")
    print(f"Vendor: {data['vendor']['name']}")
    print(f"Total: {data['currency']} {data['total']}")
    print(f"Confidence: {result['confidence']:.0%}")

    return result

# Usage
result = extract_invoice("path/to/invoice.pdf")

Step 5: JavaScript / Node.js Integration

The same workflow in Node.js using axios:

const FormData = require('form-data');
const fs = require('fs');
const axios = require('axios');

async function extractInvoice(filePath) {
  const form = new FormData();
  form.append('file', fs.createReadStream(filePath));
  form.append('document_type', 'invoice');

  const { data } = await axios.post(
    'https://parseflow.dev/api/v1/extract',
    form,
    {
      headers: {
        'X-API-Key': 'dm_live_your_api_key',
        ...form.getHeaders()
      },
      timeout: 30000
    }
  );

  console.log('Invoice:', data.data.invoiceNumber);
  console.log('Total:', data.data.currency, data.data.total);
  console.log('Confidence:', (data.confidence * 100) + '%');

  return data;
}

extractInvoice('invoice.pdf').catch(console.error);

Handling Different Invoice Formats

Not all invoices are created equal. ParseFlow handles several scenarios automatically:

Native PDFs with embedded text are processed fastest (under 500ms) with the highest accuracy (95%+). No OCR needed.
Scanned PDFs and images require OCR processing, which adds 1-3 seconds but still achieves 85-93% accuracy for standard layouts.
Multi-page invoices are handled automatically. Line items spanning multiple pages are merged into a single array.
International invoices with different date formats, currencies, and tax terminology (VAT, GST, IVA) are detected and normalized automatically.

Best Practices

Use confidence scores to route extractions. Auto-process above 0.90, flag for review between 0.70-0.90, and queue for manual entry below 0.70.
Validate totals programmatically. Check that line item amounts sum to the subtotal, and subtotal plus tax equals the total.
Process in batches when you have multiple invoices. The batch endpoint accepts up to 50 files in a single request on the Pro plan.
Store extraction IDs. You can retrieve results for 90 days using the GET /api/v1/documents/:id endpoint.

Next Steps

Now that you know how to extract data from invoices, here are some ways to go further:

Try the interactive playground to test with your own invoices — no code required.
Read the full API documentation for advanced features like webhooks and custom extraction templates.
Explore supported formats to see all document types and file formats we handle.
Check out our guide on automating invoice processing at scale for production deployment best practices.
Just getting started? See how to use a free invoice parsing API before you upgrade.
Need only the file conversion? Follow our PDF invoice to JSON API walkthrough.
Processing store and restaurant receipts too? The same workflow applies to receipt parsing — merchant, items, totals, and payment method returned as JSON.

Why Extract Invoice Data Programmatically?

What You Will Need

An API key from ParseFlow (free tier: 100 docs/month)
An invoice file (PDF, JPG, or PNG)
A tool to make HTTP requests (cURL, Python, Node.js, or any language with HTTP support)

Step 1: Get Your API Key

Sign up at parseflow.dev/login to create your account. Your API key will appear in the Dashboard. Keys follow the format dm_live_... for production and dm_test_... for sandbox.

Step 2: Upload and Extract with cURL

The simplest way to test the API is with a cURL command:

curl -X POST https://parseflow.dev/api/v1/extract \
  -H "X-API-Key: dm_live_your_api_key" \
  -F "file=@invoice.pdf" \
  -F "document_type=invoice"

The document_type parameter is optional. If omitted, ParseFlow auto-detects the document type. Setting it explicitly can improve accuracy for known document types.

Step 3: Understand the JSON Response

The API returns structured JSON with all extracted fields and metadata:

{
  "id": "ext_7f3a2b1c-4d5e-6f78-9a0b-cdef12345678",
  "status": "completed",
  "documentType": "invoice",
  "confidence": 0.94,
  "data": {
    "invoiceNumber": "INV-2026-0142",
    "invoiceDate": "2026-03-15",
    "dueDate": "2026-04-14",
    "vendor": {
      "name": "Acme Corporation",
      "email": "billing@acme.com"
    },
    "customer": {
      "name": "Widget Inc."
    },
    "lineItems": [
      {
        "description": "API Integration Service",
        "quantity": 1,
        "unitPrice": 2500.00,
        "amount": 2500.00
      },
      {
        "description": "Cloud Hosting (Annual)",
        "quantity": 1,
        "unitPrice": 5000.00,
        "amount": 5000.00
      }
    ],
    "subtotal": 7500.00,
    "taxRate": 8.25,
    "taxAmount": 618.75,
    "total": 8118.75,
    "currency": "USD"
  },
  "processingTimeMs": 342
}

Key fields to note: confidence is a 0-1 score indicating extraction reliability. Above 0.90 is excellent. The processingTimeMs shows how long extraction took.

Step 4: Python Integration

Here is a complete Python example with error handling and confidence-based routing:

import requests

API_KEY = "dm_live_your_api_key"

def extract_invoice(file_path):
    """Extract data from an invoice PDF or image."""
    with open(file_path, "rb") as f:
        response = requests.post(
            "https://parseflow.dev/api/v1/extract",
            headers={"X-API-Key": API_KEY},
            files={"file": f},
            data={"document_type": "invoice"},
            timeout=30
        )

    if response.status_code != 200:
        error = response.json()
        raise Exception(f"API error: {error['code']}")

    result = response.json()
    data = result["data"]

    print(f"Invoice: {data['invoiceNumber']}")
    print(f"Vendor: {data['vendor']['name']}")
    print(f"Total: {data['currency']} {data['total']}")
    print(f"Confidence: {result['confidence']:.0%}")

    return result

# Usage
result = extract_invoice("path/to/invoice.pdf")

Step 5: JavaScript / Node.js Integration

The same workflow in Node.js using axios:

const FormData = require('form-data');
const fs = require('fs');
const axios = require('axios');

async function extractInvoice(filePath) {
  const form = new FormData();
  form.append('file', fs.createReadStream(filePath));
  form.append('document_type', 'invoice');

  const { data } = await axios.post(
    'https://parseflow.dev/api/v1/extract',
    form,
    {
      headers: {
        'X-API-Key': 'dm_live_your_api_key',
        ...form.getHeaders()
      },
      timeout: 30000
    }
  );

  console.log('Invoice:', data.data.invoiceNumber);
  console.log('Total:', data.data.currency, data.data.total);
  console.log('Confidence:', (data.confidence * 100) + '%');

  return data;
}

extractInvoice('invoice.pdf').catch(console.error);

Handling Different Invoice Formats

Not all invoices are created equal. ParseFlow handles several scenarios automatically:

Native PDFs with embedded text are processed fastest (under 500ms) with the highest accuracy (95%+). No OCR needed.
Scanned PDFs and images require OCR processing, which adds 1-3 seconds but still achieves 85-93% accuracy for standard layouts.
Multi-page invoices are handled automatically. Line items spanning multiple pages are merged into a single array.
International invoices with different date formats, currencies, and tax terminology (VAT, GST, IVA) are detected and normalized automatically.

Best Practices

Use confidence scores to route extractions. Auto-process above 0.90, flag for review between 0.70-0.90, and queue for manual entry below 0.70.
Validate totals programmatically. Check that line item amounts sum to the subtotal, and subtotal plus tax equals the total.
Process in batches when you have multiple invoices. The batch endpoint accepts up to 50 files in a single request on the Pro plan.
Store extraction IDs. You can retrieve results for 90 days using the GET /api/v1/documents/:id endpoint.

Next Steps

Now that you know how to extract data from invoices, here are some ways to go further:

Try the interactive playground to test with your own invoices — no code required.
Read the full API documentation for advanced features like webhooks and custom extraction templates.
Explore supported formats to see all document types and file formats we handle.
Check out our guide on automating invoice processing at scale for production deployment best practices.
Just getting started? See how to use a free invoice parsing API before you upgrade.
Need only the file conversion? Follow our PDF invoice to JSON API walkthrough.
Processing store and restaurant receipts too? The same workflow applies to receipt parsing — merchant, items, totals, and payment method returned as JSON.

How to Extract Data from Invoices with an API

Why Extract Invoice Data Programmatically?

What You Will Need

Step 1: Get Your API Key

Step 2: Upload and Extract with cURL

Step 3: Understand the JSON Response

Step 4: Python Integration

Step 5: JavaScript / Node.js Integration

Handling Different Invoice Formats

Best Practices

Next Steps

Start extracting invoices today

How to Extract Data from Invoices with an API

Why Extract Invoice Data Programmatically?

What You Will Need

Step 1: Get Your API Key

Step 2: Upload and Extract with cURL

Step 3: Understand the JSON Response

Step 4: Python Integration

Step 5: JavaScript / Node.js Integration

Handling Different Invoice Formats

Best Practices

Next Steps

Start extracting invoices today