Skip to content
ParseFlow
TutorialMarch 25, 20267 min read

How to Extract Data from Invoices with an API

A complete, step-by-step guide to turning invoice PDFs and images into structured JSON data using a REST API. Includes working code examples in cURL, Python, and JavaScript.

Why Extract Invoice Data Programmatically?

Every business deals with invoices. Whether you process 50 or 50,000 per month, manually entering invoice data into spreadsheets or accounting systems is slow, error-prone, and expensive. Industry estimates put the cost of manual invoice processing at $15-25 per document when you account for labor, error correction, and payment delays.

An invoice extraction API solves this by reading the document, identifying key fields (invoice number, vendor, line items, totals), and returning clean JSON that you can feed directly into your database or accounting system. The entire process takes under 2 seconds per document.

What You Will Need

  • An API key from ParseFlow (free tier: 100 docs/month)
  • An invoice file (PDF, JPG, or PNG)
  • A tool to make HTTP requests (cURL, Python, Node.js, or any language with HTTP support)

Step 1: Get Your API Key

Sign up at parseflow.dev/login to create your account. Your API key will appear in the Dashboard. Keys follow the format dm_live_... for production and dm_test_... for sandbox.

Step 2: Upload and Extract with cURL

The simplest way to test the API is with a cURL command:

curl -X POST https://parseflow.dev/api/v1/extract \
  -H "X-API-Key: dm_live_your_api_key" \
  -F "file=@invoice.pdf" \
  -F "document_type=invoice"

The document_type parameter is optional. If omitted, ParseFlow auto-detects the document type. Setting it explicitly can improve accuracy for known document types.

Step 3: Understand the JSON Response

The API returns structured JSON with all extracted fields and metadata:

{
  "id": "ext_7f3a2b1c-4d5e-6f78-9a0b-cdef12345678",
  "status": "completed",
  "documentType": "invoice",
  "confidence": 0.94,
  "data": {
    "invoiceNumber": "INV-2026-0142",
    "invoiceDate": "2026-03-15",
    "dueDate": "2026-04-14",
    "vendor": {
      "name": "Acme Corporation",
      "email": "billing@acme.com"
    },
    "customer": {
      "name": "Widget Inc."
    },
    "lineItems": [
      {
        "description": "API Integration Service",
        "quantity": 1,
        "unitPrice": 2500.00,
        "amount": 2500.00
      },
      {
        "description": "Cloud Hosting (Annual)",
        "quantity": 1,
        "unitPrice": 5000.00,
        "amount": 5000.00
      }
    ],
    "subtotal": 7500.00,
    "taxRate": 8.25,
    "taxAmount": 618.75,
    "total": 8118.75,
    "currency": "USD"
  },
  "processingTimeMs": 342
}

Key fields to note: confidence is a 0-1 score indicating extraction reliability. Above 0.90 is excellent. The processingTimeMs shows how long extraction took.

Step 4: Python Integration

Here is a complete Python example with error handling and confidence-based routing:

import requests

API_KEY = "dm_live_your_api_key"

def extract_invoice(file_path):
    """Extract data from an invoice PDF or image."""
    with open(file_path, "rb") as f:
        response = requests.post(
            "https://parseflow.dev/api/v1/extract",
            headers={"X-API-Key": API_KEY},
            files={"file": f},
            data={"document_type": "invoice"},
            timeout=30
        )

    if response.status_code != 200:
        error = response.json()
        raise Exception(f"API error: {error['code']}")

    result = response.json()
    data = result["data"]

    print(f"Invoice: {data['invoiceNumber']}")
    print(f"Vendor: {data['vendor']['name']}")
    print(f"Total: {data['currency']} {data['total']}")
    print(f"Confidence: {result['confidence']:.0%}")

    return result

# Usage
result = extract_invoice("path/to/invoice.pdf")

Step 5: JavaScript / Node.js Integration

The same workflow in Node.js using axios:

const FormData = require('form-data');
const fs = require('fs');
const axios = require('axios');

async function extractInvoice(filePath) {
  const form = new FormData();
  form.append('file', fs.createReadStream(filePath));
  form.append('document_type', 'invoice');

  const { data } = await axios.post(
    'https://parseflow.dev/api/v1/extract',
    form,
    {
      headers: {
        'X-API-Key': 'dm_live_your_api_key',
        ...form.getHeaders()
      },
      timeout: 30000
    }
  );

  console.log('Invoice:', data.data.invoiceNumber);
  console.log('Total:', data.data.currency, data.data.total);
  console.log('Confidence:', (data.confidence * 100) + '%');

  return data;
}

extractInvoice('invoice.pdf').catch(console.error);

Handling Different Invoice Formats

Not all invoices are created equal. ParseFlow handles several scenarios automatically:

  • Native PDFs with embedded text are processed fastest (under 500ms) with the highest accuracy (95%+). No OCR needed.
  • Scanned PDFs and images require OCR processing, which adds 1-3 seconds but still achieves 85-93% accuracy for standard layouts.
  • Multi-page invoices are handled automatically. Line items spanning multiple pages are merged into a single array.
  • International invoices with different date formats, currencies, and tax terminology (VAT, GST, IVA) are detected and normalized automatically.

Best Practices

  1. Use confidence scores to route extractions. Auto-process above 0.90, flag for review between 0.70-0.90, and queue for manual entry below 0.70.
  2. Validate totals programmatically. Check that line item amounts sum to the subtotal, and subtotal plus tax equals the total.
  3. Process in batches when you have multiple invoices. The batch endpoint accepts up to 50 files in a single request on the Pro plan.
  4. Store extraction IDs. You can retrieve results for 90 days using the GET /api/v1/documents/:id endpoint.

Next Steps

Now that you know how to extract data from invoices, here are some ways to go further:

Start extracting invoices today

Get your free API key and process up to 100 invoices per month at no cost.

We use cookies to improve your experience and analyze site traffic. See our Privacy Policy for details.