Skip to content
ParseFlow
TutorialJune 22, 20268 min read

PDF Invoice to JSON API: Convert Invoices to Structured Data

How to turn a PDF invoice into clean, structured JSON with one API call — covering field mapping, scanned-document OCR, and production code in cURL, Python, and JavaScript.

Why Convert a PDF Invoice to JSON?

Invoices arrive as PDFs, but software needs structured data. A PDF invoice to JSON API closes that gap: instead of copying numbers by hand or maintaining brittle regex against every vendor template, you POST the file and receive a predictable JSON object you can store, validate, and push into your accounting system. The JSON schema stays the same regardless of how the original invoice was laid out, which is what makes the integration durable.

This guide walks through a complete conversion: getting a key, sending a PDF, reading the JSON, and handling the messy real-world cases like scanned documents and low-confidence results.

What You Need

  • An API key from ParseFlow (free tier: 100 documents/month, no card)
  • A PDF invoice — either a native digital PDF or a scan
  • Any HTTP client: cURL, Python, Node.js, or your language of choice

Step 1: Get an API Key

Sign in at parseflow.dev/login and copy your key from the Dashboard. Keys use the format dm_live_... for production and dm_test_... for the sandbox. The free tier is active immediately, so you can convert your first PDF right away.

Step 2: Send a PDF Invoice with cURL

One request converts the PDF to JSON. Point the file field at your invoice and hint the document type so the parser knows what to expect:

curl -X POST https://parseflow.dev/api/v1/extract \
  -H "X-API-Key: dm_live_your_api_key" \
  -F "file=@invoice.pdf" \
  -F "document_type=invoice"

The document_type hint is optional — leave it off and the API auto-detects the document. Setting it to invoice improves field accuracy when you already know what you are sending.

Step 3: Read the JSON Response

The endpoint returns the converted invoice as a structured JSON object, with a confidence score that tells you how much to trust the result:

{
  "id": "ext_7f3a2b1c-4d5e-6f78-9a0b-cdef12345678",
  "status": "completed",
  "documentType": "invoice",
  "confidence": 0.94,
  "data": {
    "invoiceNumber": "INV-2026-0142",
    "invoiceDate": "2026-06-15",
    "dueDate": "2026-07-15",
    "vendor": { "name": "Acme Corporation", "email": "billing@acme.com" },
    "customer": { "name": "Widget Inc." },
    "lineItems": [
      { "description": "API Integration", "quantity": 1, "unitPrice": 2500.00, "amount": 2500.00 }
    ],
    "subtotal": 2500.00,
    "taxRate": 8.25,
    "taxAmount": 206.25,
    "total": 2706.25,
    "currency": "USD"
  },
  "processingTimeMs": 318
}

Every field maps to a key you can rely on across vendors. The top-level confidence value (0–1) is your routing signal: auto-process high-confidence results and queue anything below your threshold for a quick human check.

Step 4: Convert PDFs to JSON in Python

A complete example that saves the JSON and routes low-confidence results for review:

import json
import requests

API_KEY = "dm_live_your_api_key"

def pdf_invoice_to_json(file_path):
    """Convert a PDF invoice into structured JSON."""
    with open(file_path, "rb") as f:
        response = requests.post(
            "https://parseflow.dev/api/v1/extract",
            headers={"X-API-Key": API_KEY},
            files={"file": f},
            data={"document_type": "invoice"},
            timeout=30,
        )

    if response.status_code != 200:
        error = response.json()
        raise RuntimeError(f"API error: {error['code']}")

    result = response.json()

    if result["confidence"] < 0.70:
        print("Low confidence — flag for manual review")

    with open("invoice.json", "w") as out:
        json.dump(result["data"], out, indent=2)

    return result

pdf_invoice_to_json("invoice.pdf")

Step 5: Convert PDFs to JSON in JavaScript / Node.js

The same conversion in Node.js using axios:

const FormData = require('form-data');
const fs = require('fs');
const axios = require('axios');

async function pdfInvoiceToJson(filePath) {
  const form = new FormData();
  form.append('file', fs.createReadStream(filePath));
  form.append('document_type', 'invoice');

  const { data } = await axios.post(
    'https://parseflow.dev/api/v1/extract',
    form,
    {
      headers: { 'X-API-Key': 'dm_live_your_api_key', ...form.getHeaders() },
      timeout: 30000,
    }
  );

  fs.writeFileSync('invoice.json', JSON.stringify(data.data, null, 2));
  console.log('Saved JSON for invoice', data.data.invoiceNumber);
  return data;
}

pdfInvoiceToJson('invoice.pdf').catch(console.error);

Handling Scanned PDF Invoices

Not every PDF carries embedded text. When you send a scan or a photographed invoice, the same endpoint acts as an invoice OCR API: it recognizes the text first, then extracts fields into the identical JSON shape. Native digital PDFs are the fastest and most accurate at 95% or higher, while clean scans of standard layouts typically reach 85–93%. International invoices using VAT, GST, or IVA terminology and non-US date formats are normalized for you, so the JSON stays consistent.

Validating the JSON Before You Store It

Treat the response as data to verify, not gospel. A few inexpensive checks catch most problems before anything reaches your database:

  • Confirm subtotal + taxAmount equals total within a small tolerance.
  • Require a non-empty invoiceNumber and a parseable invoiceDate.
  • Route any extraction below your confidence threshold to a human queue.

Because the schema is stable, you can write these validators once and reuse them across every vendor’s invoices.

Frequently Asked Questions

How do I convert a PDF invoice to JSON with an API?

POST the PDF to one endpoint with your API key. The API identifies fields like invoice number, vendor, line items, and totals, then returns a JSON object plus a confidence score. Native PDFs are read directly; scans go through OCR first.

What fields does a PDF invoice to JSON API return?

Invoice number, invoice and due dates, vendor and customer details, line items with quantity and unit price, subtotal, tax rate, tax amount, grand total, and currency — plus an overall confidence score between 0 and 1.

Can the API parse scanned PDF invoices?

Yes. Scanned PDFs, JPGs, and PNGs are processed with OCR automatically, reaching 85–93% field accuracy on standard layouts and 95%+ on native digital PDFs.

Is there a free way to test a PDF-to-JSON invoice API?

Yes. The free tier covers 100 documents per month with no credit card, and the playground lets you drop a PDF and see the JSON output without writing code.

Next Steps

Convert your first PDF invoice to JSON

Get a free API key and turn up to 100 invoices per month into structured JSON at no cost.