AI-powered invoice and receipt data extractor. Upload PDFs or images, extract structured data with Ollama, then keep the results in local JSON and CSV files for ongoing use.
Part of the Aibys Document Intelligence series.
- Upload multiple PDF, JPG, PNG, or WEBP files in one batch
- PDF pages are converted to images and extracted page by page
- AI extracts invoice data such as vendor, customer, line items, totals, dates, and payment method
- Structured result view with saved extraction history
- Raw JSON output for each record
- Persistent local storage using plain files, no SQL database
- CSV is appended to the same
data/invoices.csvfile across sessions and days - Download saved CSV or JSON database from the UI
- Fully local when using a local Ollama vision model
- Responsive UI with no frontend build step
- Upload one or many invoices or receipts.
- FastAPI validates each file and stores the original upload in
uploads/. - PDF files are rendered into page images with PyMuPDF.
- Each image is sent to Ollama with a structured extraction prompt.
- Extracted records are saved into
data/invoices.json. - Flattened rows are appended into
data/invoices.csv. - The UI renders the latest batch and the saved history.
- Python 3.10+
- Ollama running locally
- A vision-capable model, for example:
ollama pull gemma4:31b-cloudgit clone https://github.com/Arlchoose-code/aibys-invoice-extractor.git
cd aibys-invoice-extractor
pip install -r requirements.txt
uvicorn main:app --reloadOpen:
http://localhost:8000
Set environment variables as needed:
OLLAMA_URL=http://localhost:11434
OLLAMA_MODEL=gemma4:31b-cloudOther possible vision models: llava, moondream, minicpm-v.
The app creates these files and folders automatically:
| Path | Purpose |
|---|---|
uploads/ |
Original uploaded files |
data/invoices.json |
Full structured extraction history |
data/invoices.csv |
Append-only CSV export for spreadsheet use |
This project intentionally uses JSON and CSV files instead of SQL so it stays simple and portable.
| Method | Endpoint | Description |
|---|---|---|
GET |
/ |
Web UI |
GET |
/health |
Ollama connection status |
POST |
/extract |
Extract one or more files using multipart field files |
GET |
/records |
Return saved JSON records |
GET |
/export-csv |
Download saved CSV |
POST |
/export-csv |
Export posted extraction data as CSV |
GET |
/export-json |
Download saved JSON database |
- Backend: FastAPI + Python
- AI: Ollama
- PDF/Image Processing: PyMuPDF
- Storage: JSON + CSV files
- Frontend: Vanilla HTML/CSS/JS
| Category | Fields |
|---|---|
| Document Info | Invoice number, date, due date, payment method |
| Vendor | Name, address, phone, email, website |
| Customer | Name, address, phone, email |
| Line Items | Description, quantity, unit price, total |
| Summary | Subtotal, tax, discount, shipping, total, currency |
| Repo | Description |
|---|---|
| Aibys Invoice Extractor | Extract invoice and receipt data |
| Aibys Legal Analyzer | Highlight risky clauses in contracts |
| Aibys Medical Explainer | Explain medical reports in plain language |
| Aibys Research Summarizer | Summarize academic papers |
Syahril Haryono - github.com/Arlchoose-code
MIT License