🚀 PaperPilot — Research Intelligence for Scientific Papers

PaperPilot is an open-source research intelligence system that automatically extracts datasets, metrics, figures, tables, claims, and reproducibility signals from academic papers.

It combines rule-based NLP pipelines with LLM-assisted extraction, producing structured, explainable, and confidence-labeled outputs — designed for researchers, students, and data scientists who want to understand papers faster and more reliably.

✨ Key Features

📄 PDF Analysis
- Extract sections, claims, figures, tables, and experimental evidence from research papers.
📊 Dataset Discovery
- Identify primary and secondary datasets used in the paper
- Confidence scoring and role labeling (training, evaluation, reference)
📈 Metrics & Results Extraction
- Detect reported metrics (accuracy, AP, F1, etc.)
- Link metrics to experiments and datasets (when possible)
🧪 Reproducibility Signals
- Dataset availability
- Metric definitions
- Baseline comparisons
- Code release detection
- Tabular results presence
🤖 Hybrid Intelligence
- Deterministic rule-based extraction for reliability
- LLM-assisted reasoning for ambiguous or implicit information
📦 Structured Outputs
- Machine-readable JSON outputs
- Export-friendly for downstream tools (Dash, notebooks, pipelines)
🖥️ Interactive UI
- Upload PDFs
- Inspect extracted datasets, figures, claims, and plots
- Human-in-the-loop validation ready

🧠 Why PaperPilot?

Reading research papers is slow and error-prone. Important details like:

Which dataset was actually used?
Which metric matters?
Is this result reproducible?
What evidence supports the claim?

are often scattered across sections, tables, and figures.

PaperPilot turns papers into structured evidence.

🏗️ Architecture Overview

PDF Upload ↓ Document Parsing & Sectioning ↓ Rule-Based NLP Extraction ↓ LLM-Assisted Reasoning ↓ Confidence & Reproducibility Scoring ↓ Structured Outputs + UI Visualization

🛠️ Tech Stack

Python
Streamlit (UI)
Rule-based NLP (regex, heuristics)
LLMs (optional / pluggable)
PDF parsing (PyMuPDF / PDFMiner)
Data visualization (matplotlib / plotly)

📂 Project Structure

paperpilot/ ├── core/ │ ├── parser.py # PDF parsing & section splitting │ ├── pipeline.py # End-to-end extraction pipeline │ ├── datasets.py # Dataset detection & matching │ ├── metrics.py # Metric extraction logic │ ├── figures.py # Figure & table detection │ └── reproducibility.py ├── frontend/ │ └── app.py # Streamlit UI ├── examples/ ├── outputs/ └── README.md

🚀 Getting Started

1. Clone the repository

git clone https://github.com/yourusername/paperpilot.git
cd paperpilot
2. Install dependencies
pip install -r requirements.txt
3. Run the app
streamlit run frontend/app.py

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
backend		backend
data		data
frontend		frontend
paperpilot.egg-info		paperpilot.egg-info
paperpilot		paperpilot
README.md		README.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
test_paper_intelligence.py		test_paper_intelligence.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🚀 PaperPilot — Research Intelligence for Scientific Papers

✨ Key Features

🧠 Why PaperPilot?

🏗️ Architecture Overview

🛠️ Tech Stack

📂 Project Structure

🚀 Getting Started

1. Clone the repository

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

🚀 PaperPilot — Research Intelligence for Scientific Papers

✨ Key Features

🧠 Why PaperPilot?

🏗️ Architecture Overview

🛠️ Tech Stack

📂 Project Structure

🚀 Getting Started

1. Clone the repository

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages