Skip to content

Commit 5c26f07

Browse files
committed
Add hallucination detection and refactor bib editing
1 parent edd6c49 commit 5c26f07

21 files changed

Lines changed: 900 additions & 597 deletions

.claude-plugin/marketplace.json

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -13,7 +13,7 @@
1313
"name": "bibtools",
1414
"source": "./",
1515
"description": "A bibliography toolkit for LaTeX",
16-
"version": "1.3.0",
16+
"version": "1.4.0",
1717
"keywords": ["bibtex", "bibliography", "latex", "overleaf", "academic", "reference", "citation"],
1818
"category": "academic",
1919
"license": "MIT"

.claude-plugin/plugin.json

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
{
22
"name": "bibtools",
33
"description": "A bibliography toolkit for LaTeX",
4-
"version": "1.3.0",
4+
"version": "1.4.0",
55
"author": {
66
"name": "Yunguan Fu"
77
},

.github/workflows/lint.yml

Lines changed: 11 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
name: Lint
1+
name: CI
22

33
on:
44
push:
@@ -14,3 +14,13 @@ jobs:
1414
with:
1515
python-version: "3.12"
1616
- uses: pre-commit/action@v3.0.1
17+
18+
test:
19+
runs-on: ubuntu-latest
20+
steps:
21+
- uses: actions/checkout@v4
22+
- uses: actions/setup-python@v5
23+
with:
24+
python-version: "3.12"
25+
- uses: astral-sh/setup-uv@v5
26+
- run: uv run pytest tests/ -v

CLAUDE.md

Lines changed: 2 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -16,9 +16,8 @@ bibtools/
1616
│ ├── compare.py ← field-level comparison
1717
│ ├── crossref.py ← CrossRef API client
1818
│ ├── duplicates.py ← duplicate detector
19-
│ └── fmt.py ← output format validator
19+
│ └── edit.py ← programmatic .bib editor
2020
├── tests/
21-
│ ├── conftest.py ← pytest path setup
2221
│ ├── test_version.py ← version sync check
2322
│ ├── run_bibtidy_tests.sh ← end-to-end test runner
2423
│ └── bibtidy/
@@ -29,7 +28,7 @@ bibtools/
2928
│ ├── test_compare.py ← unit tests for compare.py
3029
│ ├── test_crossref.py ← unit tests for crossref.py
3130
│ ├── test_duplicates.py ← unit tests for duplicates.py
32-
│ ├── test_fmt.py ← unit tests for fmt.py
31+
│ ├── test_edit.py ← unit tests for edit.py
3332
│ └── test_validate.py ← unit tests for validate.py
3433
├── pyproject.toml ← project config and pytest settings
3534
├── CLAUDE.md

README.md

Lines changed: 92 additions & 23 deletions
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22

33
A bibliography toolkit for LaTeX, built as a [Claude Code](https://docs.anthropic.com/en/docs/claude-code) plugin.
44

5-
- **[bibtidy](#bibtidy)** — Cross-check BibTeX entries against Google Scholar, CrossRef, and conference/journal sites. Upgrades arXiv/bioRxiv preprints to published versions (even when the title changed upon publication), corrects metadata (authors, pages, venues), and flags semantic duplicates (e.g. a preprint and its published version cited separately).
5+
**[bibtidy](#bibtidy)** — Cross-check BibTeX entries against Google Scholar, CrossRef, and conference/journal sites. Upgrades arXiv/bioRxiv preprints to published versions (even when the title changed upon publication), corrects metadata (authors, pages, venues), and flags semantic duplicates (e.g. a preprint and its published version cited separately).
66

77
![bibtidy demo](docs/bibtidy_demo.gif)
88

@@ -32,15 +32,74 @@ Reload plugins:
3232
/bibtidy refs.bib
3333
```
3434

35-
bibtidy verifies each entry against [Google Scholar](https://scholar.google.com/) and [CrossRef](https://search.crossref.org/), fixes errors, and upgrades stale preprints to published versions. Every change includes the original entry commented out above so you can compare or revert, plus a `% bibtidy: source` URL for verification. If CrossRef has a match for an entry that bibtidy changes, it also adds `% bibtidy: crossref <URL>` so you can see exactly which CrossRef record was available. We recommend using git to track changes. If using [Overleaf](https://www.overleaf.com/), this can be done with [git sync](https://docs.overleaf.com/integrations-and-add-ons/git-integration-and-github-synchronization). To remove bibtidy comments after review, ask Claude: "remove all bibtidy comments from refs.bib".
35+
bibtidy verifies each entry against [Google Scholar](https://scholar.google.com/) and [CrossRef](https://search.crossref.org/), fixes errors, and upgrades stale preprints to published versions. Every change includes the original entry commented out above so you can compare or revert, plus a `% bibtidy:` URL for verification. We recommend using git to track changes. If using [Overleaf](https://www.overleaf.com/), this can be done with [git sync](https://docs.overleaf.com/integrations-and-add-ons/git-integration-and-github-synchronization). To remove bibtidy comments after review, ask Claude: "remove all bibtidy comments from refs.bib".
3636

3737
Note that bibtidy assumes standard brace-style BibTeX like `@article{...}`. Parenthesized forms like `@article(...)` are not supported; convert them to brace style first.
3838

3939

4040
### Examples
4141

4242
<details>
43-
<summary><b>Example 1</b>: Google Scholar adds editors as co-authors (<a href="https://scholar.google.co.uk/scholar?hl=en&as_sdt=0%2C5&q=Estimation+of+non-normalized+statistical+models+by+score+matching&btnG=">source</a>)</summary>
43+
<summary><b>Example 1</b>: Hallucinated reference flagged and commented out (<a href="https://openreview.net/forum?id=75SJoY9gTN">source</a>)</summary>
44+
45+
Before:
46+
```bibtex
47+
@article{wang2021identity,
48+
title={On the identity of the representation learned by pre-trained language models},
49+
author={Wang, Zijie J and Choi, Yuhao and Wei, Dongyeop},
50+
journal={arXiv preprint arXiv:2109.01819},
51+
year={2021}
52+
}
53+
```
54+
55+
After:
56+
```bibtex
57+
% bibtidy: NOT FOUND — no matching paper on CrossRef or web search; verify this reference exists
58+
% @article{wang2021identity,
59+
% title={On the identity of the representation learned by pre-trained language models},
60+
% author={Wang, Zijie J and Choi, Yuhao and Wei, Dongyeop},
61+
% journal={arXiv preprint arXiv:2109.01819},
62+
% year={2021}
63+
% }
64+
```
65+
66+
</details>
67+
68+
<details>
69+
<summary><b>Example 2</b>: Hallucinated metadata corrected (<a href="https://openreview.net/forum?id=HSi4VetQLj">source</a>)</summary>
70+
71+
Before:
72+
```bibtex
73+
@inproceedings{aichberger2025semantically,
74+
title={Semantically Diverse Language Generation},
75+
author={Aichberger, Franz and Chen, Lily and Smith, John},
76+
booktitle={International Conference on Learning Representations},
77+
year={2025}
78+
}
79+
```
80+
81+
After:
82+
```bibtex
83+
% @inproceedings{aichberger2025semantically,
84+
% title={Semantically Diverse Language Generation},
85+
% author={Aichberger, Franz and Chen, Lily and Smith, John},
86+
% booktitle={International Conference on Learning Representations},
87+
% year={2025}
88+
% }
89+
% bibtidy: https://openreview.net/forum?id=HSi4VetQLj
90+
% bibtidy: corrected title and authors
91+
@inproceedings{aichberger2025semantically,
92+
title={Improving Uncertainty Estimation through Semantically Diverse Language Generation},
93+
author={Aichberger, Lukas and Schweighofer, Kajetan and Ielanskyi, Mykyta and Hochreiter, Sepp},
94+
booktitle={International Conference on Learning Representations},
95+
year={2025}
96+
}
97+
```
98+
99+
</details>
100+
101+
<details>
102+
<summary><b>Example 3</b>: Google Scholar adds editors as co-authors (<a href="https://scholar.google.co.uk/scholar?hl=en&as_sdt=0%2C5&q=Estimation+of+non-normalized+statistical+models+by+score+matching&btnG=">source</a>)</summary>
44103

45104
Before:
46105
```bibtex
@@ -64,7 +123,7 @@ After:
64123
% number={4},
65124
% year={2005}
66125
% }
67-
% bibtidy: source https://jmlr.org/papers/v6/hyvarinen05a.html
126+
% bibtidy: https://jmlr.org/papers/v6/hyvarinen05a.html
68127
% bibtidy: removed "Dayan, Peter" — journal editor, not co-author; number 4 → 24
69128
@article{hyvarinen2005estimation,
70129
title={Estimation of non-normalized statistical models by score matching},
@@ -79,7 +138,7 @@ After:
79138
</details>
80139

81140
<details>
82-
<summary><b>Example 2</b>: arXiv preprint upgraded to published version (<a href="https://scholar.google.co.uk/scholar?hl=en&as_sdt=0%2C5&q=Flow+matching+for+generative+modeling&btnG=">source</a>)</summary>
141+
<summary><b>Example 4</b>: arXiv preprint upgraded to published version (<a href="https://scholar.google.co.uk/scholar?hl=en&as_sdt=0%2C5&q=Flow+matching+for+generative+modeling&btnG=">source</a>)</summary>
83142

84143
Before:
85144
```bibtex
@@ -99,7 +158,7 @@ After:
99158
% journal={arXiv preprint arXiv:2210.02747},
100159
% year={2022}
101160
% }
102-
% bibtidy: source https://openreview.net/forum?id=PqvMRDCJT9t
161+
% bibtidy: https://openreview.net/forum?id=PqvMRDCJT9t
103162
% bibtidy: published at ICLR 2023 (was arXiv preprint)
104163
@inproceedings{lipman2022flow,
105164
title={Flow matching for generative modeling},
@@ -112,7 +171,7 @@ After:
112171
</details>
113172

114173
<details>
115-
<summary><b>Example 3</b>: arXiv preprint upgraded to published version with title change</summary>
174+
<summary><b>Example 5</b>: arXiv preprint upgraded to published version with title change</summary>
116175

117176
Before:
118177
```bibtex
@@ -132,8 +191,7 @@ After:
132191
% journal={arXiv preprint arXiv:2211.03364},
133192
% year={2022}
134193
% }
135-
% bibtidy: source https://doi.org/10.1038/s41598-023-34341-2
136-
% bibtidy: crossref https://doi.org/10.1038/s41598-023-34341-2
194+
% bibtidy: https://doi.org/10.1038/s41598-023-34341-2
137195
% bibtidy: updated from arXiv to published version (Scientific Reports 2023), title updated
138196
@article{khader2022medical,
139197
title={Denoising Diffusion Probabilistic Models for 3D Medical Image Generation},
@@ -147,7 +205,7 @@ After:
147205
</details>
148206

149207
<details>
150-
<summary><b>Example 4</b>: Wrong page numbers corrected via CrossRef (<a href="https://scholar.google.co.uk/scholar?hl=en&as_sdt=0%2C5&q=Segmenter%3A+Transformer+for+semantic+segmentation&btnG=">source</a>)</summary>
208+
<summary><b>Example 6</b>: Wrong page numbers corrected via CrossRef (<a href="https://scholar.google.co.uk/scholar?hl=en&as_sdt=0%2C5&q=Segmenter%3A+Transformer+for+semantic+segmentation&btnG=">source</a>)</summary>
151209

152210
Before:
153211
```bibtex
@@ -169,8 +227,7 @@ After:
169227
% pages={7262--7272},
170228
% year={2021}
171229
% }
172-
% bibtidy: source https://doi.org/10.1109/iccv48922.2021.00717
173-
% bibtidy: crossref https://doi.org/10.1109/iccv48922.2021.00717
230+
% bibtidy: https://doi.org/10.1109/iccv48922.2021.00717
174231
% bibtidy: corrected page range 7262--7272 → 7242--7252
175232
@inproceedings{strudel2021segmenter,
176233
title={Segmenter: Transformer for semantic segmentation},
@@ -184,7 +241,7 @@ After:
184241
</details>
185242

186243
<details>
187-
<summary><b>Example 5</b>: bioRxiv preprint duplicated with published version</summary>
244+
<summary><b>Example 7</b>: bioRxiv preprint duplicated with published version</summary>
188245

189246
Before:
190247
```bibtex
@@ -235,25 +292,37 @@ After:
235292

236293
## FAQ
237294

238-
**How can I trust the output?**
239-
240-
You shouldn't — and that's by design. The point of bibtidy is to surface potential hallucinations and errors in your bibliography. For every changed entry, bibtidy includes a `% bibtidy: source` URL so you can verify the correction yourself. Entries marked unchanged are very likely correct, but not guaranteed. Always check the provided links before accepting changes.
295+
### General
241296

242-
**Why does bibtidy flag so many page number errors?**
297+
**Do I need Claude Code?**
243298

244-
Google Scholar extracts metadata by scraping PDFs rather than querying publisher databases, so page numbers are frequently incorrect. Even official sources can disagree — for example, the same CVPR 2020 paper "Momentum Contrast for Unsupervised Visual Representation Learning" has pages 9729--9738 on [CVF Open Access](https://openaccess.thecvf.com/content_CVPR_2020/html/He_Momentum_Contrast_for_Unsupervised_Visual_Representation_Learning_CVPR_2020_paper.html) but pages 9726--9735 on [IEEE Xplore](https://ieeexplore.ieee.org/document/9157636), because IEEE re-paginates when compiling the full proceedings volume. bibtidy uses CrossRef as the authoritative source for page numbers. CrossRef gets metadata directly from publishers via DOI registration, so for IEEE/CVF conferences it returns the IEEE Xplore pagination (9726--9735 in the example above). When sources conflict, bibtidy applies the DOI-linked version and flags the entry with `% bibtidy: REVIEW` so you can verify.
299+
Yes. bibtools is currently a Claude Code plugin only. If there's demand to support other platforms (e.g. Codex), we'll consider adding it.
245300

246301
**Why a Claude Code plugin instead of a Python package?**
247302

248-
The core challenge is reliable access to bibliographic data:
303+
Building on Claude Code keeps the codebase small, the plugin reuses existing search and editing capabilities rather than reimplementing HTTP clients, parsers, and retry logic.
249304

250-
- **bibtidy** needs to search Google Scholar, CrossRef, and conference/journal sites. Google Scholar has no official API and bans scrapers; Semantic Scholar's public API (1,000 req/s) is shared globally so availability is unpredictable. Claude Code's built-in web search sidesteps both problems no API keys, no shared rate limits. Citation metadata (title, authors, venue, year) is almost never behind a paywall, so Claude can simply visit the publisher page and read the correct information.
305+
bibtidy needs to search Google Scholar, CrossRef, and conference/journal sites. Google Scholar has no official API and bans scrapers; Semantic Scholar's public API (1,000 req/s) is shared globally so availability is unpredictable. Claude Code's built-in web search sidesteps both problems, no API keys, no shared rate limits. Citation metadata (title, authors, venue, year) is almost never behind a paywall, so Claude can simply visit the publisher page and read the correct information.
251306

252-
Building on Claude Code also keeps the codebase small — the plugin reuses existing search and editing capabilities rather than reimplementing HTTP clients, parsers, and retry logic.
307+
### bibtidy
253308

254-
**Do I need Claude Code?**
309+
**How can I trust bibtidy's output?**
310+
311+
You shouldn't, and that's by design. The point of bibtidy is to surface potential hallucinations and errors in your bibliography. For every changed entry, bibtidy includes a `% bibtidy:` URL so you can verify the correction yourself. Entries marked unchanged are very likely correct, but not guaranteed. Always check the provided links before accepting changes.
312+
313+
**How does bibtidy compare to other tools?**
314+
315+
[CiteAudit](https://arxiv.org/abs/2602.23452) verifies bibliographic metadata but is a closed system. bibtidy is fully open-source, transparent (every change includes the original entry commented out and a source URL so you can verify exactly what changed and why), and it fixes issues (wrong authors, stale preprints, incorrect pages) directly in your .bib file rather than just flagging them.
316+
317+
[refchecker](https://github.com/markrussinovich/refchecker) verifies references against Semantic Scholar, OpenAlex, and CrossRef, and uses LLM-powered web search to flag fabricated references. It reports problems but does not auto-fix them. bibtidy applies corrections in place so you review a diff, not a report. bibtidy also upgrades stale arXiv/bioRxiv preprints to their published versions (even when the title changed on publication), and requires no setup beyond installing the plugin.
318+
319+
[bibtex-tidy](https://github.com/FlamingTempura/bibtex-tidy) reformats and deduplicates .bib files but does not verify metadata against external sources. bibtidy checks correctness, not just formatting.
320+
321+
[arxiv-latex-cleaner](https://github.com/google-research/arxiv-latex-cleaner) is a file cleanup tool for arXiv submissions (removing comments, resizing figures, etc.), it does not verify or correct any bibliographic metadata.
322+
323+
**Why does bibtidy flag so many page number errors?**
255324

256-
Yes. bibtidy is currently a Claude Code plugin only. If there's demand to support other platforms (e.g. Codex), we'll consider adding it.
325+
Google Scholar extracts metadata by scraping PDFs rather than querying publisher databases, so page numbers are frequently incorrect. Even official sources can disagree, for example, the same CVPR 2020 paper "Momentum Contrast for Unsupervised Visual Representation Learning" has pages 9729--9738 on [CVF Open Access](https://openaccess.thecvf.com/content_CVPR_2020/html/He_Momentum_Contrast_for_Unsupervised_Visual_Representation_Learning_CVPR_2020_paper.html) but pages 9726--9735 on [IEEE Xplore](https://ieeexplore.ieee.org/document/9157636), because IEEE re-paginates when compiling the full proceedings volume. bibtidy uses CrossRef as the authoritative source for page numbers. CrossRef gets metadata directly from publishers via DOI registration, so for IEEE/CVF conferences it returns the IEEE Xplore pagination (9726--9735 in the example above). When sources conflict, bibtidy applies the DOI-linked version and flags the entry with `% bibtidy: REVIEW` so you can verify.
257326

258327
## License
259328

0 commit comments

Comments
 (0)