You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+92-23Lines changed: 92 additions & 23 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -2,7 +2,7 @@
2
2
3
3
A bibliography toolkit for LaTeX, built as a [Claude Code](https://docs.anthropic.com/en/docs/claude-code) plugin.
4
4
5
-
-**[bibtidy](#bibtidy)** — Cross-check BibTeX entries against Google Scholar, CrossRef, and conference/journal sites. Upgrades arXiv/bioRxiv preprints to published versions (even when the title changed upon publication), corrects metadata (authors, pages, venues), and flags semantic duplicates (e.g. a preprint and its published version cited separately).
5
+
**[bibtidy](#bibtidy)** — Cross-check BibTeX entries against Google Scholar, CrossRef, and conference/journal sites. Upgrades arXiv/bioRxiv preprints to published versions (even when the title changed upon publication), corrects metadata (authors, pages, venues), and flags semantic duplicates (e.g. a preprint and its published version cited separately).
6
6
7
7

8
8
@@ -32,15 +32,74 @@ Reload plugins:
32
32
/bibtidy refs.bib
33
33
```
34
34
35
-
bibtidy verifies each entry against [Google Scholar](https://scholar.google.com/) and [CrossRef](https://search.crossref.org/), fixes errors, and upgrades stale preprints to published versions. Every change includes the original entry commented out above so you can compare or revert, plus a `% bibtidy: source` URL for verification. If CrossRef has a match for an entry that bibtidy changes, it also adds `% bibtidy: crossref <URL>` so you can see exactly which CrossRef record was available. We recommend using git to track changes. If using [Overleaf](https://www.overleaf.com/), this can be done with [git sync](https://docs.overleaf.com/integrations-and-add-ons/git-integration-and-github-synchronization). To remove bibtidy comments after review, ask Claude: "remove all bibtidy comments from refs.bib".
35
+
bibtidy verifies each entry against [Google Scholar](https://scholar.google.com/) and [CrossRef](https://search.crossref.org/), fixes errors, and upgrades stale preprints to published versions. Every change includes the original entry commented out above so you can compare or revert, plus a `% bibtidy:` URL for verification. We recommend using git to track changes. If using [Overleaf](https://www.overleaf.com/), this can be done with [git sync](https://docs.overleaf.com/integrations-and-add-ons/git-integration-and-github-synchronization). To remove bibtidy comments after review, ask Claude: "remove all bibtidy comments from refs.bib".
36
36
37
37
Note that bibtidy assumes standard brace-style BibTeX like `@article{...}`. Parenthesized forms like `@article(...)` are not supported; convert them to brace style first.
38
38
39
39
40
40
### Examples
41
41
42
42
<details>
43
-
<summary><b>Example 1</b>: Google Scholar adds editors as co-authors (<ahref="https://scholar.google.co.uk/scholar?hl=en&as_sdt=0%2C5&q=Estimation+of+non-normalized+statistical+models+by+score+matching&btnG=">source</a>)</summary>
43
+
<summary><b>Example 1</b>: Hallucinated reference flagged and commented out (<ahref="https://openreview.net/forum?id=75SJoY9gTN">source</a>)</summary>
44
+
45
+
Before:
46
+
```bibtex
47
+
@article{wang2021identity,
48
+
title={On the identity of the representation learned by pre-trained language models},
49
+
author={Wang, Zijie J and Choi, Yuhao and Wei, Dongyeop},
50
+
journal={arXiv preprint arXiv:2109.01819},
51
+
year={2021}
52
+
}
53
+
```
54
+
55
+
After:
56
+
```bibtex
57
+
% bibtidy: NOT FOUND — no matching paper on CrossRef or web search; verify this reference exists
58
+
% @article{wang2021identity,
59
+
% title={On the identity of the representation learned by pre-trained language models},
60
+
% author={Wang, Zijie J and Choi, Yuhao and Wei, Dongyeop},
title={Improving Uncertainty Estimation through Semantically Diverse Language Generation},
93
+
author={Aichberger, Lukas and Schweighofer, Kajetan and Ielanskyi, Mykyta and Hochreiter, Sepp},
94
+
booktitle={International Conference on Learning Representations},
95
+
year={2025}
96
+
}
97
+
```
98
+
99
+
</details>
100
+
101
+
<details>
102
+
<summary><b>Example 3</b>: Google Scholar adds editors as co-authors (<ahref="https://scholar.google.co.uk/scholar?hl=en&as_sdt=0%2C5&q=Estimation+of+non-normalized+statistical+models+by+score+matching&btnG=">source</a>)</summary>
% bibtidy: removed "Dayan, Peter" — journal editor, not co-author; number 4 → 24
69
128
@article{hyvarinen2005estimation,
70
129
title={Estimation of non-normalized statistical models by score matching},
@@ -79,7 +138,7 @@ After:
79
138
</details>
80
139
81
140
<details>
82
-
<summary><b>Example 2</b>: arXiv preprint upgraded to published version (<ahref="https://scholar.google.co.uk/scholar?hl=en&as_sdt=0%2C5&q=Flow+matching+for+generative+modeling&btnG=">source</a>)</summary>
141
+
<summary><b>Example 4</b>: arXiv preprint upgraded to published version (<ahref="https://scholar.google.co.uk/scholar?hl=en&as_sdt=0%2C5&q=Flow+matching+for+generative+modeling&btnG=">source</a>)</summary>
% bibtidy: corrected page range 7262--7272 → 7242--7252
175
232
@inproceedings{strudel2021segmenter,
176
233
title={Segmenter: Transformer for semantic segmentation},
@@ -184,7 +241,7 @@ After:
184
241
</details>
185
242
186
243
<details>
187
-
<summary><b>Example 5</b>: bioRxiv preprint duplicated with published version</summary>
244
+
<summary><b>Example 7</b>: bioRxiv preprint duplicated with published version</summary>
188
245
189
246
Before:
190
247
```bibtex
@@ -235,25 +292,37 @@ After:
235
292
236
293
## FAQ
237
294
238
-
**How can I trust the output?**
239
-
240
-
You shouldn't — and that's by design. The point of bibtidy is to surface potential hallucinations and errors in your bibliography. For every changed entry, bibtidy includes a `% bibtidy: source` URL so you can verify the correction yourself. Entries marked unchanged are very likely correct, but not guaranteed. Always check the provided links before accepting changes.
295
+
### General
241
296
242
-
**Why does bibtidy flag so many page number errors?**
297
+
**Do I need Claude Code?**
243
298
244
-
Google Scholar extracts metadata by scraping PDFs rather than querying publisher databases, so page numbers are frequently incorrect. Even official sources can disagree — for example, the same CVPR 2020 paper "Momentum Contrast for Unsupervised Visual Representation Learning" has pages 9729--9738 on [CVF Open Access](https://openaccess.thecvf.com/content_CVPR_2020/html/He_Momentum_Contrast_for_Unsupervised_Visual_Representation_Learning_CVPR_2020_paper.html) but pages 9726--9735 on [IEEE Xplore](https://ieeexplore.ieee.org/document/9157636), because IEEE re-paginates when compiling the full proceedings volume. bibtidy uses CrossRef as the authoritative source for page numbers. CrossRef gets metadata directly from publishers via DOI registration, so for IEEE/CVF conferences it returns the IEEE Xplore pagination (9726--9735 in the example above). When sources conflict, bibtidy applies the DOI-linked version and flags the entry with `% bibtidy: REVIEW` so you can verify.
299
+
Yes. bibtools is currently a Claude Code plugin only. If there's demand to support other platforms (e.g. Codex), we'll consider adding it.
245
300
246
301
**Why a Claude Code plugin instead of a Python package?**
247
302
248
-
The core challenge is reliable access to bibliographic data:
303
+
Building on Claude Code keeps the codebase small, the plugin reuses existing search and editing capabilities rather than reimplementing HTTP clients, parsers, and retry logic.
249
304
250
-
-**bibtidy** needs to search Google Scholar, CrossRef, and conference/journal sites. Google Scholar has no official API and bans scrapers; Semantic Scholar's public API (1,000 req/s) is shared globally so availability is unpredictable. Claude Code's built-in web search sidesteps both problems — no API keys, no shared rate limits. Citation metadata (title, authors, venue, year) is almost never behind a paywall, so Claude can simply visit the publisher page and read the correct information.
305
+
bibtidy needs to search Google Scholar, CrossRef, and conference/journal sites. Google Scholar has no official API and bans scrapers; Semantic Scholar's public API (1,000 req/s) is shared globally so availability is unpredictable. Claude Code's built-in web search sidesteps both problems, no API keys, no shared rate limits. Citation metadata (title, authors, venue, year) is almost never behind a paywall, so Claude can simply visit the publisher page and read the correct information.
251
306
252
-
Building on Claude Code also keeps the codebase small — the plugin reuses existing search and editing capabilities rather than reimplementing HTTP clients, parsers, and retry logic.
307
+
### bibtidy
253
308
254
-
**Do I need Claude Code?**
309
+
**How can I trust bibtidy's output?**
310
+
311
+
You shouldn't, and that's by design. The point of bibtidy is to surface potential hallucinations and errors in your bibliography. For every changed entry, bibtidy includes a `% bibtidy:` URL so you can verify the correction yourself. Entries marked unchanged are very likely correct, but not guaranteed. Always check the provided links before accepting changes.
312
+
313
+
**How does bibtidy compare to other tools?**
314
+
315
+
[CiteAudit](https://arxiv.org/abs/2602.23452) verifies bibliographic metadata but is a closed system. bibtidy is fully open-source, transparent (every change includes the original entry commented out and a source URL so you can verify exactly what changed and why), and it fixes issues (wrong authors, stale preprints, incorrect pages) directly in your .bib file rather than just flagging them.
316
+
317
+
[refchecker](https://github.com/markrussinovich/refchecker) verifies references against Semantic Scholar, OpenAlex, and CrossRef, and uses LLM-powered web search to flag fabricated references. It reports problems but does not auto-fix them. bibtidy applies corrections in place so you review a diff, not a report. bibtidy also upgrades stale arXiv/bioRxiv preprints to their published versions (even when the title changed on publication), and requires no setup beyond installing the plugin.
318
+
319
+
[bibtex-tidy](https://github.com/FlamingTempura/bibtex-tidy) reformats and deduplicates .bib files but does not verify metadata against external sources. bibtidy checks correctness, not just formatting.
320
+
321
+
[arxiv-latex-cleaner](https://github.com/google-research/arxiv-latex-cleaner) is a file cleanup tool for arXiv submissions (removing comments, resizing figures, etc.), it does not verify or correct any bibliographic metadata.
322
+
323
+
**Why does bibtidy flag so many page number errors?**
255
324
256
-
Yes. bibtidy is currently a Claude Code plugin only. If there's demand to support other platforms (e.g. Codex), we'll consider adding it.
325
+
Google Scholar extracts metadata by scraping PDFs rather than querying publisher databases, so page numbers are frequently incorrect. Even official sources can disagree, for example, the same CVPR 2020 paper "Momentum Contrast for Unsupervised Visual Representation Learning" has pages 9729--9738 on [CVF Open Access](https://openaccess.thecvf.com/content_CVPR_2020/html/He_Momentum_Contrast_for_Unsupervised_Visual_Representation_Learning_CVPR_2020_paper.html) but pages 9726--9735 on [IEEE Xplore](https://ieeexplore.ieee.org/document/9157636), because IEEE re-paginates when compiling the full proceedings volume. bibtidy uses CrossRef as the authoritative source for page numbers. CrossRef gets metadata directly from publishers via DOI registration, so for IEEE/CVF conferences it returns the IEEE Xplore pagination (9726--9735 in the example above). When sources conflict, bibtidy applies the DOI-linked version and flags the entry with `% bibtidy: REVIEW` so you can verify.
0 commit comments