gix-blame: Incremental#2457
Conversation
21e316d to
b65f15b
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 21e316d86c
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "Codex (@codex) review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "Codex (@codex) address that feedback".
52dd978 to
e6b8998
Compare
Christoph Rüßler (cruessler)
left a comment
There was a problem hiding this comment.
Thanks a lot for the effort, that’s much appreciated as making blame incremental was on our TODO list as well! I had a first look at the changes and left 2 comments (though this is far from an in-depth review).
The `git commit-graph write` command also supports writing a separate section on the cache file that contains information about the paths changed between a commit and its first parent. This information can be used to significantly speed up the performance of some traversal operations, such as `git log -- <PATH>` and `git blame`. This commit teaches the git-commitgraph crate in gitoxide how to parse and access this information. We've only implemented support for reading v2 of this cache, because v1 is deprecated in Git as it can return bad results in some corner cases. The implementation is 100% compatible with Git itself; it uses the exact same version of murmur3 that Git is using, including the seed hashes.
Implement a gix_blame::incremental API that yelds the blame entries as they're discovered, similarly to Git's `git blame --incremental`. The implementation simply takes the original gix_blame::file and replaces the Vec of blame entries with a generic BlameSink trait. The original gix_blame::file is now implemented as a wrapper for gix_blame::incremental, by implementing the BlameSink trait on Vec<BlameEntry> and sorting + coalescing the entries before returning.
Use the new changed-path bloom filters from the commit graph to greatly speed up blame our implementation. Whenever we find a rejection on the bloom filter for the current path, we skip it altogether and pass the blame without diffing the trees.
Implement the log_file method in gitoxide-core, which allows performing path-delimited log commands. With the new changed paths bloom filter, it is not possible to perform this operation very efficiently.
Change `process_changes` to take `&[Change]` instead of `Vec<Change>`, eliminating the `changes.clone()` heap allocation at every call site. Replace the O(H×C) restart-from-beginning approach with a cursor that advances through the changes list across hunks. Non-suspect hunks are now skipped immediately. When the rare case of overlapping suspect ranges is detected (from merge blame convergence), the cursor safely resets to maintain correctness.
Compare the performance of the implementation with and without the
commit graph cache.
gix-blame::incremental/without-commit-graph
time: [14.852 s 14.895 s 14.944 s]
change: [+0.2968% +0.7623% +1.2529%] (p = 0.00 < 0.05)
Change within noise threshold.
gix-blame::incremental/with-commit-graph
time: [287.55 ms 290.30 ms 292.85 ms]
change: [−3.1181% −1.6720% −0.4502%] (p = 0.11 > 0.05)
No change in performance detected.
Signed-off-by: Vicent Marti <vmg@strn.cat>
The BlameSink type now returns a std::ops::ControlFlow value that can be used to interrupt the blame early. Signed-off-by: Vicent Marti <vmg@strn.cat>
e6b8998 to
a85c1fe
Compare
|
Thanks for the review Christoph Rüßler (@cruessler)! I did the two changes; interrupting the incremental blame is a little bit involved and makes the code harder to follow so I've kept it in a separate commit. Let me know what else would you like to see changed. |
|
Anything else I could do to move this PR forward, Sebastian Thiel (@Byron)? I think it'd a nice addition upstream and we would very much prefer not to carry a Gitoxide fork. Thank you so much in advance! |
|
Hi, I'd also like to get this merged, and it seems there is more and more code coming up and never enough time to review. These days that's all I am doing 😅. Codex (@codex) review |
There was a problem hiding this comment.
Pull request overview
This PR adds an incremental blame API to gix-blame (akin to git blame --incremental) and introduces commit-graph changed-path Bloom filter support in gix-commitgraph to dramatically speed up blame and path-focused history traversal.
Changes:
- Add
gix_blame::incremental()with aBlameSinkstreaming interface, and reimplementgix_blame::file()as a collecting wrapper. - Add commit-graph changed-path Bloom filter parsing and query APIs (plus compatibility handling for split commit-graphs).
- Extend tests/fixtures and add a Criterion benchmark for incremental blame.
Reviewed changes
Copilot reviewed 22 out of 23 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| gix-commitgraph/tests/fixtures/split_chain_top_without_bloom.sh | Fixture for split graphs where the top layer lacks Bloom chunks |
| gix-commitgraph/tests/fixtures/split_chain_changed_paths_mismatch.sh | Fixture for split graphs with mismatched Bloom settings across layers |
| gix-commitgraph/tests/fixtures/changed_paths_v2.sh | Fixture that writes changed-path Bloom filters v2 |
| gix-commitgraph/tests/access/mod.rs | Tests for Bloom settings reading + disabling when chunks/offsets are invalid |
| gix-commitgraph/src/lib.rs | Adds Bloom settings type, bloom module export, and shared from_be_u32() |
| gix-commitgraph/src/init.rs | Disables incompatible Bloom layers when composing multi-file graphs |
| gix-commitgraph/src/file/mod.rs | Adds chunk IDs/constants for Bloom data/index |
| gix-commitgraph/src/file/init.rs | Parses BDAT/BIDX chunks and enables Bloom querying when valid |
| gix-commitgraph/src/file/commit.rs | Exposes commit-level Bloom query helpers (maybe_contains_*) |
| gix-commitgraph/src/file/access.rs | Adds accessors and a method to clear Bloom filter state |
| gix-commitgraph/src/bloom.rs | Implements Bloom hashing and query logic + graph-level helpers |
| gix-commitgraph/src/access.rs | Exposes top-compatible Bloom settings at the Graph level |
| gix-commitgraph/Cargo.toml | Adds murmur3 dependency for Bloom hashing |
| gix-blame/tests/blame.rs | Tests that Bloom prefilter keeps blame output identical + incremental sink behavior |
| gix-blame/src/types.rs | Adds BlameSink, IncrementalOutcome, and Bloom-related statistics counters |
| gix-blame/src/lib.rs | Exports new incremental API and types |
| gix-blame/src/file/tests.rs | Updates tests for process_changes() signature change |
| gix-blame/src/file/mod.rs | Moves coalesce_blame_entries() here and optimizes process_changes() iteration |
| gix-blame/src/file/function.rs | Implements incremental(), adds Bloom prefilter usage, and keeps file() as wrapper |
| gix-blame/benches/incremental_blame.rs | Adds Criterion benchmark for incremental blame with/without commit-graph |
| gix-blame/Cargo.toml | Adds benchmark target + dev-deps for Criterion and gix |
| gitoxide-core/src/repository/log.rs | Implements path-filtered log using commit-graph Bloom prefilter |
| Cargo.lock | Locks new dependencies (murmur3, Criterion, dev-only gix) |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: a85c1fe07e
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "Codex (@codex) review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "Codex (@codex) address that feedback".
|
Vicent Martí (@vmg) I’m really sorry I haven’t responded earlier, I had a few very busy weeks at work, so was lacking review capacity. Sebastian Thiel (@Byron) Things have gotten quieter this week, so I can prioritize reviewing the blame-related parts this weekend if you want an extra pair of eyes. |
|
No worries! I know you're all busy! I'm gonna go through all the AI comments first (I can see a couple legit ones) and then I'll hand it over so you guys can do human review. Thanks again. |
|
Thanks everyone! Yes, I will be happy to wait for your review Christoph Rüßler (@cruessler), and you can hand it over to me even with things that are still to be addressed, as long as you want to go forward with the changes. Then I will take it towards merging, and will also be looking forward to seeing more contributions and improvements from Vicent Martí (@vmg) 🙏. |
I’ve gone through the changes related to incrementally emitting hunks at a high level, and they look very promising to me! Looking forward to you handing over after you’ve gone through AI’s comments! One thing I was wondering: in |
1e220a3 to
3b7bff2
Compare
|
Just went through the AI review comments, they were all basically the same (good) point: since we're not supporting the deprecated v1 format, we should explicitly ensure that we don't load the bloom filter cache when it's v1. Let me know what else needs changing! |
3b7bff2 to
08a1ca9
Compare
Signed-off-by: Vicent Marti <vmg@strn.cat>
08a1ca9 to
5b94604
Compare
Hiiiiii Sebastian Thiel (@Byron)! Thanks for all your work on the library!
I've been playing around with the new blame APIs that Christoph Rüßler (@cruessler) developed. The existing
gix_blame::filewas not fitting the use case we needed at Cursor, so I took a stab at implementing an equivalent togit blame --incremental. The changes were quite minimal because I just leftgix_blame::fileas a thin wrapper overgix_blame::incremental.I then tried benchmarking the
incrementalAPI against Git itself and the numbers were not good at all. After some review, I noticed that thegix-commitgraphcrate just didn't support the changed-paths bloom filter cache from Git, so I took a stab at implementing those too.The results are very good. These are for
tools/clang/spanify/Spanifier.cppin the Chromium repository, which is a very very hairy file:Since all these changes are quite related, I'm putting them up here in a single PR. Every commit is self contained and explains the changes on the commit message so if you'd like me to split this into smaller PRs just let me know.
Thanks!