feat!: add hash_kind to TreeRefIter and Data#2497
feat!: add hash_kind to TreeRefIter and Data#2497Christoph Rüßler (cruessler) wants to merge 16 commits intoGitoxideLabs:mainfrom
hash_kind to TreeRefIter and Data#2497Conversation
Sebastian Thiel (Byron)
left a comment
There was a problem hiding this comment.
Thanks a lot! I think it's great to see that overall, passing the hash kind through isn't crazy. I also left a few comments which I hope can lead to a small direction change.
gix-object/src/object/mod.rs
Outdated
| impl<'a> ObjectRef<'a> { | ||
| /// Deserialize an object from a loose serialisation | ||
| pub fn from_loose(data: &'a [u8]) -> Result<ObjectRef<'a>, LooseDecodeError> { | ||
| pub fn from_loose(data: &'a [u8], hash_len: usize) -> Result<ObjectRef<'a>, LooseDecodeError> { |
There was a problem hiding this comment.
It's very uncommon to take a hash_len I think, and would expect this to be gix_hash::Kind as this is what it boils down to.
Let's do that instead unless there is a very good reason to use a hash_len.
There was a problem hiding this comment.
That makes way more sense, in particular as TreeRefIter and Data don’t grow in size that way.
(The main reason I initially picked hash_len was that it felt more targeted and precise, but in hindsight I agree that gix_hash::Kind is a much better way to carry that information.)
| #[derive(Debug)] | ||
| pub struct State { | ||
| pub hash_len: usize, | ||
| } | ||
|
|
||
| pub type Stream<'is> = Stateful<&'is [u8], State>; | ||
|
|
||
| pub fn tree<'a, E: ParserError<&'a [u8]>>(stream: &mut Stream<'a>) -> ModalResult<TreeRef<'a>, E> { |
There was a problem hiding this comment.
I'd love it if object_hash: gix_hash::Kind could be a parameter here with everything else unchanged.
Probably this was first attempted and this is what it needs to be for Winnow, but… honestly, maybe the slow version can even be removed in favor of fast_decode. I wonder if anything speaks against that.
There was a problem hiding this comment.
As you suspected, that’s what I tried first. :-) But winnow places certain constraints on what it accepts as a parsing function. Just looking at tree locally, I don’t see why it shouldn’t be possible to rewrite it not to need winnow anymore. I haven’t tried yet, but could, either in this PR or in a follow-up.
gix-object/src/lib.rs
Outdated
| /// TODO: | ||
| /// Document. | ||
| hash_len: usize, |
There was a problem hiding this comment.
To my mind it's fine to use hash_len if it stays private (after all, that's what the iter really wants). One can even then consider keeping Kind instead as the function to get its len is const for sure and thus has no chance of being worse than the whole usize.
There was a problem hiding this comment.
Changed to hash_kind. (Aside: this is called object_hash in other places, I started wondering whether it should also be called that here.)
Using hash_kind instead of hash_len also has the advantage that it can be packed and doesn’t grow the size of either TreeRefIter or Data.
gix-object/src/lib.rs
Outdated
| /// TODO: | ||
| /// Document. | ||
| pub hash_len: usize, |
There was a problem hiding this comment.
Definitely gix_hash::Kind here.
There was a problem hiding this comment.
Changed.
hash_len to TreeRefIter and Datahash_kind to TreeRefIter and Data
|
Sebastian Thiel (@Byron) I made some changes based on your feedback. The PR contains a lot of small commits now, feel free to shuffle them around as you see fit! The fuzzer seems to have found something, though at first sight, I wasn’t sure what to make of that finding. I’ll try figuring out what this is about, but maybe you’re more familiar with it and can quickly trace it back to one of my changes. I added one |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 7383f0210e
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "Codex (@codex) review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "Codex (@codex) address that feedback".
| /// Parse our tree data and return the parse tree for direct access to its entries. | ||
| pub fn decode(&self) -> Result<gix_object::TreeRef<'_>, gix_object::decode::Error> { | ||
| gix_object::TreeRef::from_bytes(&self.data) | ||
| gix_object::TreeRef::from_bytes(&self.data, self.repo.object_hash()) |
There was a problem hiding this comment.
Decode tree bytes with the tree id hash kind
Tree::decode() now passes self.repo.object_hash() into TreeRef::from_bytes, but the rest of this type (e.g. find_entry()/iter()) uses self.id.kind(). This inconsistency can misdecode or reject valid tree data when a Tree is constructed via Tree::from_data() with an id kind that differs from the repository default (an allowed public API path), because tree entry parsing depends on hash length. Using self.id.kind() here keeps behavior consistent and tied to the actual object id being decoded.
Useful? React with 👍 / 👎.
This PR is an attempt at making the hash length in
gix_object::tree::ref_iter::decodedepend on the decoded tree’s hash’s size. Previously, it was hardcoded to 20 which is only correct for SHA1, but not for SHA256.There were quite a few follow-up changes as
TreeRef::from_bytesas well asTreeRefIter::from_bytesnow takehash_kindas an argument. Also, sinceTreeRefcan be instantiated inData::decode,Datanow also has to carryhash_kind. Another option that I didn’t explore would have been to addhash_kindas an argument toData::decode.Update 2026-04-04: I changed the initial addition of
hash_lentohash_kind. Tests are now passing, with the exception of the fuzzer having found an issue.