feat(component): Add deterministic component fingerprints#47
feat(component): Add deterministic component fingerprints#47dmcilvaney wants to merge 9 commits intomicrosoft:mainfrom
Conversation
There was a problem hiding this comment.
Pull request overview
Adds a new internal fingerprinting utility intended to compute deterministic “component fingerprints” from resolved component configuration plus additional build context.
Changes:
- Introduces
internal/fingerprintpackage withComputeIdentityto produce a SHA256-based fingerprint and an inputs breakdown. - Adds a comprehensive test suite covering many config/input variations.
- Exports
OpenProjectRepoin synthetic history code and addshashstructure/v2dependency.
Reviewed changes
Copilot reviewed 5 out of 6 changed files in this pull request and generated 8 comments.
Show a summary per file
| File | Description |
|---|---|
| internal/fingerprint/fingerprint.go | New fingerprint computation logic (config hashing + overlay file hashing + SHA256 combiner). |
| internal/fingerprint/fingerprint_test.go | New tests asserting determinism and expected sensitivity/insensitivity to various inputs. |
| internal/fingerprint/doc.go | Package documentation for the new fingerprint module. |
| internal/app/azldev/core/sources/synthistory.go | Renames openProjectRepo to exported OpenProjectRepo and updates call site. |
| go.mod | Adds github.com/mitchellh/hashstructure/v2 dependency. |
| go.sum | Adds checksums for the new dependency. |
a7d028e to
3fb8df7
Compare
3fb8df7 to
759c099
Compare
0140c52 to
94e19ff
Compare
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
reubeno
left a comment
There was a problem hiding this comment.
Apart from the individual comments, my main question... what's your thoughts on how we can ensure that this algorithm stays in sync with the component and overlay definitions as the latter evolve?
internal/fingerprint/fingerprint.go
Outdated
| OverlayFileHashes map[string]string `json:"overlayFileHashes,omitempty"` | ||
| // AffectsCommitCount is the number of "Affects: <component>" commits in the project repo. | ||
| AffectsCommitCount int `json:"affectsCommitCount"` | ||
| // Distro is the effective distro name. |
There was a problem hiding this comment.
Why does this need to be tracked?
internal/fingerprint/fingerprint.go
Outdated
| func ComputeIdentity( | ||
| fs opctx.FS, | ||
| component projectconfig.ComponentConfig, | ||
| distroRef projectconfig.DistroReference, |
There was a problem hiding this comment.
Is this the distro that the component is being built for or the distro that the component's spec may have come from? Can we document and clarify?
There was a problem hiding this comment.
It should be the effective distro for the component, ie if one is set, pick that, otherwise the global default.
There was a problem hiding this comment.
That said, we can probably drop it to just ReleaseVer, that might be the only bit we really care about.
internal/fingerprint/fingerprint.go
Outdated
| hashes := make(map[string]string) | ||
|
|
||
| for idx, overlay := range overlays { | ||
| if overlay.Source == "" { |
There was a problem hiding this comment.
Since this logic needs to know specific properties of different overlay types -- is there a way for this logic to live closer to the implementation of the overlay types? It otherwise may not occur to someone (AI or human) to keep this in sync.
There was a problem hiding this comment.
Sure, I'll have a look
| // Overlay file hashes in sorted key order for determinism. | ||
| if len(inputs.OverlayFileHashes) > 0 { | ||
| keys := make([]string, 0, len(inputs.OverlayFileHashes)) | ||
| for key := range inputs.OverlayFileHashes { | ||
| keys = append(keys, key) | ||
| } | ||
|
|
||
| sort.Strings(keys) | ||
|
|
||
| for _, key := range keys { | ||
| writeField(hasher, "overlay:"+key, inputs.OverlayFileHashes[key]) | ||
| } | ||
| } |
There was a problem hiding this comment.
combineInputs sorts overlay indices as strings ("10" comes before "2"), so once there are >= 11 overlays the hash-combination order won’t match overlay order. Consider either writing overlay hashes in slice/index order directly, or sorting by integer index to keep ordering intuitive and stable across multi-digit indices.
There was a problem hiding this comment.
I don't think this matters? The fingerprint is an opaque value, so long as its stable and reproducible the exact ordering seems unimportant.
No description provided.