Skip to content

Configuration-aware hashing under --useCquery (parity with target-determinator) #359

@tinder-maxwellelliott

Description

@tinder-maxwellelliott

Background

Comparing bazel-diff against bazel-contrib/target-determinator (TD) surfaces one concrete correctness gap in our --useCquery mode: we drop the per-target configuration when hashing, so changes that affect only the configuration (platform swap, --config=..., cfg = \"exec\" transition flips, select() branch selection via --define) can leave hashes unchanged even though the configured graph diverged.

Where we lose the configuration

  • cli/src/main/kotlin/com/bazel_diff/hash/RuleHasher.kt — at the cquery branch, configuredRuleInputList is flattened to bare labels via .map { it.label }; the per-edge configuration checksum carried in the proto is discarded.
  • cli/src/main/kotlin/com/bazel_diff/bazel/BazelQueryService.kt — we do pass --transitions=lite to cquery, but the transition data on the proto isn't consumed downstream.
  • cli/src/main/kotlin/com/bazel_diff/hash/TargetDigest.kt / TargetHash.kt — the digest is keyed by label only, so multiple configurations of the same label collide into one entry.

What TD does

  • Each target identity is (label, Configuration); the configuration's checksum is mixed into the rule hash.
  • Each dep edge's own configuration (from ConfiguredRuleInput, Bazel 7+) is mixed in, so transitions are preserved through the transitive walk.
  • bazel config --output=json --dump_all enumerates all configurations and diffs them across revisions to detect global-config drift.
  • A small fallback handles the case where ConfiguredRuleInput.configurationChecksum is empty (a known Bazel quirk) by inheriting the depending target's configuration.

References in TD (Go): pkg/hash_cache.go::hashRule, pkg/configurations.go.

Why it matters

Today, a CI run that flips --platforms between two revisions (or toggles a --define consumed by a select()) produces an empty impacted set under cquery mode. The configured graph changed; our hashes did not. Users in that situation are advised to fall back to running everything, which defeats the purpose of bazel-diff.

What this would touch (rough scope)

  • TargetDigest/TargetHash schema — key on (label, configurationChecksum) (and emit one entry per pair).
  • RuleHasher — mix configuration.checksum into the rule hash and recurse through (depLabel, depConfigChecksum) instead of depLabel alone.
  • BuildGraphHasher — propagate the per-target configuration through the recursion and the on-disk JSON.
  • BazelQueryService — consume the per-edge configuration from cquery's transition output; add a bazel config enumeration step for the global config-set diff.
  • Hash-JSON wire format and DeserialiseHashesInteractor — backwards-compat plumbing for old JSON without configurations.
  • Tests covering: platform swap, --define-driven select() branch flip, cfg = \"exec\" attribute flip on a rule attribute.

This is a meaningful design change with format implications, so flagging as an issue rather than a PR.

Out of scope for this issue

  • WORKSPACE-mode / non-cquery mode (no configured graph available; nothing to do).
  • Bzlmod handling — we're already ahead of TD here (BazelQueryService.queryBzlmodRepos + BuildGraphHasher.createSeedForBzlFiles).

cc'ing for visibility — happy to discuss the API shape before anyone starts work.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions