feat(cli): improve evaluation reasoning output using existing fields#6417
Conversation
evankanderson
left a comment
There was a problem hiding this comment.
I don't think you need any of the changes in internal/engine/eval/ and pkg/engine/v1 for this change -- we already wire up output, status, details, remediation_details and alert.details. We don't necessarily surface all of these in the CLI yet, but I'd like to see some more design detail about how explain would differ from the existing information collected (and how it would be used) before making rule engine changes.
@evankanderson Thanks, that’s very helpful. I see your point about the existing fields (output, status, details, remediation_details, alert.details) already carrying structured information from evaluation. My intention with “explain” was to surface evaluation reasoning more explicitly (e.g., why a rule passed/failed, intermediate checks like allow/violations), rather than just the final outcome or message. That said, I agree the current PR doesn’t clearly differentiate this from existing fields, and it likely needs a clearer design before making engine-level changes. I’ll put together a short design proposal outlining:
I can share that here for feedback before continuing implementation. |
|
@evankanderson Following up with a quick design proposal: Goal“Explain” is intended to expose evaluation reasoning, not just final results. This includes:
Current StateWe already have:
These represent results, but not necessarily the reasoning behind them. Proposed DirectionInstead of introducing a new engine-level field, “explain” can be implemented as an extension of the existing evaluation model.
This approach keeps the change incremental:
UsageExplain would be a debugging/visibility feature: Used to:
Happy to refine this based on what aligns best with the current model. |
I'm trying to understand how this part works. The rest mostly makes sense, but how does the The rest of what you're proposing mostly makes sense, except for one detail:
Rule evaluation takes place independently of the CLI ever asking for a report on what happened. In many cases (success and failure) the main user interaction that takes place happens outside of Minder -- for example, #6184 happened entirely in the GitHub UI, and I never looked at the rule status after loading it. (Now, I'd tested it elsewhere before enabling it, but we can't know who's debugging rules and who is just consuming them in the current architecture.) |
|
@evankanderson thanks for calling this out, this helps clarify the gap. You're right that evaluators already compute and expose the core reasoning signals (allow/violations, message, output), and there isn't really additional hidden state to capture at execution time. Given that, I think my earlier framing of “explain mode” as something that changes rule execution isn’t the right direction. Instead, it seems the real gap is:
So rather than adding evaluator-level changes, a better approach would be:
I’ll pivot this PR in that direction and remove the evaluator changes. For example, I can:
Let me know if that direction aligns better. |
There is one tool that we don't currently support (but I've considered adding) -- Rego has a built-in
I think it's a really interesting idea -- unfortunately, it doesn't quite match the model (background execution on event, rather than triggered by users) and primitives (Rego, JQ evaluators) we currently have.
FWIW, I wouldn't describe this as "reasoning data" -- this is basically intermediate state. If you're looking for equivalent / conceptually-similar notions, tools like Rookout and Lightrun offer stopless debugging -- there are also approaches that save enough state to allow "time-travel" or backwards jumps in execution. I'm not sure to what extent our current tools support that, though.
Yes, this is a big gap! And writing documentation seems to be harder than writing code. 😁 Our current docs are somewhat organized, but I think a better section focused on writing and debugging rules would probably be helpful. I generally like the https://diataxis.fr/ model for writing documentation; if you haven't written much documentation before, a brief read through that site may be helpful.
In particular, the CLI currently doesn't make much use of EvalResultService and prefers to use ProfileService with some extra flags. I'd be open to a new top-level
Yes, and documenting patterns for "what goes where". We have a bunch of rules that stuff things into
Yes, I think being able to zoom in on a particular rule + entity and get a full readout of what happened without resorting to
👍
That sounds like a great plan -- and will give you something concrete to poke at. If you find rules that don't seem to behave correctly, feel free to bring them to the |
|
@evankanderson Thanks, this is super helpful I really appreciate the detailed guidance. I’ll update this PR to remove the evaluator/engine changes and pivot it toward:
I’ll start by picking an example rule from minder-rules-and-profiles and building from there. Will push an update shortly. |
0caff63 to
ca0ee91
Compare
|
@evankanderson Thanks for the guidance — I’ve updated the PR to align with your feedback.
This now only builds on existing fields (details, guidance, remediation, alerts, output) without changing evaluation behavior. Let me know if this direction looks good or if you'd like any further adjustments. |
evankanderson
left a comment
There was a problem hiding this comment.
Can you include some screenshots / extracts of how this renders? I'm concerned that information density and formatting may be difficult to pull off in the table view.
|
@evankanderson Good point .. I extracted the actual rendering from the reasoning helper used by the CLI. Here’s the real output produced:
This is exactly what gets rendered inside the “Reasoning” column, so multi-line values appear stacked within that cell. To keep it readable, I:
One small thing I noticed is that the output value is currently wrapped in quotes (coming from the structpb value formatting). I can clean that up in a follow-up if needed. I don’t currently have a local backend running to capture a full end-to-end CLI screenshot, but this output is directly produced by the same helper used in table rendering. Happy to adjust formatting further (e.g., truncation or alternative layouts) if this feels too dense in table view. |
evankanderson
left a comment
There was a problem hiding this comment.
I suspect we want to introduce a new command to get all the detail information about a specific rule evaluation on a selected entity -- getting all the details for every rule evaluation in a great big list can be overwhelming when you either have e.g. 20 ruletypes or 20 repositories you're managing.
|
@evankanderson Thanks I’ve updated the PR based on your feedback.
This keeps the table readable while still surfacing useful context. I agree that a separate command for detailed inspection would be a better place for full output, and I’d be happy to explore that in a follow-up. Let me know if this looks better. |
|
@evankanderson Thanks I’ve incorporated the feedback:
Let me know if this looks good or if you'd like any further tweaks. |
|
@evankanderson hey can u take a look on this again ?? I have improved the pr and also made requested changes and tell me wheather it looks good or need further improvements ?? |
|
@evankanderson Thanks !!! I’ve added golden-output tests for To support this, I introduced test-friendly RPC injection using This behavior only applies when a client is explicitly injected in tests, so normal CLI execution remains unchanged. I used lightweight stubs for the artifact and profile clients to keep the tests self-contained, but I’m happy to switch to generated mocks if that’s preferred. Let me know if you’d like additional commands covered or adjustments to the injection approach. |
- add reasoning column to CLI output - aggregate details, guidance, remediation, alerts, and output - centralize reasoning formatting helpers - improve readability with bullet formatting - remove previous explain-mode implementation
… RPC injection - add golden-output tests for artifact get (table/json/yaml + error) - add fixtures and goldens for CLI output validation - inject RPC clients via context for test isolation - short-circuit gRPC setup when injected clients are present to avoid login/network in tests
… pattern - switch artifact get to use cli.GetCLIClient for both artifact and profile clients - remove custom injection logic and align with upstream CLI patterns - add golden-output tests for artifact get (table/json/yaml + error case) - use gomock-based clients with realistic evaluation data - update goldens to reflect current CLI formatting - regenerate profile/status goldens after rebase to match latest output
53e3705 to
25edf52
Compare
|
@evankanderson
Let me know if anything else needs adjustment. |
|
I also updated the artifact get goldens (the failure was due to JSON formatting differences only — spacing changes, not data changes). Locally everything is passing:
However, CI is still failing on the artifact tests, and I’m not able to reproduce the failure locally. Could you help point out what’s different in CI or if there’s something I might be missing (e.g., environment-specific formatting or test expectations)? Happy to adjust once I understand the discrepancy. |
|
The test is claiming an extra space after the ":" in the JSON: I'll see if I can figure out why that might be. |
|
From protojson's
And they mean it: https://github.com/protocolbuffers/protobuf-go/blob/master/internal/encoding/json/encode.go#L272 Unfortunately, |
|
Alternatively, we can write a nasty note (meaning replacing each JSON case with a comment referencing a bug in the Minder repo that the JSON output is not stable) and not test the JSON output, which seems fairly safe. |
|
I’ve removed the JSON golden tests for All CLI tests are passing locally: Let me know if anything else needs adjustment. |
evankanderson
left a comment
There was a problem hiding this comment.
Just a couple comments -- mostly on cleaning up the json golden files, but I'm also wondering about our current priority of messages, and a few other cleanups.
- removed remaining JSON golden tests and related test cases - switched to fixture-backed mocks for artifact and profile tests - updated reasoning formatting order (Alert → Remediation → Details → Guidance) - improved remediation formatting to include message + URL - removed unused BestDetail helper - added test coverage for updated table rendering
|
@evankanderson Thanks I’ve addressed the feedback.
All CLI tests are passing: Let me know if anything else needs adjustment. |
|
I tried reproducing the race locally with |
|
The EEA test looks like a flake to me. Re-running |

Summary
This PR improves how evaluation results are surfaced in the CLI by making existing evaluation data easier to understand.
Previously, CLI output primarily focused on pass/fail status, while additional context (details, guidance, remediation details, alerts, and structured output) was harder to interpret. This change aggregates and presents that information in a clearer, reasoning-first format.
Key Changes
FormatEvaluationReasoningRuleDisplayNameDesign Rationale
This change builds on the existing evaluation model instead of introducing new engine-level concepts.
CLI Changes
Example improvement:
Before:
Rule: branch_protection_enabled → FAIL
After:
Rule: branch_protection_enabled
Result: FAIL
Reasoning:
- branch protection is disabled
- Guidance: enable branch protection
- Output:
...
This makes it easier to understand why a rule passed or failed without needing to inspect raw JSON/YAML output.
Testing
go test ./cmd/cli/app/profile/... ./cmd/cli/app/artifact ./cmd/cli/app/history
Notes
This PR focuses on CLI improvements only. It does not introduce any new evaluation fields or modify rule execution.
Future improvements could include:
results/evalCLI command for deeper inspectionFixes #6416