feat(cli): improve evaluation reasoning output using existing fields by sachin9058 · Pull Request #6417 · mindersec/minder

sachin9058 · 2026-04-24T09:18:48Z

Summary

This PR improves how evaluation results are surfaced in the CLI by making existing evaluation data easier to understand.

Previously, CLI output primarily focused on pass/fail status, while additional context (details, guidance, remediation details, alerts, and structured output) was harder to interpret. This change aggregates and presents that information in a clearer, reasoning-first format.

Key Changes

Added a "Reasoning" column to evaluation-related CLI outputs
Aggregated existing fields into a readable format:
- details
- guidance
- remediation_details
- alert.details and alert.url
- output (structured evaluation output)
Introduced shared helpers for consistent rendering:
- FormatEvaluationReasoning
- RuleDisplayName
Removed duplication by centralizing reasoning logic across CLI commands
Improved readability with bullet-formatted output

Design Rationale

This change builds on the existing evaluation model instead of introducing new engine-level concepts.

No changes to evaluation logic or engine behavior
Reuses already available evaluation data
Focuses on improving visibility and usability in CLI
Keeps changes incremental and aligned with current architecture

CLI Changes

Example improvement:

Before:
Rule: branch_protection_enabled → FAIL

After:
Rule: branch_protection_enabled
Result: FAIL
Reasoning:
- branch protection is disabled
- Guidance: enable branch protection
- Output:
...

This makes it easier to understand why a rule passed or failed without needing to inspect raw JSON/YAML output.

Testing

Ran:
go test ./cmd/cli/app/profile/... ./cmd/cli/app/artifact ./cmd/cli/app/history
Verified CLI output formatting for:
- profile status
- artifact evaluation status
Confirmed no changes to evaluation engine behavior

Notes

This PR focuses on CLI improvements only. It does not introduce any new evaluation fields or modify rule execution.

Future improvements could include:

a dedicated results / eval CLI command for deeper inspection
documentation improvements for rule debugging
additional formatting options (e.g., CSV/HTML export)

Fixes #6416

coveralls · 2026-04-24T09:29:50Z

Coverage is 60.56% — sachin9058:feat/explain-mode into mindersec:main. No base build found for mindersec:main.

evankanderson

I don't think you need any of the changes in internal/engine/eval/ and pkg/engine/v1 for this change -- we already wire up output, status, details, remediation_details and alert.details. We don't necessarily surface all of these in the CLI yet, but I'd like to see some more design detail about how explain would differ from the existing information collected (and how it would be used) before making rule engine changes.

sachin9058 · 2026-04-24T16:52:08Z

I don't think you need any of the changes in internal/engine/eval/ and pkg/engine/v1 for this change -- we already wire up output, status, details, remediation_details and alert.details. We don't necessarily surface all of these in the CLI yet, but I'd like to see some more design detail about how explain would differ from the existing information collected (and how it would be used) before making rule engine changes.

@evankanderson Thanks, that’s very helpful.

I see your point about the existing fields (output, status, details, remediation_details, alert.details) already carrying structured information from evaluation.

My intention with “explain” was to surface evaluation reasoning more explicitly (e.g., why a rule passed/failed, intermediate checks like allow/violations), rather than just the final outcome or message.

That said, I agree the current PR doesn’t clearly differentiate this from existing fields, and it likely needs a clearer design before making engine-level changes.

I’ll put together a short design proposal outlining:

what “explain” represents conceptually
how it differs from existing fields like output and details
whether it can reuse existing structures instead of introducing new ones
how it would be consumed (CLI / API)

I can share that here for feedback before continuing implementation.

sachin9058 · 2026-04-24T16:58:32Z

@evankanderson Following up with a quick design proposal:

Goal

“Explain” is intended to expose evaluation reasoning, not just final results.

This includes:

allow/deny decisions
constraint violations
intermediate evaluation checks

Current State

We already have:

output: user-facing result (message / structured output)
details, remediation_details, alert.details: structured metadata

These represent results, but not necessarily the reasoning behind them.

Proposed Direction

Instead of introducing a new engine-level field, “explain” can be implemented as an extension of the existing evaluation model.

Reuse existing structures
- Leverage output and/or details to carry additional reasoning data
- Avoid adding a new top-level field unless a clear gap is identified
- Keep compatibility with existing APIs and storage
Evaluator-level enrichment
- When explain mode is enabled, evaluators (e.g., deny-by-default, constraints) can include additional context such as:
  - evaluation decisions (e.g., allowed: true/false)
  - matched conditions or violations
  - key inputs used in the decision
- When explain is disabled, behavior remains unchanged (no extra data populated)
Minimal engine involvement
- The engine only needs to propagate an “explain enabled” signal (e.g., via context)
- No structural changes to evaluation result types are required initially
CLI and API consumption
- CLI can surface this additional data when --explain is used
- Initially, this can reuse existing output rendering paths
- Future iterations can improve formatting or explicitly surface structured reasoning

This approach keeps the change incremental:

avoids introducing new concepts prematurely
builds on existing fields
allows gradual refinement based on usage and feedback

Usage

Explain would be a debugging/visibility feature:

minder <command> --explain

Used to:

understand why a rule passed/failed
debug policies
inspect evaluator behavior

Happy to refine this based on what aligns best with the current model.

evankanderson · 2026-04-24T17:54:07Z

Evaluator-level enrichment

When explain mode is enabled, evaluators (e.g., deny-by-default, constraints) can include additional context such as:

evaluation decisions (e.g., allowed: true/false)

matched conditions or violations

key inputs used in the decision

When explain is disabled, behavior remains unchanged (no extra data populated)

I'm trying to understand how this part works. The rest mostly makes sense, but how does the explain context setting change the rule execution (if at all)? What does it record that isn't previously recorded? It might be easiest to pick an example rule from the existing rules to use as an example.

The rest of what you're proposing mostly makes sense, except for one detail:

When explain mode is used ...

CLI can surface this additional data when --explain is used

Rule evaluation takes place independently of the CLI ever asking for a report on what happened. In many cases (success and failure) the main user interaction that takes place happens outside of Minder -- for example, #6184 happened entirely in the GitHub UI, and I never looked at the rule status after loading it. (Now, I'd tested it elsewhere before enabling it, but we can't know who's debugging rules and who is just consuming them in the current architecture.)

sachin9058 · 2026-04-24T18:14:48Z

@evankanderson thanks for calling this out, this helps clarify the gap.

You're right that evaluators already compute and expose the core reasoning signals (allow/violations, message, output), and there isn't really additional hidden state to capture at execution time.

Given that, I think my earlier framing of “explain mode” as something that changes rule execution isn’t the right direction.

Instead, it seems the real gap is:

how this existing reasoning data is surfaced and understood
especially across CLI, RPC responses, and documentation

So rather than adding evaluator-level changes, a better approach would be:

improving how existing fields (output, details, EvalResultOutput) are exposed
making it clearer how to interpret allow/violations/message/output in practice
potentially improving CLI output to present this in a more “explain-like” way

I’ll pivot this PR in that direction and remove the evaluator changes.

For example, I can:

take an existing rule from minder-rules-and-profiles
show how its allow/violations/message/output map to what a user sees
and improve CLI output / documentation around that

Let me know if that direction aligns better.

evankanderson · 2026-04-24T18:37:01Z

@evankanderson thanks for calling this out, this helps clarify the gap.

You're right that evaluators already compute and expose the core reasoning signals (allow/violations, message, output), and there isn't really additional hidden state to capture at execution time.

There is one tool that we don't currently support (but I've considered adding) -- Rego has a built-in print function that we don't wire up or collect today, except in the mindev command. We could store that alongside the existing structured output to assist with debugging. Note that it would be always-on, rather than a "mode" that you could use for one invocation.

Given that, I think my earlier framing of “explain mode” as something that changes rule execution isn’t the right direction.

I think it's a really interesting idea -- unfortunately, it doesn't quite match the model (background execution on event, rather than triggered by users) and primitives (Rego, JQ evaluators) we currently have.

Instead, it seems the real gap is:

how this existing reasoning data is surfaced and understood

FWIW, I wouldn't describe this as "reasoning data" -- this is basically intermediate state. If you're looking for equivalent / conceptually-similar notions, tools like Rookout and Lightrun offer stopless debugging -- there are also approaches that save enough state to allow "time-travel" or backwards jumps in execution. I'm not sure to what extent our current tools support that, though.

especially across CLI, RPC responses, and documentation

Yes, this is a big gap! And writing documentation seems to be harder than writing code. 😁

Our current docs are somewhat organized, but I think a better section focused on writing and debugging rules would probably be helpful. I generally like the https://diataxis.fr/ model for writing documentation; if you haven't written much documentation before, a brief read through that site may be helpful.

So rather than adding evaluator-level changes, a better approach would be:

improving how existing fields (output, details, EvalResultOutput) are exposed

In particular, the CLI currently doesn't make much use of EvalResultService and prefers to use ProfileService with some extra flags. I'd be open to a new top-level results (or some other useful word, like eval) command for viewing and analyzing results. CSV export might also be a useful feature there -- or HTML with an embedded webserver, though I'm also working on a full HTML UI, so that may not be as helpful.

making it clearer how to interpret allow/violations/message/output in practice

Yes, and documenting patterns for "what goes where". We have a bunch of rules that stuff things into violations in order to have nice messages where there might be better ways to do it, or ways to use message in a more consistent and conventional way.

potentially improving CLI output to present this in a more “explain-like” way

Yes, I think being able to zoom in on a particular rule + entity and get a full readout of what happened without resorting to -o yaml and scrolling around could be handy.

I’ll pivot this PR in that direction and remove the evaluator changes.

👍

For example, I can:

take an existing rule from minder-rules-and-profiles

show how its allow/violations/message/output map to what a user sees

and improve CLI output / documentation around that

That sounds like a great plan -- and will give you something concrete to poke at. If you find rules that don't seem to behave correctly, feel free to bring them to the #minder OpenSSF Slack channel with questions and I'll be happy to help out.

sachin9058 · 2026-04-24T18:54:55Z

@evankanderson Thanks, this is super helpful I really appreciate the detailed guidance.

I’ll update this PR to remove the evaluator/engine changes and pivot it toward:

improving how existing evaluation data is surfaced
exploring a more focused CLI experience (e.g., a results/eval command or improved output)
and documenting how allow/violations/message/output map to user-visible results

I’ll start by picking an example rule from minder-rules-and-profiles and building from there.

Will push an update shortly.

sachin9058 · 2026-04-24T19:25:33Z

@evankanderson Thanks for the guidance — I’ve updated the PR to align with your feedback.

removed the earlier engine-level changes
focused on improving how existing evaluation data is surfaced in the CLI
centralized reasoning formatting and applied it consistently

This now only builds on existing fields (details, guidance, remediation, alerts, output) without changing evaluation behavior.

Let me know if this direction looks good or if you'd like any further adjustments.

evankanderson

Can you include some screenshots / extracts of how this renders? I'm concerned that information density and formatting may be difficult to pull off in the table view.

sachin9058 · 2026-04-25T05:51:06Z

@evankanderson Good point .. I extracted the actual rendering from the reasoning helper used by the CLI.

Here’s the real output produced:

This is exactly what gets rendered inside the “Reasoning” column, so multi-line values appear stacked within that cell.

To keep it readable, I:

use bullet prefixes
group related fields (details, guidance, remediation, alerts, output)
indent structured output

One small thing I noticed is that the output value is currently wrapped in quotes (coming from the structpb value formatting). I can clean that up in a follow-up if needed.
This is exactly what gets rendered inside the “Reasoning” column, so multi-line values appear stacked within that cell.

I don’t currently have a local backend running to capture a full end-to-end CLI screenshot, but this output is directly produced by the same helper used in table rendering.

Happy to adjust formatting further (e.g., truncation or alternative layouts) if this feels too dense in table view.

evankanderson

I suspect we want to introduce a new command to get all the detail information about a specific rule evaluation on a selected entity -- getting all the details for every rule evaluation in a great big list can be overwhelming when you either have e.g. 20 ruletypes or 20 repositories you're managing.

sachin9058 · 2026-04-25T08:38:05Z

@evankanderson Thanks I’ve updated the PR based on your feedback.

kept the table output compact by reducing details to a single-line summary
reverted naming back to “Details”
applied the suggested improvements (util usage, cmp.Or, rule naming order)
removed duplication in history output

This keeps the table readable while still surfacing useful context.

I agree that a separate command for detailed inspection would be a better place for full output, and I’d be happy to explore that in a follow-up.

Let me know if this looks better.

sachin9058 · 2026-04-25T21:19:20Z

@evankanderson Thanks I’ve incorporated the feedback:

switched column naming back to “Details”
fixed rule name duplication via RuleDisplayName
removed the pass-through function
added remediation URL alongside alert URL
simplified the table to a compact summary (no YAML/output in table)

Let me know if this looks good or if you'd like any further tweaks.

sachin9058 · 2026-04-27T22:19:33Z

@evankanderson hey can u take a look on this again ?? I have improved the pr and also made requested changes and tell me wheather it looks good or need further improvements ??

evankanderson

I think it would be helpful to add the golden-output type tests like #6418 or #6419 for artifact get or profile status list before merging this change -- the earlier comments have a screenshot or two, but being able to evaluate the output changes in a diff feels useful.

sachin9058 · 2026-04-30T09:01:57Z

@evankanderson Thanks !!!

I’ve added golden-output tests for artifact get in this PR following the same pattern as #6418/#6419, so the CLI output changes can now be evaluated directly via diffs.

To support this, I introduced test-friendly RPC injection using cli.WithRPCClient. Previously, the CLI wrapper would always initialize a real gRPC connection, which caused tests to trigger login/network flows. I added a small context-based check so that when an injected client is present, the command uses it instead of dialing.

This behavior only applies when a client is explicitly injected in tests, so normal CLI execution remains unchanged.

I used lightweight stubs for the artifact and profile clients to keep the tests self-contained, but I’m happy to switch to generated mocks if that’s preferred.

Let me know if you’d like additional commands covered or adjustments to the injection approach.

- add reasoning column to CLI output - aggregate details, guidance, remediation, alerts, and output - centralize reasoning formatting helpers - improve readability with bullet formatting - remove previous explain-mode implementation

… feedback

… RPC injection - add golden-output tests for artifact get (table/json/yaml + error) - add fixtures and goldens for CLI output validation - inject RPC clients via context for test isolation - short-circuit gRPC setup when injected clients are present to avoid login/network in tests

… pattern - switch artifact get to use cli.GetCLIClient for both artifact and profile clients - remove custom injection logic and align with upstream CLI patterns - add golden-output tests for artifact get (table/json/yaml + error case) - use gomock-based clients with realistic evaluation data - update goldens to reflect current CLI formatting - regenerate profile/status goldens after rebase to match latest output

sachin9058 · 2026-04-30T18:40:11Z

@evankanderson
Thanks I'’ve rebased this PR on the latest main and aligned it with the pattern introduced in #6420.

removed the custom injection logic and restored standard CLI behavior
updated artifact get to use cli.GetCLIClient for both artifact and profile clients
added golden-output tests for artifact get using gomock with realistic evaluation data
regenerated profile/status goldens to match the current CLI output after the rebase

Let me know if anything else needs adjustment.

sachin9058 · 2026-04-30T19:23:15Z

@evankanderson

I also updated the artifact get goldens (the failure was due to JSON formatting differences only — spacing changes, not data changes).

Locally everything is passing:

go test ./cmd/cli/app/artifact
go test ./cmd/cli/...

However, CI is still failing on the artifact tests, and I’m not able to reproduce the failure locally.

Could you help point out what’s different in CI or if there’s something I might be missing (e.g., environment-specific formatting or test expectations)?

Happy to adjust once I understand the discrepancy.

evankanderson · 2026-04-30T20:24:45Z

The test is claiming an extra space after the ":" in the JSON:

          	            	Diff:
          	            	--- Expected
          	            	+++ Actual
          	            	@@ -1,10 +1,10 @@
          	            	 {
          	            	-  "artifact": {
          	            	-    "artifactPk": "111",
          	            	-    "owner": "owner-1",
          	            	-    "name": "artifact-1",
          	            	-    "type": "image",
          	            	-    "visibility": "public",
          	            	-    "repository": "org/repo",
          	            	-    "createdAt": "2024-01-02T15:04:05Z"
          	            	+  "artifact":  {
          	            	+    "artifactPk":  "111",
          	            	+    "owner":  "owner-1",
          	            	+    "name":  "artifact-1",
          	            	+    "type":  "image",
          	            	+    "visibility":  "public",
          	            	+    "repository":  "org/repo",
          	            	+    "createdAt":  "2024-01-02T15:04:05Z"
          	            	   }

I'll see if I can figure out why that might be.

evankanderson · 2026-04-30T20:49:32Z

From protojson's MarshalOptions.Marshal:

Do not depend on the output being stable. Its output will change across different builds of your program, even when using the same version of the protobuf module.

And they mean it: https://github.com/protocolbuffers/protobuf-go/blob/master/internal/encoding/json/encode.go#L272

Unfortunately, detrand.Disable() is in the internal package, so we may need to do something to strip out their extra whitespace (possibly replace all runs of spaces with a single space for JSON).

evankanderson · 2026-04-30T20:50:16Z

Alternatively, we can write a nasty note (meaning replacing each JSON case with a comment referencing a bug in the Minder repo that the JSON output is not stable) and not test the JSON output, which seems fairly safe.

…protojson output

sachin9058 · 2026-04-30T21:13:32Z

@evankanderson

I’ve removed the JSON golden tests for artifact get since the protojson output isn’t stable across environments, and kept the table and YAML output tests. I also added a note in the test explaining this decision.

All CLI tests are passing locally:

Let me know if anything else needs adjustment.

evankanderson

Just a couple comments -- mostly on cleaning up the json golden files, but I'm also wondering about our current priority of messages, and a few other cleanups.

- removed remaining JSON golden tests and related test cases - switched to fixture-backed mocks for artifact and profile tests - updated reasoning formatting order (Alert → Remediation → Details → Guidance) - improved remediation formatting to include message + URL - removed unused BestDetail helper - added test coverage for updated table rendering

sachin9058 · 2026-05-01T00:07:40Z

@evankanderson Thanks I’ve addressed the feedback.

removed remaining JSON golden tests and test cases
switched to fixture-backed mocks for consistency
updated reasoning formatting (Alert → Remediation → Details → Guidance) with multiline remediation support
removed unused helpers and added test coverage for the updated formatting

All CLI tests are passing:

Let me know if anything else needs adjustment.

sachin9058 · 2026-05-01T00:27:18Z

I tried reproducing the race locally with go test -race ./..., and everything passes (including the internal/eea tests).
This might be a timing-sensitive or flaky race condition that’s showing up in CI but not locally. Since these failures are in the eventing layer and not directly related to the CLI changes in this PR.
Let me know if you’d like me to dig deeper or if this should be tracked separately.

evankanderson · 2026-05-01T00:35:51Z

The EEA test looks like a flake to me. Re-running

sachin9058 requested a review from a team as a code owner April 24, 2026 09:18

evankanderson reviewed Apr 24, 2026

View reviewed changes

sachin9058 force-pushed the feat/explain-mode branch from 0caff63 to ca0ee91 Compare April 24, 2026 19:20

sachin9058 changed the title ~~feat(engine): add MVP explain mode for evaluation results~~ feat(cli): improve evaluation reasoning output using existing fields Apr 24, 2026

evankanderson reviewed Apr 24, 2026

View reviewed changes

evankanderson reviewed Apr 25, 2026

View reviewed changes

sachin9058 requested a review from evankanderson April 28, 2026 12:15

evankanderson reviewed Apr 29, 2026

View reviewed changes

Comment thread cmd/cli/app/profile/table_render.go

evankanderson mentioned this pull request Apr 30, 2026

added test files for profile and profile/status command #6420

Merged

evankanderson reviewed Apr 30, 2026

View reviewed changes

Comment thread cmd/cli/app/artifact/artifact_get.go Outdated

Comment thread cmd/cli/app/artifact/artifact_get_test.go Outdated

Comment thread cmd/cli/app/artifact/artifact_get_test.go Outdated

Comment thread internal/util/cli/cli.go Outdated

Comment thread internal/util/cli/context.go Outdated

sachin9058 added 6 commits April 30, 2026 23:42

refactor(cli): simplify evaluation details rendering and apply review…

3ec7c8c

… feedback

chore: tidy go modules

51ace98

chore: trigger CI rerun

c84e306

refactor(cli): address review feedback and simplify details rendering

32803af

sachin9058 force-pushed the feat/explain-mode branch from 53e3705 to 25edf52 Compare April 30, 2026 18:38

chore: trigger CI

2108f49

sachin9058 requested a review from evankanderson April 30, 2026 19:22

evankanderson mentioned this pull request Apr 30, 2026

CLI : added unit test files for project and project role #6418

Open

test(cli): remove JSON golden tests for artifact get due to unstable …

f99a75f

…protojson output

sachin9058 mentioned this pull request Apr 30, 2026

Unstable JSON output in CLI tests due to protojson formatting #6430

Open

evankanderson reviewed Apr 30, 2026

View reviewed changes

sachin9058 requested a review from evankanderson May 1, 2026 00:07

evankanderson approved these changes May 1, 2026

View reviewed changes

evankanderson merged commit 5e95a7e into mindersec:main May 1, 2026
36 of 37 checks passed

sachin9058 deleted the feat/explain-mode branch May 1, 2026 06:38

DharunMR mentioned this pull request May 1, 2026

test(cli): support JSON stream decoding in golden tests to fix protojson instability #6433

Open

Conversation

sachin9058 commented Apr 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Key Changes

Design Rationale

CLI Changes

Testing

Notes

Uh oh!

coveralls commented Apr 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

evankanderson left a comment

Choose a reason for hiding this comment

Uh oh!

sachin9058 commented Apr 24, 2026

Uh oh!

sachin9058 commented Apr 24, 2026

Goal

Current State

Proposed Direction

Usage

Uh oh!

evankanderson commented Apr 24, 2026

Uh oh!

sachin9058 commented Apr 24, 2026

Uh oh!

evankanderson commented Apr 24, 2026

Uh oh!

sachin9058 commented Apr 24, 2026

Uh oh!

sachin9058 commented Apr 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

evankanderson left a comment

Choose a reason for hiding this comment

Uh oh!

sachin9058 commented Apr 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

evankanderson left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

sachin9058 commented Apr 25, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

sachin9058 commented Apr 25, 2026

Uh oh!

sachin9058 commented Apr 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

evankanderson left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

sachin9058 commented Apr 30, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

sachin9058 commented Apr 30, 2026

Uh oh!

sachin9058 commented Apr 30, 2026

Uh oh!

evankanderson commented Apr 30, 2026

Uh oh!

evankanderson commented Apr 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

sachin9058 commented Apr 24, 2026 •

edited

Loading

coveralls commented Apr 24, 2026 •

edited

Loading

sachin9058 commented Apr 24, 2026 •

edited

Loading

sachin9058 commented Apr 25, 2026 •

edited

Loading

sachin9058 commented Apr 27, 2026 •

edited

Loading

evankanderson commented Apr 30, 2026 •

edited

Loading

evankanderson commented Apr 30, 2026 •

edited

Loading