Skip to content

feat(cli): improve evaluation reasoning output using existing fields#6417

Merged
evankanderson merged 10 commits into
mindersec:mainfrom
sachin9058:feat/explain-mode
May 1, 2026
Merged

feat(cli): improve evaluation reasoning output using existing fields#6417
evankanderson merged 10 commits into
mindersec:mainfrom
sachin9058:feat/explain-mode

Conversation

@sachin9058
Copy link
Copy Markdown
Contributor

@sachin9058 sachin9058 commented Apr 24, 2026

Summary

This PR improves how evaluation results are surfaced in the CLI by making existing evaluation data easier to understand.

Previously, CLI output primarily focused on pass/fail status, while additional context (details, guidance, remediation details, alerts, and structured output) was harder to interpret. This change aggregates and presents that information in a clearer, reasoning-first format.

Key Changes

  • Added a "Reasoning" column to evaluation-related CLI outputs
  • Aggregated existing fields into a readable format:
    • details
    • guidance
    • remediation_details
    • alert.details and alert.url
    • output (structured evaluation output)
  • Introduced shared helpers for consistent rendering:
    • FormatEvaluationReasoning
    • RuleDisplayName
  • Removed duplication by centralizing reasoning logic across CLI commands
  • Improved readability with bullet-formatted output

Design Rationale

This change builds on the existing evaluation model instead of introducing new engine-level concepts.

  • No changes to evaluation logic or engine behavior
  • Reuses already available evaluation data
  • Focuses on improving visibility and usability in CLI
  • Keeps changes incremental and aligned with current architecture

CLI Changes

Example improvement:

Before:
Rule: branch_protection_enabled → FAIL

After:
Rule: branch_protection_enabled
Result: FAIL
Reasoning:
- branch protection is disabled
- Guidance: enable branch protection
- Output:
...

This makes it easier to understand why a rule passed or failed without needing to inspect raw JSON/YAML output.

Testing

  • Ran:
    go test ./cmd/cli/app/profile/... ./cmd/cli/app/artifact ./cmd/cli/app/history
  • Verified CLI output formatting for:
    • profile status
    • artifact evaluation status
  • Confirmed no changes to evaluation engine behavior

Notes

This PR focuses on CLI improvements only. It does not introduce any new evaluation fields or modify rule execution.

Future improvements could include:

  • a dedicated results / eval CLI command for deeper inspection
  • documentation improvements for rule debugging
  • additional formatting options (e.g., CSV/HTML export)

Fixes #6416

@sachin9058 sachin9058 requested a review from a team as a code owner April 24, 2026 09:18
@coveralls
Copy link
Copy Markdown

coveralls commented Apr 24, 2026

Coverage Status

Coverage is 60.56%sachin9058:feat/explain-mode into mindersec:main. No base build found for mindersec:main.

Copy link
Copy Markdown
Member

@evankanderson evankanderson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think you need any of the changes in internal/engine/eval/ and pkg/engine/v1 for this change -- we already wire up output, status, details, remediation_details and alert.details. We don't necessarily surface all of these in the CLI yet, but I'd like to see some more design detail about how explain would differ from the existing information collected (and how it would be used) before making rule engine changes.

@sachin9058
Copy link
Copy Markdown
Contributor Author

I don't think you need any of the changes in internal/engine/eval/ and pkg/engine/v1 for this change -- we already wire up output, status, details, remediation_details and alert.details. We don't necessarily surface all of these in the CLI yet, but I'd like to see some more design detail about how explain would differ from the existing information collected (and how it would be used) before making rule engine changes.

@evankanderson Thanks, that’s very helpful.

I see your point about the existing fields (output, status, details, remediation_details, alert.details) already carrying structured information from evaluation.

My intention with “explain” was to surface evaluation reasoning more explicitly (e.g., why a rule passed/failed, intermediate checks like allow/violations), rather than just the final outcome or message.

That said, I agree the current PR doesn’t clearly differentiate this from existing fields, and it likely needs a clearer design before making engine-level changes.

I’ll put together a short design proposal outlining:

  • what “explain” represents conceptually
  • how it differs from existing fields like output and details
  • whether it can reuse existing structures instead of introducing new ones
  • how it would be consumed (CLI / API)

I can share that here for feedback before continuing implementation.

@sachin9058
Copy link
Copy Markdown
Contributor Author

@evankanderson Following up with a quick design proposal:

Goal

“Explain” is intended to expose evaluation reasoning, not just final results.

This includes:

  • allow/deny decisions
  • constraint violations
  • intermediate evaluation checks

Current State

We already have:

  • output: user-facing result (message / structured output)
  • details, remediation_details, alert.details: structured metadata

These represent results, but not necessarily the reasoning behind them.

Proposed Direction

Instead of introducing a new engine-level field, “explain” can be implemented as an extension of the existing evaluation model.

  1. Reuse existing structures

    • Leverage output and/or details to carry additional reasoning data
    • Avoid adding a new top-level field unless a clear gap is identified
    • Keep compatibility with existing APIs and storage
  2. Evaluator-level enrichment

    • When explain mode is enabled, evaluators (e.g., deny-by-default, constraints) can include additional context such as:
      • evaluation decisions (e.g., allowed: true/false)
      • matched conditions or violations
      • key inputs used in the decision
    • When explain is disabled, behavior remains unchanged (no extra data populated)
  3. Minimal engine involvement

    • The engine only needs to propagate an “explain enabled” signal (e.g., via context)
    • No structural changes to evaluation result types are required initially
  4. CLI and API consumption

    • CLI can surface this additional data when --explain is used
    • Initially, this can reuse existing output rendering paths
    • Future iterations can improve formatting or explicitly surface structured reasoning

This approach keeps the change incremental:

  • avoids introducing new concepts prematurely
  • builds on existing fields
  • allows gradual refinement based on usage and feedback

Usage

Explain would be a debugging/visibility feature:

minder <command> --explain

Used to:

  • understand why a rule passed/failed
  • debug policies
  • inspect evaluator behavior

Happy to refine this based on what aligns best with the current model.

@evankanderson
Copy link
Copy Markdown
Member

  1. Evaluator-level enrichment

    • When explain mode is enabled, evaluators (e.g., deny-by-default, constraints) can include additional context such as:

      • evaluation decisions (e.g., allowed: true/false)
      • matched conditions or violations
      • key inputs used in the decision
    • When explain is disabled, behavior remains unchanged (no extra data populated)

I'm trying to understand how this part works. The rest mostly makes sense, but how does the explain context setting change the rule execution (if at all)? What does it record that isn't previously recorded? It might be easiest to pick an example rule from the existing rules to use as an example.

The rest of what you're proposing mostly makes sense, except for one detail:

  1. When explain mode is used ...
  2. CLI can surface this additional data when --explain is used

Rule evaluation takes place independently of the CLI ever asking for a report on what happened. In many cases (success and failure) the main user interaction that takes place happens outside of Minder -- for example, #6184 happened entirely in the GitHub UI, and I never looked at the rule status after loading it. (Now, I'd tested it elsewhere before enabling it, but we can't know who's debugging rules and who is just consuming them in the current architecture.)

@sachin9058
Copy link
Copy Markdown
Contributor Author

@evankanderson thanks for calling this out, this helps clarify the gap.

You're right that evaluators already compute and expose the core reasoning signals (allow/violations, message, output), and there isn't really additional hidden state to capture at execution time.

Given that, I think my earlier framing of “explain mode” as something that changes rule execution isn’t the right direction.

Instead, it seems the real gap is:

  • how this existing reasoning data is surfaced and understood
  • especially across CLI, RPC responses, and documentation

So rather than adding evaluator-level changes, a better approach would be:

  • improving how existing fields (output, details, EvalResultOutput) are exposed
  • making it clearer how to interpret allow/violations/message/output in practice
  • potentially improving CLI output to present this in a more “explain-like” way

I’ll pivot this PR in that direction and remove the evaluator changes.

For example, I can:

  • take an existing rule from minder-rules-and-profiles
  • show how its allow/violations/message/output map to what a user sees
  • and improve CLI output / documentation around that

Let me know if that direction aligns better.

@evankanderson
Copy link
Copy Markdown
Member

@evankanderson thanks for calling this out, this helps clarify the gap.

You're right that evaluators already compute and expose the core reasoning signals (allow/violations, message, output), and there isn't really additional hidden state to capture at execution time.

There is one tool that we don't currently support (but I've considered adding) -- Rego has a built-in print function that we don't wire up or collect today, except in the mindev command. We could store that alongside the existing structured output to assist with debugging. Note that it would be always-on, rather than a "mode" that you could use for one invocation.

Given that, I think my earlier framing of “explain mode” as something that changes rule execution isn’t the right direction.

I think it's a really interesting idea -- unfortunately, it doesn't quite match the model (background execution on event, rather than triggered by users) and primitives (Rego, JQ evaluators) we currently have.

Instead, it seems the real gap is:

  • how this existing reasoning data is surfaced and understood

FWIW, I wouldn't describe this as "reasoning data" -- this is basically intermediate state. If you're looking for equivalent / conceptually-similar notions, tools like Rookout and Lightrun offer stopless debugging -- there are also approaches that save enough state to allow "time-travel" or backwards jumps in execution. I'm not sure to what extent our current tools support that, though.

  • especially across CLI, RPC responses, and documentation

Yes, this is a big gap! And writing documentation seems to be harder than writing code. 😁

Our current docs are somewhat organized, but I think a better section focused on writing and debugging rules would probably be helpful. I generally like the https://diataxis.fr/ model for writing documentation; if you haven't written much documentation before, a brief read through that site may be helpful.

So rather than adding evaluator-level changes, a better approach would be:

  • improving how existing fields (output, details, EvalResultOutput) are exposed

In particular, the CLI currently doesn't make much use of EvalResultService and prefers to use ProfileService with some extra flags. I'd be open to a new top-level results (or some other useful word, like eval) command for viewing and analyzing results. CSV export might also be a useful feature there -- or HTML with an embedded webserver, though I'm also working on a full HTML UI, so that may not be as helpful.

  • making it clearer how to interpret allow/violations/message/output in practice

Yes, and documenting patterns for "what goes where". We have a bunch of rules that stuff things into violations in order to have nice messages where there might be better ways to do it, or ways to use message in a more consistent and conventional way.

  • potentially improving CLI output to present this in a more “explain-like” way

Yes, I think being able to zoom in on a particular rule + entity and get a full readout of what happened without resorting to -o yaml and scrolling around could be handy.

I’ll pivot this PR in that direction and remove the evaluator changes.

👍

For example, I can:

  • take an existing rule from minder-rules-and-profiles
  • show how its allow/violations/message/output map to what a user sees
  • and improve CLI output / documentation around that

That sounds like a great plan -- and will give you something concrete to poke at. If you find rules that don't seem to behave correctly, feel free to bring them to the #minder OpenSSF Slack channel with questions and I'll be happy to help out.

@sachin9058
Copy link
Copy Markdown
Contributor Author

@evankanderson Thanks, this is super helpful I really appreciate the detailed guidance.

I’ll update this PR to remove the evaluator/engine changes and pivot it toward:

  • improving how existing evaluation data is surfaced
  • exploring a more focused CLI experience (e.g., a results/eval command or improved output)
  • and documenting how allow/violations/message/output map to user-visible results

I’ll start by picking an example rule from minder-rules-and-profiles and building from there.

Will push an update shortly.

@sachin9058 sachin9058 changed the title feat(engine): add MVP explain mode for evaluation results feat(cli): improve evaluation reasoning output using existing fields Apr 24, 2026
@sachin9058
Copy link
Copy Markdown
Contributor Author

sachin9058 commented Apr 24, 2026

@evankanderson Thanks for the guidance — I’ve updated the PR to align with your feedback.

  • removed the earlier engine-level changes
  • focused on improving how existing evaluation data is surfaced in the CLI
  • centralized reasoning formatting and applied it consistently

This now only builds on existing fields (details, guidance, remediation, alerts, output) without changing evaluation behavior.

Let me know if this direction looks good or if you'd like any further adjustments.

Copy link
Copy Markdown
Member

@evankanderson evankanderson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you include some screenshots / extracts of how this renders? I'm concerned that information density and formatting may be difficult to pull off in the table view.

@sachin9058
Copy link
Copy Markdown
Contributor Author

sachin9058 commented Apr 25, 2026

@evankanderson Good point .. I extracted the actual rendering from the reasoning helper used by the CLI.

Here’s the real output produced:

Screenshot from 2026-04-25 11-13-59

This is exactly what gets rendered inside the “Reasoning” column, so multi-line values appear stacked within that cell.

To keep it readable, I:

  • use bullet prefixes
  • group related fields (details, guidance, remediation, alerts, output)
  • indent structured output

One small thing I noticed is that the output value is currently wrapped in quotes (coming from the structpb value formatting). I can clean that up in a follow-up if needed.
This is exactly what gets rendered inside the “Reasoning” column, so multi-line values appear stacked within that cell.

I don’t currently have a local backend running to capture a full end-to-end CLI screenshot, but this output is directly produced by the same helper used in table rendering.

Happy to adjust formatting further (e.g., truncation or alternative layouts) if this feels too dense in table view.

Copy link
Copy Markdown
Member

@evankanderson evankanderson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suspect we want to introduce a new command to get all the detail information about a specific rule evaluation on a selected entity -- getting all the details for every rule evaluation in a great big list can be overwhelming when you either have e.g. 20 ruletypes or 20 repositories you're managing.

Comment thread cmd/cli/app/profile/table_render.go Outdated
Comment thread cmd/cli/app/profile/table_render.go Outdated
Comment thread cmd/cli/app/profile/table_render.go Outdated
Comment thread cmd/cli/app/profile/table_render.go
Comment thread cmd/cli/app/profile/table_render.go Outdated
Comment thread cmd/cli/app/history/history_list.go Outdated
@sachin9058
Copy link
Copy Markdown
Contributor Author

@evankanderson Thanks I’ve updated the PR based on your feedback.

  • kept the table output compact by reducing details to a single-line summary
  • reverted naming back to “Details”
  • applied the suggested improvements (util usage, cmp.Or, rule naming order)
  • removed duplication in history output

This keeps the table readable while still surfacing useful context.

I agree that a separate command for detailed inspection would be a better place for full output, and I’d be happy to explore that in a follow-up.

Let me know if this looks better.

Comment thread cmd/cli/app/artifact/artifact_get.go Outdated
Comment thread cmd/cli/app/profile/table_render.go Outdated
Comment thread cmd/cli/app/profile/table_render.go Outdated
Comment thread cmd/cli/app/profile/table_render.go Outdated
Comment thread cmd/cli/app/profile/table_render.go Outdated
@sachin9058
Copy link
Copy Markdown
Contributor Author

@evankanderson Thanks I’ve incorporated the feedback:

  • switched column naming back to “Details”
  • fixed rule name duplication via RuleDisplayName
  • removed the pass-through function
  • added remediation URL alongside alert URL
  • simplified the table to a compact summary (no YAML/output in table)

Let me know if this looks good or if you'd like any further tweaks.

@sachin9058
Copy link
Copy Markdown
Contributor Author

sachin9058 commented Apr 27, 2026

@evankanderson hey can u take a look on this again ?? I have improved the pr and also made requested changes and tell me wheather it looks good or need further improvements ??

Copy link
Copy Markdown
Member

@evankanderson evankanderson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be helpful to add the golden-output type tests like #6418 or #6419 for artifact get or profile status list before merging this change -- the earlier comments have a screenshot or two, but being able to evaluate the output changes in a diff feels useful.

Comment thread cmd/cli/app/profile/table_render.go
@sachin9058
Copy link
Copy Markdown
Contributor Author

@evankanderson Thanks !!!

I’ve added golden-output tests for artifact get in this PR following the same pattern as #6418/#6419, so the CLI output changes can now be evaluated directly via diffs.

To support this, I introduced test-friendly RPC injection using cli.WithRPCClient. Previously, the CLI wrapper would always initialize a real gRPC connection, which caused tests to trigger login/network flows. I added a small context-based check so that when an injected client is present, the command uses it instead of dialing.

This behavior only applies when a client is explicitly injected in tests, so normal CLI execution remains unchanged.

I used lightweight stubs for the artifact and profile clients to keep the tests self-contained, but I’m happy to switch to generated mocks if that’s preferred.

Let me know if you’d like additional commands covered or adjustments to the injection approach.

Comment thread cmd/cli/app/artifact/artifact_get.go Outdated
Comment thread cmd/cli/app/artifact/artifact_get_test.go Outdated
Comment thread cmd/cli/app/artifact/artifact_get_test.go Outdated
Comment thread internal/util/cli/cli.go Outdated
Comment thread internal/util/cli/context.go Outdated
- add reasoning column to CLI output
- aggregate details, guidance, remediation, alerts, and output
- centralize reasoning formatting helpers
- improve readability with bullet formatting
- remove previous explain-mode implementation
… RPC injection

- add golden-output tests for artifact get (table/json/yaml + error)
- add fixtures and goldens for CLI output validation
- inject RPC clients via context for test isolation
- short-circuit gRPC setup when injected clients are present to avoid login/network in tests
… pattern

- switch artifact get to use cli.GetCLIClient for both artifact and profile clients
- remove custom injection logic and align with upstream CLI patterns
- add golden-output tests for artifact get (table/json/yaml + error case)
- use gomock-based clients with realistic evaluation data
- update goldens to reflect current CLI formatting
- regenerate profile/status goldens after rebase to match latest output
@sachin9058
Copy link
Copy Markdown
Contributor Author

@evankanderson
Thanks I'’ve rebased this PR on the latest main and aligned it with the pattern introduced in #6420.

  • removed the custom injection logic and restored standard CLI behavior
  • updated artifact get to use cli.GetCLIClient for both artifact and profile clients
  • added golden-output tests for artifact get using gomock with realistic evaluation data
  • regenerated profile/status goldens to match the current CLI output after the rebase

Let me know if anything else needs adjustment.

@sachin9058
Copy link
Copy Markdown
Contributor Author

@evankanderson

I also updated the artifact get goldens (the failure was due to JSON formatting differences only — spacing changes, not data changes).

Locally everything is passing:

  • go test ./cmd/cli/app/artifact
  • go test ./cmd/cli/...

However, CI is still failing on the artifact tests, and I’m not able to reproduce the failure locally.

Could you help point out what’s different in CI or if there’s something I might be missing (e.g., environment-specific formatting or test expectations)?

Happy to adjust once I understand the discrepancy.

@evankanderson
Copy link
Copy Markdown
Member

The test is claiming an extra space after the ":" in the JSON:

          	            	Diff:
          	            	--- Expected
          	            	+++ Actual
          	            	@@ -1,10 +1,10 @@
          	            	 {
          	            	-  "artifact": {
          	            	-    "artifactPk": "111",
          	            	-    "owner": "owner-1",
          	            	-    "name": "artifact-1",
          	            	-    "type": "image",
          	            	-    "visibility": "public",
          	            	-    "repository": "org/repo",
          	            	-    "createdAt": "2024-01-02T15:04:05Z"
          	            	+  "artifact":  {
          	            	+    "artifactPk":  "111",
          	            	+    "owner":  "owner-1",
          	            	+    "name":  "artifact-1",
          	            	+    "type":  "image",
          	            	+    "visibility":  "public",
          	            	+    "repository":  "org/repo",
          	            	+    "createdAt":  "2024-01-02T15:04:05Z"
          	            	   }

I'll see if I can figure out why that might be.

@evankanderson
Copy link
Copy Markdown
Member

evankanderson commented Apr 30, 2026

From protojson's MarshalOptions.Marshal:

Do not depend on the output being stable. Its output will change across different builds of your program, even when using the same version of the protobuf module.

And they mean it: https://github.com/protocolbuffers/protobuf-go/blob/master/internal/encoding/json/encode.go#L272

Unfortunately, detrand.Disable() is in the internal package, so we may need to do something to strip out their extra whitespace (possibly replace all runs of spaces with a single space for JSON).

@evankanderson
Copy link
Copy Markdown
Member

evankanderson commented Apr 30, 2026

Alternatively, we can write a nasty note (meaning replacing each JSON case with a comment referencing a bug in the Minder repo that the JSON output is not stable) and not test the JSON output, which seems fairly safe.

@sachin9058
Copy link
Copy Markdown
Contributor Author

@evankanderson

I’ve removed the JSON golden tests for artifact get since the protojson output isn’t stable across environments, and kept the table and YAML output tests. I also added a note in the test explaining this decision.

All CLI tests are passing locally:

Let me know if anything else needs adjustment.

Copy link
Copy Markdown
Member

@evankanderson evankanderson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a couple comments -- mostly on cleaning up the json golden files, but I'm also wondering about our current priority of messages, and a few other cleanups.

Comment thread cmd/cli/app/profile/status/testdata/status_get.json.golden Outdated
Comment thread cmd/cli/app/artifact/fixture/mock_artifact_get.json
Comment thread cmd/cli/app/artifact/testdata/artifact_get.yaml.golden Outdated
Comment thread cmd/cli/app/profile/status/testdata/status_list_table_detailed.txt.golden Outdated
Comment thread cmd/cli/app/profile/table_render.go
Comment thread cmd/cli/app/profile/table_render.go Outdated
- removed remaining JSON golden tests and related test cases
- switched to fixture-backed mocks for artifact and profile tests
- updated reasoning formatting order (Alert → Remediation → Details → Guidance)
- improved remediation formatting to include message + URL
- removed unused BestDetail helper
- added test coverage for updated table rendering
@sachin9058
Copy link
Copy Markdown
Contributor Author

@evankanderson Thanks I’ve addressed the feedback.

  • removed remaining JSON golden tests and test cases
  • switched to fixture-backed mocks for consistency
  • updated reasoning formatting (Alert → Remediation → Details → Guidance) with multiline remediation support
  • removed unused helpers and added test coverage for the updated formatting

All CLI tests are passing:

Let me know if anything else needs adjustment.

@sachin9058 sachin9058 requested a review from evankanderson May 1, 2026 00:07
@sachin9058
Copy link
Copy Markdown
Contributor Author

I tried reproducing the race locally with go test -race ./..., and everything passes (including the internal/eea tests).
This might be a timing-sensitive or flaky race condition that’s showing up in CI but not locally. Since these failures are in the eventing layer and not directly related to the CLI changes in this PR.
Let me know if you’d like me to dig deeper or if this should be tracked separately.

@evankanderson
Copy link
Copy Markdown
Member

The EEA test looks like a flake to me. Re-running

@evankanderson evankanderson merged commit 5e95a7e into mindersec:main May 1, 2026
36 of 37 checks passed
@sachin9058 sachin9058 deleted the feat/explain-mode branch May 1, 2026 06:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Improve visibility and debugging of evaluation results in CLI and documentation

3 participants