Skip to content

Conversation

@mmabrouk
Copy link
Member

@mmabrouk mmabrouk commented Dec 30, 2025

Summary

  • Adds a new JSON Multi-Field Match evaluator for comparing multiple JSON fields between expected and actual outputs
  • Perfect for entity extraction validation where you need to check if specific fields (name, email, address.city) match the ground truth
  • Each configured field becomes a separate output score (0 or 1), plus an aggregate_score showing the percentage of matching fields
  • Deprecates the existing JSON Field Match evaluator in favor of this more powerful version

Features

  • Multi-field comparison: Configure any number of JSON field paths to compare
  • Nested field support: Use dot notation for nested paths (e.g., user.address.city)
  • Array index support: Access array elements with numeric indices (e.g., items.0.name)
  • Per-field scores: Each field produces a 0/1 score for granular analysis
  • Aggregate score: Overall percentage of matching fields (0-1)
  • Auto-detection: Automatically detects available fields from testcase data
  • Tag-based UI: Intuitive add/remove interface for field management

Changes

Backend (SDK)

  • Added json_multi_field_match_v0 handler with nested value extraction
  • Added interface schema with dynamic output support
  • Added configuration for the new evaluator

Backend (API)

  • Added evaluator service implementation
  • Added dynamic schema building for metrics extraction
  • Added evaluator resource definition with UI settings

Frontend

  • Added FieldsTagsEditor component (Tailwind CSS, follows AGENTS.md guidelines)
  • Added extractJsonPaths helper for JSON field detection
  • Integrated new fields_tags_editor form field type

Test plan

  • Create a new JSON Multi-Field Match evaluator
  • Configure fields manually and verify they appear as tags
  • Select a testcase with JSON in the ground truth column and verify auto-detection works
  • Use "Detect from testcase" button to replace fields
  • Run evaluation and verify per-field scores and aggregate_score in results
  • Test nested field paths like user.address.city
  • Verify the deprecated JSON Field Match evaluator is hidden but existing configs work

🤖 Generated with Claude Code

mmabrouk and others added 9 commits December 22, 2025 20:48
…te_score instead of score and enhance field comparison descriptions
Fixes naming inconsistency between API service and SDK interface schema.
The interface defines `aggregate_score` as the required output field,
so the service must use the same name to pass schema validation.

Also applies ruff auto-cleanup for unused imports.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <[email protected]>
@vercel
Copy link

vercel bot commented Dec 30, 2025

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Review Updated (UTC)
agenta-documentation Ready Ready Preview, Comment Jan 5, 2026 8:52pm

@dosubot dosubot bot added the size:XL This PR changes 500-999 lines, ignoring generated files. label Dec 30, 2025
@mmabrouk
Copy link
Member Author

Demo

CleanShot.2025-12-30.at.21.07.17.mp4

oss: Optional[bool] = False
requires_llm_api_keys: Optional[bool] = False
tags: List[str]
archived: Optional[bool] = False
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this was forgotten

"name": "JSON Field Match",
"key": "field_match_test",
"direct_use": False,
"archived": True, # Deprecated - use json_multi_field_match instead
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

old json evaluators will continue working + they will still be editable in the UI but no user can create new ones.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jp-agenta let's not remove the old evaluators by mistake, a lot of users rely on them, deprecating them completely is more pain than just keeping them indefinitely imo

"settings_template": {
"fields": {
"label": "Fields to Compare",
"type": "fields_tags_editor", # Custom type - tag-based add/remove editor
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this tells the UI how to show the playground

return {"outputs": {"success": result}}


def get_nested_value(obj: Any, path: str) -> Any:
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe in the future:

  • This could become more general (allowing the use of json paths)
  • This is likely to be a common function

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ardaerzin the code as is mixes business logic and UI (the tag addition / removal component).

For now, I don't see us using this component somewhere else. I expect we would have something like this when we start using tags everywhere. It would make sense then to refactor it to disconnect both.

Let me know what you think. Is this too hacky or good enough for the use case

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good call

* @param prefix - Current path prefix (used for recursion)
* @returns Array of dot-notation paths to all leaf values
*/
export const extractJsonPaths = (obj: unknown, prefix = ""): string[] => {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ardaerzin this feels like something we have already / would have, no?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we already have safeJson5Parse, but no path utils like the ones you created here. in testsets these were handled via columns. I'll think of adding this to the loadable api I'm planning currently

…path formats

Updated the get_nested_value function to utilize resolve_any() for improved path resolution, allowing support for dot notation, JSON Path, and JSON Pointer formats. Cleaned up imports and ensured consistent error handling for path resolution failures.
…-match-evaluator

# Conflicts:
#	api/oss/src/services/evaluators_service.py
#	sdk/agenta/sdk/workflows/handlers.py
… reactivity

- Updated FieldsTagsEditor component to utilize Form.useWatch instead of form.getFieldValue for monitoring changes to correct_answer_key.
- This change enhances reactivity and ensures the component responds appropriately to form updates.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good call

* @param prefix - Current path prefix (used for recursion)
* @returns Array of dot-notation paths to all leaf values
*/
export const extractJsonPaths = (obj: unknown, prefix = ""): string[] => {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we already have safeJson5Parse, but no path utils like the ones you created here. in testsets these were handled via columns. I'll think of adding this to the loadable api I'm planning currently

@bekossy bekossy changed the base branch from main to release/v0.74.0 January 6, 2026 11:12
@bekossy bekossy merged commit 66b9a1d into release/v0.74.0 Jan 6, 2026
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Backend feature Frontend size:XL This PR changes 500-999 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants