feat(evaluators): add JSON Multi-Field Match evaluator for entity extraction validation #3331

mmabrouk · 2025-12-30T20:05:29Z

Summary

Adds a new JSON Multi-Field Match evaluator for comparing multiple JSON fields between expected and actual outputs
Perfect for entity extraction validation where you need to check if specific fields (name, email, address.city) match the ground truth
Each configured field becomes a separate output score (0 or 1), plus an aggregate_score showing the percentage of matching fields
Deprecates the existing JSON Field Match evaluator in favor of this more powerful version

Features

Multi-field comparison: Configure any number of JSON field paths to compare
Nested field support: Use dot notation for nested paths (e.g., user.address.city)
Array index support: Access array elements with numeric indices (e.g., items.0.name)
Per-field scores: Each field produces a 0/1 score for granular analysis
Aggregate score: Overall percentage of matching fields (0-1)
Auto-detection: Automatically detects available fields from testcase data
Tag-based UI: Intuitive add/remove interface for field management

Changes

Backend (SDK)

Added json_multi_field_match_v0 handler with nested value extraction
Added interface schema with dynamic output support
Added configuration for the new evaluator

Backend (API)

Added evaluator service implementation
Added dynamic schema building for metrics extraction
Added evaluator resource definition with UI settings

Frontend

Added FieldsTagsEditor component (Tailwind CSS, follows AGENTS.md guidelines)
Added extractJsonPaths helper for JSON field detection
Integrated new fields_tags_editor form field type

Test plan

Create a new JSON Multi-Field Match evaluator
Configure fields manually and verify they appear as tags
Select a testcase with JSON in the ground truth column and verify auto-detection works
Use "Detect from testcase" button to replace fields
Run evaluation and verify per-field scores and aggregate_score in results
Test nested field paths like user.address.city
Verify the deprecated JSON Field Match evaluator is hidden but existing configs work

🤖 Generated with Claude Code

…SON Field Match

…oring and update configurations

…n mapping

…te_score instead of score and enhance field comparison descriptions

…d paths in evaluations

…nused styles and optimizing class names

…n workflow handlers

Fixes naming inconsistency between API service and SDK interface schema. The interface defines `aggregate_score` as the required output field, so the service must use the same name to pass schema validation. Also applies ruff auto-cleanup for unused imports. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>

vercel · 2025-12-30T20:05:33Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Review	Updated (UTC)
agenta-documentation	Ready	Preview, Comment	Jan 5, 2026 8:52pm

mmabrouk · 2025-12-30T20:11:32Z

Demo

CleanShot.2025-12-30.at.21.07.17.mp4

mmabrouk · 2025-12-30T20:13:32Z

api/oss/src/models/api/evaluation_model.py

    oss: Optional[bool] = False
    requires_llm_api_keys: Optional[bool] = False
    tags: List[str]
+    archived: Optional[bool] = False


this was forgotten

mmabrouk · 2025-12-30T20:14:15Z

api/oss/src/resources/evaluators/evaluators.py

        "name": "JSON Field Match",
        "key": "field_match_test",
        "direct_use": False,
+        "archived": True,  # Deprecated - use json_multi_field_match instead


old json evaluators will continue working + they will still be editable in the UI but no user can create new ones.

@jp-agenta let's not remove the old evaluators by mistake, a lot of users rely on them, deprecating them completely is more pain than just keeping them indefinitely imo

mmabrouk · 2025-12-30T20:15:16Z

api/oss/src/resources/evaluators/evaluators.py

+        "settings_template": {
+            "fields": {
+                "label": "Fields to Compare",
+                "type": "fields_tags_editor",  # Custom type - tag-based add/remove editor


this tells the UI how to show the playground

mmabrouk · 2025-12-30T20:16:10Z

api/oss/src/services/evaluators_service.py

    return {"outputs": {"success": result}}


+def get_nested_value(obj: Any, path: str) -> Any:


maybe in the future:

This could become more general (allowing the use of json paths)

This is likely to be a common function

mmabrouk · 2025-12-30T20:19:26Z

...nts/pages/evaluations/autoEvaluation/EvaluatorsModal/ConfigureEvaluator/FieldsTagsEditor.tsx

@ardaerzin the code as is mixes business logic and UI (the tag addition / removal component).

For now, I don't see us using this component somewhere else. I expect we would have something like this when we start using tags everywhere. It would make sense then to refactor it to disconnect both.

Let me know what you think. Is this too hacky or good enough for the use case

mmabrouk · 2025-12-30T20:20:55Z

web/oss/src/lib/helpers/extractJsonPaths.ts

+ * @param prefix - Current path prefix (used for recursion)
+ * @returns Array of dot-notation paths to all leaf values
+ */
+export const extractJsonPaths = (obj: unknown, prefix = ""): string[] => {


@ardaerzin this feels like something we have already / would have, no?

we already have safeJson5Parse, but no path utils like the ones you created here. in testsets these were handled via columns. I'll think of adding this to the loadable api I'm planning currently

…path formats Updated the get_nested_value function to utilize resolve_any() for improved path resolution, allowing support for dot notation, JSON Path, and JSON Pointer formats. Cleaned up imports and ensured consistent error handling for path resolution failures.

…-match-evaluator # Conflicts: # api/oss/src/services/evaluators_service.py # sdk/agenta/sdk/workflows/handlers.py

… reactivity - Updated FieldsTagsEditor component to utilize Form.useWatch instead of form.getFieldValue for monitoring changes to correct_answer_key. - This change enhances reactivity and ensures the component responds appropriately to form updates.

ardaerzin · 2026-01-06T10:05:01Z

...nts/pages/evaluations/autoEvaluation/EvaluatorsModal/ConfigureEvaluator/FieldsTagsEditor.tsx

ardaerzin · 2026-01-06T10:12:08Z

web/oss/src/lib/helpers/extractJsonPaths.ts

+ * @param prefix - Current path prefix (used for recursion)
+ * @returns Array of dot-notation paths to all leaf values
+ */
+export const extractJsonPaths = (obj: unknown, prefix = ""): string[] => {


we already have safeJson5Parse, but no path utils like the ones you created here. in testsets these were handled via columns. I'll think of adding this to the loadable api I'm planning currently

mmabrouk and others added 9 commits December 22, 2025 20:48

feat(api): add archived field to LegacyEvaluator model

2520a19

feat(api): introduce JSON Multi-Field Match evaluator and deprecate J…

fb13c91

…SON Field Match

feat(api): implement JSON Multi-Field Match evaluator with dynamic sc…

70f1cb6

…oring and update configurations

feat(api): add json_multi_field_match evaluator and corresponding ico…

7c32857

…n mapping

refactor(api): update json_multi_field_match evaluator to use aggrega…

64f5c14

…te_score instead of score and enhance field comparison descriptions

feat(frontend): add FieldsTagsEditor component for managing JSON fiel…

4a3065e

…d paths in evaluations

refactor(frontend): simplify FieldsTagsEditor component by removing u…

d47fc55

…nused styles and optimizing class names

refactor(api): clean up unused imports and improve parameter checks i…

0df3ddc

…n workflow handlers

dosubot bot added the size:XL This PR changes 500-999 lines, ignoring generated files. label Dec 30, 2025

vercel bot deployed to Preview December 30, 2025 20:05 View deployment

dosubot bot added Backend feature Frontend labels Dec 30, 2025

Merge branch 'main' into feat/json-multi-field-match-evaluator

020a7e4

mmabrouk commented Dec 30, 2025

View reviewed changes

vercel bot deployed to Preview December 30, 2025 20:13 View deployment

mmabrouk commented Dec 30, 2025

View reviewed changes

mmabrouk requested review from ardaerzin, jp-agenta and junaway December 30, 2025 20:21

junaway approved these changes Dec 31, 2025

View reviewed changes

vercel bot deployed to Preview December 31, 2025 14:10 View deployment

mmabrouk mentioned this pull request Dec 31, 2025

docs(evaluators): add JSON Multi-Field Match documentation and changelog #3337

Open

Merge remote-tracking branch 'origin/main' into feat/json-multi-field…

e0c2564

…-match-evaluator # Conflicts: # api/oss/src/services/evaluators_service.py # sdk/agenta/sdk/workflows/handlers.py

vercel bot deployed to Preview January 5, 2026 20:41 View deployment

vercel bot deployed to Preview January 5, 2026 20:52 View deployment

ardaerzin approved these changes Jan 6, 2026

View reviewed changes

bekossy changed the base branch from main to release/v0.74.0 January 6, 2026 11:12

bekossy merged commit 66b9a1d into release/v0.74.0 Jan 6, 2026
9 checks passed

		return {"outputs": {"success": result}}


		def get_nested_value(obj: Any, path: str) -> Any:

feat(evaluators): add JSON Multi-Field Match evaluator for entity extraction validation #3331

feat(evaluators): add JSON Multi-Field Match evaluator for entity extraction validation #3331

Conversation

mmabrouk commented Dec 30, 2025 • edited by bekossy Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Features

Changes

Backend (SDK)

Backend (API)

Frontend

Test plan

Uh oh!

vercel bot commented Dec 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mmabrouk commented Dec 30, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

mmabrouk commented Dec 30, 2025 •

edited by bekossy

Loading

vercel bot commented Dec 30, 2025 •

edited

Loading