feat: add deterministic evaluators for output and trajectory checks by afarntrog · Pull Request #3 · afarntrog/evals

afarntrog · 2026-03-10T13:30:35Z

Add a new evaluator subpackage containing rule-based evaluators that don't require LLM calls:

Output evaluators: , , , for checking actual_output against expected values or custom logic
Trajectory evaluator: for verifying specific tools were invoked during agent execution

Includes comprehensive unit tests for all new evaluators and exports them from the top-level evaluators package.

Description

Related Issues

Documentation PR

Type of Change

Bug fix
New feature
Breaking change
Documentation update
Other (please describe):

Testing

How have you tested the change? Verify that the changes do not break functionality or introduce warnings in consuming repositories: agents-docs, agents-tools, agents-cli

I ran hatch run prepare

Checklist

I have read the CONTRIBUTING document
I have added any necessary tests that prove my fix is effective or my feature works
I have updated the documentation accordingly
I have added an appropriate example to the documentation to outline the feature, or no new docs are needed
My changes generate no new warnings
Any dependent changes have been merged and published

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

Add a new evaluator subpackage containing rule-based evaluators that don't require LLM calls: - **Output evaluators**: , , , for checking actual_output against expected values or custom logic - **Trajectory evaluator**: for verifying specific tools were invoked during agent execution Includes comprehensive unit tests for all new evaluators and exports them from the top-level evaluators package.

Remove the Custom evaluator class and all associated exports and tests. The Custom evaluator allowed users to pass arbitrary callback functions, but this functionality can be achieved by subclassing the base Evaluator directly, making the Custom wrapper unnecessary.

afarntrog had a problem deploying to auto-approve March 10, 2026 13:30 — with GitHub Actions Failure

afarntrog had a problem deploying to auto-approve March 10, 2026 19:05 — with GitHub Actions Failure

afarntrog had a problem deploying to auto-approve March 10, 2026 19:16 — with GitHub Actions Failure

afarntrog closed this Mar 31, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add deterministic evaluators for output and trajectory checks#3

feat: add deterministic evaluators for output and trajectory checks#3
afarntrog wants to merge 3 commits intomainfrom
deterministic_evals

afarntrog commented Mar 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

afarntrog commented Mar 10, 2026

Description

Related Issues

Documentation PR

Type of Change

Testing

Checklist

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant