Skip to content

feat: add deterministic evaluators for output and trajectory checks#3

Closed
afarntrog wants to merge 3 commits intomainfrom
deterministic_evals
Closed

feat: add deterministic evaluators for output and trajectory checks#3
afarntrog wants to merge 3 commits intomainfrom
deterministic_evals

Conversation

@afarntrog
Copy link
Copy Markdown
Owner

Add a new evaluator subpackage containing rule-based evaluators that don't require LLM calls:

  • Output evaluators: , , , for checking actual_output against expected values or custom logic
  • Trajectory evaluator: for verifying specific tools were invoked during agent execution

Includes comprehensive unit tests for all new evaluators and exports them from the top-level evaluators package.

Description

Related Issues

Documentation PR

Type of Change

Bug fix
New feature
Breaking change
Documentation update
Other (please describe):

Testing

How have you tested the change? Verify that the changes do not break functionality or introduce warnings in consuming repositories: agents-docs, agents-tools, agents-cli

  • I ran hatch run prepare

Checklist

  • I have read the CONTRIBUTING document
  • I have added any necessary tests that prove my fix is effective or my feature works
  • I have updated the documentation accordingly
  • I have added an appropriate example to the documentation to outline the feature, or no new docs are needed
  • My changes generate no new warnings
  • Any dependent changes have been merged and published

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

Add a new  evaluator subpackage containing rule-based
evaluators that don't require LLM calls:

- **Output evaluators**: , , ,
  for checking actual_output against expected values or custom logic
- **Trajectory evaluator**:  for verifying specific tools
  were invoked during agent execution

Includes comprehensive unit tests for all new evaluators and exports
them from the top-level evaluators package.
Remove the Custom evaluator class and all associated exports and tests.
The Custom evaluator allowed users to pass arbitrary callback functions,
but this functionality can be achieved by subclassing the base Evaluator
directly, making the Custom wrapper unnecessary.
Remove the Custom evaluator class and all associated exports and tests.
The Custom evaluator allowed users to pass arbitrary callback functions,
but this functionality can be achieved by subclassing the base Evaluator
directly, making the Custom wrapper unnecessary.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant