Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion multimodal/omni-tars/omni-agent/src/index.ts
Original file line number Diff line number Diff line change
Expand Up @@ -199,7 +199,7 @@ export default class OmniTARSAgent extends ComposableAgent {
'https://images.unsplash.com/photo-1493225457124-a3eb161ffa5f?w=400&h=300&fit=crop&crop=center',
},
],
workspace: {
z``: {
navItems: [
{
title: 'Code Server',
Expand Down
208 changes: 206 additions & 2 deletions multimodal/tarko/agent-snapshot/README.md
Original file line number Diff line number Diff line change
@@ -1,12 +1,216 @@
# @tarko/agent-snapshot

A snapshot-based agent test framework for `@tarko/agent` based Agents",
A snapshot-based testing framework for `@tarko/agent` based Agents. This package provides deterministic testing capabilities by capturing and replaying agent interactions, including LLM requests/responses, tool calls, and event streams.

## Features

- **Snapshot Generation**: Capture real agent interactions for test fixtures
- **Deterministic Replay**: Mock LLM responses using captured snapshots
- **Comprehensive Verification**: Validate LLM requests, event streams, and tool calls
- **Flexible Configuration**: Customize normalization and verification settings
- **CLI Support**: Command-line tools for snapshot management

## Installation

```bash
npm install @tarko/agent-snapshot
```

## Usage
## Quick Start

### Basic Usage

```typescript
import { Agent } from '@tarko/agent';
import { AgentSnapshot } from '@tarko/agent-snapshot';

// Create your agent
const agent = new Agent(/* your config */);

// Create snapshot instance
const snapshot = new AgentSnapshot(agent, {
snapshotPath: './fixtures/my-test-case',
snapshotName: 'example-test'
});

// Generate snapshot (runs with real LLM)
await snapshot.generate("Hello, how can you help me?");

// Replay test (uses mocked responses)
const result = await snapshot.replay("Hello, how can you help me?");
```

### Advanced Configuration

```typescript
const snapshot = new AgentSnapshot(agent, {
snapshotPath: './fixtures/complex-test',
updateSnapshots: false,
normalizerConfig: {
fieldsToNormalize: [
{ pattern: /timestamp/i, replacement: '<<TIMESTAMP>>' },
{ pattern: 'id', replacement: '<<ID>>' }
],
fieldsToIgnore: ['debug_info']
},
verification: {
verifyLLMRequests: true,
verifyEventStreams: true,
verifyToolCalls: true
}
});
```

## API Reference

### AgentSnapshot

The main class for managing agent snapshots.

#### Constructor

```typescript
new AgentSnapshot(agent: Agent, options: AgentSnapshotOptions)
```

#### Methods

- `generate(runOptions: AgentRunOptions): Promise<SnapshotGenerationResult>`
- `replay(runOptions: AgentRunOptions, config?: TestRunConfig): Promise<SnapshotRunResult>`
- `getAgent(): Agent`
- `getCurrentLoop(): number`

### AgentSnapshotRunner

Utility class for managing multiple test cases.

```typescript
const runner = new AgentSnapshotRunner([
{
name: 'basic-chat',
path: './test-cases/basic-chat.ts',
snapshotPath: './fixtures/basic-chat'
}
]);

// Generate all snapshots
await runner.generateAll();

// Run all tests
await runner.replayAll();
```

## Configuration Options

### AgentSnapshotOptions

```typescript
interface AgentSnapshotOptions {
snapshotPath: string; // Directory for snapshots
snapshotName?: string; // Test case name
updateSnapshots?: boolean; // Update mode flag
normalizerConfig?: AgentNormalizerConfig;
verification?: {
verifyLLMRequests?: boolean;
verifyEventStreams?: boolean;
verifyToolCalls?: boolean;
};
}
```

### Normalizer Configuration

The normalizer helps create stable snapshots by replacing dynamic values:

```typescript
interface AgentNormalizerConfig {
fieldsToNormalize?: Array<{
pattern: string | RegExp;
replacement?: any;
deep?: boolean;
}>;
fieldsToIgnore?: (string | RegExp)[];
customNormalizers?: Array<{
pattern: string | RegExp;
normalizer: (value: any, path: string) => any;
}>;
}
```

## Snapshot Structure

Generated snapshots follow this directory structure:

```
fixtures/
└── test-case-name/
β”œβ”€β”€ initial/
β”‚ └── event-stream.jsonl
β”œβ”€β”€ loop-1/
β”‚ β”œβ”€β”€ llm-request.jsonl
β”‚ β”œβ”€β”€ llm-response.jsonl
β”‚ β”œβ”€β”€ event-stream.jsonl
β”‚ └── tool-calls.jsonl
β”œβ”€β”€ loop-2/
β”‚ └── ...
└── event-stream.jsonl
```

## CLI Usage

```bash
# Generate snapshots
npx agent-snapshot generate my-test-case

# Run tests
npx agent-snapshot replay my-test-case

# Update snapshots
npx agent-snapshot replay my-test-case --updateSnapshot
```

## Best Practices

1. **Stable Test Data**: Use the normalizer to handle dynamic values like timestamps and IDs
2. **Focused Tests**: Create separate snapshots for different scenarios
3. **Version Control**: Commit snapshots to ensure consistent test behavior
4. **Update Mode**: Use `--updateSnapshot` carefully and review changes
5. **Verification Settings**: Adjust verification options based on test requirements

## Troubleshooting

### Common Issues

- **Snapshot Mismatch**: Check normalizer configuration for dynamic fields
- **Missing Snapshots**: Ensure snapshots are generated before running tests
- **Loop Count Errors**: Verify agent behavior consistency between runs

### Debug Tips

- Enable detailed logging by setting appropriate log levels
- Use `.actual.jsonl` files to compare expected vs actual data
- Review snapshot directory structure for completeness

## Integration with Testing Frameworks

### Vitest Example

```typescript
import { describe, it, expect } from 'vitest';
import { AgentSnapshot } from '@tarko/agent-snapshot';

describe('Agent Tests', () => {
it('should handle basic conversation', async () => {
const snapshot = new AgentSnapshot(agent, {
snapshotPath: './fixtures/basic-conversation'
});

const result = await snapshot.replay("Hello world");
expect(result.meta.loopCount).toBe(1);
});
});
```

## License

Apache-2.0
Loading
Loading