Skip to content

Commit ca791bf

Browse files
abrichrclaude
andauthored
feat: migrate evaluation infrastructure from openadapt-ml (#29)
* feat: migrate evaluation infrastructure from openadapt-ml Move all evaluation infrastructure (~13,000 lines) from openadapt-ml/benchmarks/ to openadapt-evals so openadapt-ml can focus on pure ML (schemas, training, inference, model adapters). Migrated modules: - benchmarks/vm_cli.py: Full VM/pool CLI with 50+ commands (8,503 lines) - infrastructure/azure_vm.py: AzureVMManager with SDK + CLI fallback - infrastructure/pool.py: PoolManager for multi-VM orchestration - infrastructure/resource_tracker.py: Azure cost tracking - benchmarks/pool_viewer.py: Pool results HTML viewer - benchmarks/trace_export.py: Training data export (keeps openadapt_ml.schema dep) - waa_deploy/: Docker agent deployment files Also adds: - config.py: Pydantic-settings config for Azure credentials - pydantic-settings + azure-mgmt-* dependencies - 4 test files migrated from openadapt-ml Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: correct DOCKERFILE_PATH and stale debug path in vm_cli - DOCKERFILE_PATH: use parent.parent to reach waa_deploy/ from benchmarks/ - cmd_tail_output: update hardcoded task dir from openadapt-ml to openadapt-evals Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
1 parent 9132540 commit ca791bf

16 files changed

Lines changed: 13083 additions & 39 deletions

openadapt_evals/benchmarks/pool_viewer.py

Lines changed: 683 additions & 0 deletions
Large diffs are not rendered by default.

openadapt_evals/benchmarks/trace_export.py

Lines changed: 621 additions & 0 deletions
Large diffs are not rendered by default.

openadapt_evals/benchmarks/vm_cli.py

Lines changed: 8323 additions & 0 deletions
Large diffs are not rendered by default.

openadapt_evals/config.py

Lines changed: 51 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,51 @@
1+
from __future__ import annotations
2+
3+
from pydantic_settings import BaseSettings
4+
5+
6+
class Settings(BaseSettings):
7+
"""Application settings loaded from environment variables or .env file.
8+
9+
Priority order for configuration values:
10+
1. Environment variables
11+
2. .env file
12+
3. Default values (None for API keys)
13+
"""
14+
15+
# VLM API Keys
16+
anthropic_api_key: str | None = None
17+
openai_api_key: str | None = None
18+
google_api_key: str | None = None
19+
20+
# Azure credentials (for WAA benchmark on Azure)
21+
# These are used by DefaultAzureCredential for Service Principal auth
22+
azure_client_id: str | None = None
23+
azure_client_secret: str | None = None
24+
azure_tenant_id: str | None = None
25+
26+
# Azure ML workspace config
27+
azure_subscription_id: str | None = None
28+
azure_ml_resource_group: str | None = None
29+
azure_ml_workspace_name: str | None = None
30+
31+
# Azure resource group for VM operations (used by benchmarks CLI)
32+
azure_resource_group: str = "openadapt-agents"
33+
34+
# Azure VM settings (optional overrides)
35+
azure_vm_size: str = "Standard_D2_v3"
36+
azure_docker_image: str = "docker.io/windowsarena/winarena:latest"
37+
38+
# Azure Storage for async inference queue
39+
azure_storage_connection_string: str | None = None
40+
azure_inference_queue_name: str = "inference-jobs"
41+
azure_checkpoints_container: str = "checkpoints"
42+
azure_comparisons_container: str = "comparisons"
43+
44+
model_config = {
45+
"env_file": ".env",
46+
"env_file_encoding": "utf-8",
47+
"extra": "ignore", # ignore extra env vars
48+
}
49+
50+
51+
settings = Settings()
Lines changed: 16 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -1,32 +1,39 @@
11
"""Infrastructure components for VM management and monitoring.
22
33
This module provides:
4+
- AzureVMManager: Azure VM lifecycle management (SDK + CLI fallback)
5+
- PoolManager: Multi-VM pool orchestration
46
- VMMonitor: Azure VM status monitoring
57
- AzureOpsTracker: Azure operation logging
68
- SSHTunnelManager: SSH tunnel management for VNC/API access
79
810
Example:
911
```python
10-
from openadapt_evals.infrastructure import VMMonitor, SSHTunnelManager
12+
from openadapt_evals.infrastructure import AzureVMManager, PoolManager
1113
12-
# Monitor VM status
13-
monitor = VMMonitor()
14-
status = monitor.get_status()
14+
# Manage VMs
15+
vm = AzureVMManager()
16+
ip = vm.get_vm_ip("waa-eval-vm")
1517
16-
# Manage SSH tunnels
17-
tunnel_manager = SSHTunnelManager()
18-
tunnel_manager.start_tunnels_for_vm("172.171.112.41", "azureuser")
18+
# Create and manage pools
19+
pool = PoolManager()
20+
pool.create(workers=3)
1921
```
2022
"""
2123

22-
from openadapt_evals.infrastructure.vm_monitor import VMMonitor, VMConfig
2324
from openadapt_evals.infrastructure.azure_ops_tracker import AzureOpsTracker
25+
from openadapt_evals.infrastructure.azure_vm import AzureVMManager
26+
from openadapt_evals.infrastructure.pool import PoolManager, PoolRunResult
2427
from openadapt_evals.infrastructure.ssh_tunnel import SSHTunnelManager, get_tunnel_manager
28+
from openadapt_evals.infrastructure.vm_monitor import VMMonitor, VMConfig
2529

2630
__all__ = [
31+
"AzureOpsTracker",
32+
"AzureVMManager",
33+
"PoolManager",
34+
"PoolRunResult",
2735
"VMMonitor",
2836
"VMConfig",
29-
"AzureOpsTracker",
3037
"SSHTunnelManager",
3138
"get_tunnel_manager",
3239
]

0 commit comments

Comments
 (0)