| title | SQL Analyst OpenEnv |
|---|---|
| emoji | 📊 |
| colorFrom | blue |
| colorTo | green |
| sdk | docker |
| pinned | false |
A real-world OpenEnv environment where an AI agent must write correct SQL queries against a live e-commerce database to answer business analytics questions.
Live Demo: https://p-karthik-mohan-sql-analyst-env.hf.space/docs
-This environment simulates the daily work of a data analyst at an e-commerce company. The AI agent receives a natural language business question, explores the database schema, and must produce a correct SQL query to answer it.
Unlike toy environments, this is a task that real analysts perform every day — making it a meaningful benchmark for AI reasoning and code generation.
| Metric | Score |
|---|---|
| Tasks solved | 6 / 8 |
| Average reward | 0.985 / 1.000 |
| Model used | llama-3.1-8b-instant (Groq) |
| Time to complete | Under 3 minutes |
Eight tasks of increasing difficulty, each graded by an automated SQL result comparator.
"How many completed orders were placed in 2024?"
- Requires: COUNT, WHERE, date filtering
- Expected output: single row — total_orders
"Find the top 5 customers by total revenue from completed orders."
- Requires: JOIN, GROUP BY, SUM, ORDER BY, LIMIT
- Expected output: 5 rows — first_name, last_name, total_revenue
"Rank product categories by total revenue using a window function."
- Requires: CTE, JOIN, GROUP BY, RANK() OVER (...)
- Expected output: all categories — category, total_revenue, revenue_rank
"Find average product price per category, only for categories with more than 2 products."
- Requires: GROUP BY, AVG, HAVING
- Expected output: category, avg_price
"Find customers who ordered from both Electronics and Clothing categories."
- Requires: INTERSECT, multiple JOINs
- Expected output: customer_id, first_name
"Calculate month-over-month revenue growth % for 2024."
- Requires: LAG() window function, CTE, STRFTIME
- Expected output: month, total_revenue, prev_revenue, growth_pct
"For each city, find the best-selling product by quantity."
- Requires: RANK() OVER (PARTITION BY city ...), CTE
- Expected output: city, product_name, total_quantity
"Find customers who spent more in H2 2024 than H1 2024."
- Requires: CASE WHEN, conditional SUM, HAVING
- Expected output: customer_id, first_name, last_name, h1_revenue, h2_revenue
Base URL: https://p-karthik-mohan-sql-analyst-env.hf.space
Load a task. Always call this first.
Request:
{"task_id": 1}Response:
{
"observation": {
"task_id": 1,
"difficulty": "easy",
"task_description": "Find the total number of completed orders...",
"schema": "Tables:\n customers (...)\n products (...)\n orders (...)",
"hint": "Use COUNT with WHERE filters on status and order_date"
},
"info": {"message": "Task 1 loaded."}
}Submit a SQL query. Returns reward 0.0-1.0 and feedback.
Request:
{"action": "SELECT COUNT(*) AS total_orders FROM orders WHERE status = 'completed'"}Response:
{
"observation": {
"result_preview": [{"total_orders": 312}],
"reward_breakdown": {
"column_score": 0.30,
"row_score": 0.30,
"value_score": 0.40,
"total_reward": 1.0
}
},
"reward": 1.0,
"done": true
}Get current session — task info, all attempts, best score.
Returns {"status": "ok", "db_exists": true} when server is ready.
| Field | Type | Description |
|---|---|---|
| task_description | string | Natural language business question |
| schema | string | All table names, columns, types, row counts |
| hint | string | Guidance on which SQL constructs to use |
| result_preview | array | First 5 rows of the agent query result |
| result_row_count | integer | Total rows returned by agent query |
| reward_breakdown | object | Sub-scores for columns, rows, values |
| Property | Value |
|---|---|
| Type | string |
| Format | Valid SQLite SELECT or WITH statement |
| Restrictions | No INSERT, UPDATE, DELETE, DROP |
Example actions:
-- Easy
SELECT COUNT(*) AS total_orders FROM orders WHERE status = 'completed'
-- Medium
SELECT c.first_name, c.last_name, SUM(o.total_amount) AS total_revenue
FROM orders o JOIN customers c ON o.customer_id = c.customer_id
WHERE o.status = 'completed'
GROUP BY o.customer_id ORDER BY total_revenue DESC LIMIT 5
-- Expert
WITH monthly AS (
SELECT STRFTIME('%Y-%m', order_date) AS month, SUM(total_amount) AS total_revenue
FROM orders WHERE status = 'completed' AND order_date LIKE '2024%'
GROUP BY month
)
SELECT month, total_revenue,
LAG(total_revenue) OVER (ORDER BY month) AS prev_revenue,
ROUND((total_revenue - LAG(total_revenue) OVER (ORDER BY month))
/ LAG(total_revenue) OVER (ORDER BY month) * 100, 2) AS growth_pct
FROM monthly ORDER BY month ASCPartial credit score from 0.0 to 1.0 with three components. Even an imperfect query receives meaningful feedback — not just pass/fail.
| Component | Weight | How it is measured |
|---|---|---|
| Column names | 0.30 | Fraction of expected column names present |
| Row count | 0.30 | Ratio of returned rows vs expected rows |
| Cell values | 0.40 | Fraction of cells matching expected values |
A realistic e-commerce dataset with 600 orders, 100 customers, 30 products.
customers (customer_id, first_name, last_name, email, city, signup_date) — 100 rows
products (product_id, product_name, category, price, stock) — 30 rows
orders (order_id, customer_id, product_id, quantity, total_amount, order_date, status) — 600 rows
All orders are dated in 2024. Status values: completed, pending, cancelled. 10 product categories: Electronics, Clothing, Books, Home & Garden, Sports, Beauty, Toys, Food & Grocery, Automotive, Music.
pip install openai requests python-dotenv
export API_BASE_URL=https://api.groq.com/openai/v1
export MODEL_NAME=llama-3.1-8b-instant
export HF_TOKEN=your_groq_api_key_here
python inference.pyExpected output:
SQL Analyst OpenEnv — Baseline Inference Agent
Model : llama-3.1-8b-instant
Tasks solved : 6 / 8
Average reward : 0.985 / 1.000
Results saved to results.json
git clone https://huggingface.co/spaces/P-Karthik-Mohan/sql-analyst-env
cd sql-analyst-env
pip install -r requirements.txt
python seed.py
uvicorn main:app --host 0.0.0.0 --port 7860docker build -t sql-analyst-env .
docker run -p 7860:7860 sql-analyst-env| Resource | Requirement |
|---|---|
| CPU | 1-2 vCPU |
| RAM | 8GB |
| GPU | Not required |
| Disk | ~50MB |
| Inference time | Under 5 minutes for all 8 tasks |
sql-analyst-env/
├── main.py # FastAPI server — /reset, /step, /state endpoints
├── seed.py # Database seeder — creates ecommerce.db
├── inference.py # Baseline AI agent using OpenAI-compatible client
├── openenv.yaml # OpenEnv specification file
├── Dockerfile # Container for Hugging Face Spaces
├── requirements.txt # Python dependencies
└── data/
└── ecommerce.db # SQLite database (auto-created by seed.py)
- FastAPI — REST API framework
- SQLite — embedded database, no setup required
- OpenAI Python client — compatible with Groq, Together, OpenAI
- Docker — containerized deployment
- Hugging Face Spaces — hosting