📊 SQL Analyst OpenEnv

title	SQL Analyst OpenEnv
emoji	📊
colorFrom	blue
colorTo	green
sdk	docker
pinned	false

📊 SQL Analyst OpenEnv

A real-world OpenEnv environment where an AI agent must write correct SQL queries against a live e-commerce database to answer business analytics questions.

Live Demo: https://p-karthik-mohan-sql-analyst-env.hf.space/docs

What Is This?

-This environment simulates the daily work of a data analyst at an e-commerce company. The AI agent receives a natural language business question, explores the database schema, and must produce a correct SQL query to answer it.

Unlike toy environments, this is a task that real analysts perform every day — making it a meaningful benchmark for AI reasoning and code generation.

Baseline Agent Results

Metric	Score
Tasks solved	6 / 8
Average reward	0.985 / 1.000
Model used	llama-3.1-8b-instant (Groq)
Time to complete	Under 3 minutes

Tasks

Eight tasks of increasing difficulty, each graded by an automated SQL result comparator.

Task 1 — Easy

"How many completed orders were placed in 2024?"

Requires: COUNT, WHERE, date filtering
Expected output: single row — total_orders

Task 2 — Medium

"Find the top 5 customers by total revenue from completed orders."

Requires: JOIN, GROUP BY, SUM, ORDER BY, LIMIT
Expected output: 5 rows — first_name, last_name, total_revenue

Task 3 — Hard

"Rank product categories by total revenue using a window function."

Requires: CTE, JOIN, GROUP BY, RANK() OVER (...)
Expected output: all categories — category, total_revenue, revenue_rank

Task 4 — Medium

"Find average product price per category, only for categories with more than 2 products."

Requires: GROUP BY, AVG, HAVING
Expected output: category, avg_price

Task 5 — Hard

"Find customers who ordered from both Electronics and Clothing categories."

Requires: INTERSECT, multiple JOINs
Expected output: customer_id, first_name

Task 6 — Expert

"Calculate month-over-month revenue growth % for 2024."

Requires: LAG() window function, CTE, STRFTIME
Expected output: month, total_revenue, prev_revenue, growth_pct

Task 7 — Expert

"For each city, find the best-selling product by quantity."

Requires: RANK() OVER (PARTITION BY city ...), CTE
Expected output: city, product_name, total_quantity

Task 8 — Expert

"Find customers who spent more in H2 2024 than H1 2024."

Requires: CASE WHEN, conditional SUM, HAVING
Expected output: customer_id, first_name, last_name, h1_revenue, h2_revenue

API Endpoints

Base URL: https://p-karthik-mohan-sql-analyst-env.hf.space

POST /reset

Load a task. Always call this first.

Request:

{"task_id": 1}

Response:

{
  "observation": {
    "task_id": 1,
    "difficulty": "easy",
    "task_description": "Find the total number of completed orders...",
    "schema": "Tables:\n  customers (...)\n  products (...)\n  orders (...)",
    "hint": "Use COUNT with WHERE filters on status and order_date"
  },
  "info": {"message": "Task 1 loaded."}
}

POST /step

Submit a SQL query. Returns reward 0.0-1.0 and feedback.

Request:

{"action": "SELECT COUNT(*) AS total_orders FROM orders WHERE status = 'completed'"}

Response:

{
  "observation": {
    "result_preview": [{"total_orders": 312}],
    "reward_breakdown": {
      "column_score": 0.30,
      "row_score": 0.30,
      "value_score": 0.40,
      "total_reward": 1.0
    }
  },
  "reward": 1.0,
  "done": true
}

GET /state

Get current session — task info, all attempts, best score.

GET /health

Returns {"status": "ok", "db_exists": true} when server is ready.

Observation Space

Field	Type	Description
task_description	string	Natural language business question
schema	string	All table names, columns, types, row counts
hint	string	Guidance on which SQL constructs to use
result_preview	array	First 5 rows of the agent query result
result_row_count	integer	Total rows returned by agent query
reward_breakdown	object	Sub-scores for columns, rows, values

Action Space

Property	Value
Type	string
Format	Valid SQLite SELECT or WITH statement
Restrictions	No INSERT, UPDATE, DELETE, DROP

Example actions:

-- Easy
SELECT COUNT(*) AS total_orders FROM orders WHERE status = 'completed'

-- Medium
SELECT c.first_name, c.last_name, SUM(o.total_amount) AS total_revenue
FROM orders o JOIN customers c ON o.customer_id = c.customer_id
WHERE o.status = 'completed'
GROUP BY o.customer_id ORDER BY total_revenue DESC LIMIT 5

-- Expert
WITH monthly AS (
  SELECT STRFTIME('%Y-%m', order_date) AS month, SUM(total_amount) AS total_revenue
  FROM orders WHERE status = 'completed' AND order_date LIKE '2024%'
  GROUP BY month
)
SELECT month, total_revenue,
  LAG(total_revenue) OVER (ORDER BY month) AS prev_revenue,
  ROUND((total_revenue - LAG(total_revenue) OVER (ORDER BY month))
    / LAG(total_revenue) OVER (ORDER BY month) * 100, 2) AS growth_pct
FROM monthly ORDER BY month ASC

Reward Function

Partial credit score from 0.0 to 1.0 with three components. Even an imperfect query receives meaningful feedback — not just pass/fail.

Component	Weight	How it is measured
Column names	0.30	Fraction of expected column names present
Row count	0.30	Ratio of returned rows vs expected rows
Cell values	0.40	Fraction of cells matching expected values

Database Schema

A realistic e-commerce dataset with 600 orders, 100 customers, 30 products.

customers  (customer_id, first_name, last_name, email, city, signup_date)  — 100 rows
products   (product_id, product_name, category, price, stock)              — 30 rows
orders     (order_id, customer_id, product_id, quantity, total_amount, order_date, status) — 600 rows

All orders are dated in 2024. Status values: completed, pending, cancelled. 10 product categories: Electronics, Clothing, Books, Home & Garden, Sports, Beauty, Toys, Food & Grocery, Automotive, Music.

Running the Baseline Agent

pip install openai requests python-dotenv

export API_BASE_URL=https://api.groq.com/openai/v1
export MODEL_NAME=llama-3.1-8b-instant
export HF_TOKEN=your_groq_api_key_here

python inference.py

Expected output:

SQL Analyst OpenEnv — Baseline Inference Agent
Model      : llama-3.1-8b-instant
Tasks solved   : 6 / 8
Average reward : 0.985 / 1.000
Results saved to results.json

Local Setup

git clone https://huggingface.co/spaces/P-Karthik-Mohan/sql-analyst-env
cd sql-analyst-env
pip install -r requirements.txt
python seed.py
uvicorn main:app --host 0.0.0.0 --port 7860

Docker

docker build -t sql-analyst-env .
docker run -p 7860:7860 sql-analyst-env

Hardware Requirements

Resource	Requirement
CPU	1-2 vCPU
RAM	8GB
GPU	Not required
Disk	~50MB
Inference time	Under 5 minutes for all 8 tasks

Project Structure

sql-analyst-env/
├── main.py          # FastAPI server — /reset, /step, /state endpoints
├── seed.py          # Database seeder — creates ecommerce.db
├── inference.py     # Baseline AI agent using OpenAI-compatible client
├── openenv.yaml     # OpenEnv specification file
├── Dockerfile       # Container for Hugging Face Spaces
├── requirements.txt # Python dependencies
└── data/
    └── ecommerce.db # SQLite database (auto-created by seed.py)

Built With

FastAPI — REST API framework
SQLite — embedded database, no setup required
OpenAI Python client — compatible with Groq, Together, OpenAI
Docker — containerized deployment
Hugging Face Spaces — hosting

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

📊 SQL Analyst OpenEnv

What Is This?

Baseline Agent Results

Tasks

Task 1 — Easy

Task 2 — Medium

Task 3 — Hard

Task 4 — Medium

Task 5 — Hard

Task 6 — Expert

Task 7 — Expert

Task 8 — Expert

API Endpoints

POST /reset

POST /step

GET /state

GET /health

Observation Space

Action Space

Reward Function

Database Schema

Running the Baseline Agent

Local Setup

Docker

Hardware Requirements

Project Structure

Built With

About

Uh oh!

Releases

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 36 Commits
data		data
server		server
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
inference.py		inference.py
main.py		main.py
openenv.yaml		openenv.yaml
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
results.json		results.json
seed.py		seed.py
uv.lock		uv.lock

Folders and files

Latest commit

History

Repository files navigation

📊 SQL Analyst OpenEnv

What Is This?

Baseline Agent Results

Tasks

Task 1 — Easy

Task 2 — Medium

Task 3 — Hard

Task 4 — Medium

Task 5 — Hard

Task 6 — Expert

Task 7 — Expert

Task 8 — Expert

API Endpoints

POST /reset

POST /step

GET /state

GET /health

Observation Space

Action Space

Reward Function

Database Schema

Running the Baseline Agent

Local Setup

Docker

Hardware Requirements

Project Structure

Built With

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Contributors

Uh oh!

Languages