Skip to content

Build production-ready Text-to-SQL system with LangChain + FastAPI + React#1

Merged
techwithprateek merged 9 commits into
mainfrom
copilot/build-text-to-sql-system
Apr 11, 2026
Merged

Build production-ready Text-to-SQL system with LangChain + FastAPI + React#1
techwithprateek merged 9 commits into
mainfrom
copilot/build-text-to-sql-system

Conversation

Copilot AI commented Apr 10, 2026

Copy link
Copy Markdown
Contributor
  • Project scaffolding (.gitignore, .env.example, requirements.txt)
  • model/ — database.py (engine factory), schema.py (star schema)
  • agent/ — semantic_layer.py, build_index.py, retriever.py, sql_chain.py, hitl_guard.py, few_shot_examples.yaml
  • api/ — main.py, routes/query.py, routes/schema.py, routes/health.py
  • data/ — seed.py (Olist CSV → star schema loader)
  • frontend/ — React 18 + Vite TypeScript app with ChatWindow, SqlDisplay, ResultsTable, SchemaExplorer, ApprovalModal
  • infra/ — setup.sh, configure_env.sh, install_app.sh, texttosql.service, nginx.conf, verify.sh, TROUBLESHOOTING.md, README_DEPLOY.md
  • Code review fixes: setup.sh inline-comment syntax, curl|bash → download-then-exec, CORS via ALLOWED_ORIGINS env var, fetchmany→fetchall with SQL LIMIT injection
  • Security: upgrade langchain-community 0.2.1 → 0.3.27 (XXE + pickle deserialization CVEs), langchain 0.2.1 → 0.3.28, langchain-openai 0.1.8 → 0.3.35
  • model/database.py: cache engine + sessionmaker as module-level singletons; replace StaticPool with NullPool for file-based SQLite
  • agent/sql_chain.py: use with get_session() as session: in _log_query
  • model/schema.py: add unique=True on DimReviews.order_id to enforce one-to-one relationship correctness
  • model/schema.py: change FloatNumeric(12, 2) for order_total_usd and freight_value_usd currency columns
  • agent/sql_chain.py: enforce SELECT/WITH-only allowlist in _execute_sql; reject multi-statement payloads
  • api/routes/health.py: replace per-request models.list() with key-presence check + 60-second TTL cache
  • infra/nginx.conf: replace unconditional Connection "upgrade" with a map $http_upgrade $connection_upgrade so keep-alive is preserved for normal HTTP requests
  • infra/nginx.conf: remove location = /api/health exact-match block; use map $request_uri $loggable + access_log ... if=$loggable so the health endpoint inherits all proxy headers/timeouts from the /api/ location
  • infra/configure_env.sh: use printf '%q' to shell-escape all values written to .env, preventing breakage on source when values contain spaces, $, #, or other shell-sensitive characters
  • infra/verify.sh: add --connect-timeout 2 --max-time 5 to the EC2 metadata curl so the script fails fast on non-EC2 hosts or when the metadata service is blocked

Copilot AI and others added 3 commits April 10, 2026 20:05
- Root: .gitignore, .env.example, requirements.txt (pinned)
- model/: SQLAlchemy database factory (database.py) + star schema (schema.py)
- agent/: semantic layer, RAG build_index + retriever, HITL guard,
          few-shot YAML examples, LCEL sql_chain pipeline
- api/: FastAPI app with CORS + global error handler; routes for
        /api/query, /api/approve, /api/schema, /api/health
- data/: seed.py loads Olist CSVs into star schema SQLite tables
- frontend/: React 18 + Vite + TypeScript with dark terminal UI;
             ChatWindow, SqlDisplay, ResultsTable, SchemaExplorer,
             ApprovalModal components
- infra/: setup.sh, configure_env.sh, install_app.sh, systemd service,
          nginx.conf, verify.sh, TROUBLESHOOTING.md, README_DEPLOY.md

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Co-authored-by: nerdjerry <7092764+nerdjerry@users.noreply.github.com>
- model/schema.py: use lambda for datetime.utcnow() defaults so each row
  gets its own insertion timestamp instead of the class-definition time
- api/main.py: set allow_credentials=False when allow_origins=["*"] to
  avoid the CORS security risk of wildcard + credentials combination
- agent/sql_chain.py: cache ChatOpenAI instance at module level via _get_llm()
  to reuse HTTP connection pool across requests, reducing latency

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Co-authored-by: nerdjerry <7092764+nerdjerry@users.noreply.github.com>

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Introduces a full, deployable Text-to-SQL application stack (data model + seeding, LangChain RAG agent, FastAPI backend, React frontend, and EC2/nginx/systemd deployment tooling).

Changes:

  • Adds star-schema SQLAlchemy models plus Olist CSV seeding script.
  • Implements LangChain LCEL pipeline with ChromaDB-backed schema retrieval and a HITL guard.
  • Adds FastAPI API routes (query/approve/schema/health) and a React UI, plus infra scripts/configs for EC2 deployment.

Reviewed changes

Copilot reviewed 35 out of 42 changed files in this pull request and generated 15 comments.

Show a summary per file
File Description
requirements.txt Pins backend dependencies (FastAPI, SQLAlchemy, LangChain, ChromaDB, OpenAI).
model/schema.py Defines star schema tables + query_log ORM model.
model/database.py Adds engine/session factory for SQLite/Postgres.
model/init.py Package marker for model module.
data/seed.py Loads Olist CSVs into the star schema (optional Kaggle download).
data/raw/.gitkeep Ensures raw data directory exists in git.
agent/semantic_layer.py Provides semantic schema dictionary used for prompting and indexing.
agent/build_index.py Builds/persists ChromaDB embeddings for schema RAG.
agent/retriever.py Retrieves relevant schema snippets from ChromaDB at query time.
agent/sql_chain.py LCEL pipeline: question → SQL → HITL check → execute → log.
agent/hitl_guard.py Regex-based guard to flag potentially dangerous SQL for approval.
agent/few_shot_examples.yaml Few-shot Q→SQL examples injected into prompts.
agent/init.py Package marker for agent module.
api/main.py FastAPI app setup, CORS config, global exception handler, router inclusion.
api/routes/query.py /api/query and /api/approve endpoints and response models.
api/routes/schema.py /api/schema endpoint serving semantic schema.
api/routes/health.py /api/health readiness/liveness checks (DB/Chroma/OpenAI).
api/routes/init.py Package marker for routes module.
api/init.py Package marker for api module.
frontend/package.json Frontend dependencies/scripts for React + Vite + TS.
frontend/vite.config.ts Dev server proxy configuration for /api → backend.
frontend/tsconfig.json TypeScript compiler configuration.
frontend/index.html HTML entrypoint including font imports.
frontend/src/main.tsx React root bootstrap.
frontend/src/index.css Global styling + theme variables.
frontend/src/api.ts Axios client + typed API wrappers for backend endpoints.
frontend/src/App.tsx Main layout wiring schema explorer, chat window, approval modal.
frontend/src/components/ChatWindow.tsx Chat UI, query submission, message rendering.
frontend/src/components/SqlDisplay.tsx SQL syntax highlighting + copy-to-clipboard + approval banner.
frontend/src/components/ResultsTable.tsx Sortable results table rendering.
frontend/src/components/SchemaExplorer.tsx Sidebar schema browser with expand/collapse and active-table highlighting.
frontend/src/components/ApprovalModal.tsx “CONFIRM”-based human approval UI for flagged SQL.
infra/setup.sh Provisioning script for Ubuntu (Python 3.11, Node 20, nginx).
infra/configure_env.sh Writes .env with secrets/settings and locks permissions.
infra/install_app.sh Installs Python deps, creates schema, seeds data, builds index + frontend.
infra/texttosql.service systemd unit for running gunicorn/uvicorn workers.
infra/nginx.conf nginx reverse proxy + static hosting configuration.
infra/verify.sh Post-deploy verification script (systemd/nginx/health checks).
infra/README_DEPLOY.md EC2 deployment walkthrough.
infra/TROUBLESHOOTING.md Troubleshooting guide for common operational failures.
.gitignore Ignores env/db/venv/chroma_store/dist/node_modules artifacts.
.env.example Example environment variables including ALLOWED_ORIGINS.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread model/database.py Outdated
Comment thread model/schema.py
Comment thread model/schema.py
Comment thread agent/retriever.py Outdated
Comment thread agent/sql_chain.py
Comment thread infra/configure_env.sh Outdated
Comment thread frontend/src/components/SqlDisplay.tsx
Comment thread model/database.py Outdated
Comment thread infra/verify.sh
Comment thread infra/nginx.conf
techwithprateek and others added 2 commits April 11, 2026 08:22
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
…→ NullPool for file SQLite

Agent-Logs-Url: https://github.com/nerdjerry/text-to-sql/sessions/39759ee2-4d70-4c95-bb8f-7b8ca7fa6378

Co-authored-by: nerdjerry <7092764+nerdjerry@users.noreply.github.com>
…sql, cache OpenAI health check

Agent-Logs-Url: https://github.com/nerdjerry/text-to-sql/sessions/4860fb21-30ae-43d9-803b-b13b4dd1ec28

Co-authored-by: nerdjerry <7092764+nerdjerry@users.noreply.github.com>
…nfigure_env.sh shell escaping

Agent-Logs-Url: https://github.com/nerdjerry/text-to-sql/sessions/ecc1783b-e871-4175-af27-1fa21dbdcdfd

Co-authored-by: nerdjerry <7092764+nerdjerry@users.noreply.github.com>
@techwithprateek techwithprateek marked this pull request as ready for review April 11, 2026 03:03
@techwithprateek techwithprateek merged commit 662b3d4 into main Apr 11, 2026
1 of 2 checks passed
Copilot stopped work on behalf of techwithprateek due to an error April 11, 2026 03:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants