InfraSquad

The Autonomous Cloud Architecture and Security Team.

InfraSquad is a multi-agent system that takes natural language infrastructure requirements, debates cloud architecture, writes Terraform code, runs an automated security scan via an MCP server, and generates a visual architecture diagram. Users go from a text prompt to a generated diagram, deployable Terraform, and a security report in minutes.

Architecture

stateDiagram-v2
    [*] --> UserRequest
    UserRequest --> InputValidation: Submit prompt

    InputValidation --> Conversation: Follow-up in active thread
    Conversation --> [*]: Return contextual explanation

    InputValidation --> FallbackResponse: Off-topic / invalid request
    FallbackResponse --> [*]: Return safe capability message

    InputValidation --> ProductArchitect: New infrastructure request
    ProductArchitect --> DevOpsEngineer: Generate architecture plan
    DevOpsEngineer --> OutputValidation: Generate Terraform code

    OutputValidation --> DevOpsEngineer: Invalid output + retries left
    OutputValidation --> SecurityAuditor: Valid output OR retry cap reached

    SecurityAuditor --> DevOpsEngineer: Findings + retries left
    SecurityAuditor --> Visualizer: Passed OR retry cap reached

    Visualizer --> [*]: Render Mermaid diagram + code

The workflow is a LangGraph state machine with conditional routing: new infra requests run the full generation pipeline, follow-up requests route to the conversation node for contextual answers, and off-topic requests route to a safe fallback response. Validation/security failures loop back to the DevOpsEngineer with capped remediation cycles to prevent infinite loops.

Agent Roles

Agent	Responsibility
Product Architect	Analyzes user requirements and produces a high-level AWS architecture plan
DevOps Engineer	Translates the plan into valid Terraform HCL; remediates security findings
Security Auditor	Scans Terraform via MCP (tfsec/checkov) or falls back to LLM review
Visualizer	Generates a Mermaid.js architecture diagram via MCP rendering

MCP Tools

Tool	Function
`run_tfsec_scan`	Saves Terraform to a temp file, runs tfsec or checkov, returns the JSON report
`generate_architecture_diagram`	Renders Mermaid.js source to a PNG via mmdc

Guardrails

Check	Description
Input validation	Keyword classifier rejects non-infrastructure requests with a polite fallback
Output validation	Verifies generated code contains `provider` and `resource` blocks with balanced braces
IAM safety	Regex blocker prevents `AdministratorAccess` policies and wildcard `"Action": "*"`
MCP fallback	If the security scanner crashes or times out, the system falls back to LLM review

Project Structure

infrasquad/
|
|-- infrasquad/                     # Core Python package
|   |-- __init__.py
|   |-- config.py                   # Settings via pydantic-settings (.env)
|   |-- llm.py                      # Shared LLM client + schema-retry helper
|   |-- prompts.py                  # System prompts for all agents
|   |
|   |-- agents/                     # Agent modules
|   |   |-- architect.py            # Product Architect
|   |   |-- devops.py               # DevOps Engineer (HCL generation + guardrail-aware remediation)
|   |   |-- security.py             # Security Auditor (MCP tfsec + LLM fallback)
|   |   |-- visualizer.py           # Diagram Visualizer
|   |
|   |-- graph/                      # LangGraph workflow
|   |   |-- state.py                # AgentState TypedDict (incl. hcl_validation_errors)
|   |   |-- nodes.py                # Node functions; security_node merges guardrail findings
|   |   |-- edges.py                # Conditional routing logic
|   |   |-- workflow.py             # Graph construction and compilation
|   |
|   |-- guardrails/                 # Input/output validation and safety
|   |   |-- input_validation.py     # Infrastructure intent classifier
|   |   |-- output_validation.py    # HCL syntax + human-readable security guardrails
|   |   |-- safety.py               # IAM policy blockers
|   |
|   |-- mcp/                        # Model Context Protocol client/server
|       |-- client.py               # MCP tool invocation helper
|       |-- server.py               # FastMCP server with tool registration
|       |-- tools/
|           |-- tfsec.py            # tfsec/checkov scanner
|           |-- diagram.py          # Mermaid-to-PNG renderer
|
|-- ui/
|   |-- gradio/
|       |-- app.py                  # Entry point: initialises graph and launches UI
|       |-- interface.py            # Gradio Blocks layout and component wiring
|       |-- handlers.py             # Submit event handler (streaming, progress, remediation display)
|       |-- formatters.py           # Markdown renderers for security report and final summary
|       |-- tracker.py              # Pipeline phase tracker HTML renderer
|       |-- styles.css              # Custom stylesheet (glassmorphism, tooltips, tabs)
|
|-- tests/
|   |-- conftest.py                 # Shared fixtures
|   |-- test_guardrails.py          # Guardrail unit tests
|   |-- test_graph.py               # Edge routing tests
|   |-- test_agents_runtime.py      # Agent integration tests
|   |-- test_agent_schema_retry.py  # LLM schema-retry logic tests
|
|-- output/                         # Generated diagrams and checkpoint DB
|-- app.py                          # Entry point: python app.py
|-- pyproject.toml                  # Project metadata and tool config
|-- requirements.txt                # pip-compatible dependencies
|-- .env.example                    # Template for environment variables
|-- .gitignore

Quickstart

Prerequisites

Python 3.12+
uv (recommended) or pip
An OpenRouter API key (free tier works)
Optional: tfsec or checkov for automated security scanning
Optional: mmdc (mermaid-cli) for diagram rendering

1. Clone and install

git clone https://github.com/<your-org>/infrasquad.git
cd infrasquad

# With uv (recommended)
uv venv && uv sync

# Or with pip
python -m venv .venv && source .venv/bin/activate
pip install -r requirements.txt

2. Set your API key

cp .env.example .env

Edit .env and add your OpenRouter API key:

OPENROUTER_API_KEY=sk-or-...

3. Launch the UI

python app.py                # local only at http://127.0.0.1:7860
python app.py --share        # public Gradio link for live demos
python app.py --port 8080    # custom port

4. Run the MCP server (standalone)

python -m infrasquad.mcp.server

5. Run tests

# Install dev dependencies
pip install -e ".[dev]"

# Run the test suite
pytest -v

Configuration

All settings are managed via environment variables (loaded from .env) through pydantic-settings. See infrasquad/config.py for defaults.

Variable	Default	Description
`OPENROUTER_API_KEY`	(required)	Your OpenRouter API key
`LLM_MODEL`	`qwen/qwen-2.5-72b-instruct`	Model identifier on OpenRouter
`LLM_BASE_URL`	`https://openrouter.ai/api/v1`	LLM API base URL
`TEMPERATURE`	`0.2`	Sampling temperature for generation
`MAX_REMEDIATION_CYCLES`	`3`	Max security-fix loops before proceeding

To use a different LLM provider (OpenAI, Anthropic, local Ollama), change LLM_BASE_URL and LLM_MODEL accordingly. The system uses langchain-openai which is compatible with any OpenAI-format API.

Optional External Tools

tfsec (security scanning):

# macOS
brew install tfsec

# or via Go
go install github.com/aquasecurity/tfsec/cmd/tfsec@latest

checkov (alternative scanner):

pip install checkov

mmdc (Mermaid diagram rendering):

npm install -g @mermaid-js/mermaid-cli

If these tools are not installed, the system gracefully degrades: security scanning falls back to LLM-based review, and diagrams are saved as raw Mermaid source.

Design Decisions

Decision	Rationale
LangGraph over CrewAI/AutoGen	LangGraph's state-machine model handles cyclic workflows (security remediation loops) natively with conditional edges and shared typed state
Typed state via TypedDict	Gives every agent a clear contract for what it reads and writes, catching integration bugs early
MCP for external tools	Keeps security scanning and diagram rendering as decoupled services; agents call tools through a protocol, not direct imports
Keyword classifier for input	Fast, deterministic first pass that avoids burning LLM tokens on obviously off-topic requests
Regex IAM blocker	Hardcoded safety net that runs independently of any LLM output, preventing dangerous IAM policies regardless of model behavior
Graceful fallbacks	Every external dependency (tfsec, checkov, mmdc) has a fallback path so the system never hard-crashes

Squad Contributions

Name	Role
Amit	LangGraph engineer: state schema, routing logic, graph compilation
Ayesha, Amit and Elijah	Tools and prompts: MCP server, tfsec/diagram tools, agent system prompts
Joel and Amit	Guardrails and security: input/output validation, IAM safety checks, fallback handling
Stella and Amit	UI/UX: Gradio interface, chat and output panels, styling
Adetayo and Stella	QA and integration: tests, PR reviews, README, demo preparation

Tech Stack

Component	Tool	Version
Agent framework	LangGraph	`1.1.3`
LLM client	LangChain / langchain-openai	`1.2.13` / `1.1.12`
LLM gateway	OpenRouter	-
Default model	openai/gpt-4o-mini	-
MCP server	FastMCP (mcp Python SDK)	`1.26.0`
Diagram rendering	Mermaid.js via mmdc	`11.12.0`
UI	Gradio	`6.10.0`
Configuration	pydantic-settings	`2.13.1`
Data validation	Pydantic	`2.12.5`
LLMOps / Tracing	LangSmith	-
Unit testing	pytest	`9.0.2`
Package manager	uv	`0.10.9`
Runtime	Python	`3.12+`

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

InfraSquad

Architecture

Agent Roles

MCP Tools

Guardrails

Project Structure

Quickstart

Prerequisites

1. Clone and install

2. Set your API key

3. Launch the UI

4. Run the MCP server (standalone)

5. Run tests

Configuration

Optional External Tools

Design Decisions

Squad Contributions

Tech Stack

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 45 Commits
infrasquad		infrasquad
tests		tests
ui		ui
.env.example		.env.example
.gitignore		.gitignore
README.md		README.md
app.py		app.py
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
uv.lock		uv.lock

Folders and files

Latest commit

History

Repository files navigation

InfraSquad

Architecture

Agent Roles

MCP Tools

Guardrails

Project Structure

Quickstart

Prerequisites

1. Clone and install

2. Set your API key

3. Launch the UI

4. Run the MCP server (standalone)

5. Run tests

Configuration

Optional External Tools

Design Decisions

Squad Contributions

Tech Stack

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages