Skip to content

feat(02-samples): add SRE incident response multi-agent sample#228

Open
ajha8 wants to merge 4 commits intostrands-agents:mainfrom
ajha8:feat/sre-incident-response-agent
Open

feat(02-samples): add SRE incident response multi-agent sample#228
ajha8 wants to merge 4 commits intostrands-agents:mainfrom
ajha8:feat/sre-incident-response-agent

Conversation

@ajha8
Copy link

@ajha8 ajha8 commented Mar 1, 2026

Screenshot 2026-03-01 at 1 08 15 PM # PR: feat(02-samples): Add SRE Incident Response multi-agent sample

Summary

This is a proactive contribution adding a missing SRE/DevOps use case. No existing issue tracks this gap.
This PR adds a new sample to 02-samples/ demonstrating a multi-agent SRE
(Site Reliability Engineering) incident response workflow
built with the
Strands Agents SDK.

Why this sample?

After reviewing the existing samples, there is no example that covers:

  • Operations / SRE use cases (vs. finance, restaurant, JIRA, audit tools)
  • Multi-agent supervisor pattern applied to real-time incident detection
  • AWS ↔ Kubernetes bridge (CloudWatch alarms → kubectl/Helm remediation)
  • Red Hat / OpenShift compatibility (kubectl tools work with oc too)

This fills a genuine gap and is relevant to thousands of DevOps/SRE engineers
who run workloads on AWS with Kubernetes or OpenShift.

What this adds

02-samples/19-sre-incident-response-agent/
├── sre_agent.py          # Main agent (4 agents + 8 tools)
├── test_sre_agent.py     # Pytest unit tests (mocked AWS, 12 tests)
├── requirements.txt
├── .env.example
└── README.md

Strands SDK concepts demonstrated

Concept How
@tool decorator 8 tools: CloudWatch, Logs, kubectl, Helm, Slack
Multi-agent supervisor supervisor_agent delegates to 3 specialist sub-agents
BedrockModel Configurable model provider
agents=[...] parameter Demonstrates Strands native multi-agent orchestration
Dry-run safety All destructive actions gated by DRY_RUN=true

Agent architecture

supervisor_agent (Incident Commander)
    ├── cloudwatch_agent   → list_active_alarms, get_metric_statistics, fetch_log_events
    ├── rca_agent          → reasoning-only, no tools (pure LLM analysis)
    └── remediation_agent  → kubectl_get, kubectl_rollout_restart, helm_rollback, helm_scale

Testing

pip install pytest pytest-mock
pytest test_sre_agent.py -v

All 12 tests pass without AWS credentials (mocked boto3).

Checklist

  • Sample runs end-to-end with DRY_RUN=true (no AWS credentials needed for remediation)
  • All @tool docstrings are clear and LLM-friendly
  • README.md includes prerequisites, setup, usage, IAM policy, and extension ideas
  • .env.example provided
  • requirements.txt provided
  • Unit tests provided and passing
  • No hardcoded credentials
  • Security note about dry-run mode included in README

Related

  • Bridges AWS open source (Strands Agents, CloudWatch) with Red Hat/Kubernetes tooling
  • Works with OpenShift by swapping kubectloc in the remediation tools
  • Designed to be extended with PagerDuty, GitHub Issues, or custom runbooks

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

@ajha8
Copy link
Author

ajha8 commented Mar 3, 2026

@manoj-selvakumar5 @yonib05 could you please initiate workflow and review changes?

@github-actions
Copy link

github-actions bot commented Mar 6, 2026

Latest scan for commit: 25fa18e | Updated: 2026-03-06 01:45:20 UTC

✅ Security Scan Report (PR Files Only)

Scanned Files

  • 02-samples/19-sre-incident-response-agent/.env.example
  • 02-samples/19-sre-incident-response-agent/README.md
  • 02-samples/19-sre-incident-response-agent/requirements.txt
  • 02-samples/19-sre-incident-response-agent/sre_agent.py
  • 02-samples/19-sre-incident-response-agent/test_sre_agent.py
  • 02-samples/README.md

Security Scan Results

Critical High Medium Low Info
0 0 0 0 0

Threshold: High

No security issues detected in your changes. Great job!

This scan only covers files changed in this PR.

@ajha8
Copy link
Author

ajha8 commented Mar 9, 2026

@mvangara10 could you please review my changes?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant