feat(02-samples): add SRE incident response multi-agent sample by ajha8 · Pull Request #228 · strands-agents/samples

ajha8 · 2026-03-01T20:33:24Z

# PR: feat(02-samples): Add SRE Incident Response multi-agent sample

Summary

This is a proactive contribution adding a missing SRE/DevOps use case. No existing issue tracks this gap.
This PR adds a new sample to 02-samples/ demonstrating a multi-agent SRE
(Site Reliability Engineering) incident response workflow built with the
Strands Agents SDK.

Why this sample?

After reviewing the existing samples, there is no example that covers:

Operations / SRE use cases (vs. finance, restaurant, JIRA, audit tools)
Multi-agent supervisor pattern applied to real-time incident detection
AWS ↔ Kubernetes bridge (CloudWatch alarms → kubectl/Helm remediation)
Red Hat / OpenShift compatibility (kubectl tools work with oc too)

This fills a genuine gap and is relevant to thousands of DevOps/SRE engineers
who run workloads on AWS with Kubernetes or OpenShift.

What this adds

02-samples/19-sre-incident-response-agent/
├── sre_agent.py          # Main agent (4 agents + 8 tools)
├── test_sre_agent.py     # Pytest unit tests (mocked AWS, 12 tests)
├── requirements.txt
├── .env.example
└── README.md

Strands SDK concepts demonstrated

Concept	How
`@tool` decorator	8 tools: CloudWatch, Logs, kubectl, Helm, Slack
Multi-agent supervisor	`supervisor_agent` delegates to 3 specialist sub-agents
`BedrockModel`	Configurable model provider
`agents=[...]` parameter	Demonstrates Strands native multi-agent orchestration
Dry-run safety	All destructive actions gated by `DRY_RUN=true`

Agent architecture

supervisor_agent (Incident Commander)
    ├── cloudwatch_agent   → list_active_alarms, get_metric_statistics, fetch_log_events
    ├── rca_agent          → reasoning-only, no tools (pure LLM analysis)
    └── remediation_agent  → kubectl_get, kubectl_rollout_restart, helm_rollback, helm_scale

Testing

pip install pytest pytest-mock
pytest test_sre_agent.py -v

All 12 tests pass without AWS credentials (mocked boto3).

Checklist

Sample runs end-to-end with DRY_RUN=true (no AWS credentials needed for remediation)
All @tool docstrings are clear and LLM-friendly
README.md includes prerequisites, setup, usage, IAM policy, and extension ideas
.env.example provided
requirements.txt provided
Unit tests provided and passing
No hardcoded credentials
Security note about dry-run mode included in README

✅ Security Scan Report (PR Files Only)

Scanned Files

02-samples/19-sre-incident-response-agent/.env.example
02-samples/19-sre-incident-response-agent/README.md
02-samples/19-sre-incident-response-agent/requirements.txt
02-samples/19-sre-incident-response-agent/sre_agent.py
02-samples/19-sre-incident-response-agent/test_sre_agent.py
02-samples/README.md

Security Scan Results

Critical	High	Medium	Low	Info
0	0	0	0	0

Threshold: High

No security issues detected in your changes. Great job!

This scan only covers files changed in this PR.

ajha8 · 2026-03-09T22:14:58Z

@mvangara10 could you please review my changes?

Ayush Jha added 4 commits March 1, 2026 12:12

feat(02-samples): add SRE incident response multi-agent sample

00b87c9

feat(02-samples): remove unwanted cat/eof commands in the example file

eebd7fc

feat(02-samples): minor fixes in comments

3283989

feat(02-samples): fixed test issues

25fa18e

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(02-samples): add SRE incident response multi-agent sample#228

feat(02-samples): add SRE incident response multi-agent sample#228
ajha8 wants to merge 4 commits intostrands-agents:mainfrom
ajha8:feat/sre-incident-response-agent

ajha8 commented Mar 1, 2026 •

edited

Loading

Uh oh!

ajha8 commented Mar 3, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Mar 6, 2026

Uh oh!

ajha8 commented Mar 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

ajha8 commented Mar 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Why this sample?

What this adds

Strands SDK concepts demonstrated

Agent architecture

Testing

Checklist

Related

Uh oh!

ajha8 commented Mar 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Mar 6, 2026

✅ Security Scan Report (PR Files Only)

Scanned Files

Security Scan Results

Uh oh!

ajha8 commented Mar 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

ajha8 commented Mar 1, 2026 •

edited

Loading

ajha8 commented Mar 3, 2026 •

edited

Loading