Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
224 changes: 224 additions & 0 deletions docs/QUICK_RUN_GUIDE.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,224 @@
# Quick Run Guide: aorta-report Pipelines

This guide demonstrates how to use the `aorta-report` CLI to analyze PyTorch profiler traces.

---

## 1. Input Directory Structures

### GEMM Sweep Directory (`gemm-sweep/`)

Used for analyzing GEMM kernel variance across multiple thread/channel configurations.

```
experiments/2026-01-10/gemm-sweep/
├── 256thread/
│ ├── nccl_28channels/
│ │ └── torch_profiler/
│ │ ├── rank0/trace/pt.trace.json
│ │ ├── rank1/trace/pt.trace.json
│ │ └── ... (rank2-7)
│ └── nccl_56channels/
│ └── torch_profiler/
│ └── rank*/trace/pt.trace.json
├── 512thread/
│ ├── nccl_28channels/
│ │ └── torch_profiler/rank*/...
│ └── nccl_56channels/
│ └── torch_profiler/rank*/...
└── tracelens_analysis/ # Generated by TraceLens
├── 256thread/individual_reports/
└── 512thread/individual_reports/
```

### RCCL Warp Speed Directory (`rccl-warp-speed/`)

Used for comparing baseline vs test configurations (A/B comparison).

```
experiments/2026-01-10/rccl-warp-speed/
├── 32cu_512threads/ # Baseline configuration
│ ├── torch_profiler/
│ │ ├── rank0/*.json
│ │ ├── rank1/*.json
│ │ └── ... (rank2-7)
│ └── tracelens_analysis/ # Generated by TraceLens
│ ├── individual_reports/
│ └── collective_reports/
├── 37cu_384threads/ # Test configuration
│ ├── torch_profiler/rank*/...
│ └── tracelens_analysis/...
└── 56cu_256threads/ # Another configuration
└── ...
```

---

## 2. Pipeline Commands

### GEMM Variance Analysis Pipeline

Analyzes GEMM kernel time variance across thread/channel configurations.

```bash
aorta-report pipeline gemm \
--sweep-dir ./experiments/2026-01-10/gemm-sweep/ \
-o ./comparison_gemm_1/
```

**Options:**
- `--sweep-dir` - Path to sweep directory with thread/channel subdirectories
- `-o, --output` - Output directory for results
- `--skip-tracelens` - Skip TraceLens analysis if reports already exist
- `--top-k` - Number of top GEMM kernels to extract (default: 5)
- `-t, --threads` - Thread configs to analyze (default: 256, 512)
- `-c, --channels` - Channel configs to analyze (default: 28, 42, 56, 70)
- `--no-plots` - Skip plot generation
- `--no-html` - Skip HTML report generation

**Example with options:**
```bash
aorta-report pipeline gemm \
--sweep-dir ./experiments/2026-01-10/gemm-sweep/ \
-o ./comparison_gemm/ \
--skip-tracelens \
--top-k 10 \
-t 256 -t 512 \
-c 28 -c 56
```

---

### Summary Comparison Pipeline

Compares two configurations (baseline vs test) with comprehensive analysis.

```bash
aorta-report pipeline summary \
--baseline ./experiments/2026-01-10/rccl-warp-speed/32cu_512threads/ \
--test ./experiments/2026-01-10/rccl-warp-speed/37cu_384threads/ \
--baseline-label 32c_512t \
--test-label 37c_384t \
--output ./comparison_rccl/
```

**Options:**
- `--baseline` - Path to baseline trace directory
- `--test` - Path to test trace directory
- `--baseline-label` - Label for baseline in reports
- `--test-label` - Label for test in reports
- `--output` - Output directory for results
- `--skip-tracelens` - Skip TraceLens analysis if reports already exist
- `--gpu-timeline/--no-gpu-timeline` - Include GPU timeline comparison
- `--collective/--no-collective` - Include collective/NCCL comparison

**Example with options:**
```bash
aorta-report pipeline summary \
--baseline ./experiments/2026-01-10/rccl-warp-speed/32cu_512threads/ \
--test ./experiments/2026-01-10/rccl-warp-speed/56cu_256threads/ \
--baseline-label baseline_32cu \
--test-label test_56cu \
--output ./comparison_output/ \
--skip-tracelens
```

---

## 3. Output Directory Structures

### GEMM Pipeline Output (`comparison_gemm_1/`)

```
comparison_gemm_1/
├── top5_gemm_kernels_time_variance.csv # Raw GEMM variance data
├── top5_gemm_kernels_time_variance_with_timestamps.csv # Enhanced with timestamps
├── plots/
│ ├── variance_by_threads_boxplot.png # Variance by thread config
│ ├── variance_by_channels_boxplot.png # Variance by channel config
│ ├── variance_by_ranks_boxplot.png # Variance by rank
│ ├── variance_thread_channel_interaction.png # Thread × Channel interaction
│ └── variance_violin_combined.png # Combined violin plot
└── gemm_variance_report.html # Self-contained HTML report
```

**Key outputs:**
- **CSV files**: Raw data for further analysis
- **Boxplots**: Identify which configs have highest variance
- **HTML report**: Share with team (includes all plots embedded)

---

### Summary Pipeline Output (`comparison_rccl/`)

```
comparison_rccl/
├── gpu_timeline_comparison.xlsx # GPU timeline comparison
├── gpu_timeline_combined.xlsx # Combined timeline data
├── collective_comparison.xlsx # NCCL collective comparison
├── collective_combined.xlsx # Combined collective data
├── final_analysis_report.xlsx # Comprehensive analysis
├── plots/
│ ├── abs_time_comparison.png # Absolute time comparison
│ ├── computation_time_by_rank.png # Computation time per rank
│ ├── idle_time_by_rank.png # Idle time per rank
│ ├── total_time_by_rank.png # Total time per rank
│ ├── total_comm_time_by_rank.png # Communication time per rank
│ ├── gpu_time_heatmap.png # GPU time heatmap
│ ├── gpu_time_change_percentage_summary_by_rank.png # % change summary
│ ├── improvement_chart.png # Overall improvement chart
│ ├── NCCL_Algorithm_Bandwidth_comparison.png # NCCL bandwidth comparison
│ ├── NCCL_Bus_Bandwidth_comparison.png # Bus bandwidth comparison
│ ├── NCCL_Communication_Latency_comparison.png # Latency comparison
│ ├── NCCL_Total_Communication_Latency_comparison.png
│ └── NCCL_Performance_Percentage_Change_comparison.png
└── performance_analysis_report.html # Self-contained HTML report
```

**Key outputs:**
- **Excel files**: Detailed data for spreadsheet analysis
- **Plots**: Visual comparisons between baseline and test
- **HTML report**: Share comprehensive results with team

---

## 4. Quick Start Examples

### Analyze a new sweep directory
```bash
# Full pipeline (runs TraceLens + GEMM analysis)
aorta-report pipeline gemm --sweep-dir /path/to/sweep -o ./output/

# If TraceLens was already run
aorta-report pipeline gemm --sweep-dir /path/to/sweep -o ./output/ --skip-tracelens
```

### Compare two configurations
```bash
# Full comparison (runs TraceLens + comparison)
aorta-report pipeline summary \
--baseline /path/to/baseline \
--test /path/to/test \
--baseline-label "Baseline" \
--test-label "Test" \
--output ./comparison/
```

### Run only TraceLens analysis
```bash
# Single configuration
aorta-report analyze single /path/to/traces

# Sweep directory (multiple configs)
aorta-report analyze sweep /path/to/sweep
```

---

## 5. Tips

1. **First run**: Let the pipeline run TraceLens (don't use `--skip-tracelens`)
2. **Subsequent runs**: Use `--skip-tracelens` to save time
3. **Large datasets**: Use `--no-plots --no-html` for faster processing
4. **Custom analysis**: Use the CSV/Excel outputs for custom visualization

5 changes: 4 additions & 1 deletion scripts/gemm_analysis/run_tracelens_analysis.sh
Original file line number Diff line number Diff line change
Expand Up @@ -264,7 +264,10 @@ else
# trace file in the rank folder to the canonical `trace/pt.trace.json` path.
# This will satisfy TraceLens's requirement of only one `*` being present in the trace pattern
# while also avoiding FileNotFoundErrors due to different filenames.
find $TRACE_DIR/rank* -name "*.json" -exec sh -c 'mkdir -p "$(dirname "$0")/trace" && mv "$0" "$(dirname "$0")/trace/pt.trace.json"' {} \;
# OLD (not idempotent - running twice creates trace/trace/pt.trace.json):
# find $TRACE_DIR/rank* -name "*.json" -exec sh -c 'mkdir -p "$(dirname "$0")/trace" && mv "$0" "$(dirname "$0")/trace/pt.trace.json"' {} \;
# NEW: -not -path "*/trace/*" ensures this is idempotent (safe to run multiple times)
find $TRACE_DIR/rank* -name "*.json" -not -path "*/trace/*" -exec sh -c 'mkdir -p "$(dirname "$0")/trace" && mv "$0" "$(dirname "$0")/trace/pt.trace.json"' {} \;

TraceLens_generate_multi_rank_collective_report_pytorch \
--trace_pattern "$TRACE_DIR/rank*/trace/pt.trace.json" \
Expand Down
3 changes: 2 additions & 1 deletion src/aorta/report/analysis/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,12 +3,13 @@
from .tracelens_wrapper import TraceLensWrapper
from .analyze_gemm import analyze_gemm_reports
from .analyze_single import analyze_single_config
from .analyze_sweep import analyze_sweep_config
from .analyze_sweep import analyze_sweep_config, discover_and_run_tracelens

__all__ = [
"TraceLensWrapper",
"analyze_gemm_reports",
"analyze_single_config",
"analyze_sweep_config",
"discover_and_run_tracelens",
]

109 changes: 57 additions & 52 deletions src/aorta/report/analysis/analyze_gemm.py
Original file line number Diff line number Diff line change
Expand Up @@ -38,12 +38,48 @@ def extract_name_from_kernel_info(kernel_info_str: str) -> Optional[str]:
return None


def column_letter_to_index(letter: str) -> int:
"""Convert Excel column letter to 0-based index."""
index = 0
for i, char in enumerate(reversed(letter.upper())):
index += (ord(char) - ord("A") + 1) * (26**i)
return index - 1
def find_column_indices(
header_row: List[Any],
required_columns: Dict[str, str],
) -> Dict[str, int]:
"""
Find column indices by matching column names in header row.

Args:
header_row: List of column header values
required_columns: Dict mapping logical names to expected column names
e.g., {"kernel_info": "kernel_details__summarize_kernel_stats"}

Returns:
Dict mapping logical names to column indices (0-based)

Raises:
ValueError: If any required column is not found
"""
# Create a mapping of column name -> index
header_map = {}
for idx, col_name in enumerate(header_row):
if col_name is not None:
header_map[str(col_name)] = idx

# Find indices for required columns
column_indices = {}
missing_columns = []

for logical_name, expected_name in required_columns.items():
if expected_name in header_map:
column_indices[logical_name] = header_map[expected_name]
else:
missing_columns.append(f"'{expected_name}' (for {logical_name})")

if missing_columns:
available = list(header_map.keys())[:20] # Show first 20 columns
raise ValueError(
f"Required columns not found: {', '.join(missing_columns)}\n"
f"Available columns (first 20): {available}"
)

return column_indices


def process_excel_file(
Expand All @@ -66,6 +102,13 @@ def process_excel_file(
Returns:
List of dictionaries containing kernel data
"""
# Define required columns by their expected names
REQUIRED_COLUMNS = {
"kernel_info": "kernel_details__summarize_kernel_stats",
"time_min": "Kernel Time (µs)_min",
"time_max": "Kernel Time (µs)_max",
}

try:
# Open the workbook
wb = openpyxl.load_workbook(file_path, read_only=True, data_only=True)
Expand All @@ -77,62 +120,24 @@ def process_excel_file(

sheet = wb["GEMM"]

# Expected column positions (0-based indices)
col_kernel_info = column_letter_to_index("X") # Column X
col_time_min = column_letter_to_index("AG") # Column AG
col_time_max = column_letter_to_index("AH") # Column AH

# Read header row to validate column names
rows_data = []
header_row = None
col_indices = None

for i, row in enumerate(sheet.iter_rows(values_only=True)):
if i == 0:
# This is the header - validate column names match expectations
# Parse header row and find column indices dynamically
header_row = list(row)

# Expected column names (match what TraceLens generates)
expected_x = "kernel_details__summarize_kernel_stats"
expected_ag = "Kernel Time (µs)_min"
expected_ah = "Kernel Time (µs)_max"

# Validate each expected column
errors = []

if col_kernel_info < len(header_row):
header_x = str(header_row[col_kernel_info]) if header_row[col_kernel_info] else ""
if header_x != expected_x:
errors.append(f"Column X: expected '{expected_x}', found '{header_x}'")
else:
errors.append(f"Column X: not found (only {len(header_row)} columns)")

if col_time_min < len(header_row):
header_ag = str(header_row[col_time_min]) if header_row[col_time_min] else ""
if header_ag != expected_ag:
errors.append(f"Column AG: expected '{expected_ag}', found '{header_ag}'")
else:
errors.append(f"Column AG: not found (only {len(header_row)} columns)")

if col_time_max < len(header_row):
header_ah = str(header_row[col_time_max]) if header_row[col_time_max] else ""
if header_ah != expected_ah:
errors.append(f"Column AH: expected '{expected_ah}', found '{header_ah}'")
else:
errors.append(f"Column AH: not found (only {len(header_row)} columns)")

if errors:
raise ValueError(
f"Column validation failed in {file_path}:\n " + "\n ".join(errors)
)

col_indices = find_column_indices(header_row, REQUIRED_COLUMNS)
continue

if row is None or len(row) <= max(col_kernel_info, col_time_min, col_time_max):
if row is None or col_indices is None:
continue

kernel_info = row[col_kernel_info] if col_kernel_info < len(row) else None
kernel_time_min = row[col_time_min] if col_time_min < len(row) else None
kernel_time_max = row[col_time_max] if col_time_max < len(row) else None
# Extract values using dynamically found indices
kernel_info = row[col_indices["kernel_info"]] if col_indices["kernel_info"] < len(row) else None
kernel_time_min = row[col_indices["time_min"]] if col_indices["time_min"] < len(row) else None
kernel_time_max = row[col_indices["time_max"]] if col_indices["time_max"] < len(row) else None

# Extract kernel name
kernel_name = extract_name_from_kernel_info(kernel_info)
Expand Down
Loading
Loading