Skip to content

Conversation

@pablogsal
Copy link
Member

@pablogsal pablogsal commented Dec 7, 2025

When profiling Python code, knowing which source line is hot often isn't enough—a single line can contain multiple operations with very different performance characteristics. For example, result = obj.attr + expensive_call() has an attribute load, a function call, and an addition, but traditional line-level profiling treats them identically. This becomes especially important with the adaptive specialization, where understanding whether the interpreter has specialized an instruction (e.g., LOAD_ATTRLOAD_ATTR_INSTANCE_VALUE) provides crucial optimization insights that were previously only visible through manual dis inspection.

This PR threads bytecode information through the existing sampling profiler infrastructure by extending RemoteUnwinder to optionally extract the executing opcode and its precise source span (including column offsets) from the remote process's frame state. The column-level location data flows through to the visualization layers, where the heatmap can highlight the exact expression within a line, the gecko format emits interval markers for Firefox Profiler's timeline, and the live TUI displays real-time instruction breakdowns. The feature is opt-in via --opcodes to avoid the overhead when not needed, and the opcode utilities handle the mapping between specialized instruction variants and their base forms so users can see both what's actually executing and what it was specialized from.

screenrecording-2025-12-02_21-12-36.mp4
screenrecording-2025-12-01_01-18-26.mp4

Introduces LocationInfo struct sequence with end_lineno, col_offset, and
end_col_offset fields. Adds opcodes parameter to RemoteUnwinder that
extracts the currently executing opcode alongside its source span.

Refactors linetable parsing to correctly accumulate line numbers
separately from output values, fixing edge cases in computed_line.
New opcode_utils.py maps opcode numbers to names and detects specialized
variants using opcode module metadata. Adds normalize_location() and
extract_lineno() helpers to collector base for uniform location handling.

CLI gains --opcodes flag, validated against compatible formats (gecko,
flamegraph, heatmap, live).
Stores per-node opcode counts in the tree structure. Exports opcode
mapping (names and deopt relationships) in JSON so the JS renderer can
show instruction names and distinguish specialized variants.
Tracks opcode state transitions per thread and emits interval markers
when the executing opcode changes. Markers include opcode name, line,
column, and duration. Adds Opcodes category to marker schema.
Expandable panel per hot line shows instruction-level sample breakdown
with opcode names and specialization percentage. Converts call graph
data structures from lists to sets for O(1) deduplication.
New widget displays instruction-level stats for selected function when
--opcodes is enabled. Navigation via j/k keys with scroll support.
Adds per-thread opcode tracking. Updates pstats collector for new frame
format.
Frame location is now a 4-tuple (lineno, end_lineno, col_offset,
end_col_offset). MockFrameInfo wraps locations in LocationInfo struct.
Updates assertions throughout and adds opcode_utils coverage.
# Conflicts:
#	Lib/test/test_profiling/test_sampling_profiler/test_integration.py
@pablogsal pablogsal requested review from ivonastojanovic and savannahostrowski and removed request for ivonastojanovic December 7, 2025 23:42

This comment was marked as outdated.

@pablogsal pablogsal marked this pull request as ready for review December 8, 2025 19:22
Copy link
Member

@savannahostrowski savannahostrowski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First pass on most of the files (mainly the JS/CSS)! I ended up playing around with this quite a bit (flamegraph, heatmap, live mode, etc) and it looks awesome. You're on a roll 💫 !

A couple of comments here and then I also opened pablogsal#110 to move a bunch of the CSS into proper classes instead of inlining all the styles, which fixes some dark mode theming bugs (figured that'd be simpler than commenting in a bunch of places)!


const opcodeLines = sortedOpcodes.map(([opcode, count]) => {
const opcodeInfo = getOpcodeInfo(parseInt(opcode, 10));
const pct = ((count / totalOpcodeSamples) * 100).toFixed(1);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't used right now. I assume we do want to include it somewhere in the UI 😄

const normalizedIntensity = (intensity - 0.3) / 0.7;
// Warm orange-red with increasing opacity for hotter spans
const alpha = 0.25 + normalizedIntensity * 0.35; // 0.25 to 0.6
return `rgba(255, 100, 50, ${alpha})`;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can read from a CSS variable instead so we don't have to hardcode this/can reuse colors we already have defined?

return `rgba(255, 100, 50, ${alpha})`;
} else if (intensity > 0) {
// Cold spans: very subtle gray, almost invisible
return `rgba(150, 150, 150, 0.1)`;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here?


elif ch == curses.KEY_LEFT or ch == curses.KEY_UP:
# Navigate to previous thread in PER_THREAD mode, or switch from ALL to PER_THREAD
elif ch == ord("j") or ch == ord("J"):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unless I'm missing something, I think we can deduplicate a lot of logic with some helpers for up/down movement, right?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants