Skip to content

feat(bloat): per-symbol forward refs + bidirectional graph + dual-rank sub-table#470

Merged
zackees merged 1 commit into
mainfrom
feat/bloat-forward-refs
Jun 7, 2026
Merged

feat(bloat): per-symbol forward refs + bidirectional graph + dual-rank sub-table#470
zackees merged 1 commit into
mainfrom
feat/bloat-forward-refs

Conversation

@zackees
Copy link
Copy Markdown
Member

@zackees zackees commented Jun 7, 2026

Motivation

When an AI optimization pass sees a Top-25 symbol like `ClocklessIdf5`, it should be able to tell what `ClocklessIdf5` actually calls — not what its TU siblings call. The original `--cref`-only data is TU-level: it surfaced `fl::sort` as a callee of `ClocklessIdf5` (because a sibling in the same TU called `fl::sort`), which led the AI to pick the wrong optimization target.

What changed

Per-symbol forward edges via `objdump -d`

New `fbuild_core::symbol_analysis::callgraph` module parses the textual disassembly of the linked ELF and extracts `caller → callee` edges. Same source `cargo install` / `cargo bloat` use, no dependency on `.o` files (which fbuild deletes post-link), and we fall back gracefully when objdump is absent.

Architectures covered: ARM Thumb (`bl`), Xtensa (`call4`/`call8`), RISC-V (`jal`/`tail`), AVR (`call`/`rcall`/`jmp`). Filters ARM `$a`/`$t`/`$d` mapping markers, PLT shims, hex-only annotations, and self-refs.

`FineGrainedSymbol` grows `references_to: Vec` (mangled callee names). `serde(default)` so older `report.json` files keep parsing.

Bidirectional graph

`BackrefGraph` gains a `Direction` enum (`Backward` default, `Forward`, `Bidirectional`) and a per-edge `EdgeDirection`. The `.dot` renderer styles backward edges with the historical solid arrow and forward edges with a distinct dashed blue arrow labelled `calls` so a bidirectional graph stays readable. Default config remains backward-only — zero behaviour change for any pre-#471 caller.

Markdown: "Top callees (dual ranking)" sub-table

The per-symbol section in the embedded markdown report now embeds a side-by-side ranking — heaviest callees (by flash bytes) and most-shared callees (by `referenced_by` length) — plus an `(… and N more)` bucket row. This is the data layout the user asked for so an AI can answer both "what's the biggest thing this calls?" and "what's the most-shared thing this calls?" at a glance.

Analyzer wiring

`AnalyzeConfig` gains `objdump_path: Option<&Path>`. The CLI's `ToolPaths` resolver reads it from `build_info.json` (already populated by #428) or derives it from `nm` using the GCC cross-tool naming convention. When neither resolves, the analyzer skips the objdump pass and `references_to` stays empty — backref graphs are unaffected.

Caveats (documented in code)

  • TU-level for cref, per-symbol for objdump. Forward edges are per-symbol; backward edges remain TU-granular because `--cref` itself is.
  • LTO hides edges. Inlined helpers don't appear in disassembly; listed edges are real but missing edges are common.
  • Indirect calls dropped. `blx r3` / `jalr` on a register have no symbol annotation; the parser silently skips them.

Test plan

  • `soldr cargo check` workspace: clean
  • `soldr cargo clippy -p fbuild-core -p fbuild-build -p fbuild-cli --all-targets -- -D warnings`: clean
  • `soldr cargo test -p fbuild-core --lib`: 166 passed
  • `soldr cargo test -p fbuild-build --lib`: 636 passed
  • `soldr cargo test -p fbuild-cli --bin fbuild`: 30 passed

23 new tests across the 3 crates:

  • Disassembly parser × 14: ARM Thumb, Xtensa, RISC-V, AVR; mapping symbols / PLT / hex / self-ref / dedup / sibling exclusion / mangled-name preservation / invert round-trip / empty input.
  • Forward graph traversal × 5: callee node creation, sibling-exclusion regression, fan-out overflow, edge styling in dot, default-direction unchanged.
  • Bidirectional × 1: walks both sides from one root.
  • Dual-rank sub-table × 1: `rank_callees_dual` separates size + popularity; "other" bucket count.
  • Markdown integration × 1: per-symbol section embeds the dual-rank sub-table when `references_to` is populated.

🤖 Generated with Claude Code

Summary by CodeRabbit

Release Notes

  • New Features

    • Added forward call-graph analysis: symbols now display their direct callees and call counts
    • Enabled bidirectional call graphs alongside existing back-reference analysis
    • Introduced dual-ranked "Top callees" tables showing callees ranked by size and popularity
  • Enhanced Reporting

    • Symbol reports now include "References (calls)" metrics for better visibility into outgoing call relationships
    • Graphviz visualizations now support forward edges and bidirectional traversal modes

…k sub-table

The motivating scenario (asked by FastLED user):

  When AI sees a Top-25 symbol like `ClocklessIdf5` it should be able
  to tell what `ClocklessIdf5` actually calls — not what its TU
  siblings call. cref-inversion is TU-level and mis-attributes
  sibling callees (e.g. `fl::sort` showed up as a callee of
  `ClocklessIdf5` when the cref-inverted view ran).

Approach: per-symbol forward references via `objdump -d` on the
linked ELF. Same source `cargo install` / `cargo bloat` use, no
extra build-graph dependency on `.o` files (which fbuild deletes
post-link), and the analyzer falls back gracefully when objdump is
absent.

Changes
-------

- New `fbuild_core::symbol_analysis::callgraph` module: textual
  `objdump -d` disassembly parser. Handles ARM Thumb (`bl`),
  Xtensa (`call4`/`call8`), RISC-V (`jal`/`tail`), and AVR
  (`call`/`rcall`/`jmp`). Filters ARM `$a`/`$t`/`$d` mapping markers,
  PLT shims, hex-only annotations, self-refs. 14 unit tests cover
  every flavour + edge cases.

- `FineGrainedSymbol` grows `references_to: Vec<String>` (mangled
  callee names). `serde(default)` so older `report.json` files keep
  parsing.

- `BackrefGraph` gains a `Direction` enum (`Backward` default,
  `Forward`, `Bidirectional`) and a per-edge `EdgeDirection` so the
  `.dot` renderer styles backward (solid) vs forward (dashed, blue,
  labelled `calls`) distinctly. Forward walking is per-symbol via
  `references_to`; the existing backward walk over `referenced_by`
  is unchanged so every pre-#471 caller sees zero behaviour change.

- Markdown report's per-symbol section now uses the bidirectional
  graph (callers ← root → callees) and embeds a "Top callees (dual
  ranking)" sub-table next to the existing back-ref table. Side-by-
  side columns: heaviest callees (by flash bytes) + most-shared
  callees (by `referenced_by` length), with an `(… and N more)`
  bucket row matching the user's ask.

- `AnalyzeConfig` gains `objdump_path: Option<&Path>`. The CLI's
  `ToolPaths` resolver reads `objdump_path` from `build_info.json`
  (already populated by #428) or derives it from `nm` via the GCC
  cross-tool naming convention. When neither resolves, the analyzer
  skips the objdump pass and `references_to` stays empty — backref
  graphs are unaffected.

- Section heading renamed from "Top N back-reference graphs" to
  "Top N symbol graphs" to reflect the bidirectional content.

Implementation caveats called out in code comments:

- **TU-level for cref**, **per-symbol for objdump.** Forward edges
  are the per-symbol view the user wanted; backward edges remain
  TU-granular because `--cref` itself is.
- **LTO hides edges.** Inlined helpers don't appear in disassembly,
  so listed edges are real but missing edges are common.
- **Indirect calls dropped.** `blx r3` / `jalr` on a register have
  no symbol annotation; the parser silently skips them.

23 new tests across the 3 crates (disassembly parser × 14, forward
graph traversal × 6, dual-rank sub-table × 1, sibling-exclusion
regression × 1, bidirectional dot styling × 1).

Closes the AI-misattributing-`fl::sort` reproducer.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Jun 7, 2026

Review Change Stack

📝 Walkthrough

Walkthrough

This PR adds optional forward call-graph extraction via objdump, extends symbol analysis to support bidirectional graph traversal, and updates markdown reporting to expose dual-ranked callees. The disassembly parser filters spurious targets while maintaining backward compatibility through a Backward-direction default.

Changes

Forward call-graph and bidirectional analysis

Layer / File(s) Summary
Disassembly parser and callgraph data model
crates/fbuild-core/src/symbol_analysis/mod.rs, crates/fbuild-core/src/symbol_analysis/callgraph.rs, crates/fbuild-core/src/symbol_analysis/tests.rs
A new callgraph module parses objdump -d output to extract forward call edges: extracts function headers, scans instruction annotations for call targets, filters PLT trampolines, ARM markers, and absolute-address refs, deduplicates edges per (caller, callee) pair, and provides invert() for reverse mapping. FineGrainedSymbol gains a references_to: Vec<String> field populated during symbol rollup. Tests validate multi-architecture parsing, filtering behavior, C++ mangling, and edge inversion.
Graph direction and traversal
crates/fbuild-core/src/symbol_analysis/graph.rs
GraphConfig gains a direction: Direction field (Backward/Forward/Bidirectional, defaulting to Backward for compatibility). NodeKind adds a Callee variant; GraphEdge becomes directional via EdgeDirection and new backward() / forward() constructors. BackrefGraph::build_with_index conditionally runs backward BFS or forward DFS depending on direction, emitting correspondingly-typed edges. A new walk_forward DFS traverses callees with depth bounding and fan-out capping. rank_callees_dual ranks callees by both size and popularity, returning top-N lists plus an "other" count. Visualization via .dot styles forward edges as dashed blue calls labels while backward edges remain solid. Tests cover forward-only, bidirectional, fan-out overflow, edge styling, and backward-only (default) behavior.
CLI tool resolution and objdump integration
crates/fbuild-cli/src/cli/symbols_cmd.rs, crates/fbuild-cli/src/cli/graph_cmd.rs
ToolPaths struct extended with an optional objdump field. resolve_tool_paths_public now returns a third tuple element (optional objdump path). Tool resolution prefers objdump_path from build_info, otherwise derives it from the selected nm via prefix replacement (similar to c++filt derivation), returning None if the derived tool doesn't exist. fbuild symbols and fbuild bloat graph both resolve and wire objdump_path into AnalyzeConfig. Graph config now explicitly sets direction: Default::default() (backward-only) pending a future direction flag.
Symbol analyzer objdump execution and markdown reporting
crates/fbuild-build/src/symbol_analyzer.rs
AnalyzeConfig gains optional objdump_path. After building the symbol map, run_objdump_and_attribute conditionally executes objdump, parses disassembly, matches callees against both mangled and demangled symbol names, and populates each symbol's references_to. Markdown reporting now includes a "References (calls)" count per top symbol, switches embedded graphs to bidirectional walker configuration, and renders a new emit_dual_callees_subtable showing top callees ranked by size and popularity, with an "(… and N more) callees" row when overflow occurs. All markdown tests updated to construct symbols with references_to field.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Poem

🐰 A bunny's ode to call-graph glee,
Forward edges dance in trees so free,
Objdump whispers who calls whom with care,
Bidirectional bonds now floating in air.
From disassembly walks a callee bright,
The symbol graph shines with dual-ranked light!

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title directly and specifically describes the three main changes: per-symbol forward references, bidirectional graph support, and dual-rank sub-table feature.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/bloat-forward-refs

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
crates/fbuild-core/src/symbol_analysis/graph.rs (1)

272-275: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Fix the broken intra-doc link.

Line 272 uses [`build`], which rustdoc cannot resolve from this scope, and the docs job is already failing on it. Point it at [`Self::build`] or [`BackrefGraph::build`] instead.

As per coding guidelines, all warnings are denied in CI.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@crates/fbuild-core/src/symbol_analysis/graph.rs` around lines 272 - 275, The
intra-doc link [`build`] in the doc comment for build_with_index is unresolved;
update it to a fully-qualified link such as [`Self::build`] or
[`BackrefGraph::build`] so rustdoc can resolve it (i.e., change the reference in
the doc comment on build_with_index to point to Self::build or
BackrefGraph::build).

Sources: Coding guidelines, Pipeline failures

crates/fbuild-cli/src/cli/graph_cmd.rs (1)

53-69: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Avoid the extra objdump pass until this command can render forward edges.

Lines 53-69 now thread objdump_path into AnalyzeConfig, but parse_graph_config() still forces Direction::Backward, so fbuild bloat graph never reads references_to. On large ELFs this turns a cheap backref query into a full-disassembly scan with no output change. Either keep objdump_path: None on this path for now, or add the direction flag in the same PR so the extra subprocess is actually used.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@crates/fbuild-cli/src/cli/graph_cmd.rs` around lines 53 - 69, The
AnalyzeConfig being constructed threads a real objdump_path (from
resolve_tool_paths_public) into AnalyzeConfig even though parse_graph_config()
forces Direction::Backward and the command doesn't use forward edges, causing an
unnecessary full disassembly; change the objdump_path passed into AnalyzeConfig
here to None (i.e., set objdump_path: None in the AnalyzeConfig initializer) so
fbuild bloat graph does not trigger the extra objdump pass until you either add
a direction flag or otherwise enable forward-edge rendering in
parse_graph_config; keep resolve_tool_paths_public and the other paths as-is and
only alter the objdump_path field for now.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@crates/fbuild-build/src/symbol_analyzer.rs`:
- Around line 367-379: The loop is incorrectly assigning parsed call edges to
any symbol with matching names (sym.mangled/sym.demangled), which copies
function call edges onto data/map-derived rows; fix by only assigning edges when
the target symbol is an actual code/function symbol (e.g., check a field like
sym.kind or sym.is_function()/is_executable() or that its section is the
canonical text/code row) or when the symbol is the canonical nm function row,
and skip assigning to data-only or map-derived rows; update the conditional
inside the loop around sym.references_to = callees.clone() to require that
code/function predicate before cloning, keeping use of parse_disasm/edges and
the mangled/demangled checks unchanged otherwise.

In `@crates/fbuild-cli/src/cli/symbols_cmd.rs`:
- Around line 295-300: The current objdump selection prefers build_info's
objdump_path (bi_objdump) even when the user-specified nm override was used;
change the logic so we first try deriving objdump from the resolved/selected nm
(using derive_sibling_tool(&nm, "objdump")) when the nm came from the CLI
override, and only fall back to bi_objdump when that derivation fails or when
both nm and bi_objdump originate from the same build_info source; update the
assignment to objdump to prefer derive_sibling_tool(&nm, "objdump") for
user-supplied nm, otherwise use bi_objdump as before.

In `@crates/fbuild-core/src/symbol_analysis/callgraph.rs`:
- Around line 99-106: The code is treating any angle-bracket symbol in an
instruction as a call target; update call-edge creation in the loop that uses
current, parse_call_target and is_real_call_target to first check the
instruction mnemonic against a whitelist of real call/jump mnemonics for the
target ISA(s) (e.g., call/bl/blr/jmp/ret variants) and only then parse/accept
the angle-bracket symbol as a callee; also fix the helper that currently returns
the last angle-bracket annotation (used around lines 155-168) to similarly
verify the mnemonic before returning a symbolized target. Add a regression test
that feeds a symbolized non-call instruction (e.g., an ARM/ARM64 "ldr <sym>" or
x86 data-load) and assert that no edge is emitted in references_to.

In `@crates/fbuild-core/src/symbol_analysis/graph.rs`:
- Around line 853-915: In rank_callees_dual the code includes unresolved callees
(sym == None) in resolved and thus they can appear in size_top/pop_top; filter
them out before computing the top-N: split the initial resolved vector into
known (where CalleeRanked.sym.is_some()) and unresolved (sym.is_none()),
sort/compute by_size and by_popularity only from the known set (use
by_size/by_popularity on the filtered known list), build size_top and pop_top
from those sorted known entries, and compute other_count as unresolved.len()
plus the count of known entries not present in top_mangled; update uses of
resolved, by_size, by_popularity, size_top, pop_top, and top_mangled
accordingly.

---

Outside diff comments:
In `@crates/fbuild-cli/src/cli/graph_cmd.rs`:
- Around line 53-69: The AnalyzeConfig being constructed threads a real
objdump_path (from resolve_tool_paths_public) into AnalyzeConfig even though
parse_graph_config() forces Direction::Backward and the command doesn't use
forward edges, causing an unnecessary full disassembly; change the objdump_path
passed into AnalyzeConfig here to None (i.e., set objdump_path: None in the
AnalyzeConfig initializer) so fbuild bloat graph does not trigger the extra
objdump pass until you either add a direction flag or otherwise enable
forward-edge rendering in parse_graph_config; keep resolve_tool_paths_public and
the other paths as-is and only alter the objdump_path field for now.

In `@crates/fbuild-core/src/symbol_analysis/graph.rs`:
- Around line 272-275: The intra-doc link [`build`] in the doc comment for
build_with_index is unresolved; update it to a fully-qualified link such as
[`Self::build`] or [`BackrefGraph::build`] so rustdoc can resolve it (i.e.,
change the reference in the doc comment on build_with_index to point to
Self::build or BackrefGraph::build).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: c1a84201-3825-426f-949e-abd60a01daa0

📥 Commits

Reviewing files that changed from the base of the PR and between c675ae1 and 8a609e7.

📒 Files selected for processing (7)
  • crates/fbuild-build/src/symbol_analyzer.rs
  • crates/fbuild-cli/src/cli/graph_cmd.rs
  • crates/fbuild-cli/src/cli/symbols_cmd.rs
  • crates/fbuild-core/src/symbol_analysis/callgraph.rs
  • crates/fbuild-core/src/symbol_analysis/graph.rs
  • crates/fbuild-core/src/symbol_analysis/mod.rs
  • crates/fbuild-core/src/symbol_analysis/tests.rs

Comment on lines +367 to +379
let edges = parse_disasm(&result.stdout);
let mut total = 0usize;
for sym in &mut map.symbols {
if let Some(callees) = edges.get(&sym.mangled) {
sym.references_to = callees.clone();
total += callees.len();
} else if let Some(callees) = edges.get(&sym.demangled) {
// Some toolchains demangle in-place when emitting the
// disassembly, so the function header uses the demangled
// name. Match against either.
sym.references_to = callees.clone();
total += callees.len();
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Don't copy function call edges onto map-derived data rows.

This loop matches purely on sym.mangled / sym.demangled, so any synthetic row that shares the owner's name with a real function also inherits that function's references_to. For example, a map-derived .rodata.foo.str1.1 row and the real foo text symbol will both get the same callees, which makes data-only rows render bogus forward graphs and duplicates the per-symbol call counts. Restrict attribution to executable/code symbols (or the canonical nm function row) before assigning references_to.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@crates/fbuild-build/src/symbol_analyzer.rs` around lines 367 - 379, The loop
is incorrectly assigning parsed call edges to any symbol with matching names
(sym.mangled/sym.demangled), which copies function call edges onto
data/map-derived rows; fix by only assigning edges when the target symbol is an
actual code/function symbol (e.g., check a field like sym.kind or
sym.is_function()/is_executable() or that its section is the canonical text/code
row) or when the symbol is the canonical nm function row, and skip assigning to
data-only or map-derived rows; update the conditional inside the loop around
sym.references_to = callees.clone() to require that code/function predicate
before cloning, keeping use of parse_disasm/edges and the mangled/demangled
checks unchanged otherwise.

Comment on lines +295 to +300
// objdump: prefer build_info, else derive from nm using the
// GCC cross-tool prefix (`<prefix>-nm` → `<prefix>-objdump`).
// Same prefix-replacement strategy `derive_cppfilt_path`
// uses; inlined here to keep symbol_analyzer's public surface
// minimal — objdump derivation isn't useful outside this CLI.
let objdump = bi_objdump.or_else(|| derive_sibling_tool(&nm, "objdump"));
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Keep objdump paired with the selected nm.

Line 300 still prefers build_info's objdump_path even when --nm overrides build_info's nm_path. That can silently combine one toolchain's nm with another toolchain's objdump, so forward-edge extraction either fails or disassembles with the wrong ISA. Derive objdump from the resolved nm first whenever the user supplied --nm, and only fall back to build_info.objdump_path when both tools come from the same source.

Based on docs/symbols.md, explicit tool overrides are supposed to win over build-info-discovered paths.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@crates/fbuild-cli/src/cli/symbols_cmd.rs` around lines 295 - 300, The current
objdump selection prefers build_info's objdump_path (bi_objdump) even when the
user-specified nm override was used; change the logic so we first try deriving
objdump from the resolved/selected nm (using derive_sibling_tool(&nm,
"objdump")) when the nm came from the CLI override, and only fall back to
bi_objdump when that derivation fails or when both nm and bi_objdump originate
from the same build_info source; update the assignment to objdump to prefer
derive_sibling_tool(&nm, "objdump") for user-supplied nm, otherwise use
bi_objdump as before.

Comment on lines +99 to +106
// 4. Instruction line: hex offset followed by mnemonic + args.
let Some(caller) = current.as_deref() else {
continue;
};
let Some(callee) = parse_call_target(line) else {
continue;
};
if !is_real_call_target(&callee) || callee == caller {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Only treat real call/jump mnemonics as forward edges.

Lines 99-106 currently accept any instruction line whose tail contains <...>, and Lines 155-168 just return the last angle-bracket annotation without checking the mnemonic. objdump uses the same annotation for symbolized loads/data references too, so this will invent callees from non-call instructions and pollute references_to.

Please gate edge creation behind a mnemonic whitelist for the supported ISAs and add a regression test for a symbolized non-call line (for example an ldr/address-load form) that must not emit an edge.

Also applies to: 155-168

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@crates/fbuild-core/src/symbol_analysis/callgraph.rs` around lines 99 - 106,
The code is treating any angle-bracket symbol in an instruction as a call
target; update call-edge creation in the loop that uses current,
parse_call_target and is_real_call_target to first check the instruction
mnemonic against a whitelist of real call/jump mnemonics for the target ISA(s)
(e.g., call/bl/blr/jmp/ret variants) and only then parse/accept the
angle-bracket symbol as a callee; also fix the helper that currently returns the
last angle-bracket annotation (used around lines 155-168) to similarly verify
the mnemonic before returning a symbolized target. Add a regression test that
feeds a symbolized non-call instruction (e.g., an ARM/ARM64 "ldr <sym>" or x86
data-load) and assert that no edge is emitted in references_to.

Comment on lines +853 to +915
/// Callees that don't resolve to a row in `map` (e.g. weak refs not
/// pulled in) get `size=0` and contribute to neither axis.
#[must_use]
pub fn rank_callees_dual<'a>(
map: &'a FineGrainedSymbolMap,
caller: &super::FineGrainedSymbol,
top_n: usize,
) -> (Vec<CalleeRanked<'a>>, Vec<CalleeRanked<'a>>, usize) {
// Resolve every callee once.
let mut resolved: Vec<CalleeRanked<'a>> = caller
.references_to
.iter()
.map(|m| {
let s = map
.symbols
.iter()
.find(|s| s.mangled == *m || s.demangled == *m);
CalleeRanked {
mangled: m.clone(),
demangled: s.map(|s| s.demangled.clone()).unwrap_or_else(|| m.clone()),
size: s.map(|s| s.size).unwrap_or(0),
callers_count: s.map(|s| s.referenced_by.len()).unwrap_or(0),
sym: s,
}
})
.collect();

// Sort copies by each axis. We clone the small structs because
// they're cheap (4 fields + a `&FineGrainedSymbol`).
let mut by_size = resolved.clone();
by_size.sort_by(|a, b| {
b.size
.cmp(&a.size)
.then_with(|| a.demangled.cmp(&b.demangled))
});
let mut by_popularity = resolved.clone();
by_popularity.sort_by(|a, b| {
b.callers_count
.cmp(&a.callers_count)
.then_with(|| b.size.cmp(&a.size))
.then_with(|| a.demangled.cmp(&b.demangled))
});

let size_top = by_size.iter().take(top_n).cloned().collect::<Vec<_>>();
let pop_top = by_popularity
.iter()
.take(top_n)
.cloned()
.collect::<Vec<_>>();

// "Other" is the unique callees NOT present in either top-N
// bucket.
let top_mangled: BTreeSet<&str> = size_top
.iter()
.chain(pop_top.iter())
.map(|c| c.mangled.as_str())
.collect();
let other_count = resolved
.drain(..)
.filter(|c| !top_mangled.contains(c.mangled.as_str()))
.count();

(size_top, pop_top, other_count)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Exclude unresolved callees from the dual-rank top-N lists.

The doc comment says unresolved callees "contribute to neither axis", but this implementation still keeps them in resolved, sorts them with size=0 / callers_count=0, and can surface them in by_size or by_popularity whenever top_n exceeds the number of resolved symbols. That makes the markdown table advertise unknown 0-byte entries as top callees instead of leaving the slot empty or counting them only in other_count.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@crates/fbuild-core/src/symbol_analysis/graph.rs` around lines 853 - 915, In
rank_callees_dual the code includes unresolved callees (sym == None) in resolved
and thus they can appear in size_top/pop_top; filter them out before computing
the top-N: split the initial resolved vector into known (where
CalleeRanked.sym.is_some()) and unresolved (sym.is_none()), sort/compute by_size
and by_popularity only from the known set (use by_size/by_popularity on the
filtered known list), build size_top and pop_top from those sorted known
entries, and compute other_count as unresolved.len() plus the count of known
entries not present in top_mangled; update uses of resolved, by_size,
by_popularity, size_top, pop_top, and top_mangled accordingly.

@zackees zackees merged commit 00216ac into main Jun 7, 2026
83 of 87 checks passed
zackees added a commit that referenced this pull request Jun 7, 2026
Addresses CodeRabbit review on #471 and the broken-intra-doc-link
build failure on the Documentation CI job:

1. The docs claimed the test pins four rows from the FastLED #2473
   audit but the test only had three. Add the missing
   `PLM_AUDIO_SYNTHESIS_WINDOW` row (2048 B tombstone from
   `libFastLED.a(third_party+.cpp.o)`, per the issue's reproduction
   transcript). Docs and test now match: four tombstones + one live
   row, parser drops all four tombstones.

2. `graph.rs:272` had `[\`build\`]` in a doc comment, which rustdoc's
   strict intra-doc-link resolver (with `RUSTDOCFLAGS=-D warnings`)
   couldn't resolve from inside an impl block. Disambiguate to
   `[\`Self::build\`]`.

LOC-gate failures (graph.rs 1744 / symbol_analyzer.rs 1609) on the
same run are pre-existing from #470 and are tracked at #473
(file-split follow-up); folding that into this PR would balloon the
diff. The expected LPC845 by-design failure is unchanged (per #456
3.7).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
zackees added a commit that referenced this pull request Jun 7, 2026
Addresses CodeRabbit review on #471 and the broken-intra-doc-link
build failure on the Documentation CI job:

1. The docs claimed the test pins four rows from the FastLED #2473
   audit but the test only had three. Add the missing
   `PLM_AUDIO_SYNTHESIS_WINDOW` row (2048 B tombstone from
   `libFastLED.a(third_party+.cpp.o)`, per the issue's reproduction
   transcript). Docs and test now match: four tombstones + one live
   row, parser drops all four tombstones.

2. `graph.rs:272` had `[\`build\`]` in a doc comment, which rustdoc's
   strict intra-doc-link resolver (with `RUSTDOCFLAGS=-D warnings`)
   couldn't resolve from inside an impl block. Disambiguate to
   `[\`Self::build\`]`.

LOC-gate failures (graph.rs 1744 / symbol_analyzer.rs 1609) on the
same run are pre-existing from #470 and are tracked at #473
(file-split follow-up); folding that into this PR would balloon the
diff. The expected LPC845 by-design failure is unchanged (per #456
3.7).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

1 participant