Skip to content

DavidClawson/ripcord

Repository files navigation

ripcord

Opaque firmware binary in, queryable fact database out — one command.

ripcord is a research pipeline for reverse engineering embedded firmware. It takes a binary with no symbols, no source, and an undocumented hardware peripheral, and expands it into a structured warehouse of facts — functions, call graph, MMIO access patterns, decompiled C, behavioral traces — that deterministic analyzers, formal methods, and LLM agents can all query without ever re-reading the raw bytes.

Pull the ripcord on a parachute and a carefully packed structure tumbles out and inflates into something functional. Same operation, applied to firmware.

The driving target is the FNIRSI 2C53T oscilloscope (AT32F403A MCU + an opaque Gowin FPGA). The hardest, most valuable part of that firmware is the FPGA acquisition path — timing-critical code talking to a chip with no public documentation and no source. The only way to know what the FPGA does is to watch the MCU talk to it. ripcord is built to capture that conversation and turn it into an execution-verified protocol spec.


The idea in one picture

   firmware.bin
        │
        ▼
┌───────────────────┐   deterministic, runs in minutes, no human judgment
│  IDENTIFY         │   ISA · load address · chip family  (scripts/identify.py)
├───────────────────┤
│  EXTRACT (Ghidra) │   functions · calls · blocks · xrefs · strings
│                   │   pcode · decompiled C            (PyGhidra headless)
├───────────────────┤
│  RECOVER          │   vector tables, func-ptr dispatch, veneers, registrars
│                   │   → closes the call-graph reachability gap
├───────────────────┤
│  CLASSIFY         │   SVD-resolved peripheral register access · fingerprint
│                   │   match library code across compilers
├───────────────────┤
│  TRACE (Renode)   │   boot the binary, capture MMIO transcript = ground truth
└─────────┬─────────┘
          ▼
   ┌─────────────────────────────────────────┐
   │   THE WAREHOUSE                          │   per-target Parquet tables,
   │   build/<target>/tables/*.parquet        │   queried with DuckDB.
   │   (no database file — Parquet is truth)  │   THIS is the artifact.
   └─────────┬───────────────────────────────┘
             │
     ┌───────┴────────┬─────────────────┬──────────────────┐
     ▼                ▼                 ▼                  ▼
  scripts/query   LLM agent swarm   Unicorn / Renode    Claude Code
  (SQL / DuckDB)  (bulk labeling)   (VERIFY by          (skills + CLI:
                                     execution)          drives it all)

Two principles do the heavy lifting:

  1. Execution is the verification oracle — not the compiler. A claim about what a function does is confirmed by running it (Unicorn) or tracing it (Renode) and diffing register/memory/MMIO deltas against the original. Compilers catch type errors; execution catches logic errors. No claim becomes canonical until execution backs it. This is the part most RE tooling skips (see Related work).

  2. The database is the artifact — not clean source code. The deliverable is a queryable set of facts about the binary. Rendered C, if it exists at all, is a late-stage view over the database, never the goal. (Why: notes/goal-and-approach.md.)

LLM budget is spent only on the residue deterministic tools can't resolve. Everything mechanical — Ghidra extraction, library identification, call recovery, trace capture — runs unattended in minutes.


Quick start

# Identify ISA / load address / chip before committing to a full run
scripts/identify.py firmware.bin

# One command: identify → extract → ingest → recover calls → classify → summarize
scripts/ripcord.py firmware.elf                                   # ELF: flags inferred
scripts/ripcord.py firmware.bin --chip AT32F403A --base-addr 0x08004000  # raw binary

# Ask questions over the warehouse + decompiled C + an LLM
scripts/analyze --target stock_v120 "what writes to USART2_DR?"

# Full bottom-up comprehension: smoke-test every function, name them,
# decompose monsters, synthesize subsystem → architecture narratives
uv run python scripts/agents/deep_analysis.py --target stock_v120

# Render a self-contained HTML report
scripts/render/report.py stock_v120

# (optional) Expose the warehouse over MCP for a client without shell access.
# The primary path is Claude Code running the tools + skills above directly.
uv run python scripts/mcp_server.py --build-dir ./build

See SETUP.md for toolchain prerequisites (Ghidra 11.2+ with PyGhidra, Python 3.11+, uv, Snakemake, DuckDB; optionally Renode and a cross-toolchain to build the test corpus).


Why the harness is the point

Most "LLM + Ghidra" tools feed a single decompiled function to a model and ask "what does this do?" — a fragment with no surrounding context. That starves the model exactly where embedded RE is hardest.

ripcord inverts it. The deterministic pipeline builds a rich, queryable context first; then Claude Code drives — running the CLI tools and skills (.claude/skills/) directly to pull precisely the tables, decompiled bodies, peripheral maps, and execution traces it needs, iteratively, while reasoning about the binary as a whole and building new tools mid-task when a target demands them. Reusable procedures harden into skills (firmware-bringup, execution-verify); execution-verified conclusions land in the contract ledger (scripts/contracts/ledger.py), which is the durable product. The single-shot API paths (scripts/analyze, the agent swarm) stay for cheap, scoped, measurable sub-tasks — fingerprint matching, bulk function labeling — where a fragment genuinely is enough. An MCP server remains as optional interop for a client that can't run the shell; it isn't the primary surface, because ripcord's data is local Parquet the driver already reads directly. Comprehension lives in the harness, not the access protocol.


What's in the warehouse

A snakemake --cores 4 --resources ghidra=1 run produces typed Parquet tables per target under build/<target>/tables/. Agent and validation stages add more. Highlights:

table grain
functions one row per Ghidra-discovered function (incl. body_hash)
calls / xrefs call sites; non-call references (reads, writes, jumps, data)
basic_blocks one row per CodeBlock, with containing function
strings defined strings in loaded memory
decompiled Ghidra decompiled pseudo-C, one row per function
pcode_features per-function P-Code opcode histogram + sequence hash
recovered_calls recovered indirect call edges (vector table, func ptr, …)
peripheral_xrefs SVD-resolved peripheral register accesses
mmio_events MemoryIORead/Write from a Renode trace, joinable by PC
unicorn_smoke per-function executability (catches code-vs-data misdecode)
ground_truth_functions nm -S symbols, the regression signal

All tables are auto-discovered as DuckDB views by scripts/query. The notes/queries/ directory holds committed SQL that doubles as executable documentation and regression tests.


Current state (2026-05)

Phase 0 complete; Phase 1 library-ID validated end-to-end including blind recovery on a stripped binary; Phase 3 agent swarm validated end-to-end. Renode trace capture and Datalog (Souffle) derivations are wired into the Snakemake DAG. Deep hierarchical analysis, context enrichment, and Unicorn execution-validation are built on top.

Fifteen targets across four build ecosystems live in the warehouse: 5 Raspberry Pi Pico (Cortex-M0+), 2 Zephyr (Cortex-M3), 1 stripped blind- recovery target, 3 AT32F403A reference builds (GCC + LLVM, the cross-compiler corpus), and 4 stock FNIRSI 2C53T firmware versions (V1.0.3–V1.2.0) — the primary target and its own differential ground truth.

A few empirical results that fell out (full list and provenance in CLAUDE.md → "Key empirical findings"):

  • Blind recovery on a stripped binary: 86.6% recall, 94.9% precision — 171/197 functions re-identified with zero symbols.
  • Computed-call recovery closes the reachability gap from 70% unreachable to 12% via five recovery mechanisms at ~95% blended precision.
  • Constant-based fingerprinting: 100% precision, cross-compiler.
  • Execution catches what static analysis can't — the Unicorn smoke test flags Ghidra decoding data as code, the #1 failure mode for raw imports.
  • The FNIRSI V1.0.3→V1.0.7 transition was a full architectural rewrite of the FPGA acquisition path (USART2-only → DMA/SPI3), confirmed by byte-identical FreeRTOS port code against a GCC reference build.

Related work

ripcord's individual ingredients all exist in the wild; the combination — a structured fact warehouse plus an execution-as-verification oracle plus a skills-driven Claude Code harness with a provenance-tracked contract ledger, pointed at comprehending an opaque binary — is the part I haven't found assembled elsewhere. Honest positioning:

  • LLM + disassembler tools (Gepetto, G-3PO, aiDAPal, DeGPT) mostly send a decompiled snippet to a model and write back a rename/comment. ripcord builds queryable context first, so the model never reasons from a context-free fragment.
  • Persistent structured state is no longer a differentiator. GhidrAssist (open source, a SQLite+graph knowledge DB with a 5-level hierarchy) and Binary Ninja Sidekick (commercial, with provenance and a background validation agent) both build it. ripcord's separation is that their validation is static — re-analysis and cross-reference queries — whereas ripcord gates every canonical claim on execution.
  • MCP-over-a-disassembler is table stakes — and not where ripcord's value is. GhidraMCP (9k+ stars) and IDA Pro MCP are mature; they expose live tool calls over a protocol. ripcord keeps an MCP surface only as optional interop — the driver (Claude Code) reads the local warehouse directly via the CLI, so the access protocol is incidental. What's behind the surface — a warehouse of execution-verified facts and the skills that produce it — is the interesting part.
  • Binary-analysis-as-a-database predates ripcordddisasm/GTIRB (which shares ripcord's Souffle/Datalog layer) and CodeQL. ripcord uses that technique; it didn't invent it.
  • Firmware rehosting (PRETENDER, P2IM, DICE, Fuzzware) already infers MMIO peripheral models from traces — but the deliverable is "enough model to fuzz," not a legible, falsifiable MCU↔peripheral protocol spec. Same input class, different output. ripcord aims at the legible boundary contract those tools leave on the table.
  • Matched-source decomp (decomp.me, the N64/PSX projects) verifies by byte-identical recompilation — a stricter oracle than ripcord's behavioral execution diff, but aimed at perfect source recovery, which ripcord explicitly is not trying to produce.
  • Closest precedent to the core thesis: Patrick Hulin's SimTower reimplementation put an LLM in a closed loop against a Unicorn emulator as ground truth — the same execution-as-oracle idea, as a one-off project rather than a general pipeline.

Where to go deeper


Scope, honesty, and the FPGA caveat

ripcord is deliberately generic. The scope firmware is the proving ground, not a license to hard-code 2C53T specifics into the core pipeline — target knowledge lives in notes/ and in queries, never in the extractors.

The FPGA timing code has no external ground truth. ripcord tags every claim with a provenance level and never presents inferred FPGA behavior as established fact: an internal dispatch/selector code is not a wire-level hardware transaction, and a value the firmware wrote is observed while a reply a stub invented is unverified until a hardware trace confirms it. That discipline is the whole reason the execution oracle exists.


License

MIT. Firmware binaries analyzed by the pipeline are not included in this repository; their licensing belongs to their original authors. The test corpus is built from open SDKs (Pico SDK, Zephyr, the AT32 SDK) or supplied by the user.

About

Firmware reverse-engineering pipeline: opaque binary in, queryable fact warehouse out. Ghidra + DuckDB + execution-verified analysis, driven by Claude Code skills. Primary target: FNIRSI 2C53T oscilloscope (AT32F403A + opaque Gowin FPGA).

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages