A declarative, reproducible generator of Java garbage-collection logs.
From a typed YAML scenario describing a (JVM, GC algorithm, application regime) tuple, GC-Forge runs a parameterised
workload on a real Java Virtual Machine, captures the unified
-Xlog:gc* output, and emits a hash-anchored manifest sufficient
to reproduce the run on another host.
GC-Forge is the counterpart of GC-Insight: where Insight
analyses GC logs, Forge produces them on demand with documented
ground truth. The contract surface between the two projects is
documented in
doc/concepts/traceability.md.
GC-Forge is not a benchmarking tool. It does not measure application throughput or latency; it exhibits the GC behaviour of a workload, and only that. Fidelity is established by construction (real Temurin JVM, parameterised harness), not by a statistical model.
Version 0.2.1 — extended MVP release.
| Item | Coverage |
|---|---|
| Application regimes | 7 (steady-state, burst, humongous, slow-leak, cache-churn, mixed-GC pathological, microservice stop-and-go) |
| GC collectors | 6 (G1, ZGC generational and non-generational, Parallel, Shenandoah, Serial, Epsilon) |
| JVM distributions | Eclipse Temurin 17 and 21 |
| Shipped presets | 21 (embedded in the binary) |
| CLI subcommands | lint, run, validate, batch, presets, selftest, variance-check |
| Wire formats | gc-forge/scenario.v1, gc-forge/run-manifest.v1, gc-forge/matrix.v1 (JSON Schema draft 2020-12) |
| Runner | Docker (native runner planned for V1) |
| Licence | MIT |
A summary of the rationale behind these choices is given in
doc/concepts/overview.md.
A single gc-forge run invocation produces two files: the raw
GC log (verbatim Temurin output) and a manifest that documents
how it was produced.
gc-forge run presets/steady-shenandoah-baseline.yaml \
--image gc-forge-runner:dev-jdk21 \
--embedded-harness /opt/gc-forge/harness.jarThe log — a verbatim -Xlog:gc* capture, byte-identical to
what the same JVM would write in production:
[2026-04-27T06:59:21.901+0000][1][7][info][gc] Using Shenandoah
[2026-04-27T06:59:21.901+0000][1][7][info][gc,init] Heap Min Capacity: 2G
[2026-04-27T06:59:21.901+0000][1][7][info][gc,init] Heap Initial Capacity: 2G
[2026-04-27T06:59:21.901+0000][1][7][info][gc,init] Heap Max Capacity: 2G
[2026-04-27T06:59:23.184+0000][1][9][info][gc ] GC(0) Concurrent reset
[2026-04-27T06:59:23.187+0000][1][9][info][gc ] GC(0) Pause Init Mark 0.512ms
[2026-04-27T06:59:23.193+0000][1][9][info][gc ] GC(0) Concurrent marking
[2026-04-27T06:59:23.231+0000][1][9][info][gc ] GC(0) Pause Final Mark 0.398ms
…
The manifest — a YAML identity card sufficient to reproduce the run elsewhere:
apiVersion: gc-forge/run-manifest.v1
kind: RunManifest
run:
id: 019dc617-5cd3-7272-9ddc-48b2c9b96e31
started_at: 2026-04-27T06:59:21.842Z
ended_at: 2026-04-27T07:00:51.108Z
duration_actual: 90s
exit_status: { kind: success }
host: { os: linux, arch: aarch64, cpu_count: 16, container: docker:gc-forge-runner:dev-jdk21 }
scenario:
source_path: presets/steady-shenandoah-baseline.yaml
source_sha256: bf39b4e9272cb0bde6e390b28e472f4270d3071b7c3c007b94d133150e6f4476
resolved: { … fully merged scenario after extends/overrides … }
jvm:
vendor: temurin
version: 21.0.8+9
flags: [-Xms2g, -Xmx2g, -XX:+UseShenandoahGC, …]
reproducibility:
seed: 0xC0FFEE
workload_jar_sha256: 6e9c3cdf4f9c…
gc_forge_version: 0.2.1+93f707e
output:
log_path: out/steady-shenandoah-baseline-c0ffee.log
log_sha256: a2dca41f7c5e…
log_size_bytes: 220487
expected_phenomena: [young_gc_steady]
expected_invariants: [{ rule: "full_count == 0", threshold: 0 }, …]
validation:
status: passed
results: [ … per-rule report … ]Every flag, hash, and host attribute is recorded; two operators running the same preset with the same Docker image obtain manifests that differ only by run UUID and wall-clock timestamps.
# Build the CLI and the runner image once.
make build
make docker-image
# Lint, then run a shipped preset.
gc-forge lint presets/steady-g1-baseline.yaml
gc-forge run presets/steady-g1-baseline.yaml \
--image gc-forge-runner:dev-jdk21 \
--embedded-harness /opt/gc-forge/harness.jar
# Re-check the produced log against its manifest.
gc-forge validate out/steady-g1-baseline-c0ffee.log \
--manifest out/steady-g1-baseline-c0ffee.manifest.yamlA guided walk-through of the same path, with troubleshooting
notes, is in
doc/user/getting-started.md.
GC-Forge is designed for four populations.
| You are… | You get… |
|---|---|
| Building a GC-log analyser | A reproducible corpus with machine-checkable ground truth replacing hand-curated fixtures. |
| Demonstrating GC behaviour to a customer | A 21-preset catalogue covering every regime that comes up in a sales conversation. |
| Writing an article or training material | A cited, reproducible source for every log embedded in your content. |
| Building an ML dataset of labelled traces | The gc-forge batch matrix runner produces logs in bulk, each labelled by the regime that produced it. |
A few commands worth knowing, beyond the quickstart:
Run all 21 presets and check their invariants (full pass under three minutes by default):
gc-forge selftest \
--image gc-forge-runner:dev-jdk21 \
--embedded-harness /opt/gc-forge/harness.jarSweep a preset across collectors and seeds (cartesian product, with a CSV index of every cell):
# matrix.yaml
apiVersion: gc-forge/matrix.v1
kind: Matrix
spec:
base: presets/steady-g1-baseline.yaml
axes:
spec.gc.algorithm: [G1, ZGC, Parallel, Shenandoah]
seeds: [1, 2, 3]gc-forge batch matrix.yaml --out-dir out/sweep \
--image gc-forge-runner:dev-jdk21 \
--embedded-harness /opt/gc-forge/harness.jarQuantify inter-run variance (fails non-zero if budgets are exceeded):
gc-forge variance-check steady-g1-baseline --runs 5 \
--image gc-forge-runner:dev-jdk21 \
--embedded-harness /opt/gc-forge/harness.jarTweak a preset on the fly without copying the YAML:
gc-forge run presets/humongous-g1-classic.yaml \
--override 'spec.gc.options.heap.max=4g' \
--override 'spec.regime.parameters.humongous_ratio=0.7' \
--image gc-forge-runner:dev-jdk21 \
--embedded-harness /opt/gc-forge/harness.jarFor 0.2.1 the supported install path is from source. Prebuilt Linux/macOS binaries (x86_64 and aarch64) and a Homebrew tap are scheduled for a later release.
git clone https://github.com/be-next/GC-Forge.git
cd GC-Forge
make build # cargo build --release + Maven shade of the harness
make docker-image # eclipse-temurin:21-jdk-jammy + harness embeddedmake build produces target/release/gc-forge. Add it to your
$PATH or invoke it directly. To target Temurin 17 instead of
21, build the alternate image with make docker-image JDK_MAJOR=17 and pass --image gc-forge-runner:dev-jdk17.
Prerequisites: a POSIX shell, Docker Engine or Docker Desktop, Rust 1.94+, Maven 3.9+, JDK 17+.
The documentation is organised by audience.
| Document | Purpose |
|---|---|
getting-started.md |
Install, run a first preset, read the manifest, troubleshoot. |
regimes.md |
Seven regimes, parameters, expected signatures, shipped presets. |
scenario-reference.md |
YAML schema, extends, --override syntax. |
cli-reference.md |
Every subcommand, flags, defaults, exit codes. |
| Document | Purpose |
|---|---|
doc/architecture.md |
Single-page system architecture. |
doc/concepts/overview.md |
Design goals and non-goals. |
doc/concepts/traceability.md |
Phenomenon × preset × analyser-detector matrix. |
doc/process/orchestration.md |
Development process, role rotation, DoD gate. |
CONTRIBUTING.md |
Contribution conventions. |
CHANGELOG.md |
Release history. |
These framing documents capture the structural decisions and remain authoritative for future evolution. Day-to-day use of GC-Forge does not require reading them.
doc/specs/SPEC-FUNCTIONAL.md— functional specification.doc/specs/SPEC-TECHNICAL.md— technical specification.doc/specs/ROADMAP.md— phased delivery plan.doc/specs/BACKLOG.md— epics and user stories.doc/specs/RISKS.md— risk register and open points.
GC-Forge consists of a Rust workspace that compiles to a single
gc-forge binary, and a Java workload harness packaged as a
shaded JAR. The two artefacts communicate exclusively through
the JVM command line: GC-Forge produces an argv for java,
the JVM emits a unified -Xlog:gc* log to a file, and GC-Forge
reads the file back. There is no embedded protocol.
The Rust workspace is split into six crates with an acyclic dependency graph:
scenario ← regimes ← runner ← validate ← presets ← cli
A full description, including the runner subsystem and the wire
formats, is given in doc/architecture.md.
If GC-Forge contributes to a published work, please cite it as:
Ramette, J. (2026). GC-Forge: a declarative generator of Java garbage-collection logs (Version 0.2.1). MIT licence. Available at https://github.com/be-next/GC-Forge.
A versioned BibTeX entry will be added once a DOI is assigned.
Contributions are welcome under the project's MIT licence; see
CONTRIBUTING.md for the conventions used
by the project, including the documentation language policy
(English throughout), the Definition-of-Done gate, and the
branching model.
Bug reports are tracked in GitHub Issues. A bug report is most useful when accompanied by the relevant section of the manifest produced by the failing run.
MIT.