Skip to content

ChipFlow/vajax

Repository files navigation

VAJAX: GPU-Accelerated Analog Circuit Simulator

Tests GPU Tests Lint Benchmark

An open-source GPU-accelerated analog circuit simulator. Run your existing Verilog-A models — including production PDK models like PSP103 — on GPUs for dramatic speedups on large circuits.

Why VAJAX?

  • Use your existing models — Verilog-A models are compiled directly to GPU code via OpenVAF, no manual porting required
  • GPU acceleration where it matters — Large circuits (1000+ nodes) run 2.9x faster than C++ simulators on GPU
  • Drop-in analysis — DC, transient, AC, noise, transfer function, corner sweeps, and harmonic balance
  • Open source — Apache-2.0 licensed, no license servers

Coming from Spectre or ngspice? See the migration guide.

Help Us Improve

Are you an analog designer evaluating VAJAX? We'd love your feedback! Send us your circuits and simulation traces (from Spectre, ngspice, or any SPICE simulator) and we'll add them to our validation test suite. Open an issue or email us.

Current Status

VAJAX is in active development. All VACASK benchmark circuits are passing.

Documentation & benchmark results →

Validation: Three-Way Comparison

VAJAX results are validated against VACASK (a C++ reference circuit simulator) and ngspice (open-source SPICE simulator). All simulators use identical netlists and device models (PSP103 MOSFETs via OSDI).

RC Low-Pass Filter

Simple RC circuit demonstrating basic transient behavior. VAJAX matches VACASK and ngspice exactly.

RC Comparison

PSP103 Ring Oscillator

7-stage ring oscillator with production PSP103 MOSFET models. Shows excellent agreement in oscillation frequency and waveform shape.

Ring Oscillator Comparison

C6288 16-bit Multiplier

Large-scale benchmark with ~86,000 transistors (~5,000 nodes). Uses sparse solver for memory efficiency. Demonstrates VAJAX scaling to production-sized circuits.

C6288 Comparison

Mul64 64-bit Multiplier

Our largest benchmark: ~266,000 PSP103 MOSFETs with ~666,000 unknowns. Requires sparse solver and 24GB+ GPU VRAM. The fact that a JAX-based simulator can handle a circuit of this scale — with production MOSFET models — is a significant milestone.

Generate comparison plots:

uv run scripts/plot_three_way_comparison.py --benchmark ring --output-dir docs/images
uv run scripts/plot_three_way_comparison.py --benchmark c6288 --output-dir docs/images --skip-ngspice

Performance

VAJAX is designed for GPU acceleration of large circuits. The table below shows per-step timing against VACASK (C++ reference simulator) on CI runners.

CPU Performance (vs VACASK)

Benchmark Nodes Steps JAX (ms/step) VACASK (ms/step) Ratio RMS Error
rc 4 1M 0.012 0.002 6.6x 0.00%
graetz 6 1M 0.020 0.004 5.4x 0.00%
mul 8 500k 0.041 0.004 10.9x 0.00%
ring 47 20k 0.511 0.109 4.7x -
c6288 ~5000 1k 88.060 76.390 1.2x 2.01%
mul64 ~133k 15 8324.949 timeout -

GPU Performance

Benchmark Nodes JAX GPU (ms/step) JAX CPU (ms/step) GPU Speedup vs VACASK CPU
mul64 ~133k 648.00 8324.95 12.8x VACASK timeout
c6288 ~5000 19.81 88.06 4.4x 2.9x faster
ring 47 1.49 0.51 0.3x below threshold
rc 4 0.24 0.01 0.05x below threshold

GPU results for circuits below ~500 nodes are shown for completeness but are not meaningful performance comparisons — GPU kernel launch overhead dominates when the per-step computation is tiny. The GPU auto-threshold (gpu_threshold=500 nodes) prevents this in normal usage.

Performance Characteristics

Where VAJAX excels: Large circuits (1000+ nodes) on GPU, where matrix operations dominate and GPU parallelism pays off. The c6288 benchmark (16-bit multiplier, ~5000 nodes) runs 2.9x faster than VACASK on GPU. The mul64 benchmark (~266k transistors, ~133k nodes) achieves 12.8x GPU speedup over CPU — and VACASK times out entirely on this circuit.

Where VACASK is faster: Small circuits on CPU. VAJAX carries a per-step fixed overhead of ~5-12 microseconds from:

  • Adaptive timestep machinery: LTE estimation, voltage prediction, and variable-step BDF2 coefficient computation run every step regardless of circuit size.
  • Functional array updates: JAX requires jnp.where for conditional updates inside lax.while_loop, which evaluates both branches. VACASK uses C++ runtime branching that skips unused work.
  • Vmap batching: Device evaluation is vectorized with jax.vmap for GPU parallelism, but this adds overhead when evaluating only 2-4 device instances.
  • COO matrix assembly: Jacobian construction from coordinate format adds indirection that VACASK avoids with direct matrix stamping.

This overhead is negligible for large circuits (c6288: 0.01% of step time) but dominates for small ones (rc: ~80% of step time). See docs/performance_analysis.md for the full analysis.

Quick Start

Requires Python 3.11+ (3.11, 3.12, 3.13, and 3.14 are supported).

Install from PyPI

pip install vajax

# macOS alternative: brew install vajax

# Run a simulation
vajax circuit.sim

Install from Source (for development)

Requires uv.

git clone https://github.com/ChipFlow/vajax.git
cd vajax
uv sync

# Run tests
uv sync --extra test
JAX_PLATFORMS=cpu uv run pytest tests/ -v

# Run a benchmark
JAX_PLATFORMS=cpu uv run vajax benchmark ring

Installation Options

# With CUDA 12 support (Linux)
pip install "vajax[cuda12]"

# From source with CUDA 12
uv sync --extra cuda12

# From source with SAX integration
uv sync --extra sax

Command-Line Interface

VAJAX provides an ngspice-style CLI:

# Run simulation on a circuit file
vajax circuit.sim

# Specify output file and format
vajax circuit.sim -o results.raw
vajax circuit.sim -o results.csv --format csv

# Override analysis parameters
vajax circuit.sim --tran 1n 100u
vajax circuit.sim --ac dec 100 1k 1G

# Run benchmarks
vajax benchmark ring --profile

# System info
vajax info

See docs/cli_reference.md for full documentation.

Example: Transient Simulation

from vajax import CircuitEngine

# Load and parse a VACASK circuit file
engine = CircuitEngine("path/to/circuit.sim")
engine.parse()

# Prepare and run transient analysis
engine.prepare(t_stop=1e-6, dt=1e-9)
result = engine.run_transient()

# Access results
print(f"Simulated {len(result.times)} time points")
for node_name, voltages in result.voltages.items():
    print(f"  {node_name}: {voltages[-1]:.3f}V (final)")

Architecture Overview

vajax/
├── analysis/             # Circuit solvers and analysis engines
│   ├── engine.py        # CircuitEngine - main simulation API
│   ├── solver.py        # Newton-Raphson with lax.while_loop
│   ├── transient/       # Transient analysis (scan/loop strategies)
│   ├── ac.py            # AC small-signal analysis
│   ├── noise.py         # Noise analysis
│   ├── hb.py            # Harmonic balance
│   ├── xfer.py          # Transfer function (DCINC, DCXF, ACXF)
│   ├── corners.py       # PVT corner analysis
│   ├── homotopy.py      # Convergence aids (GMIN, source stepping)
│   └── sparse.py        # JAX sparse matrix operations (BCOO/BCSR)
│
├── devices/              # Device models
│   ├── vsource.py       # Voltage/current source waveforms
│   └── verilog_a.py     # OpenVAF Verilog-A wrapper
│
├── netlist/              # Circuit representation
│   ├── parser.py        # VACASK netlist parser
│   └── circuit.py       # Circuit data structures
│
└── benchmarks/           # Benchmark infrastructure
    ├── registry.py      # Auto-discovery of benchmarks
    └── runner.py        # VACASK benchmark runner

Key Design Principles

  1. Verilog-A to JAX compilation: Device models are compiled from Verilog-A source to JAX functions via OpenVAF's IR — residuals and Jacobians are computed explicitly, no hand-written derivatives needed
  2. Vectorized evaluation: Devices grouped by type and evaluated in parallel with jax.vmap
  3. GPU-first hot path: Simulation loops use lax.while_loop and lax.scan to stay on-device
  4. Sparse scalability: Auto-switches to sparse matrices for large circuits

For architecture details, see the developer guide.

Device Model Interface

All devices are compiled from Verilog-A sources using OpenVAF. Device models are batched and evaluated in parallel using jax.vmap for GPU efficiency.

# Devices are loaded from Verilog-A via OpenVAF
# Example from a .sim netlist file:
load "resistor.va"      # SPICE resistor model
load "capacitor.va"     # SPICE capacitor model
load "psp103.va"        # PSP103 MOSFET model

model r resistor
model c capacitor
model nmos psp103va

Supported Devices

Device Source Description
Resistor resistor.va SPICE resistor with temperature coefficients
Capacitor capacitor.va Ideal capacitor
Diode diode.va SPICE diode model
VSource Built-in DC, pulse, sine, PWL voltage sources
ISource Built-in DC, pulse current sources
PSP103 psp103.va Production MOSFET model (OpenVAF)
Any VA OpenVAF Any Verilog-A model compiled to JAX

Analysis Types

All analyses are accessed through CircuitEngine:

from vajax import CircuitEngine

engine = CircuitEngine("circuit.sim")
engine.parse()

Transient Analysis

# Prepare and run transient simulation
engine.prepare(
    t_stop=1e-6,      # Stop time
    dt=1e-9,          # Time step
    use_sparse=True,  # Use sparse solver for large circuits
)
result = engine.run_transient()

# Access results
times = result.times           # Array of time points
voltages = result.voltages     # Dict of node_name -> voltage array

DC Sweep

# Sweep V1 from 0 to 1V
dc_result = engine.run_dc_sweep(
    source="v1",       # Source to sweep
    start=0.0,         # Start value (V or A)
    stop=1.0,          # Stop value
    points=101,        # Number of points
)

# Access results
sweep_values = dc_result.sweep_values   # Array of swept values
voltages = dc_result.voltages           # Dict of node_name -> voltage array
currents = dc_result.currents           # Dict of source_name -> current array

AC Analysis

# Small-signal frequency response
ac_result = engine.run_ac(
    freq_start=1e3,   # Start frequency (Hz)
    freq_stop=1e9,    # Stop frequency (Hz)
    points=100,       # Points per decade
    mode='dec',       # 'dec', 'lin', 'oct', or 'list'
)

Noise Analysis

# Compute noise across frequency
noise_result = engine.run_noise(
    freq_start=1e3,
    freq_stop=1e9,
    input_source="vin",
    out="vout",
)

Corner Analysis (PVT Sweep)

from vajax.analysis.corners import create_pvt_corners

# Create PVT corners (3x3x3 = 27 combinations)
corners = create_pvt_corners(
    processes=['FF', 'TT', 'SS'],
    voltages=[0.9, 1.0, 1.1],
    temperatures=['cold', 'room', 'hot'],
)

# Run across all corners
engine.prepare(t_stop=1e-6, dt=1e-9)
results = engine.run_corners(corners)

Transfer Function Analysis

# DC incremental (small-signal gain)
dcinc_result = engine.run_dcinc()

# DC transfer function
dcxf_result = engine.run_dcxf(out="vout")

# AC transfer function
acxf_result = engine.run_acxf(out="vout", freq_start=1e3, freq_stop=1e9)

Verilog-A Integration

VAJAX compiles Verilog-A models directly to JAX functions via OpenVAF's intermediate representation. This means production PDK models (PSP103, BSIM, EKV, etc.) run natively on GPU without manual porting:

from vajax.devices.verilog_a import compile_va, VerilogADevice

# Compile a Verilog-A model to a JAX-compatible function
model = compile_va("psp103.va")

# Devices are typically instantiated via CircuitEngine from a .sim netlist,
# but can also be created directly for testing:
device = VerilogADevice(model, params={"type": 1, "vth0": 0.4, ...})

See docs/vacask_osdi_inputs.md for details on the OpenVAF integration.

Running Benchmarks

# Run specific benchmark
JAX_PLATFORMS=cpu uv run vajax benchmark ring

# Profile with GPU
JAX_PLATFORMS=cuda uv run python scripts/profile_gpu.py --benchmark ring

# Run all VACASK suite tests
JAX_PLATFORMS=cpu uv run pytest tests/test_vacask_suite.py -v

Platform Notes

Linux

Full support. CUDA GPU acceleration with pip install "vajax[cuda12]".

macOS

CPU backend. GPU acceleration via Metal is in development.

Windows (GPU via WSL2)

VAJAX runs natively on Windows for CPU simulations (pip install vajax). For GPU acceleration, use WSL2 — JAX's CUDA backend requires Linux. Most analog designers are comfortable with Linux environments, and WSL2 provides native GPU passthrough with no performance penalty:

# 1. Install WSL2 (one-time, from PowerShell as admin)
wsl --install -d Ubuntu-24.04

# 2. Inside WSL2, install VAJAX with CUDA
pip install "vajax[cuda12]"

# 3. Verify GPU detection
python -c "import jax; print(jax.devices())"
# [CudaDevice(id=0)]

# 4. Run simulations — your Windows filesystem is at /mnt/c/
vajax /mnt/c/Users/you/circuits/my_circuit.sim

Your NVIDIA GPU driver on Windows automatically provides CUDA support inside WSL2 — no separate Linux CUDA driver install needed.

Precision

Auto-configured based on backend:

  • CPU/CUDA: Float64 enabled for numerical precision
  • Use vajax.configure_precision(force_x64=True/False) to override

Documentation

  • docs/getting_started.md - Getting Started guide with installation and first simulation
  • docs/for_spectre_users.md - For Spectre/ngspice Users — concept mapping and migration guide
  • docs/api_reference.md - API Reference (CircuitEngine, result types, I/O)
  • docs/cli_reference.md - Command-line interface reference
  • docs/architecture_overview.md - System architecture and design
  • docs/performance_analysis.md - Performance analysis and overhead breakdown
  • docs/gpu_solver_architecture.md - Detailed solver design and optimization
  • docs/gpu_solver_jacobian.md - Jacobian computation details
  • docs/debug_tools.md - Debug utilities reference
  • docs/vacask_osdi_inputs.md - OpenVAF/OSDI input handling
  • docs/vacask_sim_format.md - VACASK simulation file format
  • TODO.md - Development roadmap and known issues

Contributing

See CONTRIBUTING.md for development setup and guidelines.

License

Apache-2.0

About

JAX-based SPICE simulator with OpenVAF Verilog-A support

Resources

Contributing

Stars

Watchers

Forks

Packages

 
 
 

Contributors