Skip to content

Latest commit

 

History

History
161 lines (121 loc) · 7.04 KB

File metadata and controls

161 lines (121 loc) · 7.04 KB

ITB C++ Binding — Easy Mode Benchmark

Security notice. ITB is an experimental symmetric cipher construction without prior peer review, independent cryptanalysis, or formal certification. The construction's security properties have not been verified by independent cryptographers or mathematicians.

PRF-grade hash functions are required. No warranty is provided.

No bespoke cryptography. ITB introduces no cryptographic primitive of its own — no custom S-box, permutation, or round function. It is a construction over existing primitives, much as PGP composes standard ciphers rather than defining one. Such constructions are not the object of algorithm-level cryptographic certification: national regimes (NIST CAVP/FIPS in the US, GOST/FSB in Russia, KCMVP in South Korea, OSCCA's SM-series in China, SOG-IS/EUCC and national lists in the EU, ASD's ISM in Australia) certify primitives and the modules built on them, not compositional schemes. Eligibility for regulated use is therefore inherited from the primitives ITB is configured with, not conferred by ITB itself.

Two executables (bench_single, bench_triple) cover the Easy Mode encryption / decryption surface exposed by the C++ binding through the C binding's static archive (-litb_c -litb):

  • bench_single.cpp — Single Ouroboros (mode = 1, 3 seeds + optional dedicated lockSeed). Walks PRF-grade primitives plus one mixed-primitive variant.
  • bench_triple.cpp — Triple Ouroboros (mode = 3, 7 seeds + optional dedicated lockSeed).

Both binaries pin 1024-bit ITB key width and 16 MiB CSPRNG-filled payload, run four ops per case (encrypt, decrypt, encrypt_auth, decrypt_auth), and emit a Go-bench-style line per case (name iters ns/op MB/s).

The harness is a custom Go-bench-style runner in common.hpp (no third-party bench framework — <chrono>::steady_clock and an inline xorshift64* LCG cover the timing and random-fill surfaces). One make bench invocation drives the whole compile.

Prerequisites

Build the shared library and the C binding's static archive first (the C++ binding consumes them at link time):

go build -trimpath -buildmode=c-shared \
    -o dist/linux-amd64/libitb.so ./cmd/cshared
cd bindings/c && make
cd ../cpp

A project-private opt-out tag is available when the 4-lane chain-absorb wrapper is dead weight (hosts without AVX-512+VL). The tag disables only the chain-absorb asm; upstream stdlib asm stays engaged so the per-pixel single Func runs at upstream-asm speed via process_cgo's nil-BatchHash fallback:

go build -trimpath -tags=noitbasm -buildmode=c-shared \
    -o dist/linux-amd64/libitb.so ./cmd/cshared

The C++ binding links libitb_c.a plus libitb.so at compile time; both must be in place before make bench succeeds. The Makefile's LDFLAGS resolves libitb_c.a from ../c/build/ and libitb.so from ../../dist/linux-amd64/ with -Wl,-rpath baked into the binary.

Run

From the binding root (bindings/cpp/):

make bench
./bench/build/bench_single
./bench/build/bench_triple

Both binaries land in bench/build/ after build. The four canonical passes (Single ±LockSeed, Triple ±LockSeed) run via:

./run_bench.sh

which is the canonical entry point that fills BENCH.md.

Environment variables

Variable Default Purpose
ITB_NONCE_BITS 128 Process-wide nonce width — 128, 256, or 512. Maps to itb::set_nonce_bits before any encryptor is constructed.
ITB_LOCKBATCH unset Non-empty / non-0 enables Lock Batch (performance Lock Soup mode); set with ITB_LOCKSEED. Every encryptor additionally calls enc.set_lock_batch(1). Inert unless Lock Soup is engaged via ITB_LOCKSEED.
ITB_LOCKSEED unset When set to a non-empty / non-0 value, every encryptor in the run calls enc.set_lock_seed(1). Easy Mode auto-couples set_bit_soup(1) + set_lock_soup(1), so no separate flags are needed. The mixed-primitive cases attach a dedicated lockSeed primitive (via prim_l) only under this flag; otherwise prim_l is empty so the no-LockSeed bench arm measures the plain mixed-primitive cost.
ITB_BENCH_FILTER unset Substring filter on bench-case names — only cases whose name contains the filter are run. Useful when iterating on one primitive / op.
ITB_BENCH_MIN_SEC 5.0 Minimum measured wall-clock seconds per case. The runner keeps doubling iteration count until the measured batch reaches the threshold, mirroring Go's -benchtime=Ns. The 5-second default absorbs the cold-cache / warm-up transient that distorts shorter measurement windows on the 16 MiB encrypt / decrypt path.

Worker count is fixed at itb::set_max_workers(0) (auto-detect), matching the cross-binding bench default.

Examples

Whole grid, default settings (128-bit nonces, no lockSeed):

./bench/build/bench_single

512-bit nonces with the dedicated lockSeed channel + auto-coupled overlay (the ITB_LOCKBATCH=1 form selects the Lock Batch performance variant of Lock Soup):

ITB_NONCE_BITS=512 ITB_LOCKSEED=1 ITB_LOCKBATCH=1 ./bench/build/bench_triple
ITB_NONCE_BITS=512 ITB_LOCKSEED=1 ./bench/build/bench_triple

Just the BLAKE3 row of the Single grid:

ITB_BENCH_FILTER=blake3_1024bit ./bench/build/bench_single

Only the encrypt-with-MAC ops across every primitive in the Triple grid, with a longer 10-second per-case budget for tighter confidence intervals:

ITB_BENCH_FILTER=encrypt_auth_16mb ITB_BENCH_MIN_SEC=10 \
    ./bench/build/bench_triple

Just the mixed-primitive cases on the Single side:

ITB_BENCH_FILTER=mixed ./bench/build/bench_single

Output format

# easy_single primitives=9 key_bits=1024 mac=hmac-blake3 nonce_bits=128 lockseed=off workers=auto
# benchmarks=40 payload_bytes=16777216 min_seconds=5
bench_single_aescmac_1024bit_encrypt_16mb               64    493210110.0 ns/op    32.44 MB/s
bench_single_aescmac_1024bit_decrypt_16mb               64    488104225.0 ns/op    32.78 MB/s
...

The four columns are:

  1. Bench-case name (snake-cased; mixed is the case appended after the primitive loop).
  2. Iteration count chosen to reach ITB_BENCH_MIN_SEC.
  3. Per-iter wall-clock cost in nanoseconds.
  4. Throughput in MB/s, derived from payload_bytes / ns_per_op.

Expected runtime

At the default ITB_BENCH_MIN_SEC=5, each pass walks 40 cases ( single-primitives + 1 mixed × 4 ops) and converges per case in 5-15 wall-clock seconds depending on the primitive's per-byte cost. A full pass therefore lands at 5-10 minutes; the four canonical passes (Single ±LockSeed, Triple ±LockSeed) fill BENCH.md in ~30 minutes of total wall-clock time. Filter to a single primitive (ITB_BENCH_FILTER=blake3_1024bit) for ~1-minute spot-check runs.

Recorded results

A snapshot of the four canonical pass results (Single + Triple, each with and without ITB_LOCKSEED=1) on Intel Core i7-11700K is collected in BENCH.md. The same file briefly discusses the FFI overhead the binding leaves on top of the native Go path through the C binding's static archive.