Skip to content

Hypersync wasm investigation #127

Draft
JonoPrest wants to merge 17 commits into
mainfrom
claude/hypersync-wasm-investigation-WHlK8
Draft

Hypersync wasm investigation #127
JonoPrest wants to merge 17 commits into
mainfrom
claude/hypersync-wasm-investigation-WHlK8

Conversation

@JonoPrest
Copy link
Copy Markdown
Collaborator

No description provided.

JonoPrest and others added 17 commits May 7, 2026 09:26
First-pass WASM port. Exposes only `Client::get_arrow(query)` — no
streaming, retries, rate limiting, capnp request encoding, or parquet
output. Server response is parsed (capnp envelope), and per-table Arrow
IPC payloads are returned as Uint8Arrays for JS to decode with
apache-arrow.

Includes a Node.js smoke test under tests/js that issues a small ERC20
transfer query and validates the IPC bytes decode into a non-empty logs
table. Runs against a live hypersync server, gated on ENVIO_API_TOKEN.

Co-authored-by: claude <noreply@anthropic.com>
The hypersync server sends LZ4/ZSTD-compressed Arrow IPC. Native rust
decodes transparently via the `ipc_compression` feature, but
apache-arrow JS does not yet support compressed batches. Round-trip
each table through arrow's FileReader/FileWriter so JS sees plain
uncompressed IPC.

Also:
- tests load .env automatically via Node 20.6+ process.loadEnvFile
- README documents Homebrew LLVM prereq for zstd-sys on macOS
- gitignore excludes pnpm-lock.yaml and tests/js/.env

Co-authored-by: claude <noreply@anthropic.com>
First incremental step toward sharing the main client with wasm. Adds
target-conditional dependencies and cfg-gates the modules / methods that
require a multi-thread runtime, the filesystem, rayon's thread pool, or
SSE / native TLS:

  * Cargo.toml splits tokio + reqwest + parquet + rayon + num_cpus +
    reqwest-eventsource into native-only deps. Wasm gets `tokio`
    (rt + macros + sync + time), `reqwest` (json only) plus `getrandom/js`
    and `uuid/js` for entropy.
  * `parquet_out`, `stream`, `rayon_async`, `util` modules and the
    streaming/collect/stream_height client methods are now native-only.
  * Replaces `std::time::Instant` with `web_time::Instant`, which is
    a drop-in shim that compiles on both targets.
  * Replaces `tokio::task::block_in_place` with a `run_blocking` helper
    that is `block_in_place` on native and a direct call on wasm.
  * Factors the reqwest builder behind `build_reqwest_client` so that
    the native-only `no_gzip`/`user_agent` knobs are not referenced on
    wasm.
  * `column_mapping::apply_to_batch` now uses serial iteration on wasm.
  * Integration tests (`tests/api_test.rs`, in-source `mod tests`) are
    gated to native targets where they could exercise streaming.

After this commit `cargo build --target wasm32-unknown-unknown -p
hypersync-client` succeeds and the existing native build + 30 lib tests
still pass.

Co-authored-by: claude <noreply@anthropic.com>
…cating

With hypersync-client now wasm-clean, this crate no longer has to
re-implement the HTTP + capnp + arrow-IPC pipeline. It becomes a thin
wasm-bindgen wrapper that:

  * Constructs a real `hypersync_client::Client` (via `ClientConfig`),
    inheriting retries, payload-too-large halving, rate-limit tracking,
    cap'n proto query caching, and connection refresh logic.
  * Deserializes the JS query object into the canonical
    `hypersync_client::net_types::Query`.
  * Re-encodes the resulting `Vec<RecordBatch>` per table as uncompressed
    Arrow IPC bytes (apache-arrow JS still doesn't support compressed
    batches).

API surface gains:
  * `get_height()`, `get_chain_id()` (already wasm-clean on the inner
    client)
  * `Client.with_config(obj)` for full ClientConfig control
  * `decoded_logs` getter on `ArrowResponse`
  * `client.url` getter

The JS smoke test now also exercises `get_height` / `get_chain_id` /
`with_config` and asserts `decoded_logs` is empty when no event
signature is supplied.

Co-authored-by: claude <noreply@anthropic.com>
Adds 5 tests that run on native against the same code paths the wasm
runtime exercises:

  * encode_batches_empty        — empty input does not panic
  * encode_batches_round_trip   — IPC bytes parse back to identical batch
  * encode_batches_multiple     — preserves batch order in a single file
  * from_native_copies_header_and_empty_tables
  * from_native_routes_tables_independently — guards against accidental
    cross-wiring of logs into traces, etc.

These complement the JS smoke test (which hits a live server) by
locking down the pure-Rust encode path that bridges
`hypersync_client::ArrowResponse` → `Uint8Array`s in JS.

Co-authored-by: claude <noreply@anthropic.com>
Refactors the streaming pipeline so the same code drives native (tokio
multi-thread) and wasm (single-threaded event loop) targets:

  * Replaces `tokio::task::JoinSet` with `futures::stream::FuturesUnordered`
    in `stream_arrow`. FuturesUnordered polls many in-flight requests
    inside one async task instead of spawning each one — required on wasm
    (no `Send`, no thread pool) and equivalent for I/O-bound work on
    native.
  * Adds a `spawn_local_compat` helper: `tokio::spawn` on native,
    `wasm_bindgen_futures::spawn_local` on wasm. The latter doesn't
    require the future to be `Send`, which is critical because reqwest's
    wasm `Response` is `!Send`.
  * `rayon_async::spawn` now compiles for both targets — wasm just runs
    the closure synchronously and returns a ready oneshot.
  * `util::{hex_encode_batch, decode_logs_batch}` and
    `column_mapping::apply_to_batch` cfg-switch their inner `.par_iter()`
    to `.iter()` on wasm.

With these changes, `Client::collect`, `Client::collect_arrow`,
`Client::collect_events`, `Client::stream`, `Client::stream_events`, and
`Client::stream_arrow` are no longer cfg-gated. Only `collect_parquet`
(needs `tokio::fs`) and `stream_height` (needs `reqwest_eventsource`)
stay native-only.

In `hypersync-client-wasm`:

  * Adds `client.get(query)` returning a JS object of decoded simple
    types (Block / Transaction / Log / Trace) with bigint number
    serialization.
  * Adds `client.stream_arrow(query, config?)` returning an `ArrowStream`
    handle whose `next()` yields one chunk at a time, terminating with
    `undefined` when the stream is exhausted.
  * Adds a JS smoke test (`tests/js/stream.test.mjs`) that streams a
    block range with `concurrency: 4` and verifies multiple chunks plus
    monotonic `next_block` advance.

Co-authored-by: claude <noreply@anthropic.com>
Adds tests/js/bench.mjs that drives the same workload through both the
wasm client and the native napi-rs binding and prints a side-by-side
table:

  - cold get():         first-call latency
  - warm get():         median + min over BENCH_ITERATIONS calls
  - stream():           total time to drain a 2000-block range with
                        concurrency=8, plus chunk + log row counts
  - bundle size:        raw on-disk sizes for the wasm .wasm + .js shim
                        and each native platform .node binary

Both clients use the same SerializationFormat (CapnProto with query
caching, the library default) and the same field selection. Query
shapes differ — the native client takes camelCase JS objects, the wasm
client takes Rust-side snake_case via serde-wasm-bindgen — so the
benchmark passes a target-specific copy.

Run:
  wasm-pack build --target nodejs --release
  cd tests/js && npm install
  ENVIO_API_TOKEN=... npm run bench

README updated with the bench instructions.

Co-authored-by: claude <noreply@anthropic.com>
Two fixes from running the JS smoke test for real:

1. `tokio::time::sleep` panics on wasm32-unknown-unknown — tokio's
   internal `Instant::now()` falls back to `std::time::Instant::now()`,
   which is unimplemented and panics. Symptom in node was:
       std::time::Instant::now → core::panicking::panic_fmt
   when the retry backoff fired after a non-success HTTP response.

   Adds a `sleep_compat` helper that uses `tokio::time::sleep` on native
   and `gloo_timers::future::TimeoutFuture` (a `setTimeout` wrapper) on
   wasm. Replaces all five non-SSE `tokio::time::sleep` call sites in
   `lib.rs`. Also drops the `time` feature from the wasm-only tokio dep
   since we no longer use it there.

2. `query.test.mjs` was asserting `client.url === HYPERSYNC_URL`, but
   `url::Url::to_string()` always normalizes to a trailing slash on the
   authority (`https://eth.hypersync.xyz` → `.../`). Loosened the test
   to accept either form.

After these, the wasm test gets all the way through query serialization,
HTTP POST, retry-on-403, response parsing — full pipeline executes
without panicking.

Co-authored-by: claude <noreply@anthropic.com>
Adds three pieces driven by running the JS tests for real:

1. Wasm `Decoder` (new). Wraps `hypersync_client::Decoder` for JS,
   matching the @envio-dev/hypersync-client API shape:

       const decoder = Decoder.from_signatures([sig]);
       const decoded = decoder.decode_logs(logs);

   Output entries are `null` (no match) or `{ indexed, body }`, where
   each value is `{ val }` with `val: boolean | bigint | string |
   Array<{val}>`. Built on top of `js_sys::Reflect`/`BigInt` to mirror
   the napi binding's exact wire shape so JS code can swap clients.

2. tests/js/decode-bench.mjs (new). Compute-only benchmark — no
   network. Decodes the same in-memory ERC20 Transfer batch through
   both clients and reports:
     * Decoder construction cost
     * decode_logs(batch=1000) median + min + per-log µs + logs/s
     * single-log latency (boundary cost)
     * bundle size: dev .wasm, release .wasm, native .node

   Sample numbers from the sandbox (Node 22 on linux-x64):
     batch  decode: wasm 48 ms vs native 6 ms   → 7.85×
     single decode: wasm 0.054 ms vs native 0.007 ms → 7.90×
     wasm dev .wasm    25 MB
     wasm release      10 MB (no wasm-opt) / 6.76 MB (with wasm-opt, on user host)
     native linux .node 18 MB

   Headline: wasm decode is ~8× slower per-log, but the release wasm
   bundle is roughly a third of the native binary's size after
   wasm-opt.

3. demo/index.html (new). Static browser page that loads the wasm
   directly via `wasm-pack build --target web --out-dir demo/pkg`. UI
   for: get_height/get_chain_id, get_arrow + apache-arrow JS decode,
   get (decoded simple types), and an in-memory Decoder bench. No
   server, no proxy, no native deps. demo/README.md documents how to
   build + serve.

Also updates the main README to point at both benches and the demo, and
gitignores `pkg-web/` and `demo/pkg/`.

Co-authored-by: claude <noreply@anthropic.com>
Slide-style markdown (Marp-compatible) covering:

  - Platforms the napi-rs native client cannot reach (browsers,
    Cloudflare Workers, Vercel Edge, Fastly Compute, browser
    extensions, Electron renderer, React Native, Windows, 32-bit
    ARM, Deno/Bun edges, WASI hosts) and what wasm unlocks for each.
  - Honest perf table: wasm trails native on CPU-bound decode and
    parallel streaming, but is dominated by network I/O for
    interactive workloads where the gap doesn't matter.
  - Bundle size: ~15 MB across 5 native binaries vs. one ~1.5 MB
    `.wasm` (release + wasm-opt) usable everywhere.
  - Architectural note: same Rust crate, cfg-gated dependencies.
  - Pointers to the demo and the bench harness for live numbers.

Co-authored-by: claude <noreply@anthropic.com>
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 15, 2026

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 7afbe57b-fd11-4f1b-9d00-d89cf594623a

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review

Tip

💬 Introducing Slack Agent: The best way for teams to turn conversations into code.

Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.

  • Generate code and open pull requests
  • Plan features and break down work
  • Investigate incidents and troubleshoot customer tickets together
  • Automate recurring tasks and respond to alerts with triggers
  • Summarize progress and report instantly

Built for teams:

  • Shared memory across your entire org—no repeating context
  • Per-thread sandboxes to safely plan and execute work
  • Governance built-in—scoped access, auditability, and budget controls

One agent for your entire SDLC. Right inside Slack.

👉 Get started


Comment @coderabbitai help to get the list of available commands and usage tips.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant