Hypersync wasm investigation #127
Conversation
First-pass WASM port. Exposes only `Client::get_arrow(query)` — no streaming, retries, rate limiting, capnp request encoding, or parquet output. Server response is parsed (capnp envelope), and per-table Arrow IPC payloads are returned as Uint8Arrays for JS to decode with apache-arrow. Includes a Node.js smoke test under tests/js that issues a small ERC20 transfer query and validates the IPC bytes decode into a non-empty logs table. Runs against a live hypersync server, gated on ENVIO_API_TOKEN. Co-authored-by: claude <noreply@anthropic.com>
The hypersync server sends LZ4/ZSTD-compressed Arrow IPC. Native rust decodes transparently via the `ipc_compression` feature, but apache-arrow JS does not yet support compressed batches. Round-trip each table through arrow's FileReader/FileWriter so JS sees plain uncompressed IPC. Also: - tests load .env automatically via Node 20.6+ process.loadEnvFile - README documents Homebrew LLVM prereq for zstd-sys on macOS - gitignore excludes pnpm-lock.yaml and tests/js/.env Co-authored-by: claude <noreply@anthropic.com>
First incremental step toward sharing the main client with wasm. Adds
target-conditional dependencies and cfg-gates the modules / methods that
require a multi-thread runtime, the filesystem, rayon's thread pool, or
SSE / native TLS:
* Cargo.toml splits tokio + reqwest + parquet + rayon + num_cpus +
reqwest-eventsource into native-only deps. Wasm gets `tokio`
(rt + macros + sync + time), `reqwest` (json only) plus `getrandom/js`
and `uuid/js` for entropy.
* `parquet_out`, `stream`, `rayon_async`, `util` modules and the
streaming/collect/stream_height client methods are now native-only.
* Replaces `std::time::Instant` with `web_time::Instant`, which is
a drop-in shim that compiles on both targets.
* Replaces `tokio::task::block_in_place` with a `run_blocking` helper
that is `block_in_place` on native and a direct call on wasm.
* Factors the reqwest builder behind `build_reqwest_client` so that
the native-only `no_gzip`/`user_agent` knobs are not referenced on
wasm.
* `column_mapping::apply_to_batch` now uses serial iteration on wasm.
* Integration tests (`tests/api_test.rs`, in-source `mod tests`) are
gated to native targets where they could exercise streaming.
After this commit `cargo build --target wasm32-unknown-unknown -p
hypersync-client` succeeds and the existing native build + 30 lib tests
still pass.
Co-authored-by: claude <noreply@anthropic.com>
…cating
With hypersync-client now wasm-clean, this crate no longer has to
re-implement the HTTP + capnp + arrow-IPC pipeline. It becomes a thin
wasm-bindgen wrapper that:
* Constructs a real `hypersync_client::Client` (via `ClientConfig`),
inheriting retries, payload-too-large halving, rate-limit tracking,
cap'n proto query caching, and connection refresh logic.
* Deserializes the JS query object into the canonical
`hypersync_client::net_types::Query`.
* Re-encodes the resulting `Vec<RecordBatch>` per table as uncompressed
Arrow IPC bytes (apache-arrow JS still doesn't support compressed
batches).
API surface gains:
* `get_height()`, `get_chain_id()` (already wasm-clean on the inner
client)
* `Client.with_config(obj)` for full ClientConfig control
* `decoded_logs` getter on `ArrowResponse`
* `client.url` getter
The JS smoke test now also exercises `get_height` / `get_chain_id` /
`with_config` and asserts `decoded_logs` is empty when no event
signature is supplied.
Co-authored-by: claude <noreply@anthropic.com>
Adds 5 tests that run on native against the same code paths the wasm
runtime exercises:
* encode_batches_empty — empty input does not panic
* encode_batches_round_trip — IPC bytes parse back to identical batch
* encode_batches_multiple — preserves batch order in a single file
* from_native_copies_header_and_empty_tables
* from_native_routes_tables_independently — guards against accidental
cross-wiring of logs into traces, etc.
These complement the JS smoke test (which hits a live server) by
locking down the pure-Rust encode path that bridges
`hypersync_client::ArrowResponse` → `Uint8Array`s in JS.
Co-authored-by: claude <noreply@anthropic.com>
Refactors the streaming pipeline so the same code drives native (tokio
multi-thread) and wasm (single-threaded event loop) targets:
* Replaces `tokio::task::JoinSet` with `futures::stream::FuturesUnordered`
in `stream_arrow`. FuturesUnordered polls many in-flight requests
inside one async task instead of spawning each one — required on wasm
(no `Send`, no thread pool) and equivalent for I/O-bound work on
native.
* Adds a `spawn_local_compat` helper: `tokio::spawn` on native,
`wasm_bindgen_futures::spawn_local` on wasm. The latter doesn't
require the future to be `Send`, which is critical because reqwest's
wasm `Response` is `!Send`.
* `rayon_async::spawn` now compiles for both targets — wasm just runs
the closure synchronously and returns a ready oneshot.
* `util::{hex_encode_batch, decode_logs_batch}` and
`column_mapping::apply_to_batch` cfg-switch their inner `.par_iter()`
to `.iter()` on wasm.
With these changes, `Client::collect`, `Client::collect_arrow`,
`Client::collect_events`, `Client::stream`, `Client::stream_events`, and
`Client::stream_arrow` are no longer cfg-gated. Only `collect_parquet`
(needs `tokio::fs`) and `stream_height` (needs `reqwest_eventsource`)
stay native-only.
In `hypersync-client-wasm`:
* Adds `client.get(query)` returning a JS object of decoded simple
types (Block / Transaction / Log / Trace) with bigint number
serialization.
* Adds `client.stream_arrow(query, config?)` returning an `ArrowStream`
handle whose `next()` yields one chunk at a time, terminating with
`undefined` when the stream is exhausted.
* Adds a JS smoke test (`tests/js/stream.test.mjs`) that streams a
block range with `concurrency: 4` and verifies multiple chunks plus
monotonic `next_block` advance.
Co-authored-by: claude <noreply@anthropic.com>
Adds tests/js/bench.mjs that drives the same workload through both the
wasm client and the native napi-rs binding and prints a side-by-side
table:
- cold get(): first-call latency
- warm get(): median + min over BENCH_ITERATIONS calls
- stream(): total time to drain a 2000-block range with
concurrency=8, plus chunk + log row counts
- bundle size: raw on-disk sizes for the wasm .wasm + .js shim
and each native platform .node binary
Both clients use the same SerializationFormat (CapnProto with query
caching, the library default) and the same field selection. Query
shapes differ — the native client takes camelCase JS objects, the wasm
client takes Rust-side snake_case via serde-wasm-bindgen — so the
benchmark passes a target-specific copy.
Run:
wasm-pack build --target nodejs --release
cd tests/js && npm install
ENVIO_API_TOKEN=... npm run bench
README updated with the bench instructions.
Co-authored-by: claude <noreply@anthropic.com>
Two fixes from running the JS smoke test for real:
1. `tokio::time::sleep` panics on wasm32-unknown-unknown — tokio's
internal `Instant::now()` falls back to `std::time::Instant::now()`,
which is unimplemented and panics. Symptom in node was:
std::time::Instant::now → core::panicking::panic_fmt
when the retry backoff fired after a non-success HTTP response.
Adds a `sleep_compat` helper that uses `tokio::time::sleep` on native
and `gloo_timers::future::TimeoutFuture` (a `setTimeout` wrapper) on
wasm. Replaces all five non-SSE `tokio::time::sleep` call sites in
`lib.rs`. Also drops the `time` feature from the wasm-only tokio dep
since we no longer use it there.
2. `query.test.mjs` was asserting `client.url === HYPERSYNC_URL`, but
`url::Url::to_string()` always normalizes to a trailing slash on the
authority (`https://eth.hypersync.xyz` → `.../`). Loosened the test
to accept either form.
After these, the wasm test gets all the way through query serialization,
HTTP POST, retry-on-403, response parsing — full pipeline executes
without panicking.
Co-authored-by: claude <noreply@anthropic.com>
Adds three pieces driven by running the JS tests for real:
1. Wasm `Decoder` (new). Wraps `hypersync_client::Decoder` for JS,
matching the @envio-dev/hypersync-client API shape:
const decoder = Decoder.from_signatures([sig]);
const decoded = decoder.decode_logs(logs);
Output entries are `null` (no match) or `{ indexed, body }`, where
each value is `{ val }` with `val: boolean | bigint | string |
Array<{val}>`. Built on top of `js_sys::Reflect`/`BigInt` to mirror
the napi binding's exact wire shape so JS code can swap clients.
2. tests/js/decode-bench.mjs (new). Compute-only benchmark — no
network. Decodes the same in-memory ERC20 Transfer batch through
both clients and reports:
* Decoder construction cost
* decode_logs(batch=1000) median + min + per-log µs + logs/s
* single-log latency (boundary cost)
* bundle size: dev .wasm, release .wasm, native .node
Sample numbers from the sandbox (Node 22 on linux-x64):
batch decode: wasm 48 ms vs native 6 ms → 7.85×
single decode: wasm 0.054 ms vs native 0.007 ms → 7.90×
wasm dev .wasm 25 MB
wasm release 10 MB (no wasm-opt) / 6.76 MB (with wasm-opt, on user host)
native linux .node 18 MB
Headline: wasm decode is ~8× slower per-log, but the release wasm
bundle is roughly a third of the native binary's size after
wasm-opt.
3. demo/index.html (new). Static browser page that loads the wasm
directly via `wasm-pack build --target web --out-dir demo/pkg`. UI
for: get_height/get_chain_id, get_arrow + apache-arrow JS decode,
get (decoded simple types), and an in-memory Decoder bench. No
server, no proxy, no native deps. demo/README.md documents how to
build + serve.
Also updates the main README to point at both benches and the demo, and
gitignores `pkg-web/` and `demo/pkg/`.
Co-authored-by: claude <noreply@anthropic.com>
Slide-style markdown (Marp-compatible) covering:
- Platforms the napi-rs native client cannot reach (browsers,
Cloudflare Workers, Vercel Edge, Fastly Compute, browser
extensions, Electron renderer, React Native, Windows, 32-bit
ARM, Deno/Bun edges, WASI hosts) and what wasm unlocks for each.
- Honest perf table: wasm trails native on CPU-bound decode and
parallel streaming, but is dominated by network I/O for
interactive workloads where the gap doesn't matter.
- Bundle size: ~15 MB across 5 native binaries vs. one ~1.5 MB
`.wasm` (release + wasm-opt) usable everywhere.
- Architectural note: same Rust crate, cfg-gated dependencies.
- Pointers to the demo and the bench harness for live numbers.
Co-authored-by: claude <noreply@anthropic.com>
|
Important Review skippedDraft detected. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
Tip 💬 Introducing Slack Agent: The best way for teams to turn conversations into code.Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.
Built for teams:
One agent for your entire SDLC. Right inside Slack. Comment |
No description provided.