Request
Add support for the recently released NVIDIA Nemotron 3.5 ASR streaming model:
Why this model
Nemotron 3.5 ASR is a 600M-parameter streaming ASR model with native streaming support via a Cache-Aware FastConformer-RNNT architecture. Key properties relevant to eddy-audio:
- Streaming-first design: Configurable chunk sizes (80ms, 160ms, 320ms, 560ms, 1120ms) with cache-aware processing that avoids redundant overlapping computation — directly relevant to low-latency edge inference.
- Multilingual: 40 language-locales from a single model via language-ID prompt conditioning, with optional automatic language detection. This is broader than Parakeet V3 (24 languages) or Whisper large-v3-turbo.
- Same model family as Parakeet: Uses FastConformer-RNNT, an architecture eddy already supports via the OpenVINO backend for Parakeet TDT. The decoder/tokenizer integration path is partially established.
- Edge-relevant: Designed for voice-agent low-latency streaming workloads, which is eddy's target use case.
Suggested scope
- Export the model to OpenVINO IR (as done for Parakeet V2/V3 and Whisper), or evaluate whether the streaming cache-aware variant requires a custom export path.
- Add a model variant (e.g.
nemotron-3.5-asr) to hf_fetch_models and the model registry.
- Implement the streaming chunk interface in the C++ API (eddy currently appears to operate on whole WAV files; streaming would be a new capability).
- Benchmark on Intel Core Ultra NPU and CPU, consistent with existing BENCHMARK_RESULTS.md.
Notes
- This aligns with the "Additional audio model support" roadmap item.
- The streaming capability would be a new dimension for eddy (currently batch/whole-file). Worth scoping whether streaming lands as part of this issue or as a separate prerequisite.
- Language-ID prompt conditioning and optional auto language detection are features not present in current Parakeet/Whisper integrations.
Request
Add support for the recently released NVIDIA Nemotron 3.5 ASR streaming model:
Why this model
Nemotron 3.5 ASR is a 600M-parameter streaming ASR model with native streaming support via a Cache-Aware FastConformer-RNNT architecture. Key properties relevant to eddy-audio:
Suggested scope
nemotron-3.5-asr) tohf_fetch_modelsand the model registry.Notes