feat: Support NVIDIA Nemotron 3.5 ASR streaming model

## Request

Add support for the recently released NVIDIA Nemotron 3.5 ASR streaming model:

- Model: [nvidia/nemotron-3.5-asr-streaming-0.6b](https://huggingface.co/nvidia/nemotron-3.5-asr-streaming-0.6b)
- Released: 2026-06-04
- License: Commercial use permitted

## Why this model

Nemotron 3.5 ASR is a 600M-parameter streaming ASR model with native streaming support via a Cache-Aware FastConformer-RNNT architecture. Key properties relevant to eddy-audio:

- **Streaming-first design**: Configurable chunk sizes (80ms, 160ms, 320ms, 560ms, 1120ms) with cache-aware processing that avoids redundant overlapping computation — directly relevant to low-latency edge inference.
- **Multilingual**: 40 language-locales from a single model via language-ID prompt conditioning, with optional automatic language detection. This is broader than Parakeet V3 (24 languages) or Whisper large-v3-turbo.
- **Same model family as Parakeet**: Uses FastConformer-RNNT, an architecture eddy already supports via the OpenVINO backend for Parakeet TDT. The decoder/tokenizer integration path is partially established.
- **Edge-relevant**: Designed for voice-agent low-latency streaming workloads, which is eddy's target use case.

## Suggested scope

1. Export the model to OpenVINO IR (as done for Parakeet V2/V3 and Whisper), or evaluate whether the streaming cache-aware variant requires a custom export path.
2. Add a model variant (e.g. `nemotron-3.5-asr`) to `hf_fetch_models` and the model registry.
3. Implement the streaming chunk interface in the C++ API (eddy currently appears to operate on whole WAV files; streaming would be a new capability).
4. Benchmark on Intel Core Ultra NPU and CPU, consistent with existing BENCHMARK_RESULTS.md.

## Notes

- This aligns with the "Additional audio model support" roadmap item.
- The streaming capability would be a new dimension for eddy (currently batch/whole-file). Worth scoping whether streaming lands as part of this issue or as a separate prerequisite.
- Language-ID prompt conditioning and optional auto language detection are features not present in current Parakeet/Whisper integrations.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: Support NVIDIA Nemotron 3.5 ASR streaming model #8

Request

Why this model

Suggested scope

Notes

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

feat: Support NVIDIA Nemotron 3.5 ASR streaming model #8

Description

Request

Why this model

Suggested scope

Notes

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions