Skip to content

Conversation

@pennycoders
Copy link
Contributor

@pennycoders pennycoders commented Aug 2, 2025

Summary

This PR introduces bidirectional audio support to JetKVM, enabling both audio output (listening to the managed device) and audio input (microphone from browser to device). Audio is implemented using an in-process CGO architecture that directly calls C code for ALSA audio capture/playback and Opus encoding/decoding. The managed device presents itself as a USB Audio Class 1 (UAC1) gadget providing both stereo speakers and a stereo microphone interface over USB.

Key Features:

  • Bidirectional stereo audio (48kHz, 16-bit, 2 channels)
  • In-process CGO implementation for low latency and simplicity
  • USB Audio Gadget (UAC1) integration
  • WebRTC-based real-time streaming with Opus codec
  • Frontend controls for enabling/disabling audio output and input
  • HDMI or USB audio capture source selection
  • SDP munging for proper stereo audio support in browsers

Credits

Thanks!
Alex

@CLAassistant
Copy link

CLAassistant commented Aug 2, 2025

CLA assistant check
All committers have signed the CLA.

@pennycoders pennycoders changed the title JetKVM Advanced, CGO Audio Support JetKVM Advanced, CGO-based Audio Support Aug 2, 2025
@adamshiervani adamshiervani added this to the 0.5.0 milestone Aug 4, 2025
@adamshiervani adamshiervani moved this to Backlog in JetKVM Aug 4, 2025
@adamshiervani adamshiervani moved this from Backlog to In progress in JetKVM Aug 4, 2025
@adamshiervani adamshiervani moved this from In progress to In review in JetKVM Aug 4, 2025
@adamshiervani adamshiervani moved this from In review to In progress in JetKVM Aug 4, 2025
@adamshiervani adamshiervani moved this from In progress to In Review in JetKVM Aug 4, 2025
@adamshiervani adamshiervani mentioned this pull request Aug 4, 2025
3 tasks
@pennycoders
Copy link
Contributor Author

Great news! I'll soon update this PR with Audio Input pass-through functionality too

@adamshiervani adamshiervani linked an issue Aug 4, 2025 that may be closed by this pull request
@pennycoders pennycoders changed the title JetKVM Advanced, CGO-based Audio Support JetKVM Advanced, CGO-based 2-way Audio Support Aug 4, 2025
@IDisposable
Copy link
Contributor

This is amazing!

Would it be possible to forward the audio channel on device's input HDMI to the browser?

By this I mean that if I set my host/controlled device's audio output to the JetKVM virtual monitor then the sound is going to be coming in the HDMI stream, which might be possible to extract (I know nothing about that hardware), so we could have the host-audio come through without an additional (virtual) audit device.

image

@pennycoders
Copy link
Contributor Author

pennycoders commented Aug 8, 2025

Hi @IDisposable

Glad you like this functionality, mainly to free up as much of that USB bandwidth. I'm actually looking at this, however, it is a little trickier as it moat likely requires changes in the rv1106-system repo containing the OS too.

In case I do manage to pull that off before the v0.5.0 release for which this functionality has been scheduled, I'll update this PR.

Thanks,
Alex

@vvns
Copy link

vvns commented Aug 10, 2025

JetKVM Audio PR Review & Test Feedback

Hi @pennycoders ,

First, thanks for the work on bringing audio and mic support into JetKVM.

I’ve tested the new functionality in a local LAN environment with both playback and microphone streaming active, including during real-world scenarios like a Teams call.

Test conditions:

  • Setup: Wired LAN, low network latency, tested with a headset mic.
  • Modes tested: Low, Medium, High, Ultra for both playback and mic.

Main observations:

  1. Mic quality constant across modes

    • Microphone stream sounds the same in all modes.
    • Quality is acceptable but not “HD” — there is a constant background noise floor, even with a headset mic on a clean LAN.
  2. Ultra playback distortion

    • In Ultra mode only, playback sometimes has a warped/buzzy/distorted effect.
    • Low/Medium/High playback modes sound good and consistent.
  3. Latency when mic is active

    • Mouse and keyboard control become noticeably less responsive whenever the mic stream is active, even on a low-latency LAN connection.
    • Likely due to video/control WebSocket traffic competing with audio packets on the same channel.
  4. Packet loss

    • Playback drop rate: ~22%
    • Mic drop rate: ~13%
    • Loss observed despite no network congestion, pointing to buffering or scheduling bottlenecks.

Potential Improvements (Technical):

  1. Separate transport channels

    • Move audio to a dedicated WebSocket endpoint (e.g. /ws/audio) or use WebRTC for audio transport.
    • Prevents video/control from being delayed by audio bursts.
  2. Opus tuning exposure

    • Make parameters adjustable via UI or JSON config:
      • bitrate, frame size, complexity
      • FEC, DTX, VBR/CBR
    • Lets users balance latency, quality, and bandwidth.
  3. ALSA parameter control

    • Expose period_time and buffer_time for fine-tuning latency vs underrun protection.
  4. Queue management

    • Use a bounded audio frame queue with drop-oldest to prevent latency spikes when encoding falls behind.
  5. Noise reduction & echo cancellation

    • Integrate RNNoise or WebRTC AEC/NS for mic clarity.
    • Even simple high-pass filtering can reduce constant hum.
  6. Thread/process separation

    • Run audio encode/decode in its own goroutine/process to isolate timing from video/control.

Happy to re-run these tests and provide before/after metrics once adjustments are implemented.
This PR is already a big step forward, and with these improvements, we could get low-latency, clean mic audio without impacting remote control responsiveness.

@pennycoders
Copy link
Contributor Author

Hi @vvns

Thanks! Thank you very much for putting this through its paces! This is great feedback, that I can definitely work with. I initially encountered the interference with the Keyboard & Mouse that you are mentioning and made some optimizations. Do you happen to know the commit hash you've tested at? Is it the latest version of my branch? I am asking because I've tested actual calls with the latest implementation and was definitely usable.

I will break down into your feedback and see what I can do about each of the items.

If you want we can discuss more in-depth on other channels too.

Thanks,

Alex

@vvns
Copy link

vvns commented Aug 10, 2025

Hi @pennycoders ,

Glad the feedback is useful! 👍
I’ve confirmed that my tests were run on the latest commit at the time — 5f905e7 from your feat/audio-support branch — so the results I reported already include your most recent optimizations.

The version is indeed usable, but in slightly more demanding conditions (e.g., during calls or with sustained mic usage) the remote control latency — which was very low before the audio feature — increases significantly, to the point where slow mouse movement becomes noticeably delayed.

I can still retest to be sure nothing was missed, but the latency impact with mic active, packet loss, and Ultra mode distortion were all observed on that commit.

I’m happy to continue sharing feedback as you push further updates, so we can iterate quickly toward the best possible audio and control experience. Let me know which channel you’d prefer for more direct discussion, so you can share any details privately if needed.

Thanks again for the great work — we’re close to a fully smooth audio + control experience.

@pennycoders
Copy link
Contributor Author

Hi @vvns,

Are you on the JetKVM Discord?

If so, we can discuss there. What's your username?

Thanks,
Alex

Copy link
Contributor

@IDisposable IDisposable left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This really looks nice, all my comments are questions or nits, just feel free to ignore... I wonder if we need to be more explicit in the priority assignment of the other RTC channels (medai/serial/rpc) as we really want to ensure the control signals get through at very high fidelity ... might even be worthwhile splitting up the RPC messages into control vs. advisory messaging, but that's not this PR :)

@am-zed
Copy link

am-zed commented Aug 23, 2025

Audio works in Firefox and Chrome, but clicking on Audio button in Chrome throws error "Cannot read properties of undefined (reading 'addEventListener')":

image

@pennycoders
Copy link
Contributor Author

pennycoders commented Aug 24, 2025

Audio works in Firefox and Chrome, but clicking on Audio button in Chrome throws error "Cannot read properties of undefined (reading 'addEventListener')":

image

Hi! I am actually actually developing and testing this using Chrome. You mean the Audio button in the Actions bar (The top menu), right? How did you deploy the feat/audio-support branch to your JetKVM? Also, what is your Chrome version? Is there something particular about your networking setup, such as WebSockets or WebRTC being blocked? Any funkier chrome extensions installed? Can you please try again with the latest version of the branch please?

Thanks!

- Delete unused audio_common.c and audio_common.h (237 lines of dead code)
- Remove redundant encoder/decoder pointer comparisons (mutex already held)
- Remove unused err variables in encode/decode hot paths
- Fix OPUS_BANDWIDTH constant from 1104 to 1105 (fullband 20kHz vs superwideband 12kHz)
- Fix PacketLossPerc default from 0 to 20 to match C layer default
- Add early return in FEC recovery when pcm_frames <= 0 to avoid zero-frame ALSA writes
- Remove unused InputRelay fields (source, ctx, cancel) and unused NewInputRelay parameter
- Add async cleanup on timeout in SetAudioOutputEnabled/SetAudioInputEnabled
- Add missing __sync_synchronize() in capture init to match playback init path
- Add error handling for os.Setenv calls with warning logs on failure
- Add logging when ALSA channel map is unavailable (assumes standard L/R order)
@pennycoders
Copy link
Contributor Author

@IDisposable @adamshiervani I've also implemented dynamic resampler updates. In order for that feature to be supported, this PR has to be merged first (the audio does work without it too, though, defaulting to 48000KHz source sample rate): jetkvm/rv1106-system#50

Please review, test and let me know. @SuperKali has been of great help turning this on all sides (he is still seeing some issues with an armbian SBC, which I was unable to reproduce - basically the audio not sounding right when the SBC outputs to 44.1KHz - we are still tracking that down). USB Audio works for him as well though, and HDMI works quite well in my case too. Testing on as many hardware configurations as possible would be great.

Thanks,
Alex

- Copy opus data in ReadMessage to prevent aliasing with internal buffer
- Remove unused ctx field from OutputRelay struct
- Fix OPUS_BANDWIDTH comment accuracy (20kHz passband)
Hardware sample rate is auto-negotiated by ALSA, and frame size is derived
from it. These parameters were being passed but ignored - remove them from
update_audio_constants() and update_audio_decoder_constants() signatures.
All callers pass non-NULL pointers for actual_rate_out and actual_frame_size_out.
Sample rate detection and periodic checking is only relevant for HDMI
where the source can change rates. USB Gadget is fixed at 48kHz.
- Add CTA-861 extension block with HDMI Vendor-Specific Data Block
- Include Audio Data Block (2ch LPCM, 32/44.1/48kHz, 16/20/24-bit)
- Add Speaker Allocation and Video Capability data blocks
- Set display name to JetKVM with proper display size (71x40cm)
- EDID now passes edid-decode validation
- Use EDID 1.3 for HDMI specification compliance
- Add HDMI VSDB (OUI 00-0C-03) to enable audio on Linux/Ubuntu
- Add Display Range Limits descriptor for validation compliance
- Include standard resolutions: 1080p, 720p, 480p
- Set display name to "JetKVM v1"
- Passes edid-decode validation
@J-Bu
Copy link

J-Bu commented Dec 8, 2025

Hi,

I've started using and testing this branch. First thanks for the great work.

Audio output via USB und HDMI is working for me without any problems but I had some problems with audio input it just did not work. The problem was the default format used by Windows, it was set to 2 channels @48kHz, after changing it to 1 channels @48kHz sound input also started working.

Screenshot From 2025-12-08 10-35-21

Maybe the settings of the emulated input device can be changed to not advertise support for 2 channels format.

Test setup:

Host (device JetKVM is connected to):

  • Device: Lenovo Thinkpad P15 Gen1
  • OS: Windows 11

Client:

  • Device: Framework laptop with AMD Ryzen AI 7 350
  • OS: Linux (gentoo)
  • Browser: Firefox and Chromium

Change UAC1 gadget p_chmask from 0x01 (Left Front) to 0x04 (Center Front)
for the mono microphone endpoint.

This addresses an issue where Windows defaults to "2 channels @ 48kHz"
for the USB audio input device, even though the device advertises mono.
Users had to manually change Windows audio settings to "1 channel @ 48kHz"
for the microphone to work.

Per USB Audio Class 1.0 specification, mono streams should use the
Center Front (D2) channel position rather than Left Front (D0), as
Left Front may be interpreted as one half of a stereo pair.

The channel count remains 1 (num_channels(0x04) = 1 bit set = 1 channel),
only the spatial position metadata changes in the USB descriptor.

References:
- USB Device Class Definition for Audio Devices 1.0, Section 3.7.2.3
- Silicon Labs AN295: USB Audio Class Tutorial
  https://www.silabs.com/documents/public/application-notes/AN295.pdf
- https://stackoverflow.com/questions/23519753/usb-audio-descriptor
@pennycoders
Copy link
Contributor Author

Hi,

I've started using and testing this branch. First thanks for the great work.

Audio output via USB und HDMI is working for me without any problems but I had some problems with audio input it just did not work. The problem was the default format used by Windows, it was set to 2 channels @48kHz, after changing it to 1 channels @48kHz sound input also started working.

Screenshot From 2025-12-08 10-35-21 Maybe the settings of the emulated input device can be changed to not advertise support for 2 channels format.

Test setup:

Host (device JetKVM is connected to):

  • Device: Lenovo Thinkpad P15 Gen1
  • OS: Windows 11

Client:

  • Device: Framework laptop with AMD Ryzen AI 7 350
  • OS: Linux (gentoo)
  • Browser: Firefox and Chromium

Hi, can you please try it now, without updating anything on the windows side?

Thanks,
Alex

@J-Bu
Copy link

J-Bu commented Dec 8, 2025

Yes seems to work now, 1 channels @48kHz is the only option now:

Screenshot From 2025-12-08 15-42-17

The TC358743 HDMI receiver stops I2S clocks during silence periods,
causing corrupted samples (isolated ±32767 spikes) when clocks restart.
This manifests as audible clicks/pops during quiet audio passages.

Add NEON-optimized glitch filter that:
- Detects extreme values (>±32000) surrounded by low-amplitude neighbors
- Replaces glitches with interpolated values from adjacent samples
- Uses SIMD fast-path to skip clean audio chunks with zero overhead
- Only runs for HDMI capture (USB audio unaffected)

The filter processes 16 samples per iteration using ARM NEON intrinsics,
resulting in ~0.005% CPU overhead on Cortex-A7 at 1.2GHz.
@pennycoders
Copy link
Contributor Author

Hi @IDisposable, @adamshiervani, @ym,

I've finally managed to address the occasional HDMI audio artifacts (clicks/pops during quiet passages). The root cause is two-fold: ACR PLL instability during silence transitions, and I2S clock-stop behavior generating corrupted samples.

The fix is two-fold:

  • Kernel driver (Improve TC358743 HDMI audio quality rv1106-system#50): Disables automatic audio muting to prevent ACR PLL unlock during silence, tightens ACR clock tolerance from 976-3906 PPM to 122-244 PPM (8-32x better), increases divider
    settling time, and enables dynamic sample rate detection for non-48kHz sources

  • Software filter (commit f39a838): NEON-optimized filter that detects remaining I2S glitches (isolated ±32767/±16383 spikes during silence) and removes them via linear interpolation. Zero overhead for clean audio.

Once both are merged, HDMI audio quality should be greatly improved. Would be great if you could test the combination as well.

Thanks,
Alex

The script was copied but never executed, causing Docker-based builds
(via dev_deploy.sh) to fail due to missing ALSA/Opus/SpeexDSP libraries.

Reported-by: J-Bu
@J-Bu
Copy link

J-Bu commented Dec 10, 2025

I flashed the image and app with the HDMI fixes and it is working for me. But to be fair I did also not notice any clicks/pops before those changes.

@pennycoders
Copy link
Contributor Author

I flashed the image and app with the HDMI fixes and it is working for me. But to be fair I did also not notice any clicks/pops before those changes.

Thanks! They were happening at low volume, when sound with long silence periods was playing. I've spotted it on an Ubuntu 24.04 laptop.

It was one of those things driving me nuts because I couldn't figure it out... until I did. Just for reference, this was the video that made it occur: https://youtu.be/ucZl6vQ_8Uo?si=HpnJRBdnBJFn8OeA

@IDisposable
Copy link
Contributor

Resolved the merge conflicts... we should probably delete all changes to the language files messages (except the English) and then rerun npm run i18n:machine-translate because (especially with the ZH Chinese) thing have been updated a lot since this merry journey started.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: In progress

Development

Successfully merging this pull request may close these issues.

Add sound support