This is critical for server usecases, to improve concurrency on GPU. Faster Whisper added SYSTRAN/faster-whisper#856. For live use case this means multiple concurrent transcriptions can happen on the same GPU/model. This will significantly lower costs of deployment.
Is there a way to do this now? Or is this something on the roadmap?
Thanks!