Skip to content

[FEAT]: Implement OpenAI Whisper for audio-to-text transcription #499

@marcvergees

Description

@marcvergees

📝 Description

Add OpenAI Whisper audio transcription capability to FireForm, enabling users to record audio input that is automatically converted to text. The feature should include a backend API endpoint for audio processing and a frontend micro button that triggers recording, with an editable textarea for post-conversion corrections.

💡 Rationale

Audio transcription improves accessibility and user experience for form filling, particularly for users who prefer voice input or have mobility constraints. Integrating OpenAI Whisper provides accurate, reliable speech-to-text conversion that can be corrected by users before form submission.

🛠️ Proposed Solution

Implement audio transcription functionality across backend and frontend:

Backend:

  • Create new API endpoint to receive audio files
  • Integrate OpenAI Whisper API for audio-to-text conversion
  • Add audio file validation and error handling
  • Update requirements.txt with OpenAI library dependency
  • Add Whisper API key configuration to environment variables

Frontend:

  • Create micro button component to trigger audio recording
  • Implement audio recording functionality with browser's Web Audio API
  • Add textarea component to display and allow editing of transcribed text
  • Handle audio upload to backend and display loading state
  • Display transcription results in textarea for user correction
  • Integrate transcription button with existing form fields

✅ Acceptance Criteria

How will we know this is finished?

  • Backend API endpoint accepts and processes audio files
  • Whisper successfully converts audio to text with reasonable accuracy
  • Frontend micro button records audio and uploads to backend
  • Transcribed text displays in editable textarea
  • Users can modify transcribed text before submission
  • Feature works in Docker container
  • Error handling for failed transcriptions or network issues
  • Documentation updated in docs/ with Whisper setup and usage
  • Audio recording works across modern browsers (Chrome, Firefox, Safari, Edge)

📌 Additional Context

  • Consider audio format support (MP3, WAV, WebM, etc.)
  • Implement timeout for maximum recording duration
  • Add visual feedback during recording and transcription processing
  • Consider privacy implications and data retention policies for audio files

Metadata

Metadata

Assignees

No one assigned
    No fields configured for Feature.

    Projects

    Status

    No status

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions