Skip to content

Voice (PTT): typing during the finalize window drops the dictated transcript #3896

Description

@jay-tau

Summary

When using push-to-talk (PTT) voice dictation (hold the trigger key, speak, release), if I start typing before the final transcript lands, my entire dictation is lost. The grey "preview" text disappears and is replaced by whatever I type.

Steps to reproduce

  1. Place the cursor in the prompt input.
  2. Hold the voice push-to-talk trigger, speak a sentence (the live transcription appears as grey/dim preview text), then release the key.
  3. Immediately — before the text turns solid/committed (white) — type any character.

Expected

The pending dictation should be committed first (the grey preview becomes real text), and my typed character should be appended after it. I should not lose what I dictated.

Actual

The grey preview text is cleared and the finalized transcript is dropped entirely. Only the character I typed remains. The whole dictation is lost.

Root cause (from inspecting the bundled app.js, v1.0.63)

The voice reactor hook (ycr) drives a small state machine: idle → recording → finalizing → idle.

  • During recording, streaming partial transcripts are shown via the input controller's setPreview(...) — this is the grey/dim preview text.

  • Releasing the PTT key calls finishSession({ commit: true }) and moves to the finalizing state, while the speech engine asynchronously drains audio and returns the final transcript.

  • The keyboard handler only intercepts/commits a keystroke when mode === "dictation" && state === "recording":

    if (Fe && Fe.mode === "dictation" && x.current === "recording" && (he.length > 0 || ...))
        return fe(), true; // commit + swallow the key

    In PTT mode, and during the finalizing window, this guard is false, so the keystroke is not intercepted — it falls through into the input box and mutates the buffer.

  • When finishSession resolves, the result handler (ae) clears the preview and, for PTT specifically, performs a strict snapshot check against the recording anchor:

    if (he.mode === "ptt") {
      if (Ge.text !== he.anchor.before + he.anchor.after || Ge.cursorPosition !== he.anchor.pos) {
        // logs: "ptt: snapshot diverged after release; dropped transcript"
        return; // <-- DROPS the transcript
      }
      Ge.insertInput(...);
    } else {
      // dictation (toggle) mode: on divergence it still inserts at the current cursor (does NOT drop)
    }

Because the user typed during the finalizing window, the input no longer matches the anchor, so the PTT branch hits the divergence guard and drops the transcript. Dictation (toggle) mode does not drop on divergence — it recovers by inserting at the current cursor. This asymmetry is the bug: PTT loses the dictation, toggle mode does not.

Suggested fixes (either would resolve it)

  1. Queue keystrokes during finalizing: while a session is finalizing, hold incoming keystrokes until the transcript is inserted (mirrors how dictation-while-recording already swallows the first keystroke to commit), then apply them. This commits the grey preview before the typed text.
  2. Make PTT recover like dictation: on snapshot divergence, insert the committed transcript at the current cursor position instead of dropping it. This is the smaller change and preserves the dictation in every case.

Environment

  • GitHub Copilot CLI v1.0.63
  • Platform: Linux
  • Voice mode: push-to-talk (hold-to-talk)

Metadata

Metadata

Assignees

No one assigned

    Labels

    area:input-keyboardKeyboard shortcuts, keybindings, copy/paste, clipboard, mouse, and text input

    Type

    No fields configured for Bug.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions