Skip to content

feat(Azure STT): add an option to use the lexical form of the transcription#4350

Open
tarekasishm wants to merge 6 commits intolivekit:mainfrom
tarekasishm:feat/azure-add-lexical-result-option
Open

feat(Azure STT): add an option to use the lexical form of the transcription#4350
tarekasishm wants to merge 6 commits intolivekit:mainfrom
tarekasishm:feat/azure-add-lexical-result-option

Conversation

@tarekasishm
Copy link
Copy Markdown

This PR introduces optional support for returning Azure Speech-to-Text results in lexical format.

What’s new

  • A new flag has been added to STTOptions to control whether the Azure STT plugin returns lexical or normalized text.
  • The option defaults to false, so the current behavior remains unchanged.
  • When enabled, the STT plugin will return Azure’s lexical form directly in the transcription result.

🔄 Backward compatibility

  • This change is fully backward-compatible.
  • Existing users will see no behavior change unless the new option is explicitly enabled.

🧩 Motivation

Some downstream use cases (e.g. custom NLU pipelines, post-processing, or domain-specific text handling) require access to the raw lexical transcription provided by Azure, rather than the normalized output. This change makes that possible without affecting existing integrations.

⚙️ Usage

  • The new option is exposed as an additional field in STTOptions.
  • Default behavior remains identical to the current implementation.
  • Enabling the option switches the Azure STT response to lexical format.

@davidzhao
Copy link
Copy Markdown
Member

thanks for the PR, could you run ruff format . && ruff check --fix . and rebase from main? it'd be great to get all the CI passing

detailed_result = json.loads(evt.result.json)
lexical = detailed_result.get("NBest", [{}])[0].get("Lexical", None)
except Exception as e:
pass
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

exception should be logged at a minimum

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done! Thank you for the comment

detailed_result = json.loads(evt.result.json)
lexical = detailed_result.get("NBest", [{}])[0].get("Lexical", None)
except Exception as e:
pass
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same comment here

Copy link
Copy Markdown
Member

@theomonnom theomonnom left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks

profanity: NotGivenOr[speechsdk.enums.ProfanityOption] = NOT_GIVEN
phrase_list: NotGivenOr[list[str] | None] = NOT_GIVEN
explicit_punctuation: bool = False
lexical_output: bool = False
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lexical_output isn't exposed since STTOptions is private.

Can you add it to the constructor?
How did you test the PR?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants