Refine timestamps in spans and recording alignment#982
Conversation
🦋 Changeset detectedLatest commit: 2fe2557 The changes in this PR will be included in the next version bump. This PR includes changesets to release 18 packages
Not sure what this means? Click here to learn what changesets are. Click here if you're a maintainer who wants to add another changeset to this PR |
|
Note Other AI code review bot(s) detectedCodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review. 📝 WalkthroughWalkthroughThe changes introduce precise timestamp tracking and OpenTelemetry context propagation throughout the voice agent system. They add support for explicit span start times, rejection tracking in futures, and implement event-driven timing for audio playback and speech events to improve span accuracy and recording alignment. Changes
Sequence DiagramsequenceDiagram
participant SpeechDet as Speech Detection
participant AgentAct as Agent Activity
participant Tracer as OTEL Tracer
participant AudioOut as Audio Output
participant Playback as Playback System
SpeechDet->>SpeechDet: Detect speech start<br/>(VAD event)
SpeechDet->>SpeechDet: Compute startTime =<br/>now - speechDuration
SpeechDet->>Tracer: startSpan("user_turn",<br/>{startTime})
AgentAct->>AgentAct: Capture OTEL context
AgentAct->>AgentAct: Update agent state<br/>with context & timing
AgentAct->>Tracer: startSpan("agent_speaking",<br/>{startTime, context})
AudioOut->>AudioOut: Begin playback
AudioOut->>Playback: Register onPlaybackStarted<br/>listener
Playback->>Playback: Start audio output
Playback->>AudioOut: Emit PLAYBACK_STARTED<br/>event (createdAt)
AudioOut->>AudioOut: onPlaybackStarted(createdAt)
AudioOut->>AudioOut: Emit PlaybackStartedEvent<br/>with timestamp
AudioOut->>Tracer: Spans reference<br/>precise timestamps
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~35 minutes Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
📜 Recent review detailsConfiguration used: Organization UI Review profile: CHILL Plan: Pro 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
✏️ Tip: You can disable this entire section by setting Comment |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 2eb8d02b56
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 8f38e2c44b
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Fix all issues with AI agents
In `@agents/src/voice/agent_activity.ts`:
- Around line 640-646: onStartOfSpeech computes speechStartTime by subtracting
VADEvent.speechDuration from Date.now() but speechDuration is in seconds while
Date.now() is milliseconds; update the subtraction in onStartOfSpeech to convert
ev.speechDuration to milliseconds (multiply by 1000) before subtracting, so the
timestamp passed to this.agentSession._updateUserState('speaking', ...) is
correct.
In `@agents/src/voice/recorder_io/recorder_io.ts`:
- Around line 693-711: captureFrame sets _startedWallTime and
_lastSpeechStartTime unconditionally while only pushing frames into accFrames
when this.recorderIO.recording is true; move the initialization of
_startedWallTime and _lastSpeechStartTime so they only occur when recording is
active (i.e., inside the same this.recorderIO.recording branch that pushes into
accFrames) to ensure timestamps align with when frames are actually recorded,
leaving the await this.nextInChain.captureFrame and await super.captureFrame
calls unchanged.
🧹 Nitpick comments (2)
agents/src/voice/agent_activity.ts (2)
1229-1231: Consider logging the actual error for debugging purposes.The catch handler assumes the rejection is always due to cancellation, but other errors might occur. Logging the error would help with debugging unexpected failures.
♻️ Suggested improvement
textOut.firstTextFut.await .then(() => onFirstFrame()) - .catch(() => this.logger.debug('firstTextFut cancelled before first frame')); + .catch((e) => this.logger.debug({ error: e }, 'firstTextFut rejected before first frame'));
1686-1697: Consider extracting the duplicate filtering logic.This filtering logic is duplicated at lines 1486-1493. While acceptable, extracting to a helper function would reduce duplication.
📜 Review details
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
agents/src/voice/agent_activity.tsagents/src/voice/recorder_io/recorder_io.ts
🧰 Additional context used
📓 Path-based instructions (3)
**/*.{ts,tsx,js,jsx}
📄 CodeRabbit inference engine (.cursor/rules/agent-core.mdc)
Add SPDX-FileCopyrightText and SPDX-License-Identifier headers to all newly added files with '// SPDX-FileCopyrightText: 2025 LiveKit, Inc.' and '// SPDX-License-Identifier: Apache-2.0'
Files:
agents/src/voice/recorder_io/recorder_io.tsagents/src/voice/agent_activity.ts
**/*.{ts,tsx}?(test|example|spec)
📄 CodeRabbit inference engine (.cursor/rules/agent-core.mdc)
When testing inference LLM, always use full model names from
agents/src/inference/models.ts(e.g., 'openai/gpt-4o-mini' instead of 'gpt-4o-mini')
Files:
agents/src/voice/recorder_io/recorder_io.tsagents/src/voice/agent_activity.ts
**/*.{ts,tsx}?(test|example)
📄 CodeRabbit inference engine (.cursor/rules/agent-core.mdc)
Initialize logger before using any LLM functionality with
initializeLogger({ pretty: true })from '@livekit/agents'
Files:
agents/src/voice/recorder_io/recorder_io.tsagents/src/voice/agent_activity.ts
🧬 Code graph analysis (1)
agents/src/voice/agent_activity.ts (2)
agents/src/vad.ts (1)
VADEvent(24-56)agents/src/llm/chat_context.ts (1)
FunctionCallOutput(284-350)
🔇 Additional comments (12)
agents/src/voice/agent_activity.ts (6)
7-7: LGTM!The import alias
otelContextforcontextis clear and helps distinguish OpenTelemetry context from other context references in the codebase.
1174-1175: LGTM!Good pattern for capturing the OTel context at task entry and propagating it through
onFirstFrameto_updateAgentState. This ensures accurate span parent-child relationships across async boundaries.Also applies to: 1220-1225
1486-1493: LGTM!Good fix to prevent duplicate
FunctionCallentries in session history. The filtering ensures onlyFunctionCallOutputitems are added here sinceFunctionCallitems were already added byonToolExecutionStarted.
1517-1520: LGTM!Good naming improvement using the
InSsuffix to explicitly indicate the unit is seconds, addressing previous feedback about unit clarity.
1318-1319: LGTM!Consistent application of the OTel context capture and first-frame callback patterns in
_pipelineReplyTaskImpl.Also applies to: 1419-1424, 1436-1438, 1443-1445
1765-1766: LGTM!Consistent implementation of OTel context capture and first-frame handling in
_realtimeGenerationTaskImpl.Also applies to: 1804-1808, 1896-1903
agents/src/voice/recorder_io/recorder_io.ts (6)
125-129: LGTM!Passing the last speech end time to
takeBufenables proper alignment between input and output recordings.
139-152: LGTM!Correct logic for returning the minimum of input/output start times, with proper handling of undefined cases.
562-600: LGTM!Good improvements to playback finish handling:
- Properly handles pause state when calculating finish time
- Clamps playback position to actual speech window
- Tracks last speech timing for future padding decisions
- Logs warning when speech start time is missing
603-621: LGTM!Good adoption of the
InSsuffix convention for variables representing seconds. This makes the code much easier to reason about and addresses previous feedback about unit clarity.
731-735: LGTM!Updated
createSilenceFrameto usedurationInSparameter name, consistent with the seconds-based naming convention used throughout the file.
680-685: LGTM!Properly appends trailing silence to the buffer when needed, with correct ms-to-seconds conversion.
✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.
|
|
||
| export interface PlaybackFinishedEvent { | ||
| // How much of the audio was played back | ||
| /** How much of the audio was played back, in seconds */ |
There was a problem hiding this comment.
@lukasIO I'm going to keep the naming of playbackPositon for this PR. Otherwise, if will trigger a lot of renamings to playbackPositionInS, which I will do in a different PR.
There was a problem hiding this comment.
makes sense, the comment is already helpful, thank you!
Summary
This PR ports the Python PR #4131 (AGT-2316) to TypeScript, refining timestamp accuracy for telemetry spans and improving recording alignment.
Changes
Telemetry Timestamp Accuracy
speechDurationfrom detection time, rather than recording when VAD triggeredstartTimeparameter support totracer.startSpan()to allow backdating spansRecording Alignment
recorder_io.ts: Added_lastSpeechEndTimeand_lastSpeechStartTimetracking for proper audio alignmenttakeBuf()now supportspadSinceparameter to prepend silence frames when neededEvent Propagation
PlaybackStartedEventinterface andEVENT_PLAYBACK_STARTEDconstant toio.tsParticipantAudioOutputnow emitsplaybackStartedevent when first audio frame is capturedgeneration.tslistens for playback events to resolvefirstFrameFutwith accurate timestampOTel Context Propagation
_agentTurnContexttoSpeechHandleto maintain proper span hierarchyBug Fix: Duplicate Tool Calls
FunctionCallentries in session history by filteringtoolsMessagesto only addFunctionCallOutputitems (sinceFunctionCallitems are already added byonToolExecutionStarted)Utilities
rejectedproperty toFutureclass to check if a future was rejectedFiles Changed
telemetry/traces.tsstartTimetoStartSpanOptions, pass directly to OTel SDKvoice/io.tsPlaybackStartedEvent,EVENT_PLAYBACK_STARTED,onPlaybackStarted()voice/room_io/_output.tsplaybackStartedon first frame capturevoice/generation.tsplaybackStarted, resolvefirstFrameFutwith timestampvoice/audio_recognition.tsspeechDurationvoice/agent_session.tsstartTimeandotelContextto state update methodsvoice/agent_activity.ts_agentTurnContext, fix duplicate tool callsvoice/speech_handle.ts_agentTurnContextpropertyvoice/recorder_io/recorder_io.tsutils.tsrejectedgetter toFutureclassTesting
Summary by CodeRabbit
New Features
Chores
✏️ Tip: You can customize this high-level summary in your review settings.