Skip to content

Comments

Refine timestamps in spans and recording alignment#982

Merged
toubatbrian merged 13 commits intomainfrom
brian/refine-ts-recording
Jan 21, 2026
Merged

Refine timestamps in spans and recording alignment#982
toubatbrian merged 13 commits intomainfrom
brian/refine-ts-recording

Conversation

@toubatbrian
Copy link
Contributor

@toubatbrian toubatbrian commented Jan 16, 2026

Summary

This PR ports the Python PR #4131 (AGT-2316) to TypeScript, refining timestamp accuracy for telemetry spans and improving recording alignment.

Changes

Telemetry Timestamp Accuracy

  • User speech timing: Calculate accurate speech start time by subtracting speechDuration from detection time, rather than recording when VAD triggered
  • Agent speech timing: Track when audio playback actually starts (first frame captured) instead of when generation begins
  • Span start times: Added startTime parameter support to tracer.startSpan() to allow backdating spans

Recording Alignment

  • recorder_io.ts: Added _lastSpeechEndTime and _lastSpeechStartTime tracking for proper audio alignment
  • Silence padding: takeBuf() now supports padSince parameter to prepend silence frames when needed
  • Recording start time: Now returns the minimum of input/output start times for accurate alignment

Event Propagation

  • Added PlaybackStartedEvent interface and EVENT_PLAYBACK_STARTED constant to io.ts
  • ParticipantAudioOutput now emits playbackStarted event when first audio frame is captured
  • generation.ts listens for playback events to resolve firstFrameFut with accurate timestamp

OTel Context Propagation

  • Added _agentTurnContext to SpeechHandle to maintain proper span hierarchy
  • Agent state updates now pass OTel context for correct parent-child relationships

Bug Fix: Duplicate Tool Calls

  • Fixed duplicate FunctionCall entries in session history by filtering toolsMessages to only add FunctionCallOutput items (since FunctionCall items are already added by onToolExecutionStarted)

Utilities

  • Added rejected property to Future class to check if a future was rejected

Files Changed

File Changes
telemetry/traces.ts Added startTime to StartSpanOptions, pass directly to OTel SDK
voice/io.ts Added PlaybackStartedEvent, EVENT_PLAYBACK_STARTED, onPlaybackStarted()
voice/room_io/_output.ts Emit playbackStarted on first frame capture
voice/generation.ts Listen for playbackStarted, resolve firstFrameFut with timestamp
voice/audio_recognition.ts Calculate accurate speech start time with speechDuration
voice/agent_session.ts Pass startTime and otelContext to state update methods
voice/agent_activity.ts Propagate timestamps, set _agentTurnContext, fix duplicate tool calls
voice/speech_handle.ts Added _agentTurnContext property
voice/recorder_io/recorder_io.ts Added speech timing tracking, silence padding, aligned recording start
utils.ts Added rejected getter to Future class

Testing

  • Verified telemetry spans now have accurate start times
  • Confirmed no duplicate function calls in Agent Insights transcript
  • All existing tests pass

Summary by CodeRabbit

  • New Features

    • Added explicit start timestamp support for tracing spans to improve observability and timing precision of voice interactions.
    • Introduced playback start event signals for enhanced audio playback monitoring.
    • Improved audio recording and playback synchronization through refined timing and boundary alignment.
  • Chores

    • Updated test environment configuration for example applications.

✏️ Tip: You can customize this high-level summary in your review settings.

@changeset-bot
Copy link

changeset-bot bot commented Jan 16, 2026

🦋 Changeset detected

Latest commit: 2fe2557

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 18 packages
Name Type
@livekit/agents Patch
@livekit/agents-plugin-anam Patch
@livekit/agents-plugin-baseten Patch
@livekit/agents-plugin-bey Patch
@livekit/agents-plugin-cartesia Patch
@livekit/agents-plugin-deepgram Patch
@livekit/agents-plugin-elevenlabs Patch
@livekit/agents-plugin-google Patch
@livekit/agents-plugin-hedra Patch
@livekit/agents-plugin-inworld Patch
@livekit/agents-plugin-livekit Patch
@livekit/agents-plugin-neuphonic Patch
@livekit/agents-plugin-openai Patch
@livekit/agents-plugin-resemble Patch
@livekit/agents-plugin-rime Patch
@livekit/agents-plugin-silero Patch
@livekit/agents-plugins-test Patch
@livekit/agents-plugin-xai Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Jan 16, 2026

Note

Other AI code review bot(s) detected

CodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review.

📝 Walkthrough

Walkthrough

The changes introduce precise timestamp tracking and OpenTelemetry context propagation throughout the voice agent system. They add support for explicit span start times, rejection tracking in futures, and implement event-driven timing for audio playback and speech events to improve span accuracy and recording alignment.

Changes

Cohort / File(s) Summary
Telemetry and Timing Infrastructure
agents/src/telemetry/traces.ts, agents/src/utils.ts
Added optional startTime field to StartSpanOptions to support explicit span initialization timestamps. Added rejection state tracking to Future<T> with public rejected getter.
Voice Agent State Management
agents/src/voice/agent_activity.ts, agents/src/voice/agent_session.ts, agents/src/voice/speech_handle.ts
Refactored onStartOfSpeech signature to compute speechStartTime from VAD event duration. Added OpenTelemetry context capture at multiple entry points. Updated _updateAgentState and _updateUserState to accept optional timing and context options. Added internal _agentTurnContext field to SpeechHandle.
Speech Recognition Timing
agents/src/voice/audio_recognition.ts
Compute explicit startTime for user_turn spans based on detected speech duration in VAD START_OF_SPEECH events.
Audio Generation and Playback Events
agents/src/voice/generation.ts, agents/src/voice/io.ts
Changed firstFrameFut type from Future to Future<number> to capture numeric timestamps. Added PlaybackStartedEvent interface and onPlaybackStarted() method to AudioOutput with static event identifier. Wired event forwarding through audio chain.
Recorder Audio I/O Alignment
agents/src/voice/recorder_io/recorder_io.ts
Extensive timing updates: added padSince parameter to takeBuf() for silence padding at speech boundaries; updated recordingStartedAt to return minimum of wall times; added trailing silence duration calculations; updated createSilenceFrame() signature to accept duration in seconds. Introduced internal timing state tracking (_padded, _lastSpeechEndTime, _lastSpeechStartTime).
Audio Output Playback Events
agents/src/voice/room_io/_output.ts, agents/src/voice/avatar/datastream_io.ts
Added firstFrameEmitted flag to emit onPlaybackStarted() exactly once per playback cycle, resetting on playout completion or flush.
Configuration and Examples
.changeset/lazy-spies-worry.md, examples/src/drive-thru/drivethru_agent.ts, examples/src/frontdesk/frontdesk_agent.ts
Added changeset documenting patch release for timestamp refinement. Added ESLint-disable comments and conditional checks to skip CLI startup during Vitest test execution.

Sequence Diagram

sequenceDiagram
    participant SpeechDet as Speech Detection
    participant AgentAct as Agent Activity
    participant Tracer as OTEL Tracer
    participant AudioOut as Audio Output
    participant Playback as Playback System

    SpeechDet->>SpeechDet: Detect speech start<br/>(VAD event)
    SpeechDet->>SpeechDet: Compute startTime =<br/>now - speechDuration
    SpeechDet->>Tracer: startSpan("user_turn",<br/>{startTime})
    
    AgentAct->>AgentAct: Capture OTEL context
    AgentAct->>AgentAct: Update agent state<br/>with context & timing
    AgentAct->>Tracer: startSpan("agent_speaking",<br/>{startTime, context})
    
    AudioOut->>AudioOut: Begin playback
    AudioOut->>Playback: Register onPlaybackStarted<br/>listener
    
    Playback->>Playback: Start audio output
    Playback->>AudioOut: Emit PLAYBACK_STARTED<br/>event (createdAt)
    
    AudioOut->>AudioOut: onPlaybackStarted(createdAt)
    AudioOut->>AudioOut: Emit PlaybackStartedEvent<br/>with timestamp
    AudioOut->>Tracer: Spans reference<br/>precise timestamps
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~35 minutes

Suggested reviewers

  • davidzhao
  • theomonnom

Poem

🐰✨ Timestamps and context flow,
Through spans they swiftly go,
Playback events now chime with glee,
Recording times align precisely! 🎙️

🚥 Pre-merge checks | ✅ 2 | ❌ 1
❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 33.33% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title 'Refine timestamps in spans and recording alignment' accurately summarizes the main changes: improving telemetry timestamp precision and aligning audio recording timing.
Description check ✅ Passed The PR description is comprehensive, well-structured, and covers all major changes with clear sections for telemetry accuracy, recording alignment, event propagation, OTel context, bug fixes, and affected files.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings

📜 Recent review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between fef7fd0 and 2fe2557.

📒 Files selected for processing (1)
  • agents/src/voice/io.ts
🚧 Files skipped from review as they are similar to previous changes (1)
  • agents/src/voice/io.ts

✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.


Comment @coderabbitai help to get the list of available commands and usage tips.

@toubatbrian toubatbrian changed the title Refine timestamps in spans and recording alignment [AGT-2450] Refine timestamps in spans and recording alignment Jan 16, 2026
@toubatbrian toubatbrian changed the title [AGT-2450] Refine timestamps in spans and recording alignment https://linear.app/livekit/issue/AGT-2450/refine-timestamps-in-spans-and-recording-alignment Jan 16, 2026
@toubatbrian toubatbrian changed the title https://linear.app/livekit/issue/AGT-2450/refine-timestamps-in-spans-and-recording-alignment Refine timestamps in spans and recording alignment Jan 16, 2026
Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 2eb8d02b56

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@toubatbrian toubatbrian requested a review from lukasIO January 16, 2026 21:41
@toubatbrian
Copy link
Contributor Author

@codex

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 8f38e2c44b

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Fix all issues with AI agents
In `@agents/src/voice/agent_activity.ts`:
- Around line 640-646: onStartOfSpeech computes speechStartTime by subtracting
VADEvent.speechDuration from Date.now() but speechDuration is in seconds while
Date.now() is milliseconds; update the subtraction in onStartOfSpeech to convert
ev.speechDuration to milliseconds (multiply by 1000) before subtracting, so the
timestamp passed to this.agentSession._updateUserState('speaking', ...) is
correct.

In `@agents/src/voice/recorder_io/recorder_io.ts`:
- Around line 693-711: captureFrame sets _startedWallTime and
_lastSpeechStartTime unconditionally while only pushing frames into accFrames
when this.recorderIO.recording is true; move the initialization of
_startedWallTime and _lastSpeechStartTime so they only occur when recording is
active (i.e., inside the same this.recorderIO.recording branch that pushes into
accFrames) to ensure timestamps align with when frames are actually recorded,
leaving the await this.nextInChain.captureFrame and await super.captureFrame
calls unchanged.
🧹 Nitpick comments (2)
agents/src/voice/agent_activity.ts (2)

1229-1231: Consider logging the actual error for debugging purposes.

The catch handler assumes the rejection is always due to cancellation, but other errors might occur. Logging the error would help with debugging unexpected failures.

♻️ Suggested improvement
       textOut.firstTextFut.await
         .then(() => onFirstFrame())
-        .catch(() => this.logger.debug('firstTextFut cancelled before first frame'));
+        .catch((e) => this.logger.debug({ error: e }, 'firstTextFut rejected before first frame'));

1686-1697: Consider extracting the duplicate filtering logic.

This filtering logic is duplicated at lines 1486-1493. While acceptable, extracting to a helper function would reduce duplication.

📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 8f38e2c and 6a77734.

📒 Files selected for processing (2)
  • agents/src/voice/agent_activity.ts
  • agents/src/voice/recorder_io/recorder_io.ts
🧰 Additional context used
📓 Path-based instructions (3)
**/*.{ts,tsx,js,jsx}

📄 CodeRabbit inference engine (.cursor/rules/agent-core.mdc)

Add SPDX-FileCopyrightText and SPDX-License-Identifier headers to all newly added files with '// SPDX-FileCopyrightText: 2025 LiveKit, Inc.' and '// SPDX-License-Identifier: Apache-2.0'

Files:

  • agents/src/voice/recorder_io/recorder_io.ts
  • agents/src/voice/agent_activity.ts
**/*.{ts,tsx}?(test|example|spec)

📄 CodeRabbit inference engine (.cursor/rules/agent-core.mdc)

When testing inference LLM, always use full model names from agents/src/inference/models.ts (e.g., 'openai/gpt-4o-mini' instead of 'gpt-4o-mini')

Files:

  • agents/src/voice/recorder_io/recorder_io.ts
  • agents/src/voice/agent_activity.ts
**/*.{ts,tsx}?(test|example)

📄 CodeRabbit inference engine (.cursor/rules/agent-core.mdc)

Initialize logger before using any LLM functionality with initializeLogger({ pretty: true }) from '@livekit/agents'

Files:

  • agents/src/voice/recorder_io/recorder_io.ts
  • agents/src/voice/agent_activity.ts
🧬 Code graph analysis (1)
agents/src/voice/agent_activity.ts (2)
agents/src/vad.ts (1)
  • VADEvent (24-56)
agents/src/llm/chat_context.ts (1)
  • FunctionCallOutput (284-350)
🔇 Additional comments (12)
agents/src/voice/agent_activity.ts (6)

7-7: LGTM!

The import alias otelContext for context is clear and helps distinguish OpenTelemetry context from other context references in the codebase.


1174-1175: LGTM!

Good pattern for capturing the OTel context at task entry and propagating it through onFirstFrame to _updateAgentState. This ensures accurate span parent-child relationships across async boundaries.

Also applies to: 1220-1225


1486-1493: LGTM!

Good fix to prevent duplicate FunctionCall entries in session history. The filtering ensures only FunctionCallOutput items are added here since FunctionCall items were already added by onToolExecutionStarted.


1517-1520: LGTM!

Good naming improvement using the InS suffix to explicitly indicate the unit is seconds, addressing previous feedback about unit clarity.


1318-1319: LGTM!

Consistent application of the OTel context capture and first-frame callback patterns in _pipelineReplyTaskImpl.

Also applies to: 1419-1424, 1436-1438, 1443-1445


1765-1766: LGTM!

Consistent implementation of OTel context capture and first-frame handling in _realtimeGenerationTaskImpl.

Also applies to: 1804-1808, 1896-1903

agents/src/voice/recorder_io/recorder_io.ts (6)

125-129: LGTM!

Passing the last speech end time to takeBuf enables proper alignment between input and output recordings.


139-152: LGTM!

Correct logic for returning the minimum of input/output start times, with proper handling of undefined cases.


562-600: LGTM!

Good improvements to playback finish handling:

  • Properly handles pause state when calculating finish time
  • Clamps playback position to actual speech window
  • Tracks last speech timing for future padding decisions
  • Logs warning when speech start time is missing

603-621: LGTM!

Good adoption of the InS suffix convention for variables representing seconds. This makes the code much easier to reason about and addresses previous feedback about unit clarity.


731-735: LGTM!

Updated createSilenceFrame to use durationInS parameter name, consistent with the seconds-based naming convention used throughout the file.


680-685: LGTM!

Properly appends trailing silence to the buffer when needed, with correct ms-to-seconds conversion.

✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.

@toubatbrian toubatbrian requested a review from lukasIO January 19, 2026 22:17

export interface PlaybackFinishedEvent {
// How much of the audio was played back
/** How much of the audio was played back, in seconds */
Copy link
Contributor Author

@toubatbrian toubatbrian Jan 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lukasIO I'm going to keep the naming of playbackPositon for this PR. Otherwise, if will trigger a lot of renamings to playbackPositionInS, which I will do in a different PR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

makes sense, the comment is already helpful, thank you!

@toubatbrian toubatbrian merged commit 25df43a into main Jan 21, 2026
8 checks passed
@toubatbrian toubatbrian deleted the brian/refine-ts-recording branch January 21, 2026 19:04
@github-actions github-actions bot mentioned this pull request Jan 21, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants