Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
26 commits
Select commit Hold shift + click to select a range
7d30c9d
Add AgentKit ASR interaction language handling
digitallysavvy Jun 1, 2026
a95214e
Document AgentKit ASR language and STT params
digitallysavvy Jun 1, 2026
eeac05d
Move prompt and greeting docs to vendor config
digitallysavvy Jun 1, 2026
f652c69
[fern-generated] Update SDK
fern-api[bot] Jun 2, 2026
df2c8d6
[fern-replay] Applied customizations
fern-api[bot] Jun 2, 2026
499f754
Merge generated Python core SDK updates
digitallysavvy Jun 2, 2026
a94bac6
Align AgentKit provider wrappers with regenerated core schemas
digitallysavvy Jun 2, 2026
49af6f6
Align AgentKit TTS provider options with docs
digitallysavvy Jun 2, 2026
bad47d9
Align AgentKit provider BYOK parameter requirements
digitallysavvy Jun 2, 2026
477f40a
[fern-generated] Update SDK
fern-api[bot] Jun 2, 2026
3f7ba38
[fern-replay] Applied customizations
fern-api[bot] Jun 2, 2026
147c3e6
Merge regenerated core SDK TTS provider params
digitallysavvy Jun 2, 2026
198f367
Update AgentKit TTS provider docs and examples
digitallysavvy Jun 2, 2026
0297a70
Update AgentKit v2.1 provider docs and examples
digitallysavvy Jun 2, 2026
96afe78
align v2.1 provider docs with AgentKit validation
digitallysavvy Jun 2, 2026
434c8af
Align AgentKit LLM and ASR vendor validation
digitallysavvy Jun 2, 2026
968e1f0
Restrict managed OpenAI LLM models in AgentKit
digitallysavvy Jun 2, 2026
676b93b
Align managed vendor validation with generated core shapes
digitallysavvy Jun 2, 2026
8d52340
fix(agentkit): flatten Deepgram TTS passthrough params
digitallysavvy Jun 2, 2026
403a1a9
[fern-generated] Update SDK
fern-api[bot] Jun 2, 2026
33f9229
[fern-replay] Applied customizations
fern-api[bot] Jun 2, 2026
21682aa
chore(core): regenerate v2.1.0 types
digitallysavvy Jun 2, 2026
cb9ab8b
docs(agentkit): align OpenAI TTS instructions support
digitallysavvy Jun 2, 2026
c902235
docs(agentkit): align TTS provider reference fields
digitallysavvy Jun 2, 2026
420547b
docs: add v2.1.0 changelog
digitallysavvy Jun 2, 2026
299e4bd
fix(agentkit): resolve provider config type checks
digitallysavvy Jun 2, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions .fernignore
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,10 @@ src/agora_agent/agentkit/
# Documentation - managed manually, not generated by Fern
docs/
README.md
reference.md

# Tests - managed manually, not generated by Fern
tests/

# Compatibility shim and CI/release workflows are managed manually
compat/
Expand Down
49 changes: 8 additions & 41 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,7 @@ pip install agora-agents
## Quick Start

Start with the `Agent` builder: create a client with app credentials, choose your ASR, LLM, and TTS providers, then start a session. Omit vendor API keys for supported Agora-managed models, or provide keys when you want BYOK.
Use `with_interaction_language()` for Agora `asr.language`; provider-specific STT language values remain under `asr.params`.

```python
import os
Expand All @@ -29,12 +30,9 @@ from agora_agent import (
Agent,
Agora,
Area,
DataChannel,
DeepgramSTT,
GenericAvatar,
MiniMaxTTS,
OpenAI,
XaiGrok,
expires_in_hours,
)

Expand All @@ -56,49 +54,18 @@ def start_conversation() -> str:
app_certificate=app_certificate,
)

agent = Agent(
name=f"conversation-{int(time.time())}",
instructions=AGENT_PROMPT,
greeting=GREETING,
failure_message="Please wait a moment.",
max_history=50,
turn_detection={
"config": {
"speech_threshold": 0.5,
"start_of_speech": {
"mode": "vad",
"vad_config": {
"interrupt_duration_ms": 160,
"prefix_padding_ms": 300,
},
},
"end_of_speech": {
"mode": "vad",
"vad_config": {
"silence_duration_ms": 480,
},
},
},
},
advanced_features={
"enable_rtm": True,
"enable_tools": True,
},
parameters={
"data_channel": DataChannel.RTM,
"enable_error_message": True,
},
).with_stt(
agent = Agent(name=f"conversation-{int(time.time())}").with_interaction_language("en-US").with_stt(
DeepgramSTT(
model="nova-3",
language="en",
)
).with_llm(
OpenAI(
model="gpt-4o-mini",
system_messages=[{"role": "system", "content": AGENT_PROMPT}],
greeting_message=GREETING,
failure_message="Please wait a moment.",
max_history=15,
max_history=50,
params={
"max_tokens": 1024,
"temperature": 0.7,
Expand Down Expand Up @@ -134,10 +101,7 @@ def start_conversation() -> str:
Use the same `Agent` builder shape, but provide credentials explicitly when you want vendor-managed billing and routing instead of Agora-managed models.

```python
agent = Agent(
instructions=AGENT_PROMPT,
greeting=GREETING,
).with_stt(
agent = Agent().with_interaction_language("en-US").with_stt(
DeepgramSTT(
api_key=os.environ["DEEPGRAM_API_KEY"],
model="nova-3",
Expand All @@ -146,7 +110,10 @@ agent = Agent(
).with_llm(
OpenAI(
api_key=os.environ["OPENAI_API_KEY"],
base_url="https://api.openai.com/v1/chat/completions",
model="gpt-4o-mini",
system_messages=[{"role": "system", "content": AGENT_PROMPT}],
greeting_message=GREETING,
max_tokens=1024,
temperature=0.7,
top_p=0.95,
Expand Down
23 changes: 21 additions & 2 deletions changelog.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,25 @@ All notable changes to this project will be documented in this file.

The format is based on [Keep a Changelog](https://keepachangelog.com/).

## [v2.1.0] — 2026-06-02

### Added

- **ASR interaction language** — AgentKit now manages Agora `asr.language` through `interaction_language` / `Agent.with_interaction_language()`, validates it against the supported BCP-47 interaction language list, and sends the default `en-US` when no language is provided.
- **Provider parameter parity** — ASR, LLM, MLLM, TTS, and avatar wrappers expose typed provider parameters plus passthrough fields where the generated core supports additional properties.

### Changed

- **Generated core refresh** — Regenerated core types from the v2.1 API schema.
- **Deepgram TTS passthrough** — `DeepgramTTS` now uses `additional_params` for passthrough fields and flattens them into `tts.params`; the removed nested `params.params` shape is no longer documented or emitted.
- **OpenAI TTS** — Docs and tests now reflect the generated core shape, including `instructions` and `speed` under `tts.params`.
- **TTS provider docs** — Updated TTS provider reference tables to match implemented wrapper fields and generated core params.

### Fixed

- **Managed-provider validation** — AgentKit validation now distinguishes preset-backed providers from BYOK providers so required provider fields are only required when credentials are caller-supplied.
- **ASR language separation** — Provider-specific STT language values remain under `asr.params`, while Agora interaction language is emitted separately as `asr.language`.

## [v2.0.0] — 2026-05-21

### Added
Expand Down Expand Up @@ -52,7 +71,7 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/).

### Added

- **`DeepgramTTS`** — New TTS vendor wrapper for Deepgram (Beta). Accepts `api_key`, `model`, `base_url`, `sample_rate`, `params`, and `skip_patterns`.
- **`DeepgramTTS`** — New TTS vendor wrapper for Deepgram (Beta). Accepts `api_key`, `model`, `base_url`, `sample_rate`, `additional_params`, and `skip_patterns`.
- **`Agent.with_tools(enabled=True)`** — Dedicated builder method to enable MCP tool invocation (`advanced_features.enable_tools`). Replaces the raw `with_advanced_features(AdvancedFeatures(enable_tools=True))` call.
- **LLM vendors: `headers` field** — All four LLM vendors (`OpenAI`, `AzureOpenAI`, `Anthropic`, `Gemini`) now accept an optional `headers: Dict[str, str]` parameter. Use this to pass custom HTTP headers to the LLM provider (e.g., tenant identifiers, routing headers).
- **`AgentSession.think()` / `AsyncAgentSession.think()`** — Send a custom instruction to a running agent through the `agent_management` API.
Expand Down Expand Up @@ -107,7 +126,7 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/).

### Added

- **`OpenAITTS`** — New optional parameters: `response_format` (str, e.g. `"pcm"`) and `speed` (float).
- **`OpenAITTS`** — New optional parameters: `instructions` (str) and `speed` (float).
- **`CartesiaTTS`** — `voice_id` user-facing field is preserved; voice is serialized to the required nested object format automatically.
- **`RimeTTS`** — New optional parameters: `lang` (str), `sampling_rate` (int, serialized as `samplingRate`), `speed_alpha` (float, serialized as `speedAlpha`).
- **`OpenAIRealtime`** — New optional parameter: `failure_message` (str).
Expand Down
58 changes: 36 additions & 22 deletions docs/concepts/agent.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,24 +12,28 @@ The `Agent` class is a fluent builder for configuring AI agent properties. It co

<!-- snippet: executable -->
```python
from agora_agent import Agent

agent = Agent(
name='support-assistant',
instructions='You are a helpful voice assistant.',
greeting='Hello! How can I help you?',
failure_message='Sorry, something went wrong.',
max_history=20,
from agora_agent import Agent, OpenAI

agent = Agent(name='support-assistant').with_llm(
OpenAI(
api_key='your-openai-key',
base_url='https://api.openai.com/v1/chat/completions',
model='gpt-4o-mini',
system_messages=[{'role': 'system', 'content': 'You are a helpful voice assistant.'}],
greeting_message='Hello! How can I help you?',
failure_message='Sorry, something went wrong.',
max_history=20,
)
)
```

| Parameter | Type | Required | Description |
|---|---|---|---|
| `name` | `str` | No | Agent display name (used as session name if not overridden) |
| `instructions` | `str` | No | System prompt for the LLM |
| `greeting` | `str` | No | Message spoken when the agent joins |
| `failure_message` | `str` | No | Message spoken on error |
| `max_history` | `int` | No | Maximum conversation history length |
| `instructions` | `str` | No | Deprecated. Use LLM vendor `system_messages` instead. |
| `greeting` | `str` | No | Deprecated. Use LLM/MLLM vendor `greeting_message` instead. |
| `failure_message` | `str` | No | Deprecated. Use LLM/MLLM vendor `failure_message` instead. |
| `max_history` | `int` | No | Deprecated. Use LLM vendor `max_history` instead. |
| `turn_detection` | `TurnDetectionConfig` | No | Turn detection settings |
| `sal` | `SalConfig` | No | SAL (Speech Activity Level) configuration |
| `advanced_features` | `Dict[str, Any]` | No | Advanced features (e.g., `{'enable_rtm': True}`) |
Expand Down Expand Up @@ -57,15 +61,15 @@ Each `with_*` method returns a **new** `Agent` instance — the original is unch

| Method | Accepts | Purpose |
|---|---|---|
| `with_instructions(text)` | `str` | Override the system prompt |
| `with_greeting(text)` | `str` | Override the greeting message |
| `with_instructions(text)` | `str` | Deprecated. Use LLM vendor `system_messages` instead. |
| `with_greeting(text)` | `str` | Deprecated. Use LLM/MLLM vendor `greeting_message` instead. |
| `with_name(name)` | `str` | Override the agent name |
| `with_turn_detection(config)` | `TurnDetectionConfig` | Override cascading-flow SOS/EOS detection; use `with_interruption()` for interruption behavior |
| `with_sal(config)` | `SalConfig` | Set SAL configuration |
| `with_advanced_features(features)` | `Dict[str, Any]` | Set advanced features |
| `with_parameters(parameters)` | `SessionParams` | Set session parameters |
| `with_failure_message(message)` | `str` | Set failure message |
| `with_max_history(max_history)` | `int` | Set max history length |
| `with_failure_message(message)` | `str` | Deprecated. Use LLM/MLLM vendor `failure_message` instead. |
| `with_max_history(max_history)` | `int` | Deprecated. Use LLM vendor `max_history` instead. |
| `with_geofence(geofence)` | `GeofenceConfig` | Set geofence configuration |
| `with_labels(labels)` | `Dict[str, str]` | Set custom labels |
| `with_rtc(rtc)` | `RtcConfig` | Set RTC configuration |
Expand All @@ -79,9 +83,14 @@ from agora_agent import Agent
from agora_agent import OpenAI, ElevenLabsTTS, DeepgramSTT

agent = (
Agent(name='my-agent', instructions='You are a helpful assistant.')
.with_llm(OpenAI(api_key='your-openai-key', model='gpt-4o-mini'))
.with_tts(ElevenLabsTTS(key='your-elevenlabs-key', model_id='eleven_flash_v2_5', voice_id='your-voice-id'))
Agent(name='my-agent')
.with_llm(OpenAI(
api_key='your-openai-key',
base_url='https://api.openai.com/v1/chat/completions',
model='gpt-4o-mini',
system_messages=[{'role': 'system', 'content': 'You are a helpful assistant.'}],
))
.with_tts(ElevenLabsTTS(key='your-elevenlabs-key', model_id='eleven_flash_v2_5', voice_id='your-voice-id', base_url='wss://api.elevenlabs.io/v1'))
.with_stt(DeepgramSTT(api_key='your-deepgram-key', language='en-US'))
)
```
Expand All @@ -97,9 +106,14 @@ from agora_agent import Agent, Agora, Area, OpenAI, ElevenLabsTTS, DeepgramSTT
client = Agora(area=Area.US, app_id='your-app-id', app_certificate='your-app-certificate')

base = (
Agent(instructions='You are a helpful assistant.')
.with_llm(OpenAI(api_key='your-openai-key', model='gpt-4o-mini'))
.with_tts(ElevenLabsTTS(key='your-elevenlabs-key', model_id='eleven_flash_v2_5', voice_id='your-voice-id'))
Agent()
.with_llm(OpenAI(
api_key='your-openai-key',
base_url='https://api.openai.com/v1/chat/completions',
model='gpt-4o-mini',
system_messages=[{'role': 'system', 'content': 'You are a helpful assistant.'}],
))
.with_tts(ElevenLabsTTS(key='your-elevenlabs-key', model_id='eleven_flash_v2_5', voice_id='your-voice-id', base_url='wss://api.elevenlabs.io/v1'))
.with_stt(DeepgramSTT(api_key='your-deepgram-key', language='en-US'))
)

Expand Down
11 changes: 8 additions & 3 deletions docs/concepts/session.md
Original file line number Diff line number Diff line change
Expand Up @@ -40,9 +40,14 @@ from agora_agent import Agent, Agora, Area, OpenAI, ElevenLabsTTS, DeepgramSTT
client = Agora(area=Area.US, app_id='your-app-id', app_certificate='your-app-certificate')

agent = (
Agent(name='my-agent', instructions='You are helpful.')
.with_llm(OpenAI(api_key='your-openai-key', model='gpt-4o-mini'))
.with_tts(ElevenLabsTTS(key='your-elevenlabs-key', model_id='eleven_flash_v2_5', voice_id='your-voice-id'))
Agent(name='my-agent')
.with_llm(OpenAI(
api_key='your-openai-key',
base_url='https://api.openai.com/v1/chat/completions',
model='gpt-4o-mini',
system_messages=[{'role': 'system', 'content': 'You are helpful.'}],
))
.with_tts(ElevenLabsTTS(key='your-elevenlabs-key', model_id='eleven_flash_v2_5', voice_id='your-voice-id', base_url='wss://api.elevenlabs.io/v1'))
.with_stt(DeepgramSTT(api_key='your-deepgram-key', language='en-US'))
)

Expand Down
47 changes: 25 additions & 22 deletions docs/concepts/vendors.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,21 +21,21 @@ Used with `agent.with_llm()` for the cascading flow (ASR → LLM → TTS).

| Class | Provider | Required Parameters |
|---|---|---|
| `OpenAI` | OpenAI | `api_key` |
| `AzureOpenAI` | Azure OpenAI | `api_key`, `endpoint`, `deployment_name` |
| `Anthropic` | Anthropic | `api_key` |
| `Gemini` | Google Gemini | `api_key` |
| `Groq` | Groq | `api_key` |
| `VertexAILLM` | Google Vertex AI | `api_key`, `project_id`, `location` |
| `AmazonBedrock` | Amazon Bedrock | `api_key`, `url`, `model` |
| `Dify` | Dify | `api_key`, `url` |
| `OpenAI` | OpenAI | `model` for Agora-managed models; `api_key`, `base_url`, `model` for BYOK |
| `AzureOpenAI` | Azure OpenAI | `api_key`, `model`, `endpoint`, `deployment_name` |
| `Anthropic` | Anthropic | `api_key`, `model`, `url`, `headers`, `max_tokens` |
| `Gemini` | Google Gemini | `api_key`, `model` |
| `Groq` | Groq | `api_key`, `model`, `base_url` |
| `VertexAILLM` | Google Vertex AI | `api_key`, `model`, `project_id`, `location` |
| `AmazonBedrock` | Amazon Bedrock | `access_key`, `secret_key`, `region`, `model` |
| `Dify` | Dify | `api_key`, `url`, `model` |
| `CustomLLM` | OpenAI-compatible LLM | `api_key`, `base_url`, `model` |

<!-- snippet: executable -->
```python
from agora_agent import OpenAI

llm = OpenAI(api_key='your-openai-key', model='gpt-4o-mini')
llm = OpenAI(api_key='your-openai-key', base_url='https://api.openai.com/v1/chat/completions', model='gpt-4o-mini')
```

## TTS Vendors
Expand All @@ -44,17 +44,17 @@ Used with `agent.with_tts()`. Each TTS vendor produces audio at a specific sampl

| Class | Provider | Required Parameters | Sample Rate |
|---|---|---|---|
| `ElevenLabsTTS` | ElevenLabs | `key`, `model_id`, `voice_id` | 16000, 22050, 24000, or 44100 Hz |
| `ElevenLabsTTS` | ElevenLabs | `key`, `model_id`, `voice_id`, `base_url` | 16000, 22050, 24000, or 44100 Hz |
| `MicrosoftTTS` | Microsoft Azure | `key`, `region`, `voice_name` | 8000, 16000, 24000, or 48000 Hz |
| `OpenAITTS` | OpenAI | `key`, `voice` | 24000 Hz (fixed) |
| `CartesiaTTS` | Cartesia | `key`, `voice_id` | 8000–48000 Hz |
| `OpenAITTS` | OpenAI | `voice` for Agora-managed `tts-1`; `api_key`, `model`, `base_url`, `voice` for BYOK | 24000 Hz (fixed) |
| `CartesiaTTS` | Cartesia | `api_key`, `voice_id`, `model_id` | 8000–48000 Hz |
| `GoogleTTS` | Google Cloud | `key`, `voice_name` | — |
| `AmazonTTS` | Amazon Polly | `access_key`, `secret_key`, `region`, `voice_id` | — |
| `HumeAITTS` | Hume AI | `key` | — |
| `RimeTTS` | Rime | `key`, `speaker` | — |
| `FishAudioTTS` | Fish Audio | `key`, `reference_id` | — |
| `AmazonTTS` | Amazon Polly | `access_key`, `secret_key`, `region`, `voice_id`, `engine` | — |
| `HumeAITTS` | Hume AI | `key`, `voice_id`, `provider` | — |
| `RimeTTS` | Rime | `key`, `speaker`, `model_id` | — |
| `FishAudioTTS` | Fish Audio | `key`, `reference_id`, `backend` | — |
| `GroqTTS` | Groq | `key` | — |
| `MiniMaxTTS` | MiniMax | `key` | — |
| `MiniMaxTTS` | MiniMax | `model` for supported Agora-managed models; `key`, `group_id`, `model`, `voice_id`, `url` for BYOK | — |
| `DeepgramTTS` | Deepgram | `api_key`, `model` | Configurable |
| `SarvamTTS` | Sarvam | `api_key` | — |

Expand All @@ -66,6 +66,7 @@ tts = ElevenLabsTTS(
key='your-elevenlabs-key',
model_id='eleven_flash_v2_5',
voice_id='your-voice-id',
base_url='wss://api.elevenlabs.io/v1',
sample_rate=24000,
)
```
Expand All @@ -74,15 +75,17 @@ tts = ElevenLabsTTS(

Used with `agent.with_stt()`.

Use `agent.with_interaction_language()` for Agora `asr.language`; it defaults to `en-US`. STT vendor `language` options are serialized under `asr.params` using each provider's own format.

| Class | Provider | Required Parameters |
|---|---|---|
| `SpeechmaticsSTT` | Speechmatics | `api_key`, `language` |
| `DeepgramSTT` | Deepgram | — (all optional) |
| `MicrosoftSTT` | Microsoft Azure | `key`, `region` |
| `DeepgramSTT` | Deepgram | `model` for Agora-managed `nova-2`/`nova-3`; `api_key` for BYOK |
| `MicrosoftSTT` | Microsoft Azure | `key`, `region`, `language` |
| `OpenAISTT` | OpenAI | `api_key` |
| `GoogleSTT` | Google Cloud | `api_key` |
| `AmazonSTT` | Amazon Transcribe | `access_key`, `secret_key`, `region` |
| `AssemblyAISTT` | AssemblyAI | `api_key` |
| `GoogleSTT` | Google Cloud | `project_id`, `location`, `adc_credentials_string`, `language` |
| `AmazonSTT` | Amazon Transcribe | `access_key`, `secret_key`, `region`, `language` |
| `AssemblyAISTT` | AssemblyAI | `api_key`, `language` |
| `AresSTT` | Ares | — (all optional) |
| `SarvamSTT` | Sarvam | `api_key`, `language` |

Expand Down
7 changes: 5 additions & 2 deletions docs/getting-started/authentication.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,9 +20,12 @@ client = Agora(
)

agent = (
Agent(instructions="Be concise.")
Agent()
.with_stt(DeepgramSTT(model="nova-3"))
.with_llm(OpenAI(model="gpt-4o-mini"))
.with_llm(OpenAI(
model="gpt-4o-mini",
system_messages=[{"role": "system", "content": "Be concise."}],
))
.with_tts(MiniMaxTTS(model="speech_2_6_turbo", voice_id="English_captivating_female1"))
)

Expand Down
14 changes: 7 additions & 7 deletions docs/getting-started/quick-start.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,14 +27,14 @@ def main() -> None:
)

agent = (
Agent(
name="support-assistant",
instructions="You are a concise support voice assistant.",
greeting="Hello! How can I help you today?",
max_history=10,
)
Agent(name="support-assistant")
.with_stt(DeepgramSTT(model="nova-3", language="en"))
.with_llm(OpenAI(model="gpt-4o-mini"))
.with_llm(OpenAI(
model="gpt-4o-mini",
system_messages=[{"role": "system", "content": "You are a concise support voice assistant."}],
greeting_message="Hello! How can I help you today?",
max_history=10,
))
.with_tts(MiniMaxTTS(model="speech_2_6_turbo", voice_id="English_captivating_female1"))
)

Expand Down
Loading
Loading