Voice Agents Coming Soon

Real-time bidirectional voice conversation with tool calling. Coming soon.

🚧

This feature is in active development. The infrastructure for voice agents is built -- BidiAgent, NovaSonicModel, OpenAIRealtimeModel, GeminiLiveModel, and the local MLX pipeline all exist in the SDK. We are working through platform-specific audio session issues before marking this stable for production use.

What's coming

Microphone → audio → BidiAgent → audio → Speaker

↕

MCP Tools (desktop control, APIs, etc.)

Planned backends

Backend	Status
AWS Nova Sonic	Blocked on HTTP/2 bidi stream support in AWS SDK for Swift
OpenAI Realtime	In progress
Google Gemini Live	In progress
On-device MLX (STT + LLM + TTS)	In progress -- audio session issues in menu bar context

Follow progress on GitHub or the blog for updates.

← PreviousAgent-to-Agent (A2A) Next →Model Providers