Voice Agents Coming Soon

Real-time bidirectional voice conversation with tool calling. Coming soon.

🚧

This feature is in active development. The infrastructure for voice agents is built -- BidiAgent, NovaSonicModel, OpenAIRealtimeModel, GeminiLiveModel, and the local MLX pipeline all exist in the SDK. We are working through platform-specific audio session issues before marking this stable for production use.

What's coming

Microphone → audio → BidiAgent → audio → Speaker
MCP Tools (desktop control, APIs, etc.)

Planned backends

BackendStatus
AWS Nova SonicBlocked on HTTP/2 bidi stream support in AWS SDK for Swift
OpenAI RealtimeIn progress
Google Gemini LiveIn progress
On-device MLX (STT + LLM + TTS)In progress -- audio session issues in menu bar context

Follow progress on GitHub or the blog for updates.