Voice Agents Coming Soon
Real-time bidirectional voice conversation with tool calling. Coming soon.
This feature is in active development. The infrastructure for voice agents is built -- BidiAgent, NovaSonicModel, OpenAIRealtimeModel, GeminiLiveModel, and the local MLX pipeline all exist in the SDK. We are working through platform-specific audio session issues before marking this stable for production use.
What's coming
Microphone
→ audio →
BidiAgent
→ audio →
Speaker
↕
MCP Tools
(desktop control, APIs, etc.)
Planned backends
| Backend | Status |
|---|---|
| AWS Nova Sonic | Blocked on HTTP/2 bidi stream support in AWS SDK for Swift |
| OpenAI Realtime | In progress |
| Google Gemini Live | In progress |
| On-device MLX (STT + LLM + TTS) | In progress -- audio session issues in menu bar context |