voice-call: document realtime mode in README; annotate tested private methods

- Add Realtime voice mode section to README covering requirements,
  config fields, env var overrides, how the bridge works, and
  networking requirements (HTTPS+WSS endpoint, Docker bind, Tailscale)
- Add JSDoc comments to registerCallInManager and executeToolCall
  explaining why they are tested via 'as unknown as' type assertion

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
Forrest Blount 2026-03-11 21:10:28 +00:00
parent 766c6bc141
commit 6327d64070
2 changed files with 67 additions and 0 deletions

View File

@ -172,6 +172,63 @@ Actions:
- `voicecall.end` (callId)
- `voicecall.status` (callId)
## Realtime voice mode (OpenAI Realtime API)
Realtime mode routes inbound calls directly to the [OpenAI Realtime API](https://platform.openai.com/docs/guides/realtime) for voice-to-voice conversation (~200400 ms latency vs ~23 s for the STT/TTS pipeline). It is disabled by default and mutually exclusive with `streaming.enabled`.
### Requirements
- `OPENAI_API_KEY` set in your environment (or `streaming.openaiApiKey` in config).
- A **publicly reachable HTTPS endpoint with WebSocket support** — the webhook server must accept both POST requests (Twilio webhook) and WebSocket upgrades (Twilio Media Stream). A plain HTTP tunnel is not sufficient; Twilio requires WSS.
- `inboundPolicy` set to `"open"` or `"allowlist"` (not `"disabled"`) so the plugin accepts inbound calls.
### Config
```json5
{
inboundPolicy: "open", // required: realtime needs inbound calls enabled
realtime: {
enabled: true,
voice: "alloy", // Realtime API voices: alloy, ash, ballad, cedar, coral,
// echo, marin, sage, shimmer, verse
instructions: "You are a helpful assistant.",
model: "gpt-4o-mini-realtime-preview", // optional, this is the default
temperature: 0.8, // 02, optional
vadThreshold: 0.5, // voice activity detection sensitivity, 01, optional
silenceDurationMs: 500, // ms of silence before end-of-turn, optional
},
}
```
### Environment variable overrides
All `realtime.*` fields can be set via environment variables (config takes precedence):
| Env var | Config field |
|---|---|
| `REALTIME_VOICE_ENABLED=true` | `realtime.enabled` |
| `REALTIME_VOICE_MODEL` | `realtime.model` |
| `REALTIME_VOICE_VOICE` | `realtime.voice` |
| `REALTIME_VOICE_INSTRUCTIONS` | `realtime.instructions` |
| `REALTIME_VOICE_TEMPERATURE` | `realtime.temperature` |
| `VAD_THRESHOLD` | `realtime.vadThreshold` |
| `SILENCE_DURATION_MS` | `realtime.silenceDurationMs` |
### How it works
1. Twilio sends a POST webhook to `serve.path` (default `/voice/webhook`).
2. The plugin responds with TwiML `<Connect><Stream>` pointing to `wss://<host>/voice/stream/realtime`.
3. Twilio opens a WebSocket to that path carrying the caller's audio in μ-law format.
4. The plugin bridges the WebSocket to the OpenAI Realtime API — audio flows in both directions in real time.
5. The call is registered with CallManager and appears in `openclaw voice status` / `openclaw voice history`.
### Networking notes
- `serve.bind` defaults to `127.0.0.1`. If running inside Docker with an external port mapping, set `serve.bind: "0.0.0.0"` so the container's port is reachable from the host.
- The WebSocket upgrade path (`/voice/stream/realtime`) must be reachable on the same host and port as the webhook. Reverse proxies must pass `Upgrade: websocket` headers through.
- When using Tailscale Funnel on the host (outside Docker), configure Funnel to route `/voice/` to the plugin's local port. The gateway itself does not need to be exposed via Tailscale Funnel.
## Notes
- Uses webhook signature verification for Twilio/Telnyx/Plivo.

View File

@ -222,6 +222,9 @@ export class RealtimeCallHandler {
/**
* Emit synthetic NormalizedEvents to register the call with CallManager.
* Returns the internal callId generated by the manager.
*
* Tested directly via `as unknown as` cast the logic is non-trivial
* enough to warrant unit testing without promoting to a public method.
*/
private registerCallInManager(callSid: string): string {
const now = Date.now();
@ -269,6 +272,13 @@ export class RealtimeCallHandler {
});
}
/**
* Dispatch a tool call from the Realtime API to the registered handler.
* Submits the result (or an error object) back to the bridge.
*
* Tested directly via `as unknown as` cast the routing/error logic is
* worth unit testing without exposing the method publicly.
*/
private async executeToolCall(
bridge: OpenAIRealtimeVoiceBridge,
callId: string,