voice-call: document realtime mode in README; annotate tested private methods
- Add Realtime voice mode section to README covering requirements, config fields, env var overrides, how the bridge works, and networking requirements (HTTPS+WSS endpoint, Docker bind, Tailscale) - Add JSDoc comments to registerCallInManager and executeToolCall explaining why they are tested via 'as unknown as' type assertion Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
parent
766c6bc141
commit
6327d64070
@ -172,6 +172,63 @@ Actions:
|
||||
- `voicecall.end` (callId)
|
||||
- `voicecall.status` (callId)
|
||||
|
||||
## Realtime voice mode (OpenAI Realtime API)
|
||||
|
||||
Realtime mode routes inbound calls directly to the [OpenAI Realtime API](https://platform.openai.com/docs/guides/realtime) for voice-to-voice conversation (~200–400 ms latency vs ~2–3 s for the STT/TTS pipeline). It is disabled by default and mutually exclusive with `streaming.enabled`.
|
||||
|
||||
### Requirements
|
||||
|
||||
- `OPENAI_API_KEY` set in your environment (or `streaming.openaiApiKey` in config).
|
||||
- A **publicly reachable HTTPS endpoint with WebSocket support** — the webhook server must accept both POST requests (Twilio webhook) and WebSocket upgrades (Twilio Media Stream). A plain HTTP tunnel is not sufficient; Twilio requires WSS.
|
||||
- `inboundPolicy` set to `"open"` or `"allowlist"` (not `"disabled"`) so the plugin accepts inbound calls.
|
||||
|
||||
### Config
|
||||
|
||||
```json5
|
||||
{
|
||||
inboundPolicy: "open", // required: realtime needs inbound calls enabled
|
||||
|
||||
realtime: {
|
||||
enabled: true,
|
||||
voice: "alloy", // Realtime API voices: alloy, ash, ballad, cedar, coral,
|
||||
// echo, marin, sage, shimmer, verse
|
||||
instructions: "You are a helpful assistant.",
|
||||
model: "gpt-4o-mini-realtime-preview", // optional, this is the default
|
||||
temperature: 0.8, // 0–2, optional
|
||||
vadThreshold: 0.5, // voice activity detection sensitivity, 0–1, optional
|
||||
silenceDurationMs: 500, // ms of silence before end-of-turn, optional
|
||||
},
|
||||
}
|
||||
```
|
||||
|
||||
### Environment variable overrides
|
||||
|
||||
All `realtime.*` fields can be set via environment variables (config takes precedence):
|
||||
|
||||
| Env var | Config field |
|
||||
|---|---|
|
||||
| `REALTIME_VOICE_ENABLED=true` | `realtime.enabled` |
|
||||
| `REALTIME_VOICE_MODEL` | `realtime.model` |
|
||||
| `REALTIME_VOICE_VOICE` | `realtime.voice` |
|
||||
| `REALTIME_VOICE_INSTRUCTIONS` | `realtime.instructions` |
|
||||
| `REALTIME_VOICE_TEMPERATURE` | `realtime.temperature` |
|
||||
| `VAD_THRESHOLD` | `realtime.vadThreshold` |
|
||||
| `SILENCE_DURATION_MS` | `realtime.silenceDurationMs` |
|
||||
|
||||
### How it works
|
||||
|
||||
1. Twilio sends a POST webhook to `serve.path` (default `/voice/webhook`).
|
||||
2. The plugin responds with TwiML `<Connect><Stream>` pointing to `wss://<host>/voice/stream/realtime`.
|
||||
3. Twilio opens a WebSocket to that path carrying the caller's audio in μ-law format.
|
||||
4. The plugin bridges the WebSocket to the OpenAI Realtime API — audio flows in both directions in real time.
|
||||
5. The call is registered with CallManager and appears in `openclaw voice status` / `openclaw voice history`.
|
||||
|
||||
### Networking notes
|
||||
|
||||
- `serve.bind` defaults to `127.0.0.1`. If running inside Docker with an external port mapping, set `serve.bind: "0.0.0.0"` so the container's port is reachable from the host.
|
||||
- The WebSocket upgrade path (`/voice/stream/realtime`) must be reachable on the same host and port as the webhook. Reverse proxies must pass `Upgrade: websocket` headers through.
|
||||
- When using Tailscale Funnel on the host (outside Docker), configure Funnel to route `/voice/` to the plugin's local port. The gateway itself does not need to be exposed via Tailscale Funnel.
|
||||
|
||||
## Notes
|
||||
|
||||
- Uses webhook signature verification for Twilio/Telnyx/Plivo.
|
||||
|
||||
@ -222,6 +222,9 @@ export class RealtimeCallHandler {
|
||||
/**
|
||||
* Emit synthetic NormalizedEvents to register the call with CallManager.
|
||||
* Returns the internal callId generated by the manager.
|
||||
*
|
||||
* Tested directly via `as unknown as` cast — the logic is non-trivial
|
||||
* enough to warrant unit testing without promoting to a public method.
|
||||
*/
|
||||
private registerCallInManager(callSid: string): string {
|
||||
const now = Date.now();
|
||||
@ -269,6 +272,13 @@ export class RealtimeCallHandler {
|
||||
});
|
||||
}
|
||||
|
||||
/**
|
||||
* Dispatch a tool call from the Realtime API to the registered handler.
|
||||
* Submits the result (or an error object) back to the bridge.
|
||||
*
|
||||
* Tested directly via `as unknown as` cast — the routing/error logic is
|
||||
* worth unit testing without exposing the method publicly.
|
||||
*/
|
||||
private async executeToolCall(
|
||||
bridge: OpenAIRealtimeVoiceBridge,
|
||||
callId: string,
|
||||
|
||||
Loading…
x
Reference in New Issue
Block a user