docs(plugins): document capability ownership model
This commit is contained in:
parent
662031a88e
commit
6da9ba3267
@ -204,7 +204,7 @@ Example with a stable public host:
|
|||||||
|
|
||||||
## TTS for calls
|
## TTS for calls
|
||||||
|
|
||||||
Voice Call uses the core `messages.tts` configuration (OpenAI or ElevenLabs) for
|
Voice Call uses the core `messages.tts` configuration for
|
||||||
streaming speech on calls. You can override it under the plugin config with the
|
streaming speech on calls. You can override it under the plugin config with the
|
||||||
**same shape** — it deep‑merges with `messages.tts`.
|
**same shape** — it deep‑merges with `messages.tts`.
|
||||||
|
|
||||||
@ -222,7 +222,7 @@ streaming speech on calls. You can override it under the plugin config with the
|
|||||||
|
|
||||||
Notes:
|
Notes:
|
||||||
|
|
||||||
- **Edge TTS is ignored for voice calls** (telephony audio needs PCM; Edge output is unreliable).
|
- **Microsoft speech is ignored for voice calls** (telephony audio needs PCM; the current Microsoft transport does not expose telephony PCM output).
|
||||||
- Core TTS is used when Twilio media streaming is enabled; otherwise calls fall back to provider native voices.
|
- Core TTS is used when Twilio media streaming is enabled; otherwise calls fall back to provider native voices.
|
||||||
|
|
||||||
### More examples
|
### More examples
|
||||||
|
|||||||
@ -97,6 +97,76 @@ The important design boundary:
|
|||||||
That split lets OpenClaw validate config, explain missing/disabled plugins, and
|
That split lets OpenClaw validate config, explain missing/disabled plugins, and
|
||||||
build UI/schema hints before the full runtime is active.
|
build UI/schema hints before the full runtime is active.
|
||||||
|
|
||||||
|
## Capability ownership model
|
||||||
|
|
||||||
|
OpenClaw treats a native plugin as the ownership boundary for a **company** or a
|
||||||
|
**feature**, not as a grab bag of unrelated integrations.
|
||||||
|
|
||||||
|
That means:
|
||||||
|
|
||||||
|
- a company plugin should usually own all of that company's OpenClaw-facing
|
||||||
|
surfaces
|
||||||
|
- a feature plugin should usually own the full feature surface it introduces
|
||||||
|
- channels should consume shared core capabilities instead of re-implementing
|
||||||
|
provider behavior ad hoc
|
||||||
|
|
||||||
|
Examples:
|
||||||
|
|
||||||
|
- the bundled `openai` plugin owns OpenAI model-provider behavior and OpenAI
|
||||||
|
speech behavior
|
||||||
|
- the bundled `elevenlabs` plugin owns ElevenLabs speech behavior
|
||||||
|
- the bundled `microsoft` plugin owns Microsoft speech behavior
|
||||||
|
- the `voice-call` plugin is a feature plugin: it owns call transport, tools,
|
||||||
|
CLI, routes, and runtime, but it consumes core TTS/STT capability instead of
|
||||||
|
inventing a second speech stack
|
||||||
|
|
||||||
|
The intended end state is:
|
||||||
|
|
||||||
|
- OpenAI lives in one plugin even if it spans text models, speech, images, and
|
||||||
|
future video
|
||||||
|
- another vendor can do the same for its own surface area
|
||||||
|
- channels do not care which vendor plugin owns the provider; they consume the
|
||||||
|
shared capability contract exposed by core
|
||||||
|
|
||||||
|
This is the key distinction:
|
||||||
|
|
||||||
|
- **plugin** = ownership boundary
|
||||||
|
- **capability** = core contract that multiple plugins can implement or consume
|
||||||
|
|
||||||
|
So if OpenClaw adds a new domain such as video, the first question is not
|
||||||
|
"which provider should hardcode video handling?" The first question is "what is
|
||||||
|
the core video capability contract?" Once that contract exists, vendor plugins
|
||||||
|
can register against it and channel/feature plugins can consume it.
|
||||||
|
|
||||||
|
If the capability does not exist yet, the right move is usually:
|
||||||
|
|
||||||
|
1. define the missing capability in core
|
||||||
|
2. expose it through the plugin API/runtime in a typed way
|
||||||
|
3. wire channels/features against that capability
|
||||||
|
4. let vendor plugins register implementations
|
||||||
|
|
||||||
|
This keeps ownership explicit while avoiding core behavior that depends on a
|
||||||
|
single vendor or a one-off plugin-specific code path.
|
||||||
|
|
||||||
|
### Capability layering
|
||||||
|
|
||||||
|
Use this mental model when deciding where code belongs:
|
||||||
|
|
||||||
|
- **core capability layer**: shared orchestration, policy, fallback, config
|
||||||
|
merge rules, delivery semantics, and typed contracts
|
||||||
|
- **vendor plugin layer**: vendor-specific APIs, auth, model catalogs, speech
|
||||||
|
synthesis, image generation, future video backends, usage endpoints
|
||||||
|
- **channel/feature plugin layer**: Slack/Discord/voice-call/etc. integration
|
||||||
|
that consumes core capabilities and presents them on a surface
|
||||||
|
|
||||||
|
For example, TTS follows this shape:
|
||||||
|
|
||||||
|
- core owns reply-time TTS policy, fallback order, prefs, and channel delivery
|
||||||
|
- `openai`, `elevenlabs`, and `microsoft` own synthesis implementations
|
||||||
|
- `voice-call` consumes the telephony TTS runtime helper
|
||||||
|
|
||||||
|
That same pattern should be preferred for future capabilities.
|
||||||
|
|
||||||
## Compatible bundles
|
## Compatible bundles
|
||||||
|
|
||||||
OpenClaw also recognizes two compatible external bundle layouts:
|
OpenClaw also recognizes two compatible external bundle layouts:
|
||||||
@ -193,6 +263,8 @@ Important trust note:
|
|||||||
- Model Studio provider catalog — bundled as `modelstudio` (enabled by default)
|
- Model Studio provider catalog — bundled as `modelstudio` (enabled by default)
|
||||||
- Moonshot provider runtime — bundled as `moonshot` (enabled by default)
|
- Moonshot provider runtime — bundled as `moonshot` (enabled by default)
|
||||||
- NVIDIA provider catalog — bundled as `nvidia` (enabled by default)
|
- NVIDIA provider catalog — bundled as `nvidia` (enabled by default)
|
||||||
|
- ElevenLabs speech provider — bundled as `elevenlabs` (enabled by default)
|
||||||
|
- Microsoft speech provider — bundled as `microsoft` (enabled by default; legacy `edge` input maps here)
|
||||||
- OpenAI provider runtime — bundled as `openai` (enabled by default; owns both `openai` and `openai-codex`)
|
- OpenAI provider runtime — bundled as `openai` (enabled by default; owns both `openai` and `openai-codex`)
|
||||||
- OpenCode Go provider capabilities — bundled as `opencode-go` (enabled by default)
|
- OpenCode Go provider capabilities — bundled as `opencode-go` (enabled by default)
|
||||||
- OpenCode Zen provider capabilities — bundled as `opencode` (enabled by default)
|
- OpenCode Zen provider capabilities — bundled as `opencode` (enabled by default)
|
||||||
@ -218,6 +290,8 @@ Native OpenClaw plugins can register:
|
|||||||
- Gateway HTTP routes
|
- Gateway HTTP routes
|
||||||
- Agent tools
|
- Agent tools
|
||||||
- CLI commands
|
- CLI commands
|
||||||
|
- Speech providers
|
||||||
|
- Web search providers
|
||||||
- Background services
|
- Background services
|
||||||
- Context engines
|
- Context engines
|
||||||
- Provider auth flows and model catalogs
|
- Provider auth flows and model catalogs
|
||||||
@ -229,6 +303,62 @@ Native OpenClaw plugins can register:
|
|||||||
Native OpenClaw plugins run **in‑process** with the Gateway, so treat them as trusted code.
|
Native OpenClaw plugins run **in‑process** with the Gateway, so treat them as trusted code.
|
||||||
Tool authoring guide: [Plugin agent tools](/plugins/agent-tools).
|
Tool authoring guide: [Plugin agent tools](/plugins/agent-tools).
|
||||||
|
|
||||||
|
Think of these registrations as **capability claims**. A plugin is not supposed
|
||||||
|
to reach into random internals and "just make it work." It should register
|
||||||
|
against explicit surfaces that OpenClaw understands, validates, and can expose
|
||||||
|
consistently across config, onboarding, status, docs, and runtime behavior.
|
||||||
|
|
||||||
|
## Contracts and enforcement
|
||||||
|
|
||||||
|
The plugin API surface is intentionally typed and centralized in
|
||||||
|
`OpenClawPluginApi`. That contract defines the supported registration points and
|
||||||
|
the runtime helpers a plugin may rely on.
|
||||||
|
|
||||||
|
Why this matters:
|
||||||
|
|
||||||
|
- plugin authors get one stable internal standard
|
||||||
|
- core can reject duplicate ownership such as two plugins registering the same
|
||||||
|
provider id
|
||||||
|
- startup can surface actionable diagnostics for malformed registration
|
||||||
|
- contract tests can enforce bundled-plugin ownership and prevent silent drift
|
||||||
|
|
||||||
|
There are two layers of enforcement:
|
||||||
|
|
||||||
|
1. **runtime registration enforcement**
|
||||||
|
The plugin registry validates registrations as plugins load. Examples:
|
||||||
|
duplicate provider ids, duplicate speech provider ids, and malformed
|
||||||
|
registrations produce plugin diagnostics instead of undefined behavior.
|
||||||
|
2. **contract tests**
|
||||||
|
Bundled plugins are captured in contract registries during test runs so
|
||||||
|
OpenClaw can assert ownership explicitly. Today this is used for model
|
||||||
|
providers, web search providers, and bundled registration ownership.
|
||||||
|
|
||||||
|
The practical effect is that OpenClaw knows, up front, which plugin owns which
|
||||||
|
surface. That lets core and channels compose seamlessly because ownership is
|
||||||
|
declared, typed, and testable rather than implicit.
|
||||||
|
|
||||||
|
### What belongs in a contract
|
||||||
|
|
||||||
|
Good plugin contracts are:
|
||||||
|
|
||||||
|
- typed
|
||||||
|
- small
|
||||||
|
- capability-specific
|
||||||
|
- owned by core
|
||||||
|
- reusable by multiple plugins
|
||||||
|
- consumable by channels/features without vendor knowledge
|
||||||
|
|
||||||
|
Bad plugin contracts are:
|
||||||
|
|
||||||
|
- vendor-specific policy hidden in core
|
||||||
|
- one-off plugin escape hatches that bypass the registry
|
||||||
|
- channel code reaching straight into a vendor implementation
|
||||||
|
- ad hoc runtime objects that are not part of `OpenClawPluginApi` or
|
||||||
|
`api.runtime`
|
||||||
|
|
||||||
|
When in doubt, raise the abstraction level: define the capability first, then
|
||||||
|
let plugins plug into it.
|
||||||
|
|
||||||
## Provider runtime hooks
|
## Provider runtime hooks
|
||||||
|
|
||||||
Provider plugins now have two layers:
|
Provider plugins now have two layers:
|
||||||
@ -530,9 +660,36 @@ const result = await api.runtime.tts.textToSpeechTelephony({
|
|||||||
|
|
||||||
Notes:
|
Notes:
|
||||||
|
|
||||||
- Uses core `messages.tts` configuration (OpenAI or ElevenLabs).
|
- Uses core `messages.tts` configuration and provider selection.
|
||||||
- Returns PCM audio buffer + sample rate. Plugins must resample/encode for providers.
|
- Returns PCM audio buffer + sample rate. Plugins must resample/encode for providers.
|
||||||
- Edge TTS is not supported for telephony.
|
- OpenAI and ElevenLabs support telephony today. Microsoft does not.
|
||||||
|
|
||||||
|
Plugins can also register speech providers via `api.registerSpeechProvider(...)`.
|
||||||
|
|
||||||
|
```ts
|
||||||
|
api.registerSpeechProvider({
|
||||||
|
id: "acme-speech",
|
||||||
|
label: "Acme Speech",
|
||||||
|
isConfigured: ({ config }) => Boolean(config.messages?.tts),
|
||||||
|
synthesize: async (req) => {
|
||||||
|
return {
|
||||||
|
audioBuffer: Buffer.from([]),
|
||||||
|
outputFormat: "mp3",
|
||||||
|
fileExtension: ".mp3",
|
||||||
|
voiceCompatible: false,
|
||||||
|
};
|
||||||
|
},
|
||||||
|
});
|
||||||
|
```
|
||||||
|
|
||||||
|
Notes:
|
||||||
|
|
||||||
|
- Keep TTS policy, fallback, and reply delivery in core.
|
||||||
|
- Use speech providers for vendor-owned synthesis behavior.
|
||||||
|
- Legacy Microsoft `edge` input is normalized to the `microsoft` provider id.
|
||||||
|
- The preferred ownership model is company-oriented: one vendor plugin can own
|
||||||
|
text, speech, image, and future media providers as OpenClaw adds those
|
||||||
|
capability contracts.
|
||||||
|
|
||||||
For STT/transcription, plugins can call:
|
For STT/transcription, plugins can call:
|
||||||
|
|
||||||
@ -1110,12 +1267,49 @@ Plugins export either:
|
|||||||
- `on(...)` for typed lifecycle hooks
|
- `on(...)` for typed lifecycle hooks
|
||||||
- `registerChannel`
|
- `registerChannel`
|
||||||
- `registerProvider`
|
- `registerProvider`
|
||||||
|
- `registerSpeechProvider`
|
||||||
|
- `registerWebSearchProvider`
|
||||||
- `registerHttpRoute`
|
- `registerHttpRoute`
|
||||||
- `registerCommand`
|
- `registerCommand`
|
||||||
- `registerCli`
|
- `registerCli`
|
||||||
- `registerContextEngine`
|
- `registerContextEngine`
|
||||||
- `registerService`
|
- `registerService`
|
||||||
|
|
||||||
|
In practice, `register(api)` is also where a plugin declares **ownership**.
|
||||||
|
That ownership should map cleanly to either:
|
||||||
|
|
||||||
|
- a vendor surface such as OpenAI, ElevenLabs, or Microsoft
|
||||||
|
- a feature surface such as Voice Call
|
||||||
|
|
||||||
|
Avoid splitting one vendor's capabilities across unrelated plugins unless there
|
||||||
|
is a strong product reason to do so. The default should be one plugin per
|
||||||
|
vendor/feature, with core capability contracts separating shared orchestration
|
||||||
|
from vendor-specific behavior.
|
||||||
|
|
||||||
|
## Adding a new capability
|
||||||
|
|
||||||
|
When a plugin needs behavior that does not fit the current API, do not bypass
|
||||||
|
the plugin system with a private reach-in. Add the missing capability.
|
||||||
|
|
||||||
|
Recommended sequence:
|
||||||
|
|
||||||
|
1. define the core contract
|
||||||
|
Decide what shared behavior core should own: policy, fallback, config merge,
|
||||||
|
lifecycle, channel-facing semantics, and runtime helper shape.
|
||||||
|
2. add typed plugin registration/runtime surfaces
|
||||||
|
Extend `OpenClawPluginApi` and/or `api.runtime` with the smallest useful
|
||||||
|
typed seam.
|
||||||
|
3. wire core + channel/feature consumers
|
||||||
|
Channels and feature plugins should consume the new capability through core,
|
||||||
|
not by importing a vendor implementation directly.
|
||||||
|
4. register vendor implementations
|
||||||
|
Vendor plugins then register their backends against the capability.
|
||||||
|
5. add contract coverage
|
||||||
|
Add tests so ownership and registration shape stay explicit over time.
|
||||||
|
|
||||||
|
This is how OpenClaw stays opinionated without becoming hardcoded to one
|
||||||
|
provider's worldview.
|
||||||
|
|
||||||
Context engine plugins can also register a runtime-owned context manager:
|
Context engine plugins can also register a runtime-owned context manager:
|
||||||
|
|
||||||
```ts
|
```ts
|
||||||
|
|||||||
75
docs/tts.md
75
docs/tts.md
@ -9,26 +9,27 @@ title: "Text-to-Speech"
|
|||||||
|
|
||||||
# Text-to-speech (TTS)
|
# Text-to-speech (TTS)
|
||||||
|
|
||||||
OpenClaw can convert outbound replies into audio using ElevenLabs, OpenAI, or Edge TTS.
|
OpenClaw can convert outbound replies into audio using ElevenLabs, Microsoft, or OpenAI.
|
||||||
It works anywhere OpenClaw can send audio; Telegram gets a round voice-note bubble.
|
It works anywhere OpenClaw can send audio; Telegram gets a round voice-note bubble.
|
||||||
|
|
||||||
## Supported services
|
## Supported services
|
||||||
|
|
||||||
- **ElevenLabs** (primary or fallback provider)
|
- **ElevenLabs** (primary or fallback provider)
|
||||||
|
- **Microsoft** (primary or fallback provider; current bundled implementation uses `node-edge-tts`, default when no API keys)
|
||||||
- **OpenAI** (primary or fallback provider; also used for summaries)
|
- **OpenAI** (primary or fallback provider; also used for summaries)
|
||||||
- **Edge TTS** (primary or fallback provider; uses `node-edge-tts`, default when no API keys)
|
|
||||||
|
|
||||||
### Edge TTS notes
|
### Microsoft speech notes
|
||||||
|
|
||||||
Edge TTS uses Microsoft Edge's online neural TTS service via the `node-edge-tts`
|
The bundled Microsoft speech provider currently uses Microsoft Edge's online
|
||||||
library. It's a hosted service (not local), uses Microsoft’s endpoints, and does
|
neural TTS service via the `node-edge-tts` library. It's a hosted service (not
|
||||||
not require an API key. `node-edge-tts` exposes speech configuration options and
|
local), uses Microsoft endpoints, and does not require an API key.
|
||||||
output formats, but not all options are supported by the Edge service. citeturn2search0
|
`node-edge-tts` exposes speech configuration options and output formats, but
|
||||||
|
not all options are supported by the service. Legacy config and directive input
|
||||||
|
using `edge` still works and is normalized to `microsoft`.
|
||||||
|
|
||||||
Because Edge TTS is a public web service without a published SLA or quota, treat it
|
Because this path is a public web service without a published SLA or quota,
|
||||||
as best-effort. If you need guaranteed limits and support, use OpenAI or ElevenLabs.
|
treat it as best-effort. If you need guaranteed limits and support, use OpenAI
|
||||||
Microsoft's Speech REST API documents a 10‑minute audio limit per request; Edge TTS
|
or ElevenLabs.
|
||||||
does not publish limits, so assume similar or lower limits. citeturn0search3
|
|
||||||
|
|
||||||
## Optional keys
|
## Optional keys
|
||||||
|
|
||||||
@ -37,8 +38,9 @@ If you want OpenAI or ElevenLabs:
|
|||||||
- `ELEVENLABS_API_KEY` (or `XI_API_KEY`)
|
- `ELEVENLABS_API_KEY` (or `XI_API_KEY`)
|
||||||
- `OPENAI_API_KEY`
|
- `OPENAI_API_KEY`
|
||||||
|
|
||||||
Edge TTS does **not** require an API key. If no API keys are found, OpenClaw defaults
|
Microsoft speech does **not** require an API key. If no API keys are found,
|
||||||
to Edge TTS (unless disabled via `messages.tts.edge.enabled=false`).
|
OpenClaw defaults to Microsoft (unless disabled via
|
||||||
|
`messages.tts.microsoft.enabled=false` or `messages.tts.edge.enabled=false`).
|
||||||
|
|
||||||
If multiple providers are configured, the selected provider is used first and the others are fallback options.
|
If multiple providers are configured, the selected provider is used first and the others are fallback options.
|
||||||
Auto-summary uses the configured `summaryModel` (or `agents.defaults.model.primary`),
|
Auto-summary uses the configured `summaryModel` (or `agents.defaults.model.primary`),
|
||||||
@ -58,7 +60,7 @@ so that provider must also be authenticated if you enable summaries.
|
|||||||
No. Auto‑TTS is **off** by default. Enable it in config with
|
No. Auto‑TTS is **off** by default. Enable it in config with
|
||||||
`messages.tts.auto` or per session with `/tts always` (alias: `/tts on`).
|
`messages.tts.auto` or per session with `/tts always` (alias: `/tts on`).
|
||||||
|
|
||||||
Edge TTS **is** enabled by default once TTS is on, and is used automatically
|
Microsoft speech **is** enabled by default once TTS is on, and is used automatically
|
||||||
when no OpenAI or ElevenLabs API keys are available.
|
when no OpenAI or ElevenLabs API keys are available.
|
||||||
|
|
||||||
## Config
|
## Config
|
||||||
@ -118,15 +120,15 @@ Full schema is in [Gateway configuration](/gateway/configuration).
|
|||||||
}
|
}
|
||||||
```
|
```
|
||||||
|
|
||||||
### Edge TTS primary (no API key)
|
### Microsoft primary (no API key)
|
||||||
|
|
||||||
```json5
|
```json5
|
||||||
{
|
{
|
||||||
messages: {
|
messages: {
|
||||||
tts: {
|
tts: {
|
||||||
auto: "always",
|
auto: "always",
|
||||||
provider: "edge",
|
provider: "microsoft",
|
||||||
edge: {
|
microsoft: {
|
||||||
enabled: true,
|
enabled: true,
|
||||||
voice: "en-US-MichelleNeural",
|
voice: "en-US-MichelleNeural",
|
||||||
lang: "en-US",
|
lang: "en-US",
|
||||||
@ -139,13 +141,13 @@ Full schema is in [Gateway configuration](/gateway/configuration).
|
|||||||
}
|
}
|
||||||
```
|
```
|
||||||
|
|
||||||
### Disable Edge TTS
|
### Disable Microsoft speech
|
||||||
|
|
||||||
```json5
|
```json5
|
||||||
{
|
{
|
||||||
messages: {
|
messages: {
|
||||||
tts: {
|
tts: {
|
||||||
edge: {
|
microsoft: {
|
||||||
enabled: false,
|
enabled: false,
|
||||||
},
|
},
|
||||||
},
|
},
|
||||||
@ -205,9 +207,10 @@ Then run:
|
|||||||
- `tagged` only sends audio when the reply includes `[[tts]]` tags.
|
- `tagged` only sends audio when the reply includes `[[tts]]` tags.
|
||||||
- `enabled`: legacy toggle (doctor migrates this to `auto`).
|
- `enabled`: legacy toggle (doctor migrates this to `auto`).
|
||||||
- `mode`: `"final"` (default) or `"all"` (includes tool/block replies).
|
- `mode`: `"final"` (default) or `"all"` (includes tool/block replies).
|
||||||
- `provider`: `"elevenlabs"`, `"openai"`, or `"edge"` (fallback is automatic).
|
- `provider`: speech provider id such as `"elevenlabs"`, `"microsoft"`, or `"openai"` (fallback is automatic).
|
||||||
- If `provider` is **unset**, OpenClaw prefers `openai` (if key), then `elevenlabs` (if key),
|
- If `provider` is **unset**, OpenClaw prefers `openai` (if key), then `elevenlabs` (if key),
|
||||||
otherwise `edge`.
|
otherwise `microsoft`.
|
||||||
|
- Legacy `provider: "edge"` still works and is normalized to `microsoft`.
|
||||||
- `summaryModel`: optional cheap model for auto-summary; defaults to `agents.defaults.model.primary`.
|
- `summaryModel`: optional cheap model for auto-summary; defaults to `agents.defaults.model.primary`.
|
||||||
- Accepts `provider/model` or a configured model alias.
|
- Accepts `provider/model` or a configured model alias.
|
||||||
- `modelOverrides`: allow the model to emit TTS directives (on by default).
|
- `modelOverrides`: allow the model to emit TTS directives (on by default).
|
||||||
@ -227,15 +230,16 @@ Then run:
|
|||||||
- `elevenlabs.applyTextNormalization`: `auto|on|off`
|
- `elevenlabs.applyTextNormalization`: `auto|on|off`
|
||||||
- `elevenlabs.languageCode`: 2-letter ISO 639-1 (e.g. `en`, `de`)
|
- `elevenlabs.languageCode`: 2-letter ISO 639-1 (e.g. `en`, `de`)
|
||||||
- `elevenlabs.seed`: integer `0..4294967295` (best-effort determinism)
|
- `elevenlabs.seed`: integer `0..4294967295` (best-effort determinism)
|
||||||
- `edge.enabled`: allow Edge TTS usage (default `true`; no API key).
|
- `microsoft.enabled`: allow Microsoft speech usage (default `true`; no API key).
|
||||||
- `edge.voice`: Edge neural voice name (e.g. `en-US-MichelleNeural`).
|
- `microsoft.voice`: Microsoft neural voice name (e.g. `en-US-MichelleNeural`).
|
||||||
- `edge.lang`: language code (e.g. `en-US`).
|
- `microsoft.lang`: language code (e.g. `en-US`).
|
||||||
- `edge.outputFormat`: Edge output format (e.g. `audio-24khz-48kbitrate-mono-mp3`).
|
- `microsoft.outputFormat`: Microsoft output format (e.g. `audio-24khz-48kbitrate-mono-mp3`).
|
||||||
- See Microsoft Speech output formats for valid values; not all formats are supported by Edge.
|
- See Microsoft Speech output formats for valid values; not all formats are supported by the bundled Edge-backed transport.
|
||||||
- `edge.rate` / `edge.pitch` / `edge.volume`: percent strings (e.g. `+10%`, `-5%`).
|
- `microsoft.rate` / `microsoft.pitch` / `microsoft.volume`: percent strings (e.g. `+10%`, `-5%`).
|
||||||
- `edge.saveSubtitles`: write JSON subtitles alongside the audio file.
|
- `microsoft.saveSubtitles`: write JSON subtitles alongside the audio file.
|
||||||
- `edge.proxy`: proxy URL for Edge TTS requests.
|
- `microsoft.proxy`: proxy URL for Microsoft speech requests.
|
||||||
- `edge.timeoutMs`: request timeout override (ms).
|
- `microsoft.timeoutMs`: request timeout override (ms).
|
||||||
|
- `edge.*`: legacy alias for the same Microsoft settings.
|
||||||
|
|
||||||
## Model-driven overrides (default on)
|
## Model-driven overrides (default on)
|
||||||
|
|
||||||
@ -260,7 +264,7 @@ Here you go.
|
|||||||
|
|
||||||
Available directive keys (when enabled):
|
Available directive keys (when enabled):
|
||||||
|
|
||||||
- `provider` (`openai` | `elevenlabs` | `edge`, requires `allowProvider: true`)
|
- `provider` (registered speech provider id, for example `openai`, `elevenlabs`, or `microsoft`; requires `allowProvider: true`)
|
||||||
- `voice` (OpenAI voice) or `voiceId` (ElevenLabs)
|
- `voice` (OpenAI voice) or `voiceId` (ElevenLabs)
|
||||||
- `model` (OpenAI TTS model or ElevenLabs model id)
|
- `model` (OpenAI TTS model or ElevenLabs model id)
|
||||||
- `stability`, `similarityBoost`, `style`, `speed`, `useSpeakerBoost`
|
- `stability`, `similarityBoost`, `style`, `speed`, `useSpeakerBoost`
|
||||||
@ -319,13 +323,12 @@ These override `messages.tts.*` for that host.
|
|||||||
- 48kHz / 64kbps is a good voice-note tradeoff and required for the round bubble.
|
- 48kHz / 64kbps is a good voice-note tradeoff and required for the round bubble.
|
||||||
- **Other channels**: MP3 (`mp3_44100_128` from ElevenLabs, `mp3` from OpenAI).
|
- **Other channels**: MP3 (`mp3_44100_128` from ElevenLabs, `mp3` from OpenAI).
|
||||||
- 44.1kHz / 128kbps is the default balance for speech clarity.
|
- 44.1kHz / 128kbps is the default balance for speech clarity.
|
||||||
- **Edge TTS**: uses `edge.outputFormat` (default `audio-24khz-48kbitrate-mono-mp3`).
|
- **Microsoft**: uses `microsoft.outputFormat` (default `audio-24khz-48kbitrate-mono-mp3`).
|
||||||
- `node-edge-tts` accepts an `outputFormat`, but not all formats are available
|
- The bundled transport accepts an `outputFormat`, but not all formats are available from the service.
|
||||||
from the Edge service. citeturn2search0
|
- Output format values follow Microsoft Speech output formats (including Ogg/WebM Opus).
|
||||||
- Output format values follow Microsoft Speech output formats (including Ogg/WebM Opus). citeturn1search0
|
|
||||||
- Telegram `sendVoice` accepts OGG/MP3/M4A; use OpenAI/ElevenLabs if you need
|
- Telegram `sendVoice` accepts OGG/MP3/M4A; use OpenAI/ElevenLabs if you need
|
||||||
guaranteed Opus voice notes. citeturn1search1
|
guaranteed Opus voice notes. citeturn1search1
|
||||||
- If the configured Edge output format fails, OpenClaw retries with MP3.
|
- If the configured Microsoft output format fails, OpenClaw retries with MP3.
|
||||||
|
|
||||||
OpenAI/ElevenLabs formats are fixed; Telegram expects Opus for voice-note UX.
|
OpenAI/ElevenLabs formats are fixed; Telegram expects Opus for voice-note UX.
|
||||||
|
|
||||||
|
|||||||
@ -98,7 +98,7 @@ See the plugin docs for recommended ranges and production examples:
|
|||||||
|
|
||||||
## TTS for calls
|
## TTS for calls
|
||||||
|
|
||||||
Voice Call uses the core `messages.tts` configuration (OpenAI or ElevenLabs) for
|
Voice Call uses the core `messages.tts` configuration for
|
||||||
streaming speech on calls. Override examples and provider caveats live here:
|
streaming speech on calls. Override examples and provider caveats live here:
|
||||||
`https://docs.openclaw.ai/plugins/voice-call#tts-for-calls`
|
`https://docs.openclaw.ai/plugins/voice-call#tts-for-calls`
|
||||||
|
|
||||||
|
|||||||
Loading…
x
Reference in New Issue
Block a user