diff --git a/docs/.i18n/glossary.zh-CN.json b/docs/.i18n/glossary.zh-CN.json index 36e44b6d909..d1b1f3f3058 100644 --- a/docs/.i18n/glossary.zh-CN.json +++ b/docs/.i18n/glossary.zh-CN.json @@ -47,6 +47,10 @@ "source": "Quick Start", "target": "快速开始" }, + { + "source": "Capability Cookbook", + "target": "能力扩展手册" + }, { "source": "Setup Wizard Reference", "target": "设置向导参考" diff --git a/docs/concepts/models.md b/docs/concepts/models.md index f3b7797eedb..88cf928568e 100644 --- a/docs/concepts/models.md +++ b/docs/concepts/models.md @@ -26,6 +26,7 @@ Related: - `agents.defaults.models` is the allowlist/catalog of models OpenClaw can use (plus aliases). - `agents.defaults.imageModel` is used **only when** the primary model can’t accept images. +- `agents.defaults.imageGenerationModel` is used by the shared image-generation capability. - Per-agent defaults can override `agents.defaults.model` via `agents.list[].model` plus bindings (see [/concepts/multi-agent](/concepts/multi-agent)). ## Quick model policy @@ -49,6 +50,7 @@ subscription** (OAuth) and **Anthropic** (API key or `claude setup-token`). - `agents.defaults.model.primary` and `agents.defaults.model.fallbacks` - `agents.defaults.imageModel.primary` and `agents.defaults.imageModel.fallbacks` +- `agents.defaults.imageGenerationModel.primary` and `agents.defaults.imageGenerationModel.fallbacks` - `agents.defaults.models` (allowlist + aliases + provider params) - `models.providers` (custom providers written into `models.json`) diff --git a/docs/gateway/configuration-reference.md b/docs/gateway/configuration-reference.md index 235d4a18a7b..910b6db2b62 100644 --- a/docs/gateway/configuration-reference.md +++ b/docs/gateway/configuration-reference.md @@ -875,6 +875,9 @@ Time format in system prompt. Default: `auto` (OS preference). primary: "openrouter/qwen/qwen-2.5-vl-72b-instruct:free", fallbacks: ["openrouter/google/gemini-2.0-flash-vision:free"], }, + imageGenerationModel: { + primary: "openai/gpt-image-1", + }, pdfModel: { primary: "anthropic/claude-opus-4-6", fallbacks: ["openai/gpt-5-mini"], @@ -899,6 +902,8 @@ Time format in system prompt. Default: `auto` (OS preference). - `imageModel`: accepts either a string (`"provider/model"`) or an object (`{ primary, fallbacks }`). - Used by the `image` tool path as its vision-model config. - Also used as fallback routing when the selected/default model cannot accept image input. +- `imageGenerationModel`: accepts either a string (`"provider/model"`) or an object (`{ primary, fallbacks }`). + - Used by the shared image-generation capability and any future tool/plugin surface that generates images. - `pdfModel`: accepts either a string (`"provider/model"`) or an object (`{ primary, fallbacks }`). - Used by the `pdf` tool for model routing. - If omitted, the PDF tool falls back to `imageModel`, then to best-effort provider defaults. diff --git a/docs/tools/capability-cookbook.md b/docs/tools/capability-cookbook.md new file mode 100644 index 00000000000..345c7b1ebd6 --- /dev/null +++ b/docs/tools/capability-cookbook.md @@ -0,0 +1,112 @@ +--- +summary: "Cookbook for adding a new shared capability to OpenClaw" +read_when: + - Adding a new core capability and plugin seam + - Deciding whether code belongs in core, a vendor plugin, or a feature plugin + - Wiring a new runtime helper for channels or tools +title: "Capability Cookbook" +--- + +# Capability Cookbook + +Use this when OpenClaw needs a new domain such as image generation, video +generation, or some future vendor-backed feature area. + +The rule: + +- plugin = ownership boundary +- capability = shared core contract + +That means you should not start by wiring a vendor directly into a channel or a +tool. Start by defining the capability. + +## When to create a capability + +Create a new capability when all of these are true: + +1. more than one vendor could plausibly implement it +2. channels, tools, or feature plugins should consume it without caring about + the vendor +3. core needs to own fallback, policy, config, or delivery behavior + +If the work is vendor-only and no shared contract exists yet, stop and define +the contract first. + +## The standard sequence + +1. Define the typed core contract. +2. Add plugin registration for that contract. +3. Add a shared runtime helper. +4. Wire one real vendor plugin as proof. +5. Move feature/channel consumers onto the runtime helper. +6. Add contract tests. +7. Document the operator-facing config and ownership model. + +## What goes where + +Core: + +- request/response types +- provider registry + resolution +- fallback behavior +- config schema and labels/help +- runtime helper surface + +Vendor plugin: + +- vendor API calls +- vendor auth handling +- vendor-specific request normalization +- registration of the capability implementation + +Feature/channel plugin: + +- calls `api.runtime.*` or the matching `plugin-sdk/*-runtime` helper +- never calls a vendor implementation directly + +## File checklist + +For a new capability, expect to touch these areas: + +- `src//types.ts` +- `src//...registry/runtime.ts` +- `src/plugins/types.ts` +- `src/plugins/registry.ts` +- `src/plugins/captured-registration.ts` +- `src/plugins/contracts/registry.ts` +- `src/plugins/runtime/types-core.ts` +- `src/plugins/runtime/index.ts` +- `src/plugin-sdk/.ts` +- `src/plugin-sdk/-runtime.ts` +- one or more `extensions//...` +- config/docs/tests + +## Example: image generation + +Image generation follows the standard shape: + +1. core defines `ImageGenerationProvider` +2. core exposes `registerImageGenerationProvider(...)` +3. core exposes `runtime.imageGeneration.generate(...)` +4. the `openai` plugin registers an OpenAI-backed implementation +5. future vendors can register the same contract without changing channels/tools + +The config key is separate from vision-analysis routing: + +- `agents.defaults.imageModel` = analyze images +- `agents.defaults.imageGenerationModel` = generate images + +Keep those separate so fallback and policy remain explicit. + +## Review checklist + +Before shipping a new capability, verify: + +- no channel/tool imports vendor code directly +- the runtime helper is the shared path +- at least one contract test asserts bundled ownership +- config docs name the new model/config key +- plugin docs explain the ownership boundary + +If a PR skips the capability layer and hardcodes vendor behavior into a +channel/tool, send it back and define the contract first. diff --git a/docs/tools/plugin.md b/docs/tools/plugin.md index db074c011d9..77ff383feb6 100644 --- a/docs/tools/plugin.md +++ b/docs/tools/plugin.md @@ -113,7 +113,7 @@ That means: Examples: - the bundled `openai` plugin owns OpenAI model-provider behavior and OpenAI - speech + media-understanding behavior + speech + media-understanding + image-generation behavior - the bundled `elevenlabs` plugin owns ElevenLabs speech behavior - the bundled `microsoft` plugin owns Microsoft speech behavior - the bundled `google`, `minimax`, `mistral`, `moonshot`, and `zai` plugins own @@ -257,6 +257,9 @@ If OpenClaw adds a new domain later, such as video generation, use the same sequence again: define the core capability first, then let vendor plugins register implementations against it. +Need a concrete rollout checklist? See +[Capability Cookbook](/tools/capability-cookbook). + ## Compatible bundles OpenClaw also recognizes two compatible external bundle layouts: