docs(plugins): add capability cookbook

This commit is contained in:
Peter Steinberger 2026-03-16 22:56:03 -07:00
parent cc88b4a72d
commit c79ade10e6
No known key found for this signature in database
5 changed files with 127 additions and 1 deletions

View File

@ -47,6 +47,10 @@
"source": "Quick Start",
"target": "快速开始"
},
{
"source": "Capability Cookbook",
"target": "能力扩展手册"
},
{
"source": "Setup Wizard Reference",
"target": "设置向导参考"

View File

@ -26,6 +26,7 @@ Related:
- `agents.defaults.models` is the allowlist/catalog of models OpenClaw can use (plus aliases).
- `agents.defaults.imageModel` is used **only when** the primary model cant accept images.
- `agents.defaults.imageGenerationModel` is used by the shared image-generation capability.
- Per-agent defaults can override `agents.defaults.model` via `agents.list[].model` plus bindings (see [/concepts/multi-agent](/concepts/multi-agent)).
## Quick model policy
@ -49,6 +50,7 @@ subscription** (OAuth) and **Anthropic** (API key or `claude setup-token`).
- `agents.defaults.model.primary` and `agents.defaults.model.fallbacks`
- `agents.defaults.imageModel.primary` and `agents.defaults.imageModel.fallbacks`
- `agents.defaults.imageGenerationModel.primary` and `agents.defaults.imageGenerationModel.fallbacks`
- `agents.defaults.models` (allowlist + aliases + provider params)
- `models.providers` (custom providers written into `models.json`)

View File

@ -875,6 +875,9 @@ Time format in system prompt. Default: `auto` (OS preference).
primary: "openrouter/qwen/qwen-2.5-vl-72b-instruct:free",
fallbacks: ["openrouter/google/gemini-2.0-flash-vision:free"],
},
imageGenerationModel: {
primary: "openai/gpt-image-1",
},
pdfModel: {
primary: "anthropic/claude-opus-4-6",
fallbacks: ["openai/gpt-5-mini"],
@ -899,6 +902,8 @@ Time format in system prompt. Default: `auto` (OS preference).
- `imageModel`: accepts either a string (`"provider/model"`) or an object (`{ primary, fallbacks }`).
- Used by the `image` tool path as its vision-model config.
- Also used as fallback routing when the selected/default model cannot accept image input.
- `imageGenerationModel`: accepts either a string (`"provider/model"`) or an object (`{ primary, fallbacks }`).
- Used by the shared image-generation capability and any future tool/plugin surface that generates images.
- `pdfModel`: accepts either a string (`"provider/model"`) or an object (`{ primary, fallbacks }`).
- Used by the `pdf` tool for model routing.
- If omitted, the PDF tool falls back to `imageModel`, then to best-effort provider defaults.

View File

@ -0,0 +1,112 @@
---
summary: "Cookbook for adding a new shared capability to OpenClaw"
read_when:
- Adding a new core capability and plugin seam
- Deciding whether code belongs in core, a vendor plugin, or a feature plugin
- Wiring a new runtime helper for channels or tools
title: "Capability Cookbook"
---
# Capability Cookbook
Use this when OpenClaw needs a new domain such as image generation, video
generation, or some future vendor-backed feature area.
The rule:
- plugin = ownership boundary
- capability = shared core contract
That means you should not start by wiring a vendor directly into a channel or a
tool. Start by defining the capability.
## When to create a capability
Create a new capability when all of these are true:
1. more than one vendor could plausibly implement it
2. channels, tools, or feature plugins should consume it without caring about
the vendor
3. core needs to own fallback, policy, config, or delivery behavior
If the work is vendor-only and no shared contract exists yet, stop and define
the contract first.
## The standard sequence
1. Define the typed core contract.
2. Add plugin registration for that contract.
3. Add a shared runtime helper.
4. Wire one real vendor plugin as proof.
5. Move feature/channel consumers onto the runtime helper.
6. Add contract tests.
7. Document the operator-facing config and ownership model.
## What goes where
Core:
- request/response types
- provider registry + resolution
- fallback behavior
- config schema and labels/help
- runtime helper surface
Vendor plugin:
- vendor API calls
- vendor auth handling
- vendor-specific request normalization
- registration of the capability implementation
Feature/channel plugin:
- calls `api.runtime.*` or the matching `plugin-sdk/*-runtime` helper
- never calls a vendor implementation directly
## File checklist
For a new capability, expect to touch these areas:
- `src/<capability>/types.ts`
- `src/<capability>/...registry/runtime.ts`
- `src/plugins/types.ts`
- `src/plugins/registry.ts`
- `src/plugins/captured-registration.ts`
- `src/plugins/contracts/registry.ts`
- `src/plugins/runtime/types-core.ts`
- `src/plugins/runtime/index.ts`
- `src/plugin-sdk/<capability>.ts`
- `src/plugin-sdk/<capability>-runtime.ts`
- one or more `extensions/<vendor>/...`
- config/docs/tests
## Example: image generation
Image generation follows the standard shape:
1. core defines `ImageGenerationProvider`
2. core exposes `registerImageGenerationProvider(...)`
3. core exposes `runtime.imageGeneration.generate(...)`
4. the `openai` plugin registers an OpenAI-backed implementation
5. future vendors can register the same contract without changing channels/tools
The config key is separate from vision-analysis routing:
- `agents.defaults.imageModel` = analyze images
- `agents.defaults.imageGenerationModel` = generate images
Keep those separate so fallback and policy remain explicit.
## Review checklist
Before shipping a new capability, verify:
- no channel/tool imports vendor code directly
- the runtime helper is the shared path
- at least one contract test asserts bundled ownership
- config docs name the new model/config key
- plugin docs explain the ownership boundary
If a PR skips the capability layer and hardcodes vendor behavior into a
channel/tool, send it back and define the contract first.

View File

@ -113,7 +113,7 @@ That means:
Examples:
- the bundled `openai` plugin owns OpenAI model-provider behavior and OpenAI
speech + media-understanding behavior
speech + media-understanding + image-generation behavior
- the bundled `elevenlabs` plugin owns ElevenLabs speech behavior
- the bundled `microsoft` plugin owns Microsoft speech behavior
- the bundled `google`, `minimax`, `mistral`, `moonshot`, and `zai` plugins own
@ -257,6 +257,9 @@ If OpenClaw adds a new domain later, such as video generation, use the same
sequence again: define the core capability first, then let vendor plugins
register implementations against it.
Need a concrete rollout checklist? See
[Capability Cookbook](/tools/capability-cookbook).
## Compatible bundles
OpenClaw also recognizes two compatible external bundle layouts: