From c05cfccc176779cccb6783db03c46d66c0c53b15 Mon Sep 17 00:00:00 2001
From: Peter Steinberger <steipete@gmail.com>
Date: Sun, 15 Mar 2026 16:57:32 -0700
Subject: [PATCH] docs(plugins): document provider runtime usage hooks

---
 docs/concepts/model-providers.md | 23 ++++++++----
 docs/tools/plugin.md             | 61 +++++++++++++++++++++++++-------
 2 files changed, 65 insertions(+), 19 deletions(-)

diff --git a/docs/concepts/model-providers.md b/docs/concepts/model-providers.md
index a56b8f76284..7a5ef04ab11 100644
--- a/docs/concepts/model-providers.md
+++ b/docs/concepts/model-providers.md
@@ -22,7 +22,8 @@ For model selection rules, see [/concepts/models](/concepts/models).
 - Provider plugins can also own provider runtime behavior via
   `resolveDynamicModel`, `prepareDynamicModel`, `normalizeResolvedModel`,
   `capabilities`, `prepareExtraParams`, `wrapStreamFn`,
-  `isCacheTtlEligible`, and `prepareRuntimeAuth`.
+  `isCacheTtlEligible`, `prepareRuntimeAuth`, `resolveUsageAuth`, and
+  `fetchUsageSnapshot`.
 
 ## Plugin-owned provider behavior
 
@@ -43,22 +44,32 @@ Typical split:
 - `isCacheTtlEligible`: provider decides which upstream model ids support prompt-cache TTL
 - `prepareRuntimeAuth`: provider turns a configured credential into a short
   lived runtime token
+- `resolveUsageAuth`: provider resolves usage/quota credentials for `/usage`
+  and related status/reporting surfaces
+- `fetchUsageSnapshot`: provider owns the usage endpoint fetch/parsing while
+  core still owns the summary shell and formatting
 
 Current bundled examples:
 
 - `openrouter`: pass-through model ids, request wrappers, provider capability
   hints, and cache-TTL policy
 - `github-copilot`: forward-compat model fallback, Claude-thinking transcript
-  hints, and runtime token exchange
+  hints, runtime token exchange, and usage endpoint fetching
 - `openai-codex`: forward-compat model fallback, transport normalization, and
-  default transport params
+  default transport params plus usage endpoint fetching
+- `google-gemini-cli`: Gemini 3.1 forward-compat fallback plus usage-token
+  parsing and quota endpoint fetching for usage surfaces
 - `moonshot`: shared transport, plugin-owned thinking payload normalization
 - `kilocode`: shared transport, plugin-owned request headers, reasoning payload
   normalization, Gemini transcript hints, and cache-TTL policy
+- `zai`: GLM-5 forward-compat fallback, `tool_stream` defaults, cache-TTL
+  policy, and usage auth + quota fetching
+- `mistral`, `opencode`, and `opencode-go`: plugin-owned capability metadata
 - `byteplus`, `cloudflare-ai-gateway`, `huggingface`, `kimi-coding`,
-  `minimax`, `minimax-portal`, `modelstudio`, `nvidia`, `qianfan`,
-  `qwen-portal`, `synthetic`, `together`, `venice`, `vercel-ai-gateway`,
-  `volcengine`, and `xiaomi`: plugin-owned catalogs only
+  `minimax-portal`, `modelstudio`, `nvidia`, `qianfan`, `qwen-portal`,
+  `synthetic`, `together`, `venice`, `vercel-ai-gateway`, and `volcengine`:
+  plugin-owned catalogs only
+- `minimax` and `xiaomi`: plugin-owned catalogs plus usage auth/snapshot logic
 
 That covers providers that still fit OpenClaw's normal transports. A provider
 that needs a totally custom request executor is a separate, deeper extension
diff --git a/docs/tools/plugin.md b/docs/tools/plugin.md
index de162c2ab42..983c69f0a12 100644
--- a/docs/tools/plugin.md
+++ b/docs/tools/plugin.md
@@ -172,12 +172,15 @@ Important trust note:
 - Hugging Face provider catalog — bundled as `huggingface` (enabled by default)
 - Kilo Gateway provider runtime — bundled as `kilocode` (enabled by default)
 - Kimi Coding provider catalog — bundled as `kimi-coding` (enabled by default)
-- MiniMax provider catalog — bundled as `minimax` (enabled by default)
+- MiniMax provider catalog + usage — bundled as `minimax` (enabled by default)
 - MiniMax OAuth (provider auth + catalog) — bundled as `minimax-portal-auth` (enabled by default)
+- Mistral provider capabilities — bundled as `mistral` (enabled by default)
 - Model Studio provider catalog — bundled as `modelstudio` (enabled by default)
 - Moonshot provider runtime — bundled as `moonshot` (enabled by default)
 - NVIDIA provider catalog — bundled as `nvidia` (enabled by default)
 - OpenAI Codex provider runtime — bundled as `openai-codex` (enabled by default)
+- OpenCode Go provider capabilities — bundled as `opencode-go` (enabled by default)
+- OpenCode Zen provider capabilities — bundled as `opencode` (enabled by default)
 - OpenRouter provider runtime — bundled as `openrouter` (enabled by default)
 - Qianfan provider catalog — bundled as `qianfan` (enabled by default)
 - Qwen OAuth (provider auth + catalog) — bundled as `qwen-portal-auth` (enabled by default)
@@ -186,7 +189,8 @@ Important trust note:
 - Venice provider catalog — bundled as `venice` (enabled by default)
 - Vercel AI Gateway provider catalog — bundled as `vercel-ai-gateway` (enabled by default)
 - Volcengine provider catalog — bundled as `volcengine` (enabled by default)
-- Xiaomi provider catalog — bundled as `xiaomi` (enabled by default)
+- Xiaomi provider catalog + usage — bundled as `xiaomi` (enabled by default)
+- Z.AI provider runtime — bundled as `zai` (enabled by default)
 - Copilot Proxy (provider auth) — local VS Code Copilot Proxy bridge; distinct from built-in `github-copilot` device login (bundled, disabled by default)
 
 Native OpenClaw plugins are **TypeScript modules** loaded at runtime via jiti.
@@ -202,7 +206,7 @@ Native OpenClaw plugins can register:
 - Background services
 - Context engines
 - Provider auth flows and model catalogs
-- Provider runtime hooks for dynamic model ids, transport normalization, capability metadata, stream wrapping, cache TTL policy, and runtime auth exchange
+- Provider runtime hooks for dynamic model ids, transport normalization, capability metadata, stream wrapping, cache TTL policy, runtime auth exchange, and usage/billing auth + snapshot resolution
 - Optional config validation
 - **Skills** (by listing `skills` directories in the plugin manifest)
 - **Auto-reply commands** (execute without invoking the AI agent)
@@ -215,7 +219,7 @@ Tool authoring guide: [Plugin agent tools](/plugins/agent-tools).
 Provider plugins now have two layers:
 
 - config-time hooks: `catalog` / legacy `discovery`
-- runtime hooks: `resolveDynamicModel`, `prepareDynamicModel`, `normalizeResolvedModel`, `capabilities`, `prepareExtraParams`, `wrapStreamFn`, `isCacheTtlEligible`, `prepareRuntimeAuth`
+- runtime hooks: `resolveDynamicModel`, `prepareDynamicModel`, `normalizeResolvedModel`, `capabilities`, `prepareExtraParams`, `wrapStreamFn`, `isCacheTtlEligible`, `prepareRuntimeAuth`, `resolveUsageAuth`, `fetchUsageSnapshot`
 
 OpenClaw still owns the generic agent loop, failover, transcript handling, and
 tool policy. These hooks are the seam for provider-specific behavior without
@@ -249,6 +253,12 @@ For model/provider plugins, OpenClaw uses hooks in this rough order:
 10. `prepareRuntimeAuth`
     Exchanges a configured credential into the actual runtime token/key just
     before inference.
+11. `resolveUsageAuth`
+    Resolves usage/billing credentials for `/usage` and related status
+    surfaces.
+12. `fetchUsageSnapshot`
+    Fetches and normalizes provider-specific usage/quota snapshots after auth
+    is resolved.
 
 ### Which hook to use
 
@@ -261,6 +271,8 @@ For model/provider plugins, OpenClaw uses hooks in this rough order:
 - `wrapStreamFn`: add provider-specific headers/payload/model compat patches while still using the normal `pi-ai` execution path
 - `isCacheTtlEligible`: decide whether provider/model pairs should use cache TTL metadata
 - `prepareRuntimeAuth`: exchange a configured credential into the actual short-lived runtime token/key used for requests
+- `resolveUsageAuth`: resolve provider-owned credentials for usage/billing endpoints without hardcoding token parsing in core
+- `fetchUsageSnapshot`: own provider-specific usage endpoint fetch/parsing while core keeps summary fan-out and formatting
 
 Rule of thumb:
 
@@ -273,12 +285,14 @@ Rule of thumb:
 - provider needs request headers/body/model compat wrappers without a custom transport: use `wrapStreamFn`
 - provider needs proxy-specific cache TTL gating: use `isCacheTtlEligible`
 - provider needs a token exchange or short-lived request credential: use `prepareRuntimeAuth`
+- provider needs custom usage/quota token parsing or a different usage credential: use `resolveUsageAuth`
+- provider needs a provider-specific usage endpoint or payload parser: use `fetchUsageSnapshot`
 
 If the provider needs a fully custom wire protocol or custom request executor,
 that is a different class of extension. These hooks are for provider behavior
 that still runs on OpenClaw's normal inference loop.
 
-### Example
+### Provider Example
 
 ```ts
 api.registerProvider({
@@ -322,6 +336,13 @@ api.registerProvider({
       expiresAt: exchanged.expiresAt,
     };
   },
+  resolveUsageAuth: async (ctx) => {
+    const auth = await ctx.resolveOAuthToken();
+    return auth ? { token: auth.token } : null;
+  },
+  fetchUsageSnapshot: async (ctx) => {
+    return await fetchExampleProxyUsage(ctx.token, ctx.timeoutMs, ctx.fetchFn);
+  },
 });
 ```
 
@@ -331,12 +352,17 @@ api.registerProvider({
   `prepareDynamicModel` because the provider is pass-through and may expose new
   model ids before OpenClaw's static catalog updates.
 - GitHub Copilot uses `catalog`, `resolveDynamicModel`, and
-  `capabilities` plus `prepareRuntimeAuth` because it needs model fallback
-  behavior, Claude transcript quirks, and a GitHub token -> Copilot token exchange.
+  `capabilities` plus `prepareRuntimeAuth` and `fetchUsageSnapshot` because it
+  needs model fallback behavior, Claude transcript quirks, a GitHub token ->
+  Copilot token exchange, and a provider-owned usage endpoint.
 - OpenAI Codex uses `catalog`, `resolveDynamicModel`, and
-  `normalizeResolvedModel` plus `prepareExtraParams` because it still runs on
-  core OpenAI transports but owns its transport/base URL normalization and
-  default transport choice.
+  `normalizeResolvedModel` plus `prepareExtraParams`, `resolveUsageAuth`, and
+  `fetchUsageSnapshot` because it still runs on core OpenAI transports but owns
+  its transport/base URL normalization, default transport choice, and ChatGPT
+  usage endpoint integration.
+- Gemini CLI OAuth uses `resolveDynamicModel`, `resolveUsageAuth`, and
+  `fetchUsageSnapshot` because it owns Gemini 3.1 forward-compat fallback plus
+  the token parsing and quota endpoint wiring needed by `/usage`.
 - OpenRouter uses `capabilities`, `wrapStreamFn`, and `isCacheTtlEligible`
   to keep provider-specific request headers, routing metadata, reasoning
   patches, and prompt-cache policy out of core.
@@ -346,10 +372,19 @@ api.registerProvider({
   `isCacheTtlEligible` because it needs provider-owned request headers,
   reasoning payload normalization, Gemini transcript hints, and Anthropic
   cache-TTL gating.
+- Z.AI uses `resolveDynamicModel`, `prepareExtraParams`, `wrapStreamFn`,
+  `isCacheTtlEligible`, `resolveUsageAuth`, and `fetchUsageSnapshot` because it
+  owns GLM-5 fallback, `tool_stream` defaults, and both usage auth + quota
+  fetching.
+- Mistral, OpenCode Zen, and OpenCode Go use `capabilities` only to keep
+  transcript/tooling quirks out of core.
 - Catalog-only bundled providers such as `byteplus`, `cloudflare-ai-gateway`,
-  `huggingface`, `kimi-coding`, `minimax`, `minimax-portal`, `modelstudio`,
-  `nvidia`, `qianfan`, `qwen-portal`, `synthetic`, `together`, `venice`,
-  `vercel-ai-gateway`, `volcengine`, and `xiaomi` use `catalog` only.
+  `huggingface`, `kimi-coding`, `minimax-portal`, `modelstudio`, `nvidia`,
+  `qianfan`, `qwen-portal`, `synthetic`, `together`, `venice`,
+  `vercel-ai-gateway`, and `volcengine` use `catalog` only.
+- MiniMax and Xiaomi use `catalog` plus usage hooks because their `/usage`
+  behavior is plugin-owned even though inference still runs through the shared
+  transports.
 
 ## Load pipeline