diff --git a/docs/.internal/extension-host-migration/openclaw-capability-catalog-and-arbitration-spec.md b/docs/.internal/extension-host-migration/openclaw-capability-catalog-and-arbitration-spec.md index b5f1ba788fc..5323830a82a 100644 --- a/docs/.internal/extension-host-migration/openclaw-capability-catalog-and-arbitration-spec.md +++ b/docs/.internal/extension-host-migration/openclaw-capability-catalog-and-arbitration-spec.md @@ -502,6 +502,8 @@ The selected provider integration may also contribute: - token refresh behavior - model-selected hooks +It should not silently absorb unrelated subsystem runtimes such as embeddings, transcription, media understanding, or TTS. + ## Memory Arbitration Memory needs both backend arbitration and agent action arbitration. @@ -541,6 +543,26 @@ Selection rules: This is why `capability.runtime-backend` must be a first-class family. +The same model should be available for other subsystem runtimes discovered during migration: + +- embeddings +- audio transcription +- image understanding +- video understanding +- text-to-speech + +Selection rules for these subsystem runtimes should preserve the useful parts of provider-capability prototypes: + +- capability-based selection +- normalized provider ids +- explicit built-in fallback policy +- typed host-injected request envelopes + +Architecture rule: + +- keep those selection and envelope rules inside host-owned subsystem runtime registries or typed backend families +- do not widen provider-integration or legacy plugin-provider APIs into a universal surface for unrelated runtime subsystems + ## Catalog Publication The kernel should publish: @@ -584,6 +606,7 @@ Capability selection must emit structured events for: - channel capabilities from `extensions/discord/src/channel.ts:74`, `extensions/slack/src/channel.ts:107`, and `extensions/telegram/src/channel.ts:120` collapse into canonical messaging action families - diffs becomes an agent-visible tool family plus a host-managed route surface from `extensions/diffs/index.ts:27` - provider integration from `extensions/google-gemini-cli-auth/index.ts:24` becomes operator-visible setup and auth capabilities +- embedding, media-understanding, and TTS provider overrides should become runtime-internal subsystem registries rather than remaining part of a universal plugin-provider API - voice-call from `extensions/voice-call/index.ts:230` becomes a mix of agent-visible actions, runtime providers, and operator surfaces - ACP backend registration from `extensions/acpx/src/service.ts:55` becomes runtime-internal backend arbitration - context-engine registration becomes runtime-internal slot arbitration from `src/context-engine/registry.ts:60` @@ -599,6 +622,7 @@ Capability selection must emit structured events for: 6. Migrate the existing provider auth and setup selection path onto host-owned setup catalogs and canonical provider metadata. 7. Add provider selection logic for the broader messaging action family before migrating all channels. 8. Add runtime-backend and context-engine arbitration using the same rank and slot model where appropriate. -9. Ensure lightweight setup catalogs can be built from static descriptors alone. -10. Add a reviewed core registry for canonical action families and document how new ids are introduced. -11. Record catalog and arbitration parity for `thread-ownership` first and `telegram` second before broader rollout. +9. Add host-owned embedding, media-understanding, and TTS subsystem registries with explicit capability routing and built-in fallback policy. +10. Ensure lightweight setup catalogs can be built from static descriptors alone. +11. Add a reviewed core registry for canonical action families and document how new ids are introduced. +12. Record catalog and arbitration parity for `thread-ownership` first and `telegram` second before broader rollout. diff --git a/docs/.internal/extension-host-migration/openclaw-extension-host-implementation-guide.md b/docs/.internal/extension-host-migration/openclaw-extension-host-implementation-guide.md index d1415e1e5fd..ce10f07ce58 100644 --- a/docs/.internal/extension-host-migration/openclaw-extension-host-implementation-guide.md +++ b/docs/.internal/extension-host-migration/openclaw-extension-host-implementation-guide.md @@ -217,6 +217,7 @@ What is still missing for these phases: - broader lifecycle ownership beyond the loader state machine, service-lifecycle boundary, CLI-lifecycle boundary, session-owned activation state, and explicit discovery-policy, activation-policy, and finalization-policy outcomes, remaining policy gate ownership, and broad host-owned registries described for Phase 2 - minimal SDK compatibility work beyond preserving current behavior indirectly through existing loading - host-owned conversation binding, interaction routing, ingress claim, and generic interactive control surfaces identified by external-plugin validation +- host-owned subsystem runtime registries for embeddings, media understanding, and TTS identified by provider-capability evaluation - any pilot migration, event pipeline, canonical catalog, or arbitration implementation Recent plan refinement from external-plugin validation: @@ -225,6 +226,8 @@ Recent plan refinement from external-plugin validation: - it now explicitly treats interactive callback routing, namespace ownership, dedupe, and fallback behavior as first-class migration surfaces - it now explicitly treats inbound claim as a canonical ingress-stage concern rather than a permanent plugin-era hook shape - it now explicitly treats Telegram and Discord as the first validated rollout targets for interactive control surfaces while keeping the underlying contracts generic, host-owned, and kernel-agnostic +- it now explicitly treats embeddings, media understanding, and TTS as host-owned subsystem runtimes with capability routing, typed request envelopes, provider-id normalization, and fallback policy +- it now explicitly rejects widening the legacy `registerProvider(...)` or `ProviderPlugin` surface into a universal runtime API, even when harvesting useful capability-routing ideas from provider-capability prototypes ## Implementation Order diff --git a/docs/.internal/extension-host-migration/openclaw-extension-host-lifecycle-and-security-spec.md b/docs/.internal/extension-host-migration/openclaw-extension-host-lifecycle-and-security-spec.md index d169822b6af..f4523f6acba 100644 --- a/docs/.internal/extension-host-migration/openclaw-extension-host-lifecycle-and-security-spec.md +++ b/docs/.internal/extension-host-migration/openclaw-extension-host-lifecycle-and-security-spec.md @@ -136,6 +136,7 @@ What is still pending from this spec: - broader extension-host lifecycle ownership beyond the loader state machine, service-lifecycle boundary, CLI-lifecycle boundary, session-owned activation state, and explicit discovery-policy, activation-policy, and finalization-policy outcomes - activation pipeline ownership - host-owned registries for setup, CLI, routes, services, slots, and backends +- host-owned subsystem runtime registries for embeddings, media understanding, and TTS, including explicit fallback and override policy instead of plugin-era capability reads - permission-mode enforcement - per-extension state ownership and migration - provenance, reload, and hardening parity tracking @@ -733,9 +734,10 @@ The host must emit structured telemetry for: 4. Add a policy evaluator that understands advisory versus enforced permission modes. 5. Add host-owned credential and per-extension state boundaries for extension services. 6. Generalize backend registration into a host-managed `capability.runtime-backend` registry. -7. Add slot-backed provider management for context engines and other exclusive runtime providers. -8. Preserve provenance, origin precedence, and current workspace and bundled enablement rules in host policy. -9. Preserve prompt-mutation policy gates and add explicit state migration handling. -10. Add explicit host registries and typed contracts for extension-owned hooks, channels, providers, tools, commands, CLI, setup flows, config surfaces, and status surfaces. -11. Preserve config redaction-aware schema behavior and current reload or gateway feature contracts during migration. -12. Record lifecycle parity for `thread-ownership` first and `telegram` second before broadening the compatibility bridges. +7. Add host-owned subsystem runtime registries for embeddings, media understanding, and TTS instead of widening `registerProvider(...)`. +8. Add slot-backed provider management for context engines and other exclusive runtime providers. +9. Preserve provenance, origin precedence, and current workspace and bundled enablement rules in host policy. +10. Preserve prompt-mutation policy gates and add explicit state migration handling. +11. Add explicit host registries and typed contracts for extension-owned hooks, channels, providers, tools, commands, CLI, setup flows, config surfaces, and status surfaces. +12. Preserve config redaction-aware schema behavior and current reload or gateway feature contracts during migration. +13. Record lifecycle parity for `thread-ownership` first and `telegram` second before broadening the compatibility bridges. diff --git a/docs/.internal/extension-host-migration/openclaw-kernel-extension-host-transition-plan.md b/docs/.internal/extension-host-migration/openclaw-kernel-extension-host-transition-plan.md index 964569d480d..89800b69018 100644 --- a/docs/.internal/extension-host-migration/openclaw-kernel-extension-host-transition-plan.md +++ b/docs/.internal/extension-host-migration/openclaw-kernel-extension-host-transition-plan.md @@ -548,6 +548,7 @@ Suggested mapping: - `registerChannel(...)` -> `adapter.runtime` plus lightweight dock metadata and optional `surface.config`, `surface.status`, `surface.setup` - `registerProvider(...)` -> `capability.provider-integration` plus optional setup and auth surfaces +- plugin-provided embeddings, transcription, image or video understanding, and TTS -> typed subsystem runtime contributions or `capability.runtime-backend`, not a widened `registerProvider(...)` end state - `registerTool(...)` -> `capability.agent-tool` - `registerCommand(...)` -> `capability.control-command` - `on(...)` returning context or side effects -> `capability.context-augmenter` or `capability.event-handler` @@ -564,6 +565,9 @@ Suggested mapping: Concrete examples: - `extensions/google-gemini-cli-auth/index.ts:25` becomes `capability.provider-integration` +- plugin-provided embeddings become a host-owned embedding runtime contribution +- plugin-provided transcription, image understanding, and video understanding become host-owned media runtime contributions +- plugin-provided TTS becomes a host-owned TTS runtime contribution - `extensions/diffs/index.ts:27` becomes `capability.agent-tool` - `extensions/diffs/index.ts:28` becomes a host-managed route or interaction surface - `extensions/diffs/index.ts:38` becomes `capability.context-augmenter` @@ -823,6 +827,25 @@ Provider integration contributions need host-injected capabilities for: - token refresh or credential renewal - model-selected lifecycle hooks +Scope rule: + +- `capability.provider-integration` is for chat or model-provider discovery, setup, auth, and post-selection lifecycle +- embeddings, transcription, image understanding, video understanding, and TTS should not be folded into that family just because they also use remote providers +- those subsystem runtimes should use host-owned capability routing and typed runtime registries or runtime-backend families instead + +Useful ideas harvested from provider-capability validation: + +- capability-based selection is good +- typed request envelopes with host-injected `apiKey`, `baseUrl`, `headers`, `timeoutMs`, and `fetchFn` are good +- provider-id normalization is good +- graceful built-in fallback is good + +Architecture rule: + +- harvest those behaviors into host-owned subsystem runtime contracts +- do not widen legacy `registerProvider(...)` into a universal plugin API for unrelated runtime subsystems +- do not make `src/plugins/runtime.ts` capability filters or global active-registry reads the long-term selection surface for embeddings, media understanding, or TTS + Example: - `extensions/google-gemini-cli-auth/index.ts:25`