diff --git a/docs/.internal/extension-host-migration/openclaw-capability-catalog-and-arbitration-spec.md b/docs/.internal/extension-host-migration/openclaw-capability-catalog-and-arbitration-spec.md index 5323830a82a..8b7cb47a147 100644 --- a/docs/.internal/extension-host-migration/openclaw-capability-catalog-and-arbitration-spec.md +++ b/docs/.internal/extension-host-migration/openclaw-capability-catalog-and-arbitration-spec.md @@ -503,6 +503,7 @@ The selected provider integration may also contribute: - model-selected hooks It should not silently absorb unrelated subsystem runtimes such as embeddings, transcription, media understanding, or TTS. +It should also not silently absorb agent-visible search surfaces, which belong in the agent-tool catalog even when they call remote search services. ## Memory Arbitration @@ -562,6 +563,7 @@ Architecture rule: - keep those selection and envelope rules inside host-owned subsystem runtime registries or typed backend families - do not widen provider-integration or legacy plugin-provider APIs into a universal surface for unrelated runtime subsystems +- if search is agent-visible, publish it through canonical tool catalogs; reserve runtime-backend modeling for search backends that are consumed internally by the host or another subsystem ## Catalog Publication @@ -607,6 +609,7 @@ Capability selection must emit structured events for: - diffs becomes an agent-visible tool family plus a host-managed route surface from `extensions/diffs/index.ts:27` - provider integration from `extensions/google-gemini-cli-auth/index.ts:24` becomes operator-visible setup and auth capabilities - embedding, media-understanding, and TTS provider overrides should become runtime-internal subsystem registries rather than remaining part of a universal plugin-provider API +- extension-backed web search should become an agent-visible tool family unless it is only an internal backend feeding another host-owned surface - voice-call from `extensions/voice-call/index.ts:230` becomes a mix of agent-visible actions, runtime providers, and operator surfaces - ACP backend registration from `extensions/acpx/src/service.ts:55` becomes runtime-internal backend arbitration - context-engine registration becomes runtime-internal slot arbitration from `src/context-engine/registry.ts:60` @@ -623,6 +626,7 @@ Capability selection must emit structured events for: 7. Add provider selection logic for the broader messaging action family before migrating all channels. 8. Add runtime-backend and context-engine arbitration using the same rank and slot model where appropriate. 9. Add host-owned embedding, media-understanding, and TTS subsystem registries with explicit capability routing and built-in fallback policy. -10. Ensure lightweight setup catalogs can be built from static descriptors alone. -11. Add a reviewed core registry for canonical action families and document how new ids are introduced. -12. Record catalog and arbitration parity for `thread-ownership` first and `telegram` second before broader rollout. +10. Decide whether extension-backed search needs only canonical tool publication or also a host-owned internal search-backend registry, and keep those two cases distinct. +11. Ensure lightweight setup catalogs can be built from static descriptors alone. +12. Add a reviewed core registry for canonical action families and document how new ids are introduced. +13. Record catalog and arbitration parity for `thread-ownership` first and `telegram` second before broader rollout. diff --git a/docs/.internal/extension-host-migration/openclaw-extension-contribution-schema-spec.md b/docs/.internal/extension-host-migration/openclaw-extension-contribution-schema-spec.md index 62c31a2fe34..579c3112f75 100644 --- a/docs/.internal/extension-host-migration/openclaw-extension-contribution-schema-spec.md +++ b/docs/.internal/extension-host-migration/openclaw-extension-contribution-schema-spec.md @@ -476,6 +476,8 @@ The lightweight dock contract should be specific enough to preserve current host Represents an agent-visible action. +This is the correct family for extension-backed search when the search surface is directly exposed to the agent, for example a canonical `web.search` or workspace-search action. + Required descriptor metadata: - canonical action id @@ -529,6 +531,13 @@ Required descriptor metadata: This family exists because today's provider plugin contract includes more than auth, as shown in `src/plugins/types.ts:158`. +Scope rule: + +- this family is specifically for chat or model-provider discovery, setup, auth, and post-selection lifecycle +- agent-visible search should not be folded into this family only because it may call remote providers under the hood +- non-chat subsystem providers such as embeddings, transcription, image understanding, video understanding, and TTS should not be folded into this family only because they also use remote providers +- those subsystem runtimes should use typed runtime contributions or `capability.runtime-backend` with subsystem-specific capability metadata + ### `capability.memory` Represents a memory store or memory query runtime. @@ -678,6 +687,38 @@ Required descriptor metadata: This family exists because not all runtime providers are user-facing adapters. +This family is also the right home for plugin-provided subsystem runtimes when the runtime is consumed by a host or subsystem rather than directly by the agent. + +Examples to support during migration: + +- embeddings +- audio transcription +- image understanding +- video understanding +- text-to-speech +- search backends only when they are runtime-internal and not directly exposed as agent tools + +Required metadata for these subsystem runtimes: + +- subsystem id such as `embedding`, `media.audio`, `media.image`, `media.video`, or `tts` +- supported capability list +- typed request envelope contract +- provider-id normalization rules +- fallback policy +- override policy when a built-in implementation already exists + +Useful harvested behavior: + +- capability-based routing is worth keeping +- typed host-injected request fields such as `apiKey`, `baseUrl`, `headers`, `timeoutMs`, and `fetchFn` are worth keeping +- graceful fallback to built-in implementations is worth keeping + +Important rule: + +- keep these as host-owned runtime registries or backend families +- do not widen `registerProvider(...)` into the permanent universal surface for every runtime subsystem +- if search is directly agent-visible, model it as `capability.agent-tool` instead of treating it as a generic provider family + ### Adapter-runtime helper contracts Some interactive and bound-conversation extensions need a bounded set of runtime helper contracts from the active adapter. diff --git a/docs/.internal/extension-host-migration/openclaw-extension-host-implementation-guide.md b/docs/.internal/extension-host-migration/openclaw-extension-host-implementation-guide.md index ce10f07ce58..8b67277f721 100644 --- a/docs/.internal/extension-host-migration/openclaw-extension-host-implementation-guide.md +++ b/docs/.internal/extension-host-migration/openclaw-extension-host-implementation-guide.md @@ -218,6 +218,7 @@ What is still missing for these phases: - minimal SDK compatibility work beyond preserving current behavior indirectly through existing loading - host-owned conversation binding, interaction routing, ingress claim, and generic interactive control surfaces identified by external-plugin validation - host-owned subsystem runtime registries for embeddings, media understanding, and TTS identified by provider-capability evaluation +- explicit support for extension-backed search, with a generic split between agent-visible tool publication and optional runtime-internal search backends - any pilot migration, event pipeline, canonical catalog, or arbitration implementation Recent plan refinement from external-plugin validation: @@ -228,6 +229,7 @@ Recent plan refinement from external-plugin validation: - it now explicitly treats Telegram and Discord as the first validated rollout targets for interactive control surfaces while keeping the underlying contracts generic, host-owned, and kernel-agnostic - it now explicitly treats embeddings, media understanding, and TTS as host-owned subsystem runtimes with capability routing, typed request envelopes, provider-id normalization, and fallback policy - it now explicitly rejects widening the legacy `registerProvider(...)` or `ProviderPlugin` surface into a universal runtime API, even when harvesting useful capability-routing ideas from provider-capability prototypes +- it now explicitly treats extension-backed search as either a canonical tool contribution or a host-owned runtime backend depending on whether the search surface is agent-visible ## Implementation Order diff --git a/docs/.internal/extension-host-migration/openclaw-extension-host-lifecycle-and-security-spec.md b/docs/.internal/extension-host-migration/openclaw-extension-host-lifecycle-and-security-spec.md index f4523f6acba..c037451a3af 100644 --- a/docs/.internal/extension-host-migration/openclaw-extension-host-lifecycle-and-security-spec.md +++ b/docs/.internal/extension-host-migration/openclaw-extension-host-lifecycle-and-security-spec.md @@ -137,6 +137,7 @@ What is still pending from this spec: - activation pipeline ownership - host-owned registries for setup, CLI, routes, services, slots, and backends - host-owned subsystem runtime registries for embeddings, media understanding, and TTS, including explicit fallback and override policy instead of plugin-era capability reads +- a clear host-owned split for extension-backed search between agent-visible tool publication and any optional runtime-internal search backend registry - permission-mode enforcement - per-extension state ownership and migration - provenance, reload, and hardening parity tracking @@ -735,9 +736,10 @@ The host must emit structured telemetry for: 5. Add host-owned credential and per-extension state boundaries for extension services. 6. Generalize backend registration into a host-managed `capability.runtime-backend` registry. 7. Add host-owned subsystem runtime registries for embeddings, media understanding, and TTS instead of widening `registerProvider(...)`. -8. Add slot-backed provider management for context engines and other exclusive runtime providers. -9. Preserve provenance, origin precedence, and current workspace and bundled enablement rules in host policy. -10. Preserve prompt-mutation policy gates and add explicit state migration handling. -11. Add explicit host registries and typed contracts for extension-owned hooks, channels, providers, tools, commands, CLI, setup flows, config surfaces, and status surfaces. -12. Preserve config redaction-aware schema behavior and current reload or gateway feature contracts during migration. -13. Record lifecycle parity for `thread-ownership` first and `telegram` second before broadening the compatibility bridges. +8. Keep extension-backed search generic by publishing agent-visible search through tool contracts and using runtime-backend only for search backends consumed internally by the host or another subsystem. +9. Add slot-backed provider management for context engines and other exclusive runtime providers. +10. Preserve provenance, origin precedence, and current workspace and bundled enablement rules in host policy. +11. Preserve prompt-mutation policy gates and add explicit state migration handling. +12. Add explicit host registries and typed contracts for extension-owned hooks, channels, providers, tools, commands, CLI, setup flows, config surfaces, and status surfaces. +13. Preserve config redaction-aware schema behavior and current reload or gateway feature contracts during migration. +14. Record lifecycle parity for `thread-ownership` first and `telegram` second before broadening the compatibility bridges. diff --git a/docs/.internal/extension-host-migration/openclaw-kernel-extension-host-transition-plan.md b/docs/.internal/extension-host-migration/openclaw-kernel-extension-host-transition-plan.md index 89800b69018..9a6319f1ded 100644 --- a/docs/.internal/extension-host-migration/openclaw-kernel-extension-host-transition-plan.md +++ b/docs/.internal/extension-host-migration/openclaw-kernel-extension-host-transition-plan.md @@ -549,6 +549,8 @@ Suggested mapping: - `registerChannel(...)` -> `adapter.runtime` plus lightweight dock metadata and optional `surface.config`, `surface.status`, `surface.setup` - `registerProvider(...)` -> `capability.provider-integration` plus optional setup and auth surfaces - plugin-provided embeddings, transcription, image or video understanding, and TTS -> typed subsystem runtime contributions or `capability.runtime-backend`, not a widened `registerProvider(...)` end state +- extension-backed search exposed to the agent -> `capability.agent-tool` +- extension-backed search consumed only by a host or subsystem -> typed runtime contribution or `capability.runtime-backend` - `registerTool(...)` -> `capability.agent-tool` - `registerCommand(...)` -> `capability.control-command` - `on(...)` returning context or side effects -> `capability.context-augmenter` or `capability.event-handler` @@ -568,6 +570,7 @@ Concrete examples: - plugin-provided embeddings become a host-owned embedding runtime contribution - plugin-provided transcription, image understanding, and video understanding become host-owned media runtime contributions - plugin-provided TTS becomes a host-owned TTS runtime contribution +- extension-backed web search becomes a canonical search tool contribution unless it is only a runtime-internal backend - `extensions/diffs/index.ts:27` becomes `capability.agent-tool` - `extensions/diffs/index.ts:28` becomes a host-managed route or interaction surface - `extensions/diffs/index.ts:38` becomes `capability.context-augmenter` @@ -830,6 +833,7 @@ Provider integration contributions need host-injected capabilities for: Scope rule: - `capability.provider-integration` is for chat or model-provider discovery, setup, auth, and post-selection lifecycle +- agent-visible search should not be folded into that family only because it may call remote services - embeddings, transcription, image understanding, video understanding, and TTS should not be folded into that family just because they also use remote providers - those subsystem runtimes should use host-owned capability routing and typed runtime registries or runtime-backend families instead @@ -839,6 +843,7 @@ Useful ideas harvested from provider-capability validation: - typed request envelopes with host-injected `apiKey`, `baseUrl`, `headers`, `timeoutMs`, and `fetchFn` are good - provider-id normalization is good - graceful built-in fallback is good +- the same host-owned routing pattern is useful for runtime-internal search backends, but agent-visible search should still surface as a tool family rather than a universal provider API Architecture rule: diff --git a/src/extension-host/cutover-inventory.md b/src/extension-host/cutover-inventory.md index a0a1d56ea38..b8106255b16 100644 --- a/src/extension-host/cutover-inventory.md +++ b/src/extension-host/cutover-inventory.md @@ -97,6 +97,9 @@ This is an implementation checklist, not a future-design spec. | Interactive channel control verbs for bound agents | product-shaped runtime helpers added under `src/plugins/runtime/*` and direct channel-specific helpers in extension code | host-owned adapter runtime contracts and interaction capabilities | `not started` | The host needs a bounded first-cut set of control verbs for interactive agents, such as typing leases plus message or conversation actions. Those verbs should be expressed as generic host-owned adapter capabilities, even if the first validated rollout only exercises them through Telegram and Discord. | | Slot arbitration | `src/plugins/slots.ts` | host-owned arbitration model | `not started` | Current slot selection remains plugin-era logic. | | ACP backend registry | `src/acp/runtime/registry.ts` | host-owned runtime-backend registry | `not started` | ACP backends still mutate a global ACP runtime registry directly. | +| Embedding provider registry and fallback routing | `src/memory/embeddings.ts` plus plugin provider capability filtering through `src/plugins/runtime.ts` | host-owned embedding runtime registry or typed runtime-backend family | `not started` | Provider-capability evaluation showed this is a real missing scope area. Embedding providers should be modeled as host-owned subsystem runtimes with explicit capability metadata, request envelopes, provider-id normalization, and fallback rules, not by widening legacy `registerProvider(...)` as the long-term architecture. | +| Media-understanding provider registry and fallback routing | `src/media-understanding/providers/index.ts` plus plugin provider capability filtering through `src/plugins/runtime.ts` | host-owned media runtime registry or typed runtime-backend family | `not started` | Audio transcription, image understanding, and video understanding should be modeled as host-owned subsystem runtimes with capability routing, explicit request envelopes, and fallback behavior rather than as permanent extensions of the plugin-era provider API. | +| TTS provider registry and telephony override routing | `src/tts/providers.ts`, `src/tts/tts.ts`, and plugin provider capability filtering through `src/plugins/runtime.ts` | host-owned TTS runtime registry or typed runtime-backend family | `not started` | TTS providers and telephony TTS overrides should move behind host-owned runtime registries with explicit capability and fallback policy rather than staying coupled to plugin-era provider capabilities and global active-registry reads. | | Onboarding/install/setup surfaces | `src/plugins/install.ts`, package manifests, channel catalog, onboarding commands | host-owned static descriptors | `partial` | Static metadata normalization has started; full setup/install descriptor migration is not done. | | Pilot migrations | `extensions/thread-ownership`, `extensions/telegram`, `extensions/acpx` | extension-host path with parity tracking | `not started` | No pilot runs through the host path yet. | @@ -127,6 +130,8 @@ That pattern has been used for: - CLI duplicate detection, registrar invocation, and async failure logging - gateway method-id aggregation, plugin diagnostic shaping, and extra-handler composition - explicit scoping of still-unimplemented migration targets discovered by external-plugin evaluation: conversation binding ownership, interactive callback routing, ingress claim semantics, and bounded first-cut interactive channel controls +- explicit scoping of still-unimplemented subsystem-runtime targets discovered by provider-capability evaluation: embeddings, media understanding, and TTS as host-owned runtime registries with capability routing and fallback +- explicit scoping of extension-backed search as either a canonical tool contribution or an optional host-owned runtime backend, rather than as another universal provider surface ## Immediate Next Targets @@ -152,6 +157,8 @@ The following remain legacy-owned today: - interaction namespace routing, dedupe, and callback fallback rules - canonical ingress claim semantics - generic host-owned interactive channel control contracts +- embedding, media-understanding, and TTS runtime registries +- a clear host-owned split for extension-backed search between canonical tool publication and any optional runtime-internal search backend registry - slot arbitration - ACP backend registration - channel runtime compatibility bridges