Docs: clarify extension-backed search plan

This commit is contained in:
Gustavo Madeira Santana 2026-03-15 18:23:21 +00:00
parent e7ad81179d
commit e2635aaf00
No known key found for this signature in database
6 changed files with 70 additions and 9 deletions

View File

@ -503,6 +503,7 @@ The selected provider integration may also contribute:
- model-selected hooks
It should not silently absorb unrelated subsystem runtimes such as embeddings, transcription, media understanding, or TTS.
It should also not silently absorb agent-visible search surfaces, which belong in the agent-tool catalog even when they call remote search services.
## Memory Arbitration
@ -562,6 +563,7 @@ Architecture rule:
- keep those selection and envelope rules inside host-owned subsystem runtime registries or typed backend families
- do not widen provider-integration or legacy plugin-provider APIs into a universal surface for unrelated runtime subsystems
- if search is agent-visible, publish it through canonical tool catalogs; reserve runtime-backend modeling for search backends that are consumed internally by the host or another subsystem
## Catalog Publication
@ -607,6 +609,7 @@ Capability selection must emit structured events for:
- diffs becomes an agent-visible tool family plus a host-managed route surface from `extensions/diffs/index.ts:27`
- provider integration from `extensions/google-gemini-cli-auth/index.ts:24` becomes operator-visible setup and auth capabilities
- embedding, media-understanding, and TTS provider overrides should become runtime-internal subsystem registries rather than remaining part of a universal plugin-provider API
- extension-backed web search should become an agent-visible tool family unless it is only an internal backend feeding another host-owned surface
- voice-call from `extensions/voice-call/index.ts:230` becomes a mix of agent-visible actions, runtime providers, and operator surfaces
- ACP backend registration from `extensions/acpx/src/service.ts:55` becomes runtime-internal backend arbitration
- context-engine registration becomes runtime-internal slot arbitration from `src/context-engine/registry.ts:60`
@ -623,6 +626,7 @@ Capability selection must emit structured events for:
7. Add provider selection logic for the broader messaging action family before migrating all channels.
8. Add runtime-backend and context-engine arbitration using the same rank and slot model where appropriate.
9. Add host-owned embedding, media-understanding, and TTS subsystem registries with explicit capability routing and built-in fallback policy.
10. Ensure lightweight setup catalogs can be built from static descriptors alone.
11. Add a reviewed core registry for canonical action families and document how new ids are introduced.
12. Record catalog and arbitration parity for `thread-ownership` first and `telegram` second before broader rollout.
10. Decide whether extension-backed search needs only canonical tool publication or also a host-owned internal search-backend registry, and keep those two cases distinct.
11. Ensure lightweight setup catalogs can be built from static descriptors alone.
12. Add a reviewed core registry for canonical action families and document how new ids are introduced.
13. Record catalog and arbitration parity for `thread-ownership` first and `telegram` second before broader rollout.

View File

@ -476,6 +476,8 @@ The lightweight dock contract should be specific enough to preserve current host
Represents an agent-visible action.
This is the correct family for extension-backed search when the search surface is directly exposed to the agent, for example a canonical `web.search` or workspace-search action.
Required descriptor metadata:
- canonical action id
@ -529,6 +531,13 @@ Required descriptor metadata:
This family exists because today's provider plugin contract includes more than auth, as shown in `src/plugins/types.ts:158`.
Scope rule:
- this family is specifically for chat or model-provider discovery, setup, auth, and post-selection lifecycle
- agent-visible search should not be folded into this family only because it may call remote providers under the hood
- non-chat subsystem providers such as embeddings, transcription, image understanding, video understanding, and TTS should not be folded into this family only because they also use remote providers
- those subsystem runtimes should use typed runtime contributions or `capability.runtime-backend` with subsystem-specific capability metadata
### `capability.memory`
Represents a memory store or memory query runtime.
@ -678,6 +687,38 @@ Required descriptor metadata:
This family exists because not all runtime providers are user-facing adapters.
This family is also the right home for plugin-provided subsystem runtimes when the runtime is consumed by a host or subsystem rather than directly by the agent.
Examples to support during migration:
- embeddings
- audio transcription
- image understanding
- video understanding
- text-to-speech
- search backends only when they are runtime-internal and not directly exposed as agent tools
Required metadata for these subsystem runtimes:
- subsystem id such as `embedding`, `media.audio`, `media.image`, `media.video`, or `tts`
- supported capability list
- typed request envelope contract
- provider-id normalization rules
- fallback policy
- override policy when a built-in implementation already exists
Useful harvested behavior:
- capability-based routing is worth keeping
- typed host-injected request fields such as `apiKey`, `baseUrl`, `headers`, `timeoutMs`, and `fetchFn` are worth keeping
- graceful fallback to built-in implementations is worth keeping
Important rule:
- keep these as host-owned runtime registries or backend families
- do not widen `registerProvider(...)` into the permanent universal surface for every runtime subsystem
- if search is directly agent-visible, model it as `capability.agent-tool` instead of treating it as a generic provider family
### Adapter-runtime helper contracts
Some interactive and bound-conversation extensions need a bounded set of runtime helper contracts from the active adapter.

View File

@ -218,6 +218,7 @@ What is still missing for these phases:
- minimal SDK compatibility work beyond preserving current behavior indirectly through existing loading
- host-owned conversation binding, interaction routing, ingress claim, and generic interactive control surfaces identified by external-plugin validation
- host-owned subsystem runtime registries for embeddings, media understanding, and TTS identified by provider-capability evaluation
- explicit support for extension-backed search, with a generic split between agent-visible tool publication and optional runtime-internal search backends
- any pilot migration, event pipeline, canonical catalog, or arbitration implementation
Recent plan refinement from external-plugin validation:
@ -228,6 +229,7 @@ Recent plan refinement from external-plugin validation:
- it now explicitly treats Telegram and Discord as the first validated rollout targets for interactive control surfaces while keeping the underlying contracts generic, host-owned, and kernel-agnostic
- it now explicitly treats embeddings, media understanding, and TTS as host-owned subsystem runtimes with capability routing, typed request envelopes, provider-id normalization, and fallback policy
- it now explicitly rejects widening the legacy `registerProvider(...)` or `ProviderPlugin` surface into a universal runtime API, even when harvesting useful capability-routing ideas from provider-capability prototypes
- it now explicitly treats extension-backed search as either a canonical tool contribution or a host-owned runtime backend depending on whether the search surface is agent-visible
## Implementation Order

View File

@ -137,6 +137,7 @@ What is still pending from this spec:
- activation pipeline ownership
- host-owned registries for setup, CLI, routes, services, slots, and backends
- host-owned subsystem runtime registries for embeddings, media understanding, and TTS, including explicit fallback and override policy instead of plugin-era capability reads
- a clear host-owned split for extension-backed search between agent-visible tool publication and any optional runtime-internal search backend registry
- permission-mode enforcement
- per-extension state ownership and migration
- provenance, reload, and hardening parity tracking
@ -735,9 +736,10 @@ The host must emit structured telemetry for:
5. Add host-owned credential and per-extension state boundaries for extension services.
6. Generalize backend registration into a host-managed `capability.runtime-backend` registry.
7. Add host-owned subsystem runtime registries for embeddings, media understanding, and TTS instead of widening `registerProvider(...)`.
8. Add slot-backed provider management for context engines and other exclusive runtime providers.
9. Preserve provenance, origin precedence, and current workspace and bundled enablement rules in host policy.
10. Preserve prompt-mutation policy gates and add explicit state migration handling.
11. Add explicit host registries and typed contracts for extension-owned hooks, channels, providers, tools, commands, CLI, setup flows, config surfaces, and status surfaces.
12. Preserve config redaction-aware schema behavior and current reload or gateway feature contracts during migration.
13. Record lifecycle parity for `thread-ownership` first and `telegram` second before broadening the compatibility bridges.
8. Keep extension-backed search generic by publishing agent-visible search through tool contracts and using runtime-backend only for search backends consumed internally by the host or another subsystem.
9. Add slot-backed provider management for context engines and other exclusive runtime providers.
10. Preserve provenance, origin precedence, and current workspace and bundled enablement rules in host policy.
11. Preserve prompt-mutation policy gates and add explicit state migration handling.
12. Add explicit host registries and typed contracts for extension-owned hooks, channels, providers, tools, commands, CLI, setup flows, config surfaces, and status surfaces.
13. Preserve config redaction-aware schema behavior and current reload or gateway feature contracts during migration.
14. Record lifecycle parity for `thread-ownership` first and `telegram` second before broadening the compatibility bridges.

View File

@ -549,6 +549,8 @@ Suggested mapping:
- `registerChannel(...)` -> `adapter.runtime` plus lightweight dock metadata and optional `surface.config`, `surface.status`, `surface.setup`
- `registerProvider(...)` -> `capability.provider-integration` plus optional setup and auth surfaces
- plugin-provided embeddings, transcription, image or video understanding, and TTS -> typed subsystem runtime contributions or `capability.runtime-backend`, not a widened `registerProvider(...)` end state
- extension-backed search exposed to the agent -> `capability.agent-tool`
- extension-backed search consumed only by a host or subsystem -> typed runtime contribution or `capability.runtime-backend`
- `registerTool(...)` -> `capability.agent-tool`
- `registerCommand(...)` -> `capability.control-command`
- `on(...)` returning context or side effects -> `capability.context-augmenter` or `capability.event-handler`
@ -568,6 +570,7 @@ Concrete examples:
- plugin-provided embeddings become a host-owned embedding runtime contribution
- plugin-provided transcription, image understanding, and video understanding become host-owned media runtime contributions
- plugin-provided TTS becomes a host-owned TTS runtime contribution
- extension-backed web search becomes a canonical search tool contribution unless it is only a runtime-internal backend
- `extensions/diffs/index.ts:27` becomes `capability.agent-tool`
- `extensions/diffs/index.ts:28` becomes a host-managed route or interaction surface
- `extensions/diffs/index.ts:38` becomes `capability.context-augmenter`
@ -830,6 +833,7 @@ Provider integration contributions need host-injected capabilities for:
Scope rule:
- `capability.provider-integration` is for chat or model-provider discovery, setup, auth, and post-selection lifecycle
- agent-visible search should not be folded into that family only because it may call remote services
- embeddings, transcription, image understanding, video understanding, and TTS should not be folded into that family just because they also use remote providers
- those subsystem runtimes should use host-owned capability routing and typed runtime registries or runtime-backend families instead
@ -839,6 +843,7 @@ Useful ideas harvested from provider-capability validation:
- typed request envelopes with host-injected `apiKey`, `baseUrl`, `headers`, `timeoutMs`, and `fetchFn` are good
- provider-id normalization is good
- graceful built-in fallback is good
- the same host-owned routing pattern is useful for runtime-internal search backends, but agent-visible search should still surface as a tool family rather than a universal provider API
Architecture rule:

View File

@ -97,6 +97,9 @@ This is an implementation checklist, not a future-design spec.
| Interactive channel control verbs for bound agents | product-shaped runtime helpers added under `src/plugins/runtime/*` and direct channel-specific helpers in extension code | host-owned adapter runtime contracts and interaction capabilities | `not started` | The host needs a bounded first-cut set of control verbs for interactive agents, such as typing leases plus message or conversation actions. Those verbs should be expressed as generic host-owned adapter capabilities, even if the first validated rollout only exercises them through Telegram and Discord. |
| Slot arbitration | `src/plugins/slots.ts` | host-owned arbitration model | `not started` | Current slot selection remains plugin-era logic. |
| ACP backend registry | `src/acp/runtime/registry.ts` | host-owned runtime-backend registry | `not started` | ACP backends still mutate a global ACP runtime registry directly. |
| Embedding provider registry and fallback routing | `src/memory/embeddings.ts` plus plugin provider capability filtering through `src/plugins/runtime.ts` | host-owned embedding runtime registry or typed runtime-backend family | `not started` | Provider-capability evaluation showed this is a real missing scope area. Embedding providers should be modeled as host-owned subsystem runtimes with explicit capability metadata, request envelopes, provider-id normalization, and fallback rules, not by widening legacy `registerProvider(...)` as the long-term architecture. |
| Media-understanding provider registry and fallback routing | `src/media-understanding/providers/index.ts` plus plugin provider capability filtering through `src/plugins/runtime.ts` | host-owned media runtime registry or typed runtime-backend family | `not started` | Audio transcription, image understanding, and video understanding should be modeled as host-owned subsystem runtimes with capability routing, explicit request envelopes, and fallback behavior rather than as permanent extensions of the plugin-era provider API. |
| TTS provider registry and telephony override routing | `src/tts/providers.ts`, `src/tts/tts.ts`, and plugin provider capability filtering through `src/plugins/runtime.ts` | host-owned TTS runtime registry or typed runtime-backend family | `not started` | TTS providers and telephony TTS overrides should move behind host-owned runtime registries with explicit capability and fallback policy rather than staying coupled to plugin-era provider capabilities and global active-registry reads. |
| Onboarding/install/setup surfaces | `src/plugins/install.ts`, package manifests, channel catalog, onboarding commands | host-owned static descriptors | `partial` | Static metadata normalization has started; full setup/install descriptor migration is not done. |
| Pilot migrations | `extensions/thread-ownership`, `extensions/telegram`, `extensions/acpx` | extension-host path with parity tracking | `not started` | No pilot runs through the host path yet. |
@ -127,6 +130,8 @@ That pattern has been used for:
- CLI duplicate detection, registrar invocation, and async failure logging
- gateway method-id aggregation, plugin diagnostic shaping, and extra-handler composition
- explicit scoping of still-unimplemented migration targets discovered by external-plugin evaluation: conversation binding ownership, interactive callback routing, ingress claim semantics, and bounded first-cut interactive channel controls
- explicit scoping of still-unimplemented subsystem-runtime targets discovered by provider-capability evaluation: embeddings, media understanding, and TTS as host-owned runtime registries with capability routing and fallback
- explicit scoping of extension-backed search as either a canonical tool contribution or an optional host-owned runtime backend, rather than as another universal provider surface
## Immediate Next Targets
@ -152,6 +157,8 @@ The following remain legacy-owned today:
- interaction namespace routing, dedupe, and callback fallback rules
- canonical ingress claim semantics
- generic host-owned interactive channel control contracts
- embedding, media-understanding, and TTS runtime registries
- a clear host-owned split for extension-backed search between canonical tool publication and any optional runtime-internal search backend registry
- slot arbitration
- ACP backend registration
- channel runtime compatibility bridges