Docs: refine subsystem runtime provider plan

This commit is contained in:
Gustavo Madeira Santana 2026-03-15 18:19:54 +00:00
parent 1548728a28
commit e7ad81179d
No known key found for this signature in database
4 changed files with 61 additions and 9 deletions

View File

@ -502,6 +502,8 @@ The selected provider integration may also contribute:
- token refresh behavior
- model-selected hooks
It should not silently absorb unrelated subsystem runtimes such as embeddings, transcription, media understanding, or TTS.
## Memory Arbitration
Memory needs both backend arbitration and agent action arbitration.
@ -541,6 +543,26 @@ Selection rules:
This is why `capability.runtime-backend` must be a first-class family.
The same model should be available for other subsystem runtimes discovered during migration:
- embeddings
- audio transcription
- image understanding
- video understanding
- text-to-speech
Selection rules for these subsystem runtimes should preserve the useful parts of provider-capability prototypes:
- capability-based selection
- normalized provider ids
- explicit built-in fallback policy
- typed host-injected request envelopes
Architecture rule:
- keep those selection and envelope rules inside host-owned subsystem runtime registries or typed backend families
- do not widen provider-integration or legacy plugin-provider APIs into a universal surface for unrelated runtime subsystems
## Catalog Publication
The kernel should publish:
@ -584,6 +606,7 @@ Capability selection must emit structured events for:
- channel capabilities from `extensions/discord/src/channel.ts:74`, `extensions/slack/src/channel.ts:107`, and `extensions/telegram/src/channel.ts:120` collapse into canonical messaging action families
- diffs becomes an agent-visible tool family plus a host-managed route surface from `extensions/diffs/index.ts:27`
- provider integration from `extensions/google-gemini-cli-auth/index.ts:24` becomes operator-visible setup and auth capabilities
- embedding, media-understanding, and TTS provider overrides should become runtime-internal subsystem registries rather than remaining part of a universal plugin-provider API
- voice-call from `extensions/voice-call/index.ts:230` becomes a mix of agent-visible actions, runtime providers, and operator surfaces
- ACP backend registration from `extensions/acpx/src/service.ts:55` becomes runtime-internal backend arbitration
- context-engine registration becomes runtime-internal slot arbitration from `src/context-engine/registry.ts:60`
@ -599,6 +622,7 @@ Capability selection must emit structured events for:
6. Migrate the existing provider auth and setup selection path onto host-owned setup catalogs and canonical provider metadata.
7. Add provider selection logic for the broader messaging action family before migrating all channels.
8. Add runtime-backend and context-engine arbitration using the same rank and slot model where appropriate.
9. Ensure lightweight setup catalogs can be built from static descriptors alone.
10. Add a reviewed core registry for canonical action families and document how new ids are introduced.
11. Record catalog and arbitration parity for `thread-ownership` first and `telegram` second before broader rollout.
9. Add host-owned embedding, media-understanding, and TTS subsystem registries with explicit capability routing and built-in fallback policy.
10. Ensure lightweight setup catalogs can be built from static descriptors alone.
11. Add a reviewed core registry for canonical action families and document how new ids are introduced.
12. Record catalog and arbitration parity for `thread-ownership` first and `telegram` second before broader rollout.

View File

@ -217,6 +217,7 @@ What is still missing for these phases:
- broader lifecycle ownership beyond the loader state machine, service-lifecycle boundary, CLI-lifecycle boundary, session-owned activation state, and explicit discovery-policy, activation-policy, and finalization-policy outcomes, remaining policy gate ownership, and broad host-owned registries described for Phase 2
- minimal SDK compatibility work beyond preserving current behavior indirectly through existing loading
- host-owned conversation binding, interaction routing, ingress claim, and generic interactive control surfaces identified by external-plugin validation
- host-owned subsystem runtime registries for embeddings, media understanding, and TTS identified by provider-capability evaluation
- any pilot migration, event pipeline, canonical catalog, or arbitration implementation
Recent plan refinement from external-plugin validation:
@ -225,6 +226,8 @@ Recent plan refinement from external-plugin validation:
- it now explicitly treats interactive callback routing, namespace ownership, dedupe, and fallback behavior as first-class migration surfaces
- it now explicitly treats inbound claim as a canonical ingress-stage concern rather than a permanent plugin-era hook shape
- it now explicitly treats Telegram and Discord as the first validated rollout targets for interactive control surfaces while keeping the underlying contracts generic, host-owned, and kernel-agnostic
- it now explicitly treats embeddings, media understanding, and TTS as host-owned subsystem runtimes with capability routing, typed request envelopes, provider-id normalization, and fallback policy
- it now explicitly rejects widening the legacy `registerProvider(...)` or `ProviderPlugin` surface into a universal runtime API, even when harvesting useful capability-routing ideas from provider-capability prototypes
## Implementation Order

View File

@ -136,6 +136,7 @@ What is still pending from this spec:
- broader extension-host lifecycle ownership beyond the loader state machine, service-lifecycle boundary, CLI-lifecycle boundary, session-owned activation state, and explicit discovery-policy, activation-policy, and finalization-policy outcomes
- activation pipeline ownership
- host-owned registries for setup, CLI, routes, services, slots, and backends
- host-owned subsystem runtime registries for embeddings, media understanding, and TTS, including explicit fallback and override policy instead of plugin-era capability reads
- permission-mode enforcement
- per-extension state ownership and migration
- provenance, reload, and hardening parity tracking
@ -733,9 +734,10 @@ The host must emit structured telemetry for:
4. Add a policy evaluator that understands advisory versus enforced permission modes.
5. Add host-owned credential and per-extension state boundaries for extension services.
6. Generalize backend registration into a host-managed `capability.runtime-backend` registry.
7. Add slot-backed provider management for context engines and other exclusive runtime providers.
8. Preserve provenance, origin precedence, and current workspace and bundled enablement rules in host policy.
9. Preserve prompt-mutation policy gates and add explicit state migration handling.
10. Add explicit host registries and typed contracts for extension-owned hooks, channels, providers, tools, commands, CLI, setup flows, config surfaces, and status surfaces.
11. Preserve config redaction-aware schema behavior and current reload or gateway feature contracts during migration.
12. Record lifecycle parity for `thread-ownership` first and `telegram` second before broadening the compatibility bridges.
7. Add host-owned subsystem runtime registries for embeddings, media understanding, and TTS instead of widening `registerProvider(...)`.
8. Add slot-backed provider management for context engines and other exclusive runtime providers.
9. Preserve provenance, origin precedence, and current workspace and bundled enablement rules in host policy.
10. Preserve prompt-mutation policy gates and add explicit state migration handling.
11. Add explicit host registries and typed contracts for extension-owned hooks, channels, providers, tools, commands, CLI, setup flows, config surfaces, and status surfaces.
12. Preserve config redaction-aware schema behavior and current reload or gateway feature contracts during migration.
13. Record lifecycle parity for `thread-ownership` first and `telegram` second before broadening the compatibility bridges.

View File

@ -548,6 +548,7 @@ Suggested mapping:
- `registerChannel(...)` -> `adapter.runtime` plus lightweight dock metadata and optional `surface.config`, `surface.status`, `surface.setup`
- `registerProvider(...)` -> `capability.provider-integration` plus optional setup and auth surfaces
- plugin-provided embeddings, transcription, image or video understanding, and TTS -> typed subsystem runtime contributions or `capability.runtime-backend`, not a widened `registerProvider(...)` end state
- `registerTool(...)` -> `capability.agent-tool`
- `registerCommand(...)` -> `capability.control-command`
- `on(...)` returning context or side effects -> `capability.context-augmenter` or `capability.event-handler`
@ -564,6 +565,9 @@ Suggested mapping:
Concrete examples:
- `extensions/google-gemini-cli-auth/index.ts:25` becomes `capability.provider-integration`
- plugin-provided embeddings become a host-owned embedding runtime contribution
- plugin-provided transcription, image understanding, and video understanding become host-owned media runtime contributions
- plugin-provided TTS becomes a host-owned TTS runtime contribution
- `extensions/diffs/index.ts:27` becomes `capability.agent-tool`
- `extensions/diffs/index.ts:28` becomes a host-managed route or interaction surface
- `extensions/diffs/index.ts:38` becomes `capability.context-augmenter`
@ -823,6 +827,25 @@ Provider integration contributions need host-injected capabilities for:
- token refresh or credential renewal
- model-selected lifecycle hooks
Scope rule:
- `capability.provider-integration` is for chat or model-provider discovery, setup, auth, and post-selection lifecycle
- embeddings, transcription, image understanding, video understanding, and TTS should not be folded into that family just because they also use remote providers
- those subsystem runtimes should use host-owned capability routing and typed runtime registries or runtime-backend families instead
Useful ideas harvested from provider-capability validation:
- capability-based selection is good
- typed request envelopes with host-injected `apiKey`, `baseUrl`, `headers`, `timeoutMs`, and `fetchFn` are good
- provider-id normalization is good
- graceful built-in fallback is good
Architecture rule:
- harvest those behaviors into host-owned subsystem runtime contracts
- do not widen legacy `registerProvider(...)` into a universal plugin API for unrelated runtime subsystems
- do not make `src/plugins/runtime.ts` capability filters or global active-registry reads the long-term selection surface for embeddings, media understanding, or TTS
Example:
- `extensions/google-gemini-cli-auth/index.ts:25`