feat: add CLI-only streaming hardening and interactive subagent panel plans

- Introduced a new plan for CLI-only streaming hardening, focusing on protocol-level improvements and the removal of web WS clients.
- Added a plan for an interactive subagent panel, enabling subagents to operate independently from the parent agent's event stream, with unified API routes for enhanced interactivity.
This commit is contained in:
kumarabhirup 2026-02-21 14:52:27 -08:00
parent c8ae7acbf4
commit 5b41523f17
No known key found for this signature in database
GPG Key ID: DB7CA2289CAB0167
2 changed files with 234 additions and 0 deletions

View File

@ -0,0 +1,102 @@
---
name: cli-only-streaming-hardening
overview: Harden the CLI-only web streaming refactor by fixing protocol-level flaws first, then replacing web WS consumers with managed CLI subscribe processes and adding guardrails for dedupe, lifecycle, and long-wait stability.
todos:
- id: fix-subscribe-cli-semantics
content: Make `agent --stream-json --subscribe-session-key` long-lived and session-filtered, with tests.
status: completed
- id: add-subscribe-spawner
content: Add `spawnAgentSubscribeProcess` helper in `apps/web/lib/agent-runner.ts` with profile/workspace env wiring.
status: completed
- id: parent-wait-cli-subscribe
content: Refactor `apps/web/lib/active-runs.ts` waiting flow to use managed subscribe child + globalSeq dedupe.
status: completed
- id: subagent-cli-subscribe
content: Refactor `apps/web/lib/subagent-runs.ts` fallback/rehydration to managed subscribe child + globalSeq dedupe.
status: completed
- id: remove-web-ws-client
content: Remove `apps/web/lib/gateway-events.ts` usages and delete file after typecheck passes.
status: completed
- id: sse-keepalive
content: Add keepalive behavior for long idle waiting streams.
status: completed
- id: verify-regressions
content: Run targeted tests/smoke checks for handoff, refresh, replay, and duplicate/cross-session safety.
status: completed
isProject: false
---
# CLI-Only Streaming Plan (Flaw-Hardened)
## Critical flaws to fix before WS removal
- The current subscribe CLI path in [src/commands/agent-via-gateway.ts](/Users/kumareth/Documents/projects/openclaw/src/commands/agent-via-gateway.ts) calls `callGateway(... expectFinal: false)` and exits after `agent.subscribe` response; it does not remain attached for live events.
- `agent.subscribe` clients still receive global `agent` broadcasts unless filtered client-side; without filtering, per-session subscribe children can ingest unrelated events and cause cross-session noise/duplication.
- Handoff/replay can duplicate already-buffered events unless consumers gate by `globalSeq` (`<= lastSeen` ignore).
- Long “waiting for subagents” SSE windows in [apps/web/app/api/chat/stream/route.ts](/Users/kumareth/Documents/projects/openclaw/apps/web/app/api/chat/stream/route.ts) have no keepalive signal, increasing disconnect risk during quiet periods.
## Revised implementation sequence
1. **Stabilize subscribe transport semantics first**
- Rework subscribe mode in [src/commands/agent-via-gateway.ts](/Users/kumareth/Documents/projects/openclaw/src/commands/agent-via-gateway.ts) to use a long-lived gateway client session (not one-shot `callGateway`) that:
- connects,
- sends `agent.subscribe { sessionKey, afterSeq }`,
- streams events until SIGTERM/SIGINT,
- emits only matching `sessionKey` events,
- exits cleanly with `aborted` on signal.
- Add targeted tests for subscribe staying alive and session-key filtering.
2. **Add reusable CLI subscribe spawner**
- In [apps/web/lib/agent-runner.ts](/Users/kumareth/Documents/projects/openclaw/apps/web/lib/agent-runner.ts), add `spawnAgentSubscribeProcess(sessionKey, afterSeq)` using:
- `node <scriptPath> agent --stream-json --subscribe-session-key <key> --after-seq <n>`
- same profile/workspace env wiring as `spawnAgentProcess`.
3. **Replace parent waiting flow with subscribe child process**
- In [apps/web/lib/active-runs.ts](/Users/kumareth/Documents/projects/openclaw/apps/web/lib/active-runs.ts):
- replace `subscribeToSessionKey(...)` usage with a managed subscribe child,
- parse NDJSON from subscribe child and route through existing parent event processor,
- dedupe using `globalSeq` (drop stale/replayed duplicates),
- store/cleanup process handle across finalize/abort/cleanup.
4. **Replace subagent fallback/rehydration with subscribe child process**
- In [apps/web/lib/subagent-runs.ts](/Users/kumareth/Documents/projects/openclaw/apps/web/lib/subagent-runs.ts):
- swap `subscribeToSessionKey(...)` for one managed subscribe child per running subagent session,
- feed NDJSON into existing `routeRawEvent`/transform path,
- use `lastGlobalSeq` dedupe and robust teardown on completion/error/cleanup.
5. **Retire direct web WS client**
- Remove [apps/web/lib/gateway-events.ts](/Users/kumareth/Documents/projects/openclaw/apps/web/lib/gateway-events.ts) imports/usages from web runtime.
- Delete file only after all references are gone and typecheck passes.
6. **Long-wait stream resilience**
- Add lightweight SSE keepalive comments/events while run status is `waiting-for-subagents` in [apps/web/app/api/chat/stream/route.ts](/Users/kumareth/Documents/projects/openclaw/apps/web/app/api/chat/stream/route.ts) or run subscription layer, so idle waits dont silently time out.
7. **Verification gates**
- Run targeted checks for:
- parent run -> subagent spawn -> parent wait -> announcement turn -> finalize,
- page refresh during parent wait,
- page refresh during subagent live stream,
- no cross-session event bleed,
- no duplicate tool/lifecycle events after replay handoff.
## Flow target
```mermaid
flowchart TD
webRun[WebRunManager] --> parentCli[agent --stream-json main run]
parentCli --> ndjsonParent[Parent NDJSON events]
parentCli -->|parent exits while subagents running| waitState[waitingForSubagents]
waitState --> subscribeCliParent[agent --stream-json subscribe parentSessionKey]
subscribeCliParent --> ndjsonReplayParent[ReplayedPlusLive NDJSON]
subagentMgr[SubagentRunManager] --> subscribeCliSub[agent --stream-json subscribe subagentSessionKey]
subscribeCliSub --> ndjsonSub[Subagent NDJSON]
ndjsonReplayParent --> sse[API chat stream SSE]
ndjsonSub --> sse
```

View File

@ -0,0 +1,132 @@
---
name: Interactive Subagent Panel
overview: Make subagents fully independent of the parent agent's event stream. Each subagent gets its own gateway subscription immediately on registration. Then make the subagent panel interactive with stop, send, queue matching the main chat, using unified API routes.
todos:
- id: decouple-subagent
content: "SubagentRunManager: subscribe immediately on registration, remove routeRawEvent/preRegBuffer/activateGatewayFallback"
status: pending
- id: remove-parent-routing
content: "active-runs.ts: remove subagent event routing from parent NDJSON stream"
status: pending
- id: srm-methods
content: "SubagentRunManager: add persistUserMessage(), reactivateSubagent(), abortSubagent(), spawnSubagentMessage()"
status: pending
- id: unify-chat-route
content: Extend POST /api/chat to dispatch to subagent flow when sessionKey is a subagent key
status: pending
- id: unify-stop-route
content: Extend POST /api/chat/stop to dispatch to SubagentRunManager when sessionKey is a subagent key
status: pending
- id: unify-stream-route
content: Extend GET /api/chat/stream to dispatch to SubagentRunManager when sessionKey is a subagent key
status: pending
- id: parser-turns
content: Extend createStreamParser to handle user-message events as turn boundaries
status: pending
- id: panel-rewrite
content: Rewrite SubagentPanel with ChatEditor, send/stop/queue, multi-turn conversation
status: pending
isProject: false
---
# Interactive Subagent Panel
## Core Problem
Subagent events piggyback on the parent agent's CLI NDJSON stream. When the parent finishes (spawns subagents then exits), the stream dies and subagent events stop flowing. The `activateGatewayFallback()` partially compensates but loses early events.
The root cause is architectural: subagents are treated as appendages of the parent. They should be independent sessions.
## Architecture Change
A subagent is just an agent session. The only link to the parent is the completion announcement. Each subagent gets its own gateway subscription from the moment it's registered.
```mermaid
flowchart TB
subgraph before [Current: Coupled]
GW1[Gateway] --> ParentCLI[Parent CLI stdout]
ParentCLI --> ARM1[ActiveRunManager]
ParentCLI -.->|"routeRawEvent<br/>(filtered by runId, never arrives)"| SRM1[SubagentRunManager]
ARM1 -.->|"activateGatewayFallback<br/>(after parent exits, loses early events)"| SRM1
end
subgraph after [New: Independent]
GW2[Gateway] --> ParentCLI2[Parent CLI stdout]
GW2 --> SubProc[Subscribe Process per subagent]
ParentCLI2 --> ARM2[ActiveRunManager]
SubProc --> SRM2[SubagentRunManager]
end
```
## Phase 1: Decouple Subagents
### 1. SubagentRunManager ([subagent-runs.ts](apps/web/lib/subagent-runs.ts))
**In `registerSubagent()` (line 266-270)**: replace the comment with:
```typescript
if (run.status === "running") {
startSubagentSubscribeStream(run);
}
```
Each subagent immediately gets its own subscribe process (`spawnAgentSubscribeProcess`) that connects to the gateway and streams events for that subagent's sessionKey. No dependency on the parent's stream.
**Remove dead code:**
- `routeRawEvent()` (lines 419-448) -- no longer called; events come from per-subagent subscribe processes
- `preRegBuffer` from the registry type and `getRegistry()` -- no pre-registration buffering needed; the subscribe process handles everything
- `activateGatewayFallback()` (lines 368-375) -- no longer needed; subscription starts at registration time
### 2. active-runs.ts ([active-runs.ts](apps/web/lib/active-runs.ts))
**Remove subagent event routing from the parent NDJSON handler**: the block that checks `ev.sessionKey !== parentSessionKey` and calls `routeSubagentEvent()` -- delete it entirely. Parent NDJSON stream now only processes parent events. No imports of `routeRawEvent`, `ensureRegisteredFromDisk`, `hasActiveSubagent` from subagent-runs needed for routing.
**Remove `activateGatewayFallback()` call** from the parent exit handler.
**Keep**: the `waiting-for-subagents` state transition and `hasRunningSubagentsForParent()` check -- the parent still needs to know when all subagents finish so it can finalize.
### 3. No CLI changes needed
The `runId` filter in `src/commands/agent.ts` is correct -- the parent's NDJSON stream should only contain parent events. Subagent events flow independently through their own subscribe processes.
## Phase 2: Unified API Routes
Same primitive, same routes. Dispatch based on session key format (`:subagent:` vs `:web:`).
### 4. SubagentRunManager: interactive methods
- `**persistUserMessage(sessionKey, msg)**` -- append `{type: "user-message", text, id}` to event buffer + JSONL
- `**reactivateSubagent(sessionKey)**` -- set status to `"running"`, clear `endedAt`, restart subscribe process
- `**abortSubagent(sessionKey)**` -- spawn CLI `gateway call chat.abort`, mark `"error"`, signal subscribers
- `**spawnSubagentMessage(sessionKey, message)**` -- spawn CLI `gateway call agent --params '{"message":"...", "sessionKey":"...", "lane":"subagent", ...}'`
### 5. Extend `POST /api/chat` ([route.ts](apps/web/app/api/chat/route.ts))
If `sessionKey` contains `:subagent:`:
- Reject if running (409)
- `persistUserMessage()` + `reactivateSubagent()` + `spawnSubagentMessage()`
- Subscribe via `subscribeToSubagent(sessionKey, ..., { replay: false })` for SSE response
Otherwise: existing parent flow.
### 6. Extend `POST /api/chat/stop` ([stop/route.ts](apps/web/app/api/chat/stop/route.ts))
Accept `sessionKey`. If `:subagent:`: `abortSubagent()`. Otherwise: `abortRun()`.
### 7. Extend `GET /api/chat/stream` ([stream/route.ts](apps/web/app/api/chat/stream/route.ts))
Accept `sessionKey`. If `:subagent:`: lazy-register from disk, `ensureSubagentStreamable()`, `subscribeToSubagent()`. Otherwise: existing parent flow.
Remove `apps/web/app/api/chat/subagent-stream/route.ts` after migration.
## Phase 3: Frontend
### 8. Stream parser turn boundaries ([chat-panel.tsx](apps/web/app/components/chat-panel.tsx))
Add `user-message` to `ParsedPart` and `createStreamParser` for multi-turn subagent conversations.
### 9. Rewrite SubagentPanel ([subagent-panel.tsx](apps/web/app/components/subagent-panel.tsx))
Full ChatPanel-like experience: ChatEditor, send/stop/queue buttons, AttachmentStrip, message queue, auto-scroll. Uses the unified routes with `sessionKey`.