When gateway.mode=remote is configured with a non-loopback remote.url,
a loopback gatewayUrl (ws://127.0.0.1:...) is likely an SSH port-forward
tunnel endpoint (ssh -N -L <local>:remote-host:<remote>). Previously,
resolveGatewayTarget() classified any loopback URL as 'local', causing
gateway-tool to forward live deliveryContext into remote config.apply /
config.patch / update.run writes. Because server handlers prefer
params.deliveryContext, post-restart wake messages were misrouted to the
caller's local chat context instead of the remote session.
Fix both classification sites:
1. validateGatewayUrlOverrideForAgentTools: when a loopback URL hits
localAllowed, check isNonLoopbackRemoteUrlConfigured(cfg); if true,
return 'remote' (tunnel) rather than 'local'.
2. resolveGatewayTarget fallback (rejected URL path): same check for
the isLoopback branch — prefer 'remote' when mode=remote with a
non-loopback remote.url is present.
Add isNonLoopbackRemoteUrlConfigured() helper (returns true iff
gateway.mode=remote AND gateway.remote.url is a non-loopback hostname).
Tests: add SSH tunnel cases in gateway.test.ts; update the
'OPENCLAW_GATEWAY_URL takes precedence' test which now correctly
returns 'remote' when mode=remote with non-loopback remote.url; add
a variant without remote config to cover the 'local' precedence case.
When OPENCLAW_GATEWAY_URL/CLAWDBOT_GATEWAY_URL is set to a valid remote URL
that doesn't match gateway.remote.url (or has a non-root path like /ws),
validateGatewayUrlOverrideForAgentTools throws and the old code silently fell
through to config-based resolution, returning undefined (local). But
callGateway/buildGatewayConnectionDetails still uses the env URL verbatim, so
the actual call goes remote while resolveGatewayTarget returned local — causing
gateway-tool to forward live deliveryContext into remote config.apply /
config.patch / update.run writes, which can misroute or leak post-restart wake
messages across hosts.
Fix: when validateGatewayUrlOverrideForAgentTools throws for an env URL
override, fall back to hostname-based loopback detection instead of silently
treating the target as local. Only truly malformed URLs (that new URL() cannot
parse) fall through to config-based resolution.
Adds tests for:
- env-only remote URL not matching gateway.remote.url → 'remote'
- env URL with no configured remote URL → 'remote'
- env URL with /ws path → 'remote'
- loopback env URL with /ws path → 'local'
Two regressions addressed (PR #34580 CR review):
1. server-restart-sentinel.ts – deliverOutboundPayloads catch block was
silently swallowing errors and proceeding to agentCommand without any
fallback. When delivery throws before bestEffort handling (e.g. channel
plugin error), the user would receive neither the deterministic notice
nor a system event if the resumed run emits no payloads. Fix: enqueue a
system event in the catch block, mirroring prior behaviour.
2. gateway.ts resolveGatewayTarget – two mismatches with callGateway's
actual URL resolution path:
a. gateway.mode=remote + missing gateway.remote.url: callGateway falls
back to local loopback, but resolveGatewayTarget returned 'remote',
suppressing deliveryContext for what is actually a local call.
b. Env URL overrides (OPENCLAW_GATEWAY_URL / CLAWDBOT_GATEWAY_URL) are
picked up by callGateway but were ignored here, causing incorrect
local/remote classification. Fix: check env overrides first, then
require both mode=remote AND remote.url present for 'remote'.
Tests: add regression coverage for both fixes.
Vitest infers mock return types from the initial factory function, causing
tsgo to reject mockReturnValueOnce/mockResolvedValueOnce calls that pass
types incompatible with the inferred return. Fix by:
- Widening resolveGatewayTarget mock to () => 'local' | 'remote' | undefined
- Widening extractDeliveryInfo mock threadId/accountId to string | undefined
- Using 'as never' on mockReturnValueOnce/mockResolvedValueOnce overrides
that pass edge-case values (null sentinel, undefined merge results,
partial payload objects) that intentionally don't match the strict type
All 71 tests still pass.
When a caller forwards live channel/to via deliveryContext but omits
accountId (e.g. /tools/invoke without x-openclaw-account-id), the
update.run handler was using paramsDeliveryContext as-is, dropping any
account binding that existed in the session store. The restart sentinel
would then be written without accountId, causing scheduleRestartSentinelWake
to deliver using the channel default account — misrouting in multi-account
setups.
Apply the same accountId fallback pattern that config.ts already uses:
when paramsDeliveryContext is present but its accountId is undefined,
merge in extractedDeliveryContext.accountId as fallback. See #18612.
When liveContext (gateway-tool.ts) or paramsDeliveryContext (config.ts) is
present but lacks an accountId, fall back to the accountId from the extracted
session store rather than dropping it entirely. This prevents restart follow-up
notices from being misrouted on multi-account channels when callers supply
channel/to without an explicit accountId.
Addresses CR comments on PR #34580.
resolveGatewayTarget() previously returned undefined when no gatewayUrl
override was provided, even when gateway.mode=remote routes the call to
gateway.remote.url. This caused isRemoteGateway to be false in that path,
so deliveryContext was forwarded to the remote host and could stamp the
restart sentinel with the local chat route, misdelivering post-restart
wake messages.
Fix: check gateway.mode=remote in the no-override branch and return
'remote' so deliveryContext is suppressed for config-based remote targets
the same way it is for explicit gatewayUrl overrides.
Adds a test covering the config-based remote mode case (no gatewayUrl).
Closes#34580 (P1 review comment).
isRemoteGateway was inferred from gatewayUrl being present, but gatewayUrl
overrides are valid for loopback/local targets too (ws://127.0.0.1:<port>,
localhost, [::1]). These local-override calls should still forward
deliveryContext — treating them as remote falls back to
extractDeliveryInfo(sessionKey) and reintroduces the stale heartbeat routing
this patch was meant to fix.
Fix: export resolveGatewayTarget() from gateway.ts (returns 'local' | 'remote'
| undefined) and use it instead of Boolean(gatewayUrl?.trim()). Only
gatewayUrl values that classify as 'remote' now suppress deliveryContext.
Adds test coverage for the local loopback case.
resolveGatewayWriteMeta() was forwarding the local agent run's
deliveryContext to config.apply/config.patch/update.run even when
gatewayUrl pointed to a remote gateway. Server handlers now prefer
params.deliveryContext over extractDeliveryInfo(sessionKey), so a
remote restart sentinel would be written with the local chat's channel/
to, causing post-restart wake messages to be delivered to the caller's
chat instead of the session that lives on the remote gateway.
Fix: gate deliveryContext forwarding on isRemoteGateway (truthy
gatewayOpts.gatewayUrl). When targeting a remote gateway, omit
deliveryContext so the remote server's extractDeliveryInfo(sessionKey)
remains authoritative for the routing of that session. See #18612.
Restore deliverOutboundPayloads() to send the restart summary
deterministically before calling agentCommand() for the agent resume
turn. Previously the notice was only delivered as input to the model
via agentCommand(), making it model-dependent: if the model rewrote or
omitted the content, the user would never see the restart summary/note.
The new two-step flow:
1. deliverOutboundPayloads() — guaranteed delivery of the exact restart
notice (model-independent). Restores the Slack replyToId mapping
from main that ensures threaded replies land in the right thread.
2. agentCommand() — agent resume turn so the agent can continue
autonomously and optionally provide additional context.
Update test to assert deliverOutboundPayloads fires before agentCommand
and verify the two-step ordering is preserved.
When a non-default agent (e.g. agent:shopping-claw:main) calls restart or
config/update with sessionKey="main", the gateway treats "main" as
resolveMainSessionKey(cfg) = the default agent's session. Previously,
isTargetingOtherSession canonicalized the target key using the CURRENT
session's agentId, so "main" mapped to the current agent's main session
rather than the default agent's — falsely treating a cross-agent request
as same-session and forwarding the wrong chat's deliveryContext.
Fix: canonicalize each key using its own agentId (resolveAgentIdFromSessionKey
on the key itself). For bare "main", this returns DEFAULT_AGENT_ID so
"main" → "agent:main:main" regardless of which agent is calling. Applied
to both the restart path and the RPC path (resolveGatewayWriteMeta).
Add two regression tests covering the cross-agent alias scenario.
Forwarding liveDeliveryContextForRpc (or liveContext for restart) when
only agentChannel is set but agentTo is missing causes the server to
prefer an incomplete deliveryContext over extractDeliveryInfo(). The
sentinel is then written without `to`, and scheduleRestartSentinelWake
bails on `if (!channel || !to)`, silently degrading to a system event
with no delivery or agent resume.
Fix: guard both liveContext (restart path) and liveDeliveryContextForRpc
(config.apply/config.patch/update.run) to require both channel and to
before forwarding.
Add gateway-tool.test.ts covering the partial-context guard for both
the restart and RPC code paths.
Fixes: chatgpt-codex-connector P2 review on #34580
opts.agentThreadId belongs to the current agent's thread. When the
restart action targets a different sessionKey, forwarding it into the
sentinel would cause scheduleRestartSentinelWake to deliver the
post-restart reply to the wrong thread.
Apply the same isTargetingOtherSession guard used for deliveryContext:
only take opts.agentThreadId when the restart targets the current
session; otherwise use extracted.threadId from extractDeliveryInfo,
which correctly derives threadId from the target session key.
When a gateway tool call (restart, config.apply, config.patch, update.run)
specifies an explicit sessionKey that differs from the current agent's
session, the live delivery context (agentChannel/agentTo/agentAccountId)
belongs to the wrong session and would misroute post-restart replies.
Only set deliveryContext in the sentinel/RPC params when:
- No explicit sessionKey is provided (falls back to own session), or
- The explicit sessionKey matches the current agent's session key
Otherwise omit deliveryContext so the server falls back to
extractDeliveryInfo(sessionKey), which correctly resolves routing for
the target session.
- Extract parseDeliveryContextFromParams() into restart-request.ts and
import it in both config.ts and update.ts, eliminating the duplicated
inline IIFE parsing in update.ts
- Add comment in gateway-tool.ts explaining why agentThreadId is
intentionally excluded from liveDeliveryContextForRpc: threadId is
reliably derived server-side from the session key via
parseSessionThreadInfo() and is not subject to heartbeat contamination
- Add beforeEach(vi.clearAllMocks) to server-restart-sentinel.test.ts
and remove ad-hoc mockClear() calls from individual tests to prevent
mock state from leaking between test cases
Fixes two related issues:
- #12768: Gateway restart notification
- #18612: Agent does not resume after self-triggered gateway restart
## Root cause
In ab4a08a82 ('fix: defer gateway restart until all replies are sent'),
agentCommand() was replaced with deliverOutboundPayloads() in
scheduleRestartSentinelWake(). This fixed a pre-restart race condition
(correct) but accidentally made delivery one-way: the user is notified
but the agent never sees the restart message and does not resume.
A compounding bug meant the sentinel was also being written with stale
routing data. extractDeliveryInfo() reads the persisted session store,
which heartbeat runs frequently overwrite to { channel: 'webchat',
to: 'heartbeat' } — an internal sink. So even restoring agentCommand()
alone would fail: the sentinel's deliveryContext was pointing nowhere.
## Fix (four parts)
**Part 1 — src/agents/openclaw-tools.ts**
Forward the live delivery fields (agentChannel, agentTo, agentThreadId,
agentAccountId) from createOpenClawTools() into createGatewayTool().
These values are captured from the current inbound message context and
are accurate; they were available at the callsite but not being passed.
**Part 2 — src/agents/tools/gateway-tool.ts**
Prefer liveDeliveryContext (built from opts.agentChannel / agentTo /
agentAccountId) over extractDeliveryInfo() when writing the sentinel.
Pass deliveryContext in config.apply, config.patch, and update.run RPC
calls so the server-side handlers receive it.
**Part 3 — src/gateway/server-restart-sentinel.ts**
Restore agentCommand() in place of deliverOutboundPayloads(). The
function runs in the new process post-restart, where there are zero
in-flight replies, so the pre-restart race condition does not apply.
agentCommand() creates a full agent turn: the restart message is
delivered to the user AND the agent sees it in its conversation history,
allowing it to resume without waiting for the user to send a new message.
**Part 4 — src/gateway/protocol/schema/config.ts**
Add deliveryContext as an optional field to ConfigApplyLikeParamsSchema
(shared by config.apply and config.patch) and UpdateRunParamsSchema.
The additionalProperties: false constraint was silently dropping the
field before it reached the server-side handlers. Also updated
resolveConfigRestartRequest() in config.ts and the update.run handler
in update.ts to prefer params.deliveryContext over extractDeliveryInfo().
## Why the heartbeat approach fails
An alternative approach (requestHeartbeatNow + enqueueSystemEvent) was
tested and rejected: the heartbeat does fire, but its delivery target
comes from the session store (lastChannel/lastTo), which reads
'webchat/heartbeat' due to heartbeat contamination. Responses route to
an internal sink and are silently dropped. agentCommand() is the correct
tool because it creates a turn with explicit delivery context attached.
- Added a test to ensure no warnings for legacy Brave config when bundled web search allowlist compatibility is applied.
- Updated validation logic to incorporate compatibility configuration for bundled web search plugins.
- Refactored the ensureRegistry function to utilize the new compatibility handling.
* test: align extension runtime mocks with plugin-sdk
Update stale extension tests to mock the plugin-sdk runtime barrels that production code now imports, and harden the Signal tool-result harness around system-event assertions so the channels lane matches current extension boundaries.
Regeneration-Prompt: |
Verify the failing channels-lane tests against current origin/main in an isolated worktree before changing anything. If the failures reproduce on main, keep the fix test-only unless production behavior is clearly wrong. Recent extension refactors moved Telegram, WhatsApp, and Signal code onto plugin-sdk runtime barrels, so update stale tests that still mock old core module paths to intercept the seams production code now uses. For Signal reaction notifications, avoid brittle assertions that depend on shared queued system-event state when a direct harness spy on enqueue behavior is sufficient. Preserve scope: only touch the failing tests and their local harness, then rerun the reproduced targeted tests plus the full channels lane and repo check gate.
* test: fix extension test drift on main
* fix: lazy-load bundled web search plugin registry
* test: make matrix sweeper failure injection portable
* fix: split heavy matrix runtime-api seams
* fix: simplify bundled web search id lookup
* test: tolerate windows env key casing
Reuse pi-ai's Anthropic client injection seam for streaming, and add
the OpenClaw-side provider discovery, auth, model catalog, and tests
needed to expose anthropic-vertex cleanly.
Signed-off-by: sallyom <somalley@redhat.com>