156 Commits

Author SHA1 Message Date
Sally O'Malley
e5fe818a74
fix(gateway/ui): restore control-ui auth bypass and classify connect failures (#45512)
Merged via squash.

Prepared head SHA: 42b5595edec71897b479b3bbaa94bcb4ac6fab17
Co-authored-by: sallyom <11166065+sallyom@users.noreply.github.com>
Co-authored-by: BunsDev <68980965+BunsDev@users.noreply.github.com>
Reviewed-by: @BunsDev
2026-03-13 20:13:35 -05:00
Peter Steinberger
4523260dda test: share gateway route auth helpers 2026-03-14 00:35:07 +00:00
Peter Steinberger
4674fbf923 refactor: share handshake auth helper builders 2026-03-13 21:40:53 +00:00
Peter Steinberger
2f58647033 refactor: share plugin route auth test harness 2026-03-13 18:38:12 +00:00
Peter Steinberger
db9c755045 refactor: share readiness test harness 2026-03-13 18:38:11 +00:00
Peter Steinberger
7dc447f79f fix(gateway): strip unbound scopes for shared-auth connects 2026-03-13 02:51:55 +00:00
Vincent Koc
8661c271e9 Gateway: preserve trusted-proxy browser scopes 2026-03-12 21:00:43 -04:00
Peter Steinberger
01e4845f6d
refactor: extract websocket handshake auth helpers 2026-03-12 22:46:28 +00:00
Peter Steinberger
bf89947a8e
fix: switch pairing setup codes to bootstrap tokens 2026-03-12 22:23:07 +00:00
Peter Steinberger
445ff0242e
refactor(gateway): cache hook proxy config in runtime state 2026-03-12 21:43:36 +00:00
Peter Steinberger
4da617e178
fix(gateway): honor trusted proxy hook auth rate limits 2026-03-12 21:35:57 +00:00
Vincent Koc
5e389d5e7c
Gateway/ws: clear unbound scopes for shared-token auth (#44306)
* Gateway/ws: clear unbound shared-auth scopes

* Gateway/auth: cover shared-token scope stripping

* Changelog: add shared-token scope stripping entry

* Gateway/ws: preserve allowed control-ui scopes

* Gateway/auth: assert control-ui admin scopes survive allowed device-less auth

* Gateway/auth: cover shared-password scope stripping
2026-03-12 14:52:24 -04:00
Vincent Koc
eff0d5a947
Hardening: tighten preauth WebSocket handshake limits (#44089)
* Gateway: tighten preauth handshake limits

* Changelog: note WebSocket preauth hardening

* Gateway: count preauth frame bytes accurately

* Gateway: cap WebSocket payloads before auth
2026-03-12 10:55:41 -04:00
Robin Waslander
ebed3bbde1
fix(gateway): enforce browser origin check regardless of proxy headers
In trusted-proxy mode, enforceOriginCheckForAnyClient was set to false
whenever proxy headers were present. This allowed browser-originated
WebSocket connections from untrusted origins to bypass origin validation
entirely, as the check only ran for control-ui and webchat client types.

An attacker serving a page from an untrusted origin could connect through
a trusted reverse proxy, inherit proxy-injected identity, and obtain
operator.admin access via the sharedAuthOk / roleCanSkipDeviceIdentity
path without any origin restriction.

Remove the hasProxyHeaders exemption so origin validation runs for all
browser-originated connections regardless of how the request arrived.

Fixes GHSA-5wcw-8jjv-m286
2026-03-12 01:16:52 +01:00
Robin Waslander
a1520d70ff
fix(gateway): propagate real gateway client into plugin subagent runtime
Plugin subagent dispatch used a hardcoded synthetic client carrying
operator.admin, operator.approvals, and operator.pairing for all
runtime.subagent.* calls. Plugin HTTP routes with auth:"plugin" require
no gateway auth by design, so an unauthenticated external request could
drive admin-only gateway methods (sessions.delete, agent.run) through
the subagent runtime.

Propagate the real gateway client into the plugin runtime request scope
when one is available. Plugin HTTP routes now run inside a scoped
runtime client: auth:"plugin" routes receive a non-admin synthetic
operator.write client; gateway-authenticated routes retain admin-capable
scopes. The security boundary is enforced at the HTTP handler level.

Fixes GHSA-xw77-45gv-p728
2026-03-11 14:17:01 +01:00
Josh Avant
a76e810193
fix(gateway): harden token fallback/reconnect behavior and docs (#42507)
* fix(gateway): harden token fallback and auth reconnect handling

* docs(gateway): clarify auth retry and token-drift recovery

* fix(gateway): tighten auth reconnect gating across clients

* fix: harden gateway token retry (#42507) (thanks @joshavant)
2026-03-10 17:05:57 -05:00
Mariano
d4e59a3666
Cron: enforce cron-owned delivery contract (#40998)
Merged via squash.

Prepared head SHA: 5877389e33d5b3a518925b5793a6f6294cb3fb3d
Co-authored-by: mbelinky <132747814+mbelinky@users.noreply.github.com>
Co-authored-by: mbelinky <132747814+mbelinky@users.noreply.github.com>
Reviewed-by: @mbelinky
2026-03-09 20:12:37 +01:00
Tak Hoffman
d9e8e8ac15
fix: resolve live config paths in status and gateway metadata (#39952)
* fix: resolve live config paths in status and gateway metadata

* fix: resolve remaining runtime config path references

* test: cover gateway config.set config path response
2026-03-08 09:59:32 -05:00
Peter Steinberger
ac86deccee fix(gateway): harden plugin HTTP route auth 2026-03-07 19:55:06 +00:00
Peter Steinberger
4113a0f39e refactor(gateway): dedupe readiness healthy snapshot fixtures 2026-03-07 17:58:31 +00:00
ql-wade
a5c07fa115
fix(gateway): skip stale-socket restarts for Telegram polling (openclaw#38405)
Verified:
- pnpm build
- pnpm check
- pnpm test:macmini

Co-authored-by: ql-wade <262266039+ql-wade@users.noreply.github.com>
2026-03-07 00:20:34 -06:00
Vincent Koc
ab5fcfcc01
feat(gateway): add channel-backed readiness probes (#38285)
* Changelog: add channel-backed readiness probe entry

* Gateway: add channel-backed readiness probes

* Docs: describe readiness probe behavior

* Gateway: add readiness probe regression tests

* Changelog: dedupe gateway probe entries

* Docs: fix readiness startup grace description

* Changelog: remove stale readiness entry

* Gateway: cover readiness hardening

* Gateway: harden readiness probes
2026-03-06 15:15:23 -05:00
Tak Hoffman
1be39d4250
fix(gateway): synthesize lifecycle robustness for restart and startup probes (#33831)
* fix(gateway): correct launchctl command sequence for gateway restart (closes #20030)

* fix(restart): expand HOME and escape label in launchctl plist path

* fix(restart): poll port free after SIGKILL to prevent EADDRINUSE restart loop

When cleanStaleGatewayProcessesSync() kills a stale gateway process,
the kernel may not immediately release the TCP port. Previously the
function returned after a fixed 500ms sleep (300ms SIGTERM + 200ms
SIGKILL), allowing triggerOpenClawRestart() to hand off to systemd
before the port was actually free. The new systemd process then raced
the dying socket for port 18789, hit EADDRINUSE, and exited with
status 1, causing systemd to retry indefinitely — the zombie restart
loop reported in #33103.

Fix: add waitForPortFreeSync() that polls lsof at 50ms intervals for
up to 2 seconds after SIGKILL. cleanStaleGatewayProcessesSync() now
blocks until the port is confirmed free (or the budget expires with a
warning) before returning. The increased SIGTERM/SIGKILL wait budgets
(600ms / 400ms) also give slow processes more time to exit cleanly.

Fixes #33103
Related: #28134

* fix: add EADDRINUSE retry and TIME_WAIT port-bind checks for gateway startup

* fix(ports): treat EADDRNOTAVAIL as non-retryable and fix flaky test

* fix(gateway): hot-reload agents.defaults.models allowlist changes

The reload plan had a rule for `agents.defaults.model` (singular) but
not `agents.defaults.models` (plural — the allowlist array).  Because
`agents.defaults.models` does not prefix-match `agents.defaults.model.`,
it fell through to the catch-all `agents` tail rule (kind=none), so
allowlist edits in openclaw.json were silently ignored at runtime.

Add a dedicated reload rule so changes to the models allowlist trigger
a heartbeat restart, which re-reads the config and serves the updated
list to clients.

Fixes #33600

Co-authored-by: HCL <chenglunhu@gmail.com>
Signed-off-by: HCL <chenglunhu@gmail.com>

* test(restart): 100% branch coverage — audit round 2

Audit findings fixed:
- remove dead guard: terminateStaleProcessesSync pids.length===0 check was
  unreachable (only caller cleanStaleGatewayProcessesSync already guards)
- expose __testing.callSleepSyncRaw so sleepSync's real Atomics.wait path
  can be unit-tested directly without going through the override
- fix broken sleepSync Atomics.wait test: previous test set override=null
  but cleanStaleGatewayProcessesSync returned before calling sleepSync —
  replaced with direct callSleepSyncRaw calls that actually exercise L36/L42-47
- fix pid collision: two tests used process.pid+304 (EPERM + dead-at-SIGTERM);
  EPERM test changed to process.pid+305
- fix misindented tests: 'deduplicates pids' and 'lsof status 1 container
  edge case' were outside their intended describe blocks; moved to correct
  scopes (findGatewayPidsOnPortSync and pollPortOnce respectively)
- add missing branch tests:
  - status 1 + non-empty stdout with zero openclaw pids → free:true (L145)
  - mid-loop non-openclaw cmd in &&-chain (L67)
  - consecutive p-lines without c-line between them (L67)
  - invalid PID in p-line (p0 / pNaN) — ternary false branch (L67)
  - unknown lsof output line (else-if false branch L69)

Coverage: 100% stmts / 100% branch / 100% funcs / 100% lines (36 tests)

* test(restart): fix stale-pid test typing for tsgo

* fix(gateway): address lifecycle review findings

* test(update): make restart-helper path assertions windows-safe

---------

Signed-off-by: HCL <chenglunhu@gmail.com>
Co-authored-by: Glucksberg <markuscontasul@gmail.com>
Co-authored-by: Efe Büken <efe@arven.digital>
Co-authored-by: Riccardo Marino <rmarino@apple.com>
Co-authored-by: HCL <chenglunhu@gmail.com>
2026-03-03 21:31:12 -06:00
Liu Xiaopai
ae29842158
Gateway: fix stale self version in status output (#32655)
Merged via squash.

Prepared head SHA: b9675d1f90ef0eabb7e68c24a72d4b2fb27def22
Co-authored-by: liuxiaopai-ai <73659136+liuxiaopai-ai@users.noreply.github.com>
Co-authored-by: gumadeiras <5599352+gumadeiras@users.noreply.github.com>
Reviewed-by: @gumadeiras
2026-03-03 02:41:52 -05:00
Peter Steinberger
8768487aee refactor(shared): dedupe protocol schema typing and session/media helpers 2026-03-02 19:57:33 +00:00
Peter Steinberger
7a7eee920a refactor(gateway): harden plugin http route contracts 2026-03-02 16:48:00 +00:00
Peter Steinberger
33e76db12a refactor(gateway): scope ws origin fallback metrics to runtime 2026-03-02 16:47:00 +00:00
Peter Steinberger
d5ae4b8337 fix(gateway): require local client for loopback origin fallback 2026-03-02 16:37:45 +00:00
Peter Steinberger
2fd8264ab0 refactor(gateway): hard-break plugin wildcard http handlers 2026-03-02 16:24:06 +00:00
Peter Steinberger
93b0724025 fix(gateway): fail closed plugin auth path canonicalization 2026-03-02 15:55:32 +00:00
Peter Steinberger
d3e0c0b29c test(gateway): dedupe gateway and infra test scaffolds 2026-03-02 07:13:10 +00:00
Sid
e1e715c53d
fix(gateway): skip device pairing for local backend self-connections (#30801)
* fix(gateway): skip device pairing for local backend self-connections

When gateway.tls is enabled, sessions_spawn (and other internal
callGateway operations) creates a new WebSocket to the gateway.
The gateway treated this self-connection like any external client
and enforced device pairing, rejecting it with "pairing required"
(close code 1008). This made sub-agent spawning impossible when
TLS was enabled in Docker with bind: "lan".

Skip pairing for connections that are gateway-client self-connections
from localhost with valid shared auth (token/password). These are
internal backend calls (e.g. sessions_spawn, subagent-announce) that
already have valid credentials and connect from the same host.

Closes #30740

* gateway: tighten backend self-pair bypass guard

* tests: cover backend self-pairing local-vs-remote auth path

* changelog: add gateway tls pairing fix credit

---------

Co-authored-by: Vincent Koc <vincentkoc@ieee.org>
2026-03-01 21:46:33 -08:00
Peter Steinberger
2d31126e6a refactor(shared): extract reused path and normalization helpers 2026-03-02 05:20:19 +00:00
Peter Steinberger
cef5fae0a2 refactor(gateway): dedupe origin seeding and plugin route auth matching 2026-03-02 00:42:22 +00:00
Peter Steinberger
53d10f8688 fix(gateway): land access/auth/config migration cluster
Land #28960 by @Glucksberg (Tailscale origin auto-allowlist).
Land #29394 by @synchronic1 (allowedOrigins upgrade migration).
Land #29198 by @Mariana-Codebase (plugin HTTP auth guard + route precedence).
Land #30910 by @liuxiaopai-ai (tailscale bind/config.patch guard).

Co-authored-by: Glucksberg <markuscontasul@gmail.com>
Co-authored-by: synchronic1 <synchronic1@users.noreply.github.com>
Co-authored-by: Mariana Sinisterra <mariana.data@outlook.com>
Co-authored-by: liuxiaopai-ai <73659136+liuxiaopai-ai@users.noreply.github.com>
2026-03-02 00:10:51 +00:00
Ayaan Zaidi
54eaf17327 feat(gateway): add node canvas capability refresh flow 2026-02-27 12:16:36 +05:30
Vincent Koc
cb9374a2a1
Gateway: improve device-auth v2 migration diagnostics (#28305)
* Gateway: add device-auth detail code resolver

* Gateway: emit specific device-auth detail codes

* Gateway tests: cover nonce and signature detail codes

* Docs: add gateway device-auth migration diagnostics

* Docs: add device-auth v2 troubleshooting signatures
2026-02-26 21:05:43 -08:00
Peter Steinberger
081b1aa1ed refactor(gateway): unify v3 auth payload builders and vectors 2026-02-26 15:08:50 +01:00
Kevin Shenghui
9c142993b8 fix: preserve operator scopes for shared auth connections
When connecting via shared gateway token (no device identity),
the operator scopes were being cleared, causing API operations
to fail with 'missing scope' errors.

This fix preserves scopes when sharedAuthOk is true, allowing
headless/API operator clients to retain their requested scopes.

Fixes #27494

(cherry picked from commit c71c8948bd693de0391f861c31d4d6c2cce96061)
2026-02-26 13:40:58 +00:00
Peter Steinberger
7d8aeaaf06 fix(gateway): pin paired reconnect metadata for node policy 2026-02-26 14:11:04 +01:00
Peter Steinberger
4b71de384c fix(core): unify session-key normalization and plugin boundary checks 2026-02-26 12:41:23 +00:00
Peter Steinberger
0cc3e8137c refactor(gateway): centralize trusted-proxy control-ui bypass policy 2026-02-26 02:26:52 +01:00
Peter Steinberger
ec45c317f5 fix(gateway): block trusted-proxy control-ui node bypass 2026-02-26 01:54:19 +01:00
Peter Steinberger
20c2db2103 refactor(gateway): split browser auth hardening paths 2026-02-26 01:37:00 +01:00
Peter Steinberger
c736f11a16 fix(gateway): harden browser websocket auth chain 2026-02-26 01:22:49 +01:00
Peter Steinberger
8d1481cb4a fix(gateway): require pairing for unpaired operator device auth 2026-02-26 00:52:50 +01:00
SidQin-cyber
20523b918a fix(gateway): allow trusted-proxy control-ui auth to skip device pairing
Control UI connections authenticated via gateway.auth.mode=trusted-proxy were
still forced through device pairing because pairing bypass only considered
shared token/password auth (sharedAuthOk). In trusted-proxy deployments,
this produced persistent "pairing required" failures despite valid trusted
proxy headers.

Treat authenticated trusted-proxy control-ui connections as pairing-bypass
eligible and allow missing device identity in that mode.

Fixes #25293

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-02-24 14:17:54 +00:00
Marco Di Dionisio
83689fc838 fix: include trusted-proxy in sharedAuthOk check
In trusted-proxy mode, sharedAuthResult is null because hasSharedAuth
only triggers for token/password in connectParams.auth. But the primary
auth (authResult) already validated the trusted-proxy — the connection
came from a CIDR in trustedProxies with a valid userHeader. This IS
shared auth semantically (the proxy vouches for identity), so operator
connections should be able to skip device identity.

Without this fix, trusted-proxy operator connections are rejected with
"device identity required" because roleCanSkipDeviceIdentity() sees
sharedAuthOk=false.

(cherry picked from commit e87048a6a650d391e1eb5704546eb49fac5f0091)
2026-02-24 04:33:51 +00:00
Peter Steinberger
223d7dc23d feat(gateway)!: require explicit non-loopback control-ui origins 2026-02-24 01:57:11 +00:00
Vincent Koc
7fb69b7cd2
Gateway: stop repeated unauthorized WS request floods per connection (#24294)
* Gateway WS: add unauthorized flood guard primitive

* Gateway WS: close repeated unauthorized post-handshake request floods

* Gateway WS: test unauthorized flood guard behavior

* Changelog: note gateway WS unauthorized flood guard hardening

* Update CHANGELOG.md
2026-02-23 09:58:47 -05:00