fix(update): guard stale-PID cleanup behind viable restart path

Addresses Codex P1 review: cleanStaleGatewayProcessesSync() ran
unconditionally before the restart strategy was resolved. When using
nohup (no managed service, restartScriptPath == null) runDaemonRestart()
returns false, leaving the gateway killed with no replacement — a
regression from pre-change behaviour.

Fix: move the proactive PID cleanup inside the restartScriptPath branch
where a managed service script guarantees a replacement is coming. The
daemon-restart fallback path skips the proactive cleanup; the existing
post-restart health check (waitForGatewayHealthyRestart) already handles
stale PIDs after the new gateway is confirmed healthy.
This commit is contained in:
Ash (Bug Lab) 2026-03-06 10:25:56 +05:30
parent 127bc620fe
commit f7760757fd

View File

@ -590,14 +590,20 @@ async function maybeRestartService(params: {
}
}
}
// Proactively kill any stale gateway processes (e.g. bare-process nohup gateways)
// holding the port before we attempt the restart. Without this, the new process
// fails to bind the port and openclaw update leaves two conflicting gateway PIDs.
cleanStaleGatewayProcessesSync();
if (params.restartScriptPath) {
// A managed service restart script is available: kill stale bare-process
// gateway PIDs first so the service can bind the port on the way up.
// We only do this when we have a guaranteed restart path — killing the
// live gateway without a viable replacement would leave it down.
cleanStaleGatewayProcessesSync();
await runRestartScript(params.restartScriptPath);
restartInitiated = true;
} else {
// No restart script — fall back to daemon restart. Skip the proactive
// PID cleanup here: if the daemon is not loaded runDaemonRestart returns
// false and we would kill the live gateway with no replacement. The
// post-restart health check (waitForGatewayHealthyRestart) already
// handles stale PID cleanup once the new gateway is confirmed healthy.
restarted = await runDaemonRestart();
}