fix(update): guard stale-PID cleanup behind viable restart path

Addresses Codex P1 review: cleanStaleGatewayProcessesSync() ran unconditionally before the restart strategy was resolved. When using nohup (no managed service, restartScriptPath == null) runDaemonRestart() returns false, leaving the gateway killed with no replacement — a regression from pre-change behaviour. Fix: move the proactive PID cleanup inside the restartScriptPath branch where a managed service script guarantees a replacement is coming. The daemon-restart fallback path skips the proactive cleanup; the existing post-restart health check (waitForGatewayHealthyRestart) already handles stale PIDs after the new gateway is confirmed healthy.
2026-03-06 10:25:56 +05:30 · 2026-03-06 10:25:56 +05:30 · f7760757fd
commit f7760757fd
parent 127bc620fe
1 changed files with 10 additions and 4 deletions
--- a/src/cli/update-cli/update-command.ts
+++ b/src/cli/update-cli/update-command.ts
@ -590,14 +590,20 @@ async function maybeRestartService(params: {
          }
        }
      }
-      // Proactively kill any stale gateway processes (e.g. bare-process nohup gateways)
-      // holding the port before we attempt the restart. Without this, the new process
-      // fails to bind the port and openclaw update leaves two conflicting gateway PIDs.
-      cleanStaleGatewayProcessesSync();
      if (params.restartScriptPath) {
+        // A managed service restart script is available: kill stale bare-process
+        // gateway PIDs first so the service can bind the port on the way up.
+        // We only do this when we have a guaranteed restart path — killing the
+        // live gateway without a viable replacement would leave it down.
+        cleanStaleGatewayProcessesSync();
        await runRestartScript(params.restartScriptPath);
        restartInitiated = true;
      } else {
+        // No restart script — fall back to daemon restart.  Skip the proactive
+        // PID cleanup here: if the daemon is not loaded runDaemonRestart returns
+        // false and we would kill the live gateway with no replacement.  The
+        // post-restart health check (waitForGatewayHealthyRestart) already
+        // handles stale PID cleanup once the new gateway is confirmed healthy.
        restarted = await runDaemonRestart();
      }