feat: split restart sentinel into user-facing and internal context messages

- Add formatRestartSentinelUserMessage for clean user-facing delivery
  (omits status prefix and doctorHint)
- Add formatRestartSentinelInternalContext for agent system prompt injection
  with full diagnostic details
- Update server-restart-sentinel to use userMessage for direct channel delivery
- Add tests for new formatting functions
This commit is contained in:
Bryan Marty 2026-03-11 04:38:00 +00:00
parent 29206a9bc6
commit 3e1bee931a
No known key found for this signature in database
7 changed files with 238 additions and 16 deletions

View File

@ -214,6 +214,37 @@ Think of it like a human reviewing their journal and updating their mental model
The goal: Be helpful without being annoying. Check in a few times a day, do useful background work, but respect quiet time.
## 🔄 Gateway Restarts — Do It Right!
**Never use `openclaw gateway restart` (CLI/shell).** It bypasses the restart sentinel, so you won't auto-resume or notify the user after restart. You'll just sit there silently until someone pings you.
**Always restart via the gateway tool** (action=restart) or via `config.patch`/`config.apply` — these write a sentinel file before restarting, which the new process consumes to wake you up and message the user automatically.
```bash
# ✅ Correct: restart via gateway tool (action=restart, sessionKey, note)
# ✅ Correct: config.patch with a key that requires restart (writes sentinel automatically)
# ❌ Wrong: openclaw gateway restart — no sentinel, silent after restart
# ❌ Wrong: systemctl --user restart openclaw-gateway.service — same problem
```
### Which config keys trigger a real restart vs dynamic reload?
**Full process restart** (sentinel written, agent wakes up):
- `gateway.*`, `discovery.*`, `plugins.*`, `canvasHost.*`
- Any unrecognized/new config key
**Hot reload** (no restart, no sentinel needed):
- `hooks.*`, `cron.*`, `browser.*`, `models.*`, `agents.defaults.heartbeat`
**Dynamic no-op** (read on next access, no process action):
- `messages.*`, `agents.*`, `tools.*`, `routing.*`, `session.*`, `skills.*`, `secrets.*`, `meta.*`, `identity.*`, `logging.*`, `ui.*`
**Rule of thumb:** If you want a test restart, patch `discovery.mdns.mode` to its current value — it's recognized as a restart-triggering key even if the value is unchanged.
## Make It Yours
This is a starting point. Add your own conventions, style, and rules as you figure out what works.

View File

@ -120,6 +120,29 @@ git commit -m "Add Clawd workspace"
- **bird** — X/Twitter CLI无需浏览器即可发推、回复、阅读话题和搜索。
- **agent-tools** — 用于自动化和辅助脚本的实用工具包。
## 🔄 网关重启 — 正确做法!
**永远不要使用 `openclaw gateway restart`CLI/shell。** 这会绕过重启哨兵机制,导致重启后你无法自动恢复,也无法通知用户。你会静静地等待,直到有人 ping 你。
**始终通过 gateway 工具**action=restart`config.patch`/`config.apply` 触发重启——这些方式会在重启前写入哨兵文件,新进程启动后会消费该文件以唤醒你并自动通知用户。
### 哪些配置键会触发真正的重启?
**完整进程重启**(写入哨兵,代理唤醒):
- `gateway.*``discovery.*``plugins.*``canvasHost.*`
- 任何无法识别的新配置键
**热重载**(无需重启,无需哨兵):
- `hooks.*``cron.*``browser.*``models.*``agents.defaults.heartbeat`
**动态无操作**(下次访问时读取,不触发任何进程操作):
- `messages.*``agents.*``tools.*``routing.*``session.*``skills.*``secrets.*``meta.*`
**经验法则:** 如果需要测试重启,将 `discovery.mdns.mode` 修改为当前值——即使值未改变,它也会触发重启流程。
## 使用说明
- 脚本编写优先使用 `openclaw` CLImac 应用处理权限。

View File

@ -80,7 +80,7 @@ export function createGatewayTool(opts?: {
name: "gateway",
ownerOnly: true,
description:
"Restart, inspect a specific config schema path, apply config, or update the gateway in-place (SIGUSR1). Use config.schema.lookup with a targeted dot path before config edits. Use config.patch for safe partial config updates (merges with existing). Use config.apply only when replacing entire config. Both trigger restart after writing. Always pass a human-readable completion message via the `note` parameter so the system can deliver it to the user after restart.",
"Restart, inspect a specific config schema path, apply config, or update the gateway in-place (SIGUSR1). Use config.schema.lookup with a targeted dot path before config edits. Use config.patch for safe partial config updates (merges with existing). Use config.apply only when replacing entire config. Both trigger restart after writing. Always pass a human-readable completion message via the `note` parameter so the system can deliver it to the user after restart. IMPORTANT: Never use the `openclaw gateway restart` CLI command to restart — it bypasses the restart sentinel so the agent will not auto-resume or notify the user after restart. Always restart via this tool (action=restart) or via config.patch/config.apply, which write the sentinel before restarting. Config keys under gateway.*, discovery.*, plugins.*, and canvasHost.* trigger a real process restart; keys under messages.*, agents.*, tools.*, hooks.*, and most others apply dynamically without a restart.",
parameters: GatewayToolSchema,
execute: async (_toolCallId, args) => {
const params = args as Record<string, unknown>;

View File

@ -13,6 +13,10 @@ const mocks = vi.hoisted(() => ({
},
})),
formatRestartSentinelMessage: vi.fn(() => "restart message"),
formatRestartSentinelUserMessage: vi.fn(() => "Gateway restarted successfully."),
formatRestartSentinelInternalContext: vi.fn(
() => "[Gateway restart context — internal]\nkind: restart\nstatus: ok",
),
summarizeRestartSentinel: vi.fn(() => "restart summary"),
resolveMainSessionKeyFromConfig: vi.fn(() => "agent:main:main"),
parseSessionThreadInfo: vi.fn(() => ({ baseSessionKey: null, threadId: undefined })),
@ -39,6 +43,8 @@ vi.mock("../agents/agent-scope.js", () => ({
vi.mock("../infra/restart-sentinel.js", () => ({
consumeRestartSentinel: mocks.consumeRestartSentinel,
formatRestartSentinelMessage: mocks.formatRestartSentinelMessage,
formatRestartSentinelUserMessage: mocks.formatRestartSentinelUserMessage,
formatRestartSentinelInternalContext: mocks.formatRestartSentinelInternalContext,
summarizeRestartSentinel: mocks.summarizeRestartSentinel,
}));
@ -105,21 +111,22 @@ describe("scheduleRestartSentinelWake two-step delivery + resume", () => {
it("delivers restart notice directly (model-independent) then resumes agent", async () => {
await scheduleRestartSentinelWake({ deps: {} as never });
// Step 1: deterministic delivery
// Step 1: deterministic delivery — uses human-friendly userMessage, not raw diagnostic
expect(mocks.deliverOutboundPayloads).toHaveBeenCalledWith(
expect.objectContaining({
channel: "whatsapp",
to: "+15550002",
accountId: "acct-2",
payloads: [{ text: "restart message" }],
payloads: [{ text: "Gateway restarted successfully." }],
bestEffort: true,
}),
);
// Step 2: agent resume
// Step 2: agent resume — userMessage as prompt, internalContext via extraSystemPrompt
expect(mocks.agentCommand).toHaveBeenCalledWith(
expect.objectContaining({
message: "restart message",
message: "Gateway restarted successfully.",
extraSystemPrompt: expect.stringContaining("[Gateway restart context"),
sessionKey: "agent:main:main",
to: "+15550002",
channel: "whatsapp",
@ -190,7 +197,7 @@ describe("scheduleRestartSentinelWake fallback to enqueueSystemEvent", () =>
await scheduleRestartSentinelWake({ deps: {} as never });
expect(mocks.agentCommand).not.toHaveBeenCalled();
expect(mocks.enqueueSystemEvent).toHaveBeenCalledWith("restart message", {
expect(mocks.enqueueSystemEvent).toHaveBeenCalledWith("Gateway restarted successfully.", {
sessionKey: "agent:main:main",
});
});
@ -204,7 +211,7 @@ describe("scheduleRestartSentinelWake fallback to enqueueSystemEvent", () =>
await scheduleRestartSentinelWake({ deps: {} as never });
expect(mocks.agentCommand).not.toHaveBeenCalled();
expect(mocks.enqueueSystemEvent).toHaveBeenCalledWith("restart message", {
expect(mocks.enqueueSystemEvent).toHaveBeenCalledWith("Gateway restarted successfully.", {
sessionKey: "agent:main:main",
});
});
@ -218,7 +225,7 @@ describe("scheduleRestartSentinelWake fallback to enqueueSystemEvent", () =>
await scheduleRestartSentinelWake({ deps: {} as never });
expect(mocks.agentCommand).not.toHaveBeenCalled();
expect(mocks.enqueueSystemEvent).toHaveBeenCalledWith("restart message", {
expect(mocks.enqueueSystemEvent).toHaveBeenCalledWith("Gateway restarted successfully.", {
sessionKey: "agent:main:main",
});
});
@ -254,7 +261,7 @@ describe("scheduleRestartSentinelWake fallback to enqueueSystemEvent", () =>
await scheduleRestartSentinelWake({ deps: {} as never });
expect(mocks.enqueueSystemEvent).toHaveBeenCalledWith("restart message", {
expect(mocks.enqueueSystemEvent).toHaveBeenCalledWith("Gateway restarted successfully.", {
sessionKey: "agent:main:main",
});
// Resume step must still run after delivery failure

View File

@ -9,7 +9,9 @@ import { buildOutboundSessionContext } from "../infra/outbound/session-context.j
import { resolveOutboundTarget } from "../infra/outbound/targets.js";
import {
consumeRestartSentinel,
formatRestartSentinelInternalContext,
formatRestartSentinelMessage,
formatRestartSentinelUserMessage,
summarizeRestartSentinel,
} from "../infra/restart-sentinel.js";
import { enqueueSystemEvent } from "../infra/system-events.js";
@ -24,7 +26,12 @@ export async function scheduleRestartSentinelWake(params: { deps: CliDeps }) {
}
const payload = sentinel.payload;
const sessionKey = payload.sessionKey?.trim();
// Raw diagnostic message (used for system events and enqueue fallbacks).
const message = formatRestartSentinelMessage(payload);
// Human-friendly message for direct user delivery — omits status prefix and doctorHint.
const userMessage = formatRestartSentinelUserMessage(payload);
// Full technical context injected into the agent's system prompt.
const internalContext = formatRestartSentinelInternalContext(payload);
const summary = summarizeRestartSentinel(payload);
if (!sessionKey) {
@ -56,7 +63,7 @@ export async function scheduleRestartSentinelWake(params: { deps: CliDeps }) {
const channel = channelRaw ? normalizeChannelId(channelRaw) : null;
const to = origin?.to;
if (!channel || !to) {
enqueueSystemEvent(message, { sessionKey });
enqueueSystemEvent(userMessage, { sessionKey });
return;
}
@ -68,7 +75,7 @@ export async function scheduleRestartSentinelWake(params: { deps: CliDeps }) {
mode: "implicit",
});
if (!resolved.ok) {
enqueueSystemEvent(message, { sessionKey });
enqueueSystemEvent(userMessage, { sessionKey });
return;
}
@ -78,7 +85,9 @@ export async function scheduleRestartSentinelWake(params: { deps: CliDeps }) {
sessionThreadId ??
(origin?.threadId != null ? String(origin.threadId) : undefined);
// Step 1: deliver the restart notice deterministically — model-independent, guaranteed.
// Step 1: deliver a human-friendly restart notice deterministically — model-independent,
// guaranteed. Uses userMessage (omits raw diagnostic fields like status prefix and
// doctorHint) so the user sees a clean message even if the agent turn in Step 2 fails.
// Slack uses replyToId (thread_ts) for threading; deliverOutboundPayloads does not do
// this mapping automatically, so we convert here. See #17716.
const isSlack = channel === "slack";
@ -93,7 +102,7 @@ export async function scheduleRestartSentinelWake(params: { deps: CliDeps }) {
accountId: origin?.accountId,
replyToId,
threadId: resolvedThreadId,
payloads: [{ text: message }],
payloads: [{ text: userMessage }],
session: outboundSession,
bestEffort: true,
});
@ -102,18 +111,22 @@ export async function scheduleRestartSentinelWake(params: { deps: CliDeps }) {
// If it does throw (channel plugin/runtime error before best-effort handling is applied),
// enqueue a system event so the user receives the restart notice even on delivery failure.
// This preserves the prior behaviour where delivery errors in this path produced a fallback event.
enqueueSystemEvent(message, { sessionKey });
enqueueSystemEvent(userMessage, { sessionKey });
}
// Step 2: trigger an agent resume turn so the agent can continue autonomously
// after restart. The model sees the restart context and can respond/take actions.
// internalContext is injected via extraSystemPrompt so the agent has full technical
// details (kind, status, note, doctorHint) without exposing raw diagnostics as a
// user-visible chat message. The agent's reply is what the user ultimately sees.
// This is safe post-restart: scheduleRestartSentinelWake() runs in the new process
// with zero in-flight replies, so the pre-restart race condition (ab4a08a82) does
// not apply here.
try {
await agentCommand(
{
message,
message: userMessage,
extraSystemPrompt: internalContext,
sessionKey,
to: resolved.to,
channel,

View File

@ -5,8 +5,9 @@ import { afterEach, beforeEach, describe, expect, it } from "vitest";
import { captureEnv } from "../test-utils/env.js";
import {
consumeRestartSentinel,
formatDoctorNonInteractiveHint,
formatRestartSentinelInternalContext,
formatRestartSentinelMessage,
formatRestartSentinelUserMessage,
readRestartSentinel,
resolveRestartSentinelPath,
summarizeRestartSentinel,
@ -160,6 +161,112 @@ describe("restart sentinel", () => {
});
});
describe("formatRestartSentinelUserMessage", () => {
it("returns note for successful restart with note", () => {
const payload = {
kind: "config-patch" as const,
status: "ok" as const,
ts: Date.now(),
message: "testing restart sentinel",
doctorHint: "Run: openclaw doctor --non-interactive",
};
const result = formatRestartSentinelUserMessage(payload);
expect(result).toBe("testing restart sentinel");
expect(result).not.toContain("Gateway restart");
expect(result).not.toContain("config-patch");
expect(result).not.toContain("doctor");
});
it("returns generic success message when no note", () => {
const payload = {
kind: "update" as const,
status: "ok" as const,
ts: Date.now(),
};
expect(formatRestartSentinelUserMessage(payload)).toBe("Gateway restarted successfully.");
});
it("returns failure message with note for error status", () => {
const payload = {
kind: "config-apply" as const,
status: "error" as const,
ts: Date.now(),
message: "disk full",
};
const result = formatRestartSentinelUserMessage(payload);
expect(result).toBe("Gateway restart failed: disk full");
});
it("returns generic failure message for error without note", () => {
const payload = {
kind: "restart" as const,
status: "error" as const,
ts: Date.now(),
};
expect(formatRestartSentinelUserMessage(payload)).toBe("Gateway restart failed.");
});
it("never includes doctorHint", () => {
const payload = {
kind: "config-patch" as const,
status: "ok" as const,
ts: Date.now(),
message: "applied config",
doctorHint: "Run: openclaw doctor --non-interactive",
};
expect(formatRestartSentinelUserMessage(payload)).not.toContain("doctor");
expect(formatRestartSentinelUserMessage(payload)).not.toContain("openclaw");
});
});
describe("formatRestartSentinelInternalContext", () => {
it("includes kind, status, note, and doctorHint", () => {
const payload = {
kind: "config-patch" as const,
status: "ok" as const,
ts: Date.now(),
message: "testing restart sentinel",
doctorHint: "Run: openclaw doctor --non-interactive",
stats: { mode: "gateway.config-patch", reason: "discovery.mdns.mode changed" },
};
const result = formatRestartSentinelInternalContext(payload);
expect(result).toContain("kind: config-patch");
expect(result).toContain("status: ok");
expect(result).toContain("note: testing restart sentinel");
expect(result).toContain("hint: Run: openclaw doctor");
expect(result).toContain("mode: gateway.config-patch");
expect(result).toContain("reason: discovery.mdns.mode changed");
expect(result).toContain("internal");
});
it("omits empty optional fields", () => {
const payload = {
kind: "restart" as const,
status: "ok" as const,
ts: Date.now(),
};
const result = formatRestartSentinelInternalContext(payload);
expect(result).not.toContain("note:");
expect(result).not.toContain("hint:");
expect(result).not.toContain("reason:");
expect(result).not.toContain("mode:");
});
it("omits reason when it duplicates note", () => {
const note = "Applying config changes";
const payload = {
kind: "config-apply" as const,
status: "ok" as const,
ts: Date.now(),
message: note,
stats: { reason: note },
};
const result = formatRestartSentinelInternalContext(payload);
const noteOccurrences = result.split(note).length - 1;
expect(noteOccurrences).toBe(1);
});
});
describe("restart sentinel message dedup", () => {
it("omits duplicate Reason: line when stats.reason matches message", () => {
const payload = {

View File

@ -127,6 +127,47 @@ export function formatRestartSentinelMessage(payload: RestartSentinelPayload): s
return lines.join("\n");
}
/**
* Human-friendly message for direct user delivery after a gateway restart.
* Omits raw diagnostic fields (status prefix, doctorHint) those belong in
* the agent's internal context, not in the user-facing chat message.
*/
export function formatRestartSentinelUserMessage(payload: RestartSentinelPayload): string {
const note = payload.message?.trim();
if (payload.status === "error") {
return note ? `Gateway restart failed: ${note}` : "Gateway restart failed.";
}
return note ?? "Gateway restarted successfully.";
}
/**
* Full technical restart context injected into the agent's system prompt so
* it can reason about and respond to the restart without exposing raw
* diagnostic text directly in the user-facing chat message.
*/
export function formatRestartSentinelInternalContext(payload: RestartSentinelPayload): string {
const lines: string[] = [
"[Gateway restart context — internal, do not surface raw details to user]",
`kind: ${payload.kind}`,
`status: ${payload.status}`,
];
const note = payload.message?.trim();
if (note) {
lines.push(`note: ${note}`);
}
const reason = payload.stats?.reason?.trim();
if (reason && reason !== note) {
lines.push(`reason: ${reason}`);
}
if (payload.stats?.mode?.trim()) {
lines.push(`mode: ${payload.stats.mode.trim()}`);
}
if (payload.doctorHint?.trim()) {
lines.push(`hint: ${payload.doctorHint.trim()}`);
}
return lines.join("\n");
}
export function summarizeRestartSentinel(payload: RestartSentinelPayload): string {
const kind = payload.kind;
const status = payload.status;