fix: drain inbound debounce buffer and followup queues before SIGUSR1 reload
When config.patch triggers a SIGUSR1 restart, two in-memory message
buffers were silently wiped:
1. Per-channel inbound debounce buffers (closure-local Map + setTimeout)
2. Followup queues (global Map of pending session messages)
This caused inbound messages received during the debounce window to be
permanently lost on config-triggered gateway restarts.
Fix:
- Add a global registry of inbound debouncers so they can be flushed
collectively during restart. Each createInboundDebouncer() call now
auto-registers in a shared Symbol.for() map, with a new flushAll()
method that immediately processes all buffered items.
- Add flushAllInboundDebouncers() which iterates the global registry
and forces all debounce timers to fire immediately.
- Add waitForFollowupQueueDrain() which polls the FOLLOWUP_QUEUES map
until all queues finish processing (or timeout).
- Hook both into the SIGUSR1 restart flow in run-loop.ts: before
markGatewayDraining(), flush all debouncers first (pushing buffered
messages into the followup queues), then wait up to 5s for the
followup drain loops to process them.
The ordering is critical: flush debouncers → wait for followup drain →
then mark draining. This ensures messages that were mid-debounce get
delivered to sessions before the gateway reinitializes.
Tests:
- flushAllInboundDebouncers: flushes multiple registered debouncers,
returns count, deregisters after flush
- createInboundDebouncer.flushAll: flushes all keys in a single debouncer
- waitForFollowupQueueDrain: immediate return when empty, waits for
drain, returns not-drained on timeout, counts draining queues
- run-loop: SIGUSR1 calls flush before markGatewayDraining, skips
followup wait when no debouncers had buffered messages, logs warning
on followup drain timeout
2026-03-14 11:54:01 -04:00
|
|
|
import { FOLLOWUP_QUEUES } from "./state.js";
|
|
|
|
|
|
|
|
|
|
/**
|
|
|
|
|
* Wait for all followup queues to finish draining, up to `timeoutMs`.
|
|
|
|
|
* Returns `{ drained: true }` if all queues are empty, or `{ drained: false }`
|
|
|
|
|
* if the timeout was reached with items still pending.
|
|
|
|
|
*
|
|
|
|
|
* Called during SIGUSR1 restart after flushing inbound debouncers, so the
|
|
|
|
|
* newly enqueued items have time to be processed before the server tears down.
|
|
|
|
|
*/
|
|
|
|
|
export async function waitForFollowupQueueDrain(
|
|
|
|
|
timeoutMs: number,
|
|
|
|
|
): Promise<{ drained: boolean; remaining: number }> {
|
|
|
|
|
const deadline = Date.now() + timeoutMs;
|
|
|
|
|
const POLL_INTERVAL_MS = 50;
|
|
|
|
|
|
|
|
|
|
const getPendingCount = (): number => {
|
|
|
|
|
let total = 0;
|
|
|
|
|
for (const queue of FOLLOWUP_QUEUES.values()) {
|
2026-03-14 15:16:48 -04:00
|
|
|
// Add 1 for the in-flight item owned by an active drain loop.
|
|
|
|
|
const queuePending = queue.items.length + (queue.draining ? 1 : 0);
|
|
|
|
|
total += queuePending;
|
fix: drain inbound debounce buffer and followup queues before SIGUSR1 reload
When config.patch triggers a SIGUSR1 restart, two in-memory message
buffers were silently wiped:
1. Per-channel inbound debounce buffers (closure-local Map + setTimeout)
2. Followup queues (global Map of pending session messages)
This caused inbound messages received during the debounce window to be
permanently lost on config-triggered gateway restarts.
Fix:
- Add a global registry of inbound debouncers so they can be flushed
collectively during restart. Each createInboundDebouncer() call now
auto-registers in a shared Symbol.for() map, with a new flushAll()
method that immediately processes all buffered items.
- Add flushAllInboundDebouncers() which iterates the global registry
and forces all debounce timers to fire immediately.
- Add waitForFollowupQueueDrain() which polls the FOLLOWUP_QUEUES map
until all queues finish processing (or timeout).
- Hook both into the SIGUSR1 restart flow in run-loop.ts: before
markGatewayDraining(), flush all debouncers first (pushing buffered
messages into the followup queues), then wait up to 5s for the
followup drain loops to process them.
The ordering is critical: flush debouncers → wait for followup drain →
then mark draining. This ensures messages that were mid-debounce get
delivered to sessions before the gateway reinitializes.
Tests:
- flushAllInboundDebouncers: flushes multiple registered debouncers,
returns count, deregisters after flush
- createInboundDebouncer.flushAll: flushes all keys in a single debouncer
- waitForFollowupQueueDrain: immediate return when empty, waits for
drain, returns not-drained on timeout, counts draining queues
- run-loop: SIGUSR1 calls flush before markGatewayDraining, skips
followup wait when no debouncers had buffered messages, logs warning
on followup drain timeout
2026-03-14 11:54:01 -04:00
|
|
|
}
|
|
|
|
|
return total;
|
|
|
|
|
};
|
|
|
|
|
|
|
|
|
|
let remaining = getPendingCount();
|
|
|
|
|
if (remaining === 0) {
|
|
|
|
|
return { drained: true, remaining: 0 };
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
while (Date.now() < deadline) {
|
|
|
|
|
await new Promise<void>((resolve) => {
|
|
|
|
|
const timer = setTimeout(resolve, Math.min(POLL_INTERVAL_MS, deadline - Date.now()));
|
|
|
|
|
timer.unref?.();
|
|
|
|
|
});
|
|
|
|
|
remaining = getPendingCount();
|
|
|
|
|
if (remaining === 0) {
|
|
|
|
|
return { drained: true, remaining: 0 };
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
return { drained: false, remaining };
|
|
|
|
|
}
|