sort -V is a GNU extension; BSD sort on macOS does not support it. When
node is absent from PATH and the nvm fallback runs, set -euo pipefail
causes the unsupported flag to abort the hook before lint/format can
run, blocking commits on macOS.
Replace the sort -V | tail -1 pipeline with a Bash for-loop that
zero-pads each semver component to five digits and emits a tab-delimited
key+path line. Plain sort + tail -1 + cut then selects the highest
semantic version — no GNU-only flags required.
Smoke-tested with v18 vs v22 paths; v22 is correctly selected on both
GNU and BSD sort.
appendRecord previously called fs.writeFile(token-usage.json, …) directly.
A process crash or SIGKILL during that write can leave the file truncated;
readJsonArray then throws (SyntaxError), and since attempt.ts swallows the
error with .catch(), that one interrupted write silently disables all future
token logging for the workspace until the file is manually repaired.
Fix: write the new content to a uniquely-named sibling temp file first, then
call fs.rename() to atomically replace the real file. rename(2) is atomic on
POSIX when src and dst share the same directory/filesystem, so readers always
see either the old complete file or the new complete file — never a partial
write. The temp file is unlinked on error to avoid leaving orphans.
A process killed or crashed after creating token-usage.json.lock but
before the finally-unlink runs leaves a permanent stale lock. All
subsequent recordTokenUsage calls for that workspace time out and drop
their entries.
Fix:
- Write the holder's PID into the lock file on acquisition (O_EXCL + writeFile).
- On each EEXIST retry, call isLockStale() which reads the PID and sends
signal 0 (kill(pid, 0)) to check liveness without delivering a signal.
ESRCH means the process is gone → lock is stale; any other result
(alive, EPERM, unreadable file) is treated as live so we never break a
legitimately held lock.
- If stale, unlink and continue to the next O_EXCL attempt; multiple
concurrent waiters racing on the steal are safe because only one O_EXCL
open succeeds.
- Recovery is immediate (no need to wait for LOCK_TIMEOUT_MS).
Add a test that spawns a subprocess, waits for it to exit, writes its
dead PID into the lock file, and asserts recordTokenUsage succeeds and
cleans up the lock.
Unconditionally unlinking the lock file after LOCK_TIMEOUT_MS is unsafe:
the holder may legitimately still be running (slow disk, large usage file),
so removing its lock breaks mutual exclusion and allows concurrent
read-modify-write cycles to overwrite each other's entries.
Remove the stale-lock-removal path entirely and throw ERR_LOCK_TIMEOUT
instead. Callers already swallow the error via .catch() in the write queue,
so the only effect is that the write is skipped rather than risking data
loss through a race.
After the retry loop timed out, withFileLock unconditionally deleted the
lock file and called fn() without reacquiring the lock. If multiple
waiters timed out concurrently they would all enter the critical section
together, defeating the serialisation guarantee and allowing concurrent
read-modify-write cycles to overwrite each other's records.
Fix: after unlinking the stale lock, attempt one final O_EXCL open so
that exactly one concurrent waiter wins the lock and the rest receive
ERR_LOCK_TIMEOUT. The unlocked fast-path is removed entirely.
readJsonArray treated any valid JSON that is not an array as [], causing
appendRecord to overwrite the file with only the new entry — silently
deleting all prior data. This is the same data-loss mode the
malformed-JSON fix was trying to prevent.
Fix: throw ERR_UNEXPECTED_TOKEN_LOG_SHAPE when parsed JSON is not an
array so appendRecord aborts and the existing file is preserved.
The in-memory writeQueues Map serialises writes within one Node process
but two concurrent OpenClaw processes sharing the same workspaceDir
(e.g. parallel CLI runs) can still race: both read the same snapshot
before either writes, and the later writer silently overwrites the
earlier entry.
Add withFileLock() — an O_EXCL advisory lock on <file>.lock — to
coordinate across processes. The per-file in-memory queue is kept to
reduce lock contention within the same process. On lock-acquire failure
the helper retries every 50 ms up to a 5 s timeout; on timeout it
removes a potentially stale lock file and makes one final attempt to
prevent permanent blocking after a crash.
pre-commit: guard the resolve-node.sh source with a file-existence
check so the hook works in test environments that stub only the files
they care about (the integration test creates run-node-tool.sh but not
resolve-node.sh; node is provided via a fake binary in PATH so the
nvm fallback is never needed in that context).
usage-log: replace Math.random() in makeId() with crypto.randomBytes()
to satisfy the temp-path-guard security lint rule that rejects weak
randomness in source files.
readJsonArray previously caught all errors and returned [], so a
malformed token-usage.json (e.g. from an interrupted writeFile) caused
the next recordTokenUsage call to overwrite the file with only the new
entry, permanently erasing all prior records.
Fix: only suppress ENOENT (file not yet created). Any other error
(SyntaxError, EACCES, …) is re-thrown so appendRecord aborts and the
existing file is left intact. The write-queue slot still absorbs the
rejection via .catch() so future writes are not stalled; callers that
need to observe the failure (e.g. attempt.ts) can attach their own
.catch() handler.
The previous loop used Bash glob expansion (lexicographic order) and
stopped at the first match, so environments with multiple Node installs
could select an older runtime (e.g. v18 before v22).
Extract the nvm resolution into a shared scripts/pre-commit/resolve-node.sh
that pipes `ls` output through `sort -V | tail -1` to select the
semantically newest version. Both pre-commit and run-node-tool.sh now
source the shared script, eliminating the duplicated logic.
taskId was set to params.runId, the same value already stored in the
runId field, giving downstream consumers two identical fields with
different names. Remove taskId from the type and the entry constructor
to avoid confusion.
Fire-and-forget callers (attempt.ts) can trigger two concurrent
recordTokenUsage() calls for the same workspaceDir. The previous
read-modify-write pattern had no locking, so the last writer silently
overwrote the first, losing that run's entry.
Fix: keep a Map<file, Promise<void>> write queue so each write awaits
the previous one. The queue slot is replaced with a no-throw wrapper so
a failed write does not stall future writes.
Added a concurrent-write test (20 parallel calls) that asserts no
record is lost.
When git hooks run, the shell profile is not sourced so nvm-managed
Node installations are not in PATH. This caused 'node: command not
found' errors on every commit for users relying on nvm.
Add a PATH-extension fallback in both pre-commit and run-node-tool.sh
that walks ~/.nvm/versions/node/*/bin/node and prepends the first
found binary to PATH, mirroring how nvm itself resolves the runtime.
The recordTokenUsage function previously only persisted the aggregate tokensUsed
total, discarding the input/output breakdown that was already available via
getUsageTotals(). This meant token-usage.json had no per-record IO split,
making it impossible to analyse input vs output token costs in dashboards.
Changes:
- Add inputTokens, outputTokens, cacheReadTokens, cacheWriteTokens optional
fields to TokenUsageRecord type in usage-log.ts (new file)
- Write these fields (when non-zero) into each usage entry
- Fields are omitted (not null) when unavailable, keeping existing records valid
- Wire up recordTokenUsage() call in attempt.ts after llm_output hook
This is a purely additive change; existing consumers that only read tokensUsed
are unaffected.
4 entries were added to the 2026.3.12 section after the v2026.3.12
tag was cut. Move them to ## Unreleased where they belong.
Verified: 2026.3.12 section now matches the 74 entries present at
the v2026.3.12 release tag (28d64c48e).
* fix(telegram): preserve media download transport policy
* refactor(telegram): thread media transport policy
* fix(telegram): sync fallback media policy
* fix: note telegram media transport fix (#44639)
Process messageData via handleDeltaEvent for both delta and final states
before resolving the turn, so ACP clients no longer drop the last visible
assistant text when the gateway sends the final message body on the
terminal chat event.
Closes#15377
Based on #17615
Co-authored-by: PJ Eby <3527052+pjeby@users.noreply.github.com>