69 Commits

Author SHA1 Message Date
Simone Macario
09d5f508b1
fix(cron): persist delivered flag in job state to surface delivery failures (openclaw#19174) thanks @simonemacario
Verified:
- pnpm build
- pnpm check
- pnpm test:macmini

Co-authored-by: simonemacario <2116609+simonemacario@users.noreply.github.com>
Co-authored-by: Tak Hoffman <781889+Takhoffman@users.noreply.github.com>
2026-02-21 12:47:29 -06:00
Tak Hoffman
7417c36268
fix(cron): honor maxConcurrentRuns in timer loop (openclaw#22413) thanks @Takhoffman
Verified:
- pnpm install --frozen-lockfile
- pnpm build
- pnpm check
- pnpm test:macmini (failed on unrelated baseline test: src/memory/qmd-manager.test.ts > throws when sqlite index is busy)

Co-authored-by: Takhoffman <781889+Takhoffman@users.noreply.github.com>
Co-authored-by: Tak Hoffman <781889+Takhoffman@users.noreply.github.com>
2026-02-20 22:31:58 -06:00
Peter Steinberger
b8b43175c5 style: align formatting with oxfmt 0.33 2026-02-18 01:34:35 +00:00
Peter Steinberger
31f9be126c style: run oxfmt and fix gate failures 2026-02-18 01:29:02 +00:00
Peter Steinberger
6dcc052bb4 fix: stabilize model catalog and pi discovery auth storage compatibility 2026-02-18 02:09:40 +01:00
Peter Steinberger
dd4eb8bf63 fix(cron): retry next-second schedule compute on undefined 2026-02-17 23:48:14 +01:00
Peter Steinberger
c26cf6aa83 feat(cron): add default stagger controls for scheduled jobs 2026-02-17 23:48:14 +01:00
Tyler Yust
75001a0490 fix cron announce routing and timeout handling 2026-02-17 11:40:04 -08:00
cpojer
d0cb8c19b2
chore: wtf. 2026-02-17 13:36:48 +09:00
Sebastian
ed11e93cf2 chore(format) 2026-02-16 23:20:16 -05:00
Vignesh Natarajan
f988abf202 Cron: route reminders by session namespace 2026-02-17 01:54:59 +01:00
cpojer
c70597daeb
chore: Fix formatting. 2026-02-17 09:40:00 +09:00
Peter Steinberger
80c7d04ad2 refactor(cron): reuse shared run outcome telemetry types 2026-02-17 00:32:34 +00:00
cpojer
90ef2d6bdf
chore: Update formatting. 2026-02-17 09:18:40 +09:00
Marcus Widing
8af4712c40 fix(cron): prevent spin loop when job completes within scheduled second (#17821)
When a cron job fires and completes within the same wall-clock second it
was scheduled for, the next-run computation could return undefined or the
same second, causing the scheduler to re-trigger the job hundreds of
times in a tight loop.

Two-layer fix:

1. computeJobNextRunAtMs: When computeNextRunAtMs returns undefined for a
   cron-kind schedule (edge case where floored nowSecondMs matches the
   schedule), retry with the ceiling (next second) as reference time.
   This ensures we always get the next valid occurrence.

2. applyJobResult: Add MIN_REFIRE_GAP_MS (2s) safety net for cron-kind
   jobs.  After a successful run, nextRunAtMs is guaranteed to be at
   least 2s in the future.  This breaks any remaining spin-loop edge
   cases without affecting normal daily/hourly schedules (where the
   natural next run is hours/days away).

Fixes #17821
2026-02-16 23:59:44 +01:00
Rob Dunn
ddea5458d0 cron: log model+token usage per run + add usage report script 2026-02-16 23:58:38 +01:00
Peter Steinberger
b991919755 refactor(cron): dedupe next-run recompute paths 2026-02-16 17:06:40 +00:00
pierreeurope
fec4be8dec
fix(cron): prevent daily jobs from skipping days (48h jump) #17852 (#17903)
Merged via /review-pr -> /prepare-pr -> /merge-pr.

Prepared head SHA: 1ffe6a45afac27fd5b0a8c4fd087f7a8fadd1143
Co-authored-by: pierreeurope <248892285+pierreeurope@users.noreply.github.com>
Co-authored-by: sebslight <19554889+sebslight@users.noreply.github.com>
Reviewed-by: @sebslight
2026-02-16 08:35:49 -05:00
Advait Paliwal
bc67af6ad8
cron: separate webhook POST delivery from announce (#17901)
* cron: split webhook delivery from announce mode

* cron: validate webhook delivery target

* cron: remove legacy webhook fallback config

* fix: finalize cron webhook delivery prep (#17901) (thanks @advaitpaliwal)

---------

Co-authored-by: Tyler Yust <TYTYYUST@YAHOO.COM>
2026-02-16 02:36:00 -08:00
Peter Steinberger
17e5a5015c perf: avoid async cron timer callbacks 2026-02-16 02:45:00 +00:00
Peter Steinberger
5b2cb8ba11 refactor(cron): dedupe finished event emit 2026-02-16 01:37:03 +00:00
Peter Steinberger
a73e7786e7 refactor(cron): share runnable job filter 2026-02-16 01:29:01 +00:00
Peter Steinberger
2679089e9e refactor(cron): dedupe next-run recompute loop 2026-02-16 01:27:40 +00:00
Peter Steinberger
c95a61aa9d refactor(cron): dedupe read-only load flow 2026-02-16 01:26:37 +00:00
Advait Paliwal
115cfb4430
gateway: add cron finished-run webhook (#14535)
* gateway: add cron finished webhook delivery

* config: allow cron webhook in runtime schema

* cron: require notify flag for webhook posts

* ui/docs: add cron notify toggle and webhook docs

* fix: harden cron webhook auth and fill notify coverage (#14535) (thanks @advaitpaliwal)

---------

Co-authored-by: Tyler Yust <TYTYYUST@YAHOO.COM>
2026-02-15 16:14:17 -08:00
Peter Steinberger
b3ef3fca75 refactor(cron): share legacy delivery helpers 2026-02-15 17:29:08 +00:00
Gustavo Madeira Santana
88caa4b50c chore(cron): simplify enabled checks for lint 2026-02-15 10:30:19 -05:00
Alejandro Santander
9a344da298
fix(cron): treat missing enabled as true in update() (openclaw#15477) thanks @eternauta1337
Verified:
- pnpm exec vitest src/cron/service.issue-regressions.test.ts

Co-authored-by: eternauta1337 <550409+eternauta1337@users.noreply.github.com>
Co-authored-by: Tak Hoffman <781889+Takhoffman@users.noreply.github.com>
2026-02-15 08:52:02 -06:00
Vignesh Natarajan
4c4d2558e3 fix (heartbeat/cron): preserve cron prompts for tagged interval events 2026-02-14 19:46:31 -08:00
Vignesh Natarajan
7b89e68d18 fix (cron): skip startup replay for interrupted running jobs 2026-02-14 19:06:37 -08:00
Peter Steinberger
20dea3cdb1 perf(cron): make wakeMode now busy-wait configurable 2026-02-14 23:51:47 +00:00
Peter Steinberger
6b400eca5c refactor(cron): share job tick state normalization 2026-02-14 21:44:30 +00:00
zerone0x
c60844931b
fix(cron): prevent list/status from silently skipping recurring jobs (openclaw#16201) thanks @zerone0x
Verified:
- pnpm install --frozen-lockfile
- pnpm build
- pnpm check
- pnpm test:macmini

Co-authored-by: zerone0x <39543393+zerone0x@users.noreply.github.com>
Co-authored-by: Tak Hoffman <781889+Takhoffman@users.noreply.github.com>
2026-02-14 13:33:29 -06:00
青雲
80407cbc6a
fix: recompute all cron next-run times after job update (openclaw#15905) thanks @echoVic
Verified:
- pnpm check
- pnpm vitest src/cron/service.issue-regressions.test.ts src/cron/service.issue-13992-regression.test.ts

Co-authored-by: echoVic <16428813+echoVic@users.noreply.github.com>
Co-authored-by: Tak Hoffman <781889+Takhoffman@users.noreply.github.com>
2026-02-14 12:37:22 -06:00
Peter Steinberger
45a2cd55cc fix: harden isolated cron announce delivery fallback (#15739) (thanks @widingmarcus-cyber) 2026-02-13 23:49:10 +01:00
Marcus Widing
ea95e88dd6 fix(cron): prevent duplicate delivery for isolated jobs with announce mode
When an isolated cron job delivers its output via deliverOutboundPayloads
or the subagent announce flow, the finish handler in executeJobCore
unconditionally posts a summary to the main agent session and wakes it
via requestHeartbeatNow. The main agent then generates a second response
that is also delivered to the target channel, resulting in duplicate
messages with different content.

Add a `delivered` flag to RunCronAgentTurnResult that is set to true
when the isolated run successfully delivers its output. In executeJobCore,
skip the enqueueSystemEvent + requestHeartbeatNow call when the flag is
set, preventing the main agent from waking up and double-posting.

Fixes #15692
2026-02-13 23:49:10 +01:00
Sebastian
d31caa81ef fix(runtime): guard cleanup and preserve skipped cron jobs 2026-02-12 09:28:47 -05:00
niceysam
f7e05d0136
fix: exclude maxTokens from config redaction + honor deleteAfterRun on skipped cron jobs (#13342)
* fix: exclude maxTokens and token-count fields from config redaction

The /token/i regex in SENSITIVE_KEY_PATTERNS falsely matched fields like
maxTokens, maxOutputTokens, maxCompletionTokens etc. These are numeric
config fields for token counts, not sensitive credentials.

Added a whitelist (SENSITIVE_KEY_WHITELIST) that explicitly excludes
known token-count field names from redaction. This prevents config
corruption when maxTokens gets replaced with __OPENCLAW_REDACTED__
during config round-trips.

Fixes #13236

* fix: honor deleteAfterRun for one-shot 'at' jobs with 'skipped' status

Previously, deleteAfterRun only triggered when result.status was 'ok'.
For one-shot 'at' jobs, a 'skipped' status (e.g. empty heartbeat file)
would leave the job in state but disabled, never getting cleaned up.

Now deleteAfterRun also triggers on 'skipped' status for 'at' jobs,
since a skipped one-shot job has no meaningful retry path.

Fixes #13249

* Cron: format timer.ts

---------

Co-authored-by: nice03 <niceyslee@gmail.com>
Co-authored-by: Tak Hoffman <781889+Takhoffman@users.noreply.github.com>
2026-02-12 07:55:05 -06:00
WalterSumbon
39e3d58fe1
fix: prevent cron jobs from skipping execution when nextRunAtMs advances (#14068)
Co-authored-by: Tak Hoffman <781889+Takhoffman@users.noreply.github.com>
2026-02-11 23:33:22 -06:00
大猫子
a88ea42ec7
fix(cron): prevent one-shot at jobs from re-firing on restart after skip/error (#13845) (#13878)
Co-authored-by: Tak Hoffman <781889+Takhoffman@users.noreply.github.com>
2026-02-11 23:33:15 -06:00
石川 諒
04e3a66f90
fix(cron): pass agentId to runHeartbeatOnce for main-session jobs (#14140)
* fix(cron): pass agentId to runHeartbeatOnce for main-session jobs

Main-session cron jobs with agentId always ran the heartbeat under
the default agent, ignoring the job's agent binding. enqueueSystemEvent
correctly routed the system event to the bound agent's session, but
runHeartbeatOnce was called without agentId, so the heartbeat ran under
the default agent and never picked up the event.

Thread agentId from job.agentId through the CronServiceDeps type,
timer execution, and the gateway wrapper so heartbeat-runner uses the
correct agent.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* cron: add heartbeat agentId propagation regression test (#14140) (thanks @ishikawa-pro)

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: Tak Hoffman <781889+Takhoffman@users.noreply.github.com>
2026-02-11 22:22:29 -06:00
MarvinDontPanic
04f695e562
fix(cron): isolate schedule errors to prevent one bad job from breaking all jobs (#14385)
Previously, if one cron job had a malformed schedule expression (e.g. invalid cron syntax),
the error would propagate up and break the entire scheduler loop. This meant one misconfigured
job could prevent ALL cron jobs from running.

Changes:
- Wrap per-job schedule computation in try/catch in recomputeNextRuns()
- Track consecutive schedule errors via new scheduleErrorCount field
- Log warnings for schedule errors with job ID and name
- Auto-disable jobs after 3 consecutive schedule errors (with error-level log)
- Clear error count when schedule computation succeeds
- Continue processing other jobs even when one fails

This ensures the scheduler is resilient to individual job misconfigurations while still
providing visibility into problems through logging.

Co-authored-by: Marvin <numegilagent@gmail.com>
2026-02-11 22:17:07 -06:00
Tom Ron
ace5e33cee
fix(cron): re-arm timer when onTimer fires during active job execution (#14233)
* fix(cron): re-arm timer when onTimer fires during active job execution

When a cron job takes longer than MAX_TIMER_DELAY_MS (60s), the clamped
timer fires while state.running is still true.  The early return in
onTimer() previously exited without re-arming the timer, leaving no
setTimeout scheduled.  This silently kills the cron scheduler until the
next gateway restart.

The fix calls armTimer(state) before the early return so the scheduler
continues ticking even when a job is in progress.

This is the likely root cause of recurring cron jobs silently skipping,
as reported in #12025.  One-shot (kind: 'at') jobs were unaffected
because they typically complete within a single timer cycle.

Includes a regression test that simulates a slow job exceeding the
timer clamp period and verifies the next occurrence still fires.

* fix: update tests for timer re-arm behavior

- Update existing regression test to expect timer re-arm with non-zero
  delay instead of no timer at all
- Simplify new test to directly verify state.timer is set after onTimer
  returns early due to running guard

* fix: use fixed 60s delay for re-arm to prevent zero-delay hot-loop

When the running guard re-arms the timer, use MAX_TIMER_DELAY_MS
directly instead of calling armTimer() which can compute a zero delay
for past-due jobs.  This prevents a tight spin while still keeping the
scheduler alive.

* style: add curly braces to satisfy eslint(curly) rule
2026-02-11 22:13:27 -06:00
Gustavo Madeira Santana
e19a23520c
fix: unify session maintenance and cron run pruning (#13083)
* fix: prune stale session entries, cap entry count, and rotate sessions.json

The sessions.json file grows unbounded over time. Every heartbeat tick (default: 30m)
triggers multiple full rewrites, and session keys from groups, threads, and DMs
accumulate indefinitely with large embedded objects (skillsSnapshot,
systemPromptReport). At >50MB the synchronous JSON parse blocks the event loop,
causing Telegram webhook timeouts and effectively taking the bot down.

Three mitigations, all running inside saveSessionStoreUnlocked() on every write:

1. Prune stale entries: remove entries with updatedAt older than 30 days
   (configurable via session.maintenance.pruneDays in openclaw.json)

2. Cap entry count: keep only the 500 most recently updated entries
   (configurable via session.maintenance.maxEntries). Entries without updatedAt
   are evicted first.

3. File rotation: if the existing sessions.json exceeds 10MB before a write,
   rename it to sessions.json.bak.{timestamp} and keep only the 3 most recent
   backups (configurable via session.maintenance.rotateBytes).

All three thresholds are configurable under session.maintenance in openclaw.json
with Zod validation. No env vars.

Existing tests updated to use Date.now() instead of epoch-relative timestamps
(1, 2, 3) that would be incorrectly pruned as stale.

27 new tests covering pruning, capping, rotation, and integration scenarios.

* feat: auto-prune expired cron run sessions (#12289)

Add TTL-based reaper for isolated cron run sessions that accumulate
indefinitely in sessions.json.

New config option:
  cron.sessionRetention: string | false  (default: '24h')

The reaper runs piggy-backed on the cron timer tick, self-throttled
to sweep at most every 5 minutes. It removes session entries matching
the pattern cron:<jobId>:run:<uuid> whose updatedAt + retention < now.

Design follows the Kubernetes ttlSecondsAfterFinished pattern:
- Sessions are persisted normally (observability/debugging)
- A periodic reaper prunes expired entries
- Configurable retention with sensible default
- Set to false to disable pruning entirely

Files changed:
- src/config/types.cron.ts: Add sessionRetention to CronConfig
- src/config/zod-schema.ts: Add Zod validation for sessionRetention
- src/cron/session-reaper.ts: New reaper module (sweepCronRunSessions)
- src/cron/session-reaper.test.ts: 12 tests covering all paths
- src/cron/service/state.ts: Add cronConfig/sessionStorePath to deps
- src/cron/service/timer.ts: Wire reaper into onTimer tick
- src/gateway/server-cron.ts: Pass config and session store path to deps

Closes #12289

* fix: sweep cron session stores per agent

* docs: add changelog for session maintenance (#13083) (thanks @skyfallsin, @Glucksberg)

* fix: add warn-only session maintenance mode

* fix: warn-only maintenance defaults to active session

* fix: deliver maintenance warnings to active session

* docs: add session maintenance examples

* fix: accept duration and size maintenance thresholds

* refactor: share cron run session key check

* fix: format issues and replace defaultRuntime.warn with console.warn

---------

Co-authored-by: Pradeep Elankumaran <pradeepe@gmail.com>
Co-authored-by: Glucksberg <markuscontasul@gmail.com>
Co-authored-by: max <40643627+quotentiroler@users.noreply.github.com>
Co-authored-by: quotentiroler <max.nussbaumer@maxhealth.tech>
2026-02-09 20:42:35 -08:00
Tyler Yust
8fae55e8e0
fix(cron): share isolated announce flow + harden cron scheduling/delivery (#11641)
* fix(cron): comprehensive cron scheduling and delivery fixes

- Fix delivery target resolution for isolated agent cron jobs
- Improve schedule parsing and validation
- Add job retry logic and error handling
- Enhance cron ops with better state management
- Add timer improvements for more reliable cron execution
- Add cron event type to protocol schema
- Support cron events in heartbeat runner (skip empty-heartbeat check,
  use dedicated CRON_EVENT_PROMPT for relay)

* fix: remove cron debug test and add changelog/docs notes (#11641) (thanks @tyler6204)
2026-02-07 19:46:01 -08:00
Tyler Yust
d90cac990c
fix: cron scheduler reliability, store hardening, and UX improvements (#10776)
* refactor: update cron job wake mode and run mode handling

- Changed default wake mode from 'next-heartbeat' to 'now' in CronJobEditor and related CLI commands.
- Updated cron-tool tests to reflect changes in run mode, introducing 'due' and 'force' options.
- Enhanced cron-tool logic to handle new run modes and ensure compatibility with existing job structures.
- Added new tests for delivery plan consistency and job execution behavior under various conditions.
- Improved normalization functions to handle wake mode and session target casing.

This refactor aims to streamline cron job configurations and enhance the overall user experience with clearer defaults and improved functionality.

* test: enhance cron job functionality and UI

- Added tests to ensure the isolated agent correctly announces the final payload text when delivering messages via Telegram.
- Implemented a new function to pick the last deliverable payload from a list of delivery payloads.
- Enhanced the cron service to maintain legacy "every" jobs while minute cron jobs recompute schedules.
- Updated the cron store migration tests to verify the addition of anchorMs to legacy every schedules.
- Improved the UI for displaying cron job details, including job state and delivery information, with new styles and layout adjustments.

These changes aim to improve the reliability and user experience of the cron job system.

* test: enhance sessions thinking level handling

- Added tests to verify that the correct thinking levels are applied during session spawning.
- Updated the sessions-spawn-tool to include a new parameter for overriding thinking levels.
- Enhanced the UI to support additional thinking levels, including "xhigh" and "full", and improved the handling of current options in dropdowns.

These changes aim to improve the flexibility and accuracy of thinking level configurations in session management.

* feat: enhance session management and cron job functionality

- Introduced passthrough arguments in the test-parallel script to allow for flexible command-line options.
- Updated session handling to hide cron run alias session keys from the sessions list, improving clarity.
- Enhanced the cron service to accurately record job start times and durations, ensuring better tracking of job execution.
- Added tests to verify the correct behavior of the cron service under various conditions, including zero-delay timers.

These changes aim to improve the usability and reliability of session and cron job management.

* feat: implement job running state checks in cron service

- Added functionality to prevent manual job runs if a job is already in progress, enhancing job management.
- Updated the `isJobDue` function to include checks for running jobs, ensuring accurate scheduling.
- Enhanced the `run` function to return a specific reason when a job is already running.
- Introduced a new test case to verify the behavior of forced manual runs during active job execution.

These changes aim to improve the reliability and clarity of cron job execution and management.

* feat: add session ID and key to CronRunLogEntry model

- Introduced `sessionid` and `sessionkey` properties to the `CronRunLogEntry` struct for enhanced tracking of session-related information.
- Updated the initializer and Codable conformance to accommodate the new properties, ensuring proper serialization and deserialization.

These changes aim to improve the granularity of logging and session management within the cron job system.

* fix: improve session display name resolution

- Updated the `resolveSessionDisplayName` function to ensure that both label and displayName are trimmed and default to an empty string if not present.
- Enhanced the logic to prevent returning the key if it matches the label or displayName, improving clarity in session naming.

These changes aim to enhance the accuracy and usability of session display names in the UI.

* perf: skip cron store persist when idle timer tick produces no changes

recomputeNextRuns now returns a boolean indicating whether any job
state was mutated. The idle path in onTimer only persists when the
return value is true, eliminating unnecessary file writes every 60s
for far-future or idle schedules.

* fix: prep for merge - explicit delivery mode migration, docs + changelog (#10776) (thanks @tyler6204)
2026-02-06 18:03:03 -08:00
fujiwara-tofu-shop
b0befb5f5d
fix(cron): handle legacy atMs field in schedule when computing next run (#9932)
* fix(cron): handle legacy atMs field in schedule when computing next run

The cron scheduler only checked for `schedule.at` (string) but legacy jobs
may have `schedule.atMs` (number) from before the schema migration.

This caused nextRunAtMs to stay null because:
1. Store migration runs on load but may not persist immediately
2. Race conditions or file mtime issues can skip migration
3. computeJobNextRunAtMs/computeNextRunAtMs only checked `at`, not `atMs`

Fix: Make both functions defensive by checking `atMs` first (number),
then `atMs` (string, for edge cases), then falling back to `at` (string).

This ensures jobs fire correctly even if:
- Migration hasn't run yet
- Old data was written by a previous version
- The store was manually edited

Fixes #9930

* fix: validate numeric atMs to prevent NaN/Infinity propagation

Addresses review feedback - numeric atMs values are now validated with
Number.isFinite() && atMs > 0 before use. This prevents corrupted or
manually edited stores from causing hot timer loops via setTimeout(..., NaN).
2026-02-05 15:49:03 -08:00
Maksym Brashchenko
40e23b05f7
fix(cron): re-arm timer in finally to survive transient errors (#9948) 2026-02-05 15:46:59 -08:00
Igor Markelov
313e2f2e85
fix(cron): prevent recomputeNextRuns from skipping due jobs in onTimer (#9823)
* fix(cron): prevent recomputeNextRuns from skipping due jobs in onTimer

ensureLoaded(forceReload) called recomputeNextRuns before runDueJobs,
which recalculated nextRunAtMs to a strictly future time. Since
setTimeout always fires a few ms late, the due check (now >= nextRunAtMs)
always failed and every/cron jobs never executed. Fixes #9788.

* docs: add changelog entry for cron timer race fix (#9823) (thanks @pycckuu)

---------

Co-authored-by: Tyler Yust <TYTYYUST@YAHOO.COM>
2026-02-05 15:43:37 -08:00
Tyler Yust
821520a057
fix cron scheduling and reminder delivery regressions (#9733)
* fix(cron): prevent timer from allowing process exit (fixes #9694)

The cron timer was using .unref(), which caused the Node.js event
loop to exit or sleep if no other handles were active. This prevented
cron jobs from firing in some environments.

* fix(cron): infer delivery target for isolated jobs (fixes #9683)

When creating isolated agentTurn jobs (e.g. reminders) without explicit
delivery options, the job would default to 'announce' but fail to
resolve the target conversation. Now, we infer the channel and
recipient from the agent's current session key.

* fix(cron): enhance delivery inference for threaded sessions and null inputs (#9733)

Improves the delivery inference logic in the cron tool to correctly handle threaded session keys and cases where delivery is explicitly set to null. This ensures that the appropriate delivery mode and target are inferred based on the agent's session key, enhancing the reliability of job execution.

* fix: preserve telegram topic delivery inference (#9733) (thanks @tyler6204)

* fix: simplify cron delivery merge spread (#9733) (thanks @tyler6204)
2026-02-05 13:08:41 -08:00