When a cron job fires and completes within the same wall-clock second it
was scheduled for, the next-run computation could return undefined or the
same second, causing the scheduler to re-trigger the job hundreds of
times in a tight loop.
Two-layer fix:
1. computeJobNextRunAtMs: When computeNextRunAtMs returns undefined for a
cron-kind schedule (edge case where floored nowSecondMs matches the
schedule), retry with the ceiling (next second) as reference time.
This ensures we always get the next valid occurrence.
2. applyJobResult: Add MIN_REFIRE_GAP_MS (2s) safety net for cron-kind
jobs. After a successful run, nextRunAtMs is guaranteed to be at
least 2s in the future. This breaks any remaining spin-loop edge
cases without affecting normal daily/hourly schedules (where the
natural next run is hours/days away).
Fixes#17821
When an isolated cron job delivers its output via deliverOutboundPayloads
or the subagent announce flow, the finish handler in executeJobCore
unconditionally posts a summary to the main agent session and wakes it
via requestHeartbeatNow. The main agent then generates a second response
that is also delivered to the target channel, resulting in duplicate
messages with different content.
Add a `delivered` flag to RunCronAgentTurnResult that is set to true
when the isolated run successfully delivers its output. In executeJobCore,
skip the enqueueSystemEvent + requestHeartbeatNow call when the flag is
set, preventing the main agent from waking up and double-posting.
Fixes#15692
* fix: exclude maxTokens and token-count fields from config redaction
The /token/i regex in SENSITIVE_KEY_PATTERNS falsely matched fields like
maxTokens, maxOutputTokens, maxCompletionTokens etc. These are numeric
config fields for token counts, not sensitive credentials.
Added a whitelist (SENSITIVE_KEY_WHITELIST) that explicitly excludes
known token-count field names from redaction. This prevents config
corruption when maxTokens gets replaced with __OPENCLAW_REDACTED__
during config round-trips.
Fixes#13236
* fix: honor deleteAfterRun for one-shot 'at' jobs with 'skipped' status
Previously, deleteAfterRun only triggered when result.status was 'ok'.
For one-shot 'at' jobs, a 'skipped' status (e.g. empty heartbeat file)
would leave the job in state but disabled, never getting cleaned up.
Now deleteAfterRun also triggers on 'skipped' status for 'at' jobs,
since a skipped one-shot job has no meaningful retry path.
Fixes#13249
* Cron: format timer.ts
---------
Co-authored-by: nice03 <niceyslee@gmail.com>
Co-authored-by: Tak Hoffman <781889+Takhoffman@users.noreply.github.com>
* fix(cron): pass agentId to runHeartbeatOnce for main-session jobs
Main-session cron jobs with agentId always ran the heartbeat under
the default agent, ignoring the job's agent binding. enqueueSystemEvent
correctly routed the system event to the bound agent's session, but
runHeartbeatOnce was called without agentId, so the heartbeat ran under
the default agent and never picked up the event.
Thread agentId from job.agentId through the CronServiceDeps type,
timer execution, and the gateway wrapper so heartbeat-runner uses the
correct agent.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* cron: add heartbeat agentId propagation regression test (#14140) (thanks @ishikawa-pro)
---------
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: Tak Hoffman <781889+Takhoffman@users.noreply.github.com>
Previously, if one cron job had a malformed schedule expression (e.g. invalid cron syntax),
the error would propagate up and break the entire scheduler loop. This meant one misconfigured
job could prevent ALL cron jobs from running.
Changes:
- Wrap per-job schedule computation in try/catch in recomputeNextRuns()
- Track consecutive schedule errors via new scheduleErrorCount field
- Log warnings for schedule errors with job ID and name
- Auto-disable jobs after 3 consecutive schedule errors (with error-level log)
- Clear error count when schedule computation succeeds
- Continue processing other jobs even when one fails
This ensures the scheduler is resilient to individual job misconfigurations while still
providing visibility into problems through logging.
Co-authored-by: Marvin <numegilagent@gmail.com>