Files
nanoclaw/docs/v1-vs-v2/task-scheduler.md
T
gavrielc 47950671fa docs: add v1→v2 action-items analysis + SDK signal probe tool
- docs/v1-vs-v2/: full v1→v2 regression analysis (SUMMARY + 21 per-module
  docs + ACTION-ITEMS rollup with decisions + timezone recreation spec).
- container/agent-runner/scripts/sdk-signal-probe.ts: empirical harness
  used to characterise Claude Agent SDK event/hook/stderr timing for the
  stuck-detection design in item 9.
- src/channels/chat-sdk-bridge.ts: document the conversations Map staleness
  in a code comment; fix deferred to when dynamic group registration lands
  (ACTION-ITEMS item 17).

No runtime behavior change.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-20 01:00:04 +03:00

10 KiB
Raw Blame History

task-scheduler: v1 vs v2

Scope

v1 task scheduler:

  • Files: src/v1/task-scheduler.ts (241 lines), src/v1/task-scheduler.test.ts (122 lines)
  • Self-contained scheduler loop with DB persistence and container execution
  • Stores tasks in central DB table scheduled_tasks
  • Runs a polling loop at SCHEDULER_POLL_INTERVAL (configurable, typically 560s)

v2 task distribution:

  • No central task-scheduler file; tasks spread across host sweep and session DBs
  • Core files: src/host-sweep.ts (174 lines), src/delivery.ts (task handlers ~line 654713), src/db/session-db.ts (task mutation logic)
  • Optional: container/agent-runner/src/task-script.ts (pre-task script execution)
  • Task rows live in per-session inbound.db table messages_in (polymorphic message kind)
  • Recurrence computed in host-sweep.ts (host-sweep.ts:159173)

Capability map

v1 Behavior v2 Location Status Notes
One-shot tasks (schedule_type='once') insertTask() in src/db/session-db.ts:103122; processAfter field set, recurrence=NULL Supported Task inserted into messages_in with process_after timestamp, processed once, no recurrence
Recurring via cron (schedule_type='cron') insertTask() with recurrence field; host-sweep.ts:159173 parses cron Supported Cron expression stored in messages_in.recurrence, next occurrence computed on completion via CronExpressionParser
Recurring via fixed interval (schedule_type='interval') Not directly supported; v2 uses cron for all recurring ⚠️ Removed v2 requires cron syntax for recurrence. No interval-based scheduling (e.g., "every 5 minutes") without converting to cron
Timezone handling host-sweep.ts:159161 uses CronExpressionParser with no explicit TZ param; cron-parser respects system TZ ⚠️ Degraded v1's explicit TIMEZONE config (via timezone.ts helpers) is absent in v2. Cron evaluation uses system/Node.js default TZ, not agent/session-level configuration
Persistence Per-session inbound.db messages_in table + series_id grouping Supported Tasks persisted as DB rows with status (pending/completed/paused). Series_id backfilled for recurring task groups
Restart recovery host-sweep.ts:8596 syncs processing_ack on startup to detect stale containers; tasks marked paused if container crashes Supported Stale container detection via heartbeat file mtime (host-sweep.ts:122131); stuck messages retried with exponential backoff
Due-message wake host-sweep.ts:9196 queries countDueMessages, wakes container if due tasks exist Supported 60s sweep checks for pending tasks with process_after in the past and wakes container if found
Missed-run catch-up (interval-based) computeNextRun() skips past missed intervals to prevent cumulative drift; tests verify no infinite loop ⚠️ Degraded v2 doesn't handle missed intervals — if a recurring cron task gets skipped, next occurrence is computed from completion time only. No "make up" for missed runs
Cancellation updateTask(id, {status: 'paused'}) prevents retry churn Supported cancelTask() in src/db/session-db.ts:128132 sets status='completed' and clears recurrence; matches by id OR series_id
Pause/resume updateTask(id, {status: 'paused'}) / resume Supported pauseTask() (line 134138) and resumeTask() (line 140144); both match id or series_id
Retry-on-failure updateTaskAfterRun() on error; no explicit retry logic in scheduler loop ⚠️ Degraded v2 uses retryWithBackoff() only when container goes stale (host-sweep.ts:147). No automatic retry for task execution errors
Concurrent-run prevention Task status 'active' gate (task-scheduler.ts:221); no concurrent-run logic ⚠️ Degraded v2 allows multiple pending tasks to wake the container in the same sweep; container processes serially but no host-level concurrency control
Idempotency Task ID is primary key; insertTask() will fail if re-run with same ID Supported messages_in.id is PRIMARY KEY; insertTask() fails on duplicate (caller must handle or use ON CONFLICT)
Max-age drop No explicit max-age field; tasks can remain pending indefinitely ⚠️ Missing No max-age or TTL in v2 messages_in schema. A stuck task can remain pending forever unless manually cancelled
Task context mode (group vs isolated session) v1: context_mode field drives session reuse (task-scheduler.ts:122) ⚠️ Removed v2 doesn't track context_mode; all tasks are processed in the container's default session context; no isolation toggle
Task result logging logTaskRun() writes to task_runs table; stores error + result summary ⚠️ Degraded v2 has no equivalent task_runs table. Task output is written as system messages back to the agent; no persistent audit trail
Task script execution v1: prompt + optional script field, passed to container Supported v2: applyPreTaskScripts() in container/agent-runner/src/task-script.ts:79121 runs scripts pre-prompt, enriches prompt with scriptOutput

Missing from v2

  1. Interval-based recurrence — v1 schedule_type='interval' (e.g., "every 5000ms") is gone. v2 only supports cron expressions. Workaround: convert to equivalent cron (e.g., */5 * * * * * for every 5 min).

  2. Timezone awareness — v1 passed TIMEZONE config to cron parser and had explicit formatLocalTime() helpers. v2 has no way to specify a session/agent timezone for cron evaluation; it uses the system/Node.js TZ.

  3. Task context modes — v1's context_mode: 'group' | 'isolated' is removed. No way to force a task into a dedicated session vs. the agent group's shared session.

  4. Task result audit trail — v1 logged every run to task_runs(task_id, run_at, duration_ms, status, result, error). v2 has no persistent task execution history; output is a system message only.

  5. Max-age / task TTL — v1 tasks could be implicitly aged out (not directly visible in the code, but conceivable via cleanup logic). v2 has no TTL; a paused/completed task lingers in messages_in forever.

  6. Task-level concurrency control — v1 prevented concurrent runs of the same task (single status check per loop iteration). v2 can queue multiple pending tasks in one sweep, though the container processes them serially.


Behavioral discrepancies

  1. Missed-interval catch-up (v1 computeNextRun() lines 3246 vs. v2 absence):

    • v1: If a task is due at 10:00, 10:05, 10:10 but the scheduler is down during 10:0010:15, it computes next_run = 10:20 (skips missed intervals, stays on the grid).
    • v2: If the same recurring cron task is skipped, the next occurrence is computed from the completion time (host-sweep.ts:160161), not from the original grid. A task that should run at :00 and :05 every 10 minutes might drift if completions are delayed.
  2. Stale-container recovery (v1 none vs. v2 heartbeat-based):

    • v1: Tasks remain due if the container crashes; the scheduler will retry on the next poll.
    • v2: If the heartbeat goes stale (container unresponsive for 10 min), stuck processing messages are retried with exponential backoff. Tasks stuck in 'processing' state are reset.
  3. Task script pre-processing (v1 prompt + script → container vs. v2 script → output enrichment):

    • v1: Passes script alongside prompt to container; container execution model unclear from scheduler.ts (likely runs in group-queue).
    • v2: Host runs script before waking container; script output (scriptOutput) is merged into prompt JSON via applyPreTaskScripts() (task-script.ts:115117). If script fails or returns wakeAgent=false, the task is skipped entirely.
  4. Retry semantics:

    • v1: On execution error (runTask throws), updateTaskAfterRun() is called with error. Next retry relies on scheduler polling the same task again (no backoff).
    • v2: Execution errors are not retried; container processes the task once. If the container crashes mid-task, the message is retried with exponential backoff only up to MAX_TRIES=5 (host-sweep.ts:145150).

Worth preserving?

Interval-based recurrence (v1 schedule_type='interval') is a practical feature that v2 trades away. Cron syntax is powerful but less intuitive for simple "every X milliseconds" patterns. If users want "run every 30 seconds," they must learn cron (*/30 * * * * * for seconds doesn't exist in standard cron; workaround is job-level looping in the prompt). Consider a thin adapter layer in agent-facing APIs to accept {interval: 5000} and convert to cron, or extend the v2 schema to support an optional interval_ms alongside recurrence.

Task context modes (group vs. isolated) were a way to isolate task execution context. v2's removal simplifies the model but loses the ability to run a task in a fresh container state. If a task needs a clean slate (no session history), that's now impossible; workaround is a manual system-action to clear session state before running the task.

Task result audit trail is a gap for operational visibility. v2's system messages are ephemeral; there's no way to query "how many times did task X run and what were the outcomes?" Adding a lightweight task_execution_log table (optional, populated on task completion) would help without burdening the common case.


References by line

  • v1 task-scheduler: src/v1/task-scheduler.ts:2049 (computeNextRun), :203235 (startSchedulerLoop)
  • v1 test coverage: src/v1/task-scheduler.test.ts:49121 (drift, missed-interval, once-task tests)
  • v1 timezone: src/v1/timezone.ts:2637 (formatLocalTime with explicit TZ)
  • v1 types: src/v1/types.ts:6074 (ScheduledTask interface with context_mode)
  • v2 sweep: src/host-sweep.ts:154173 (handleRecurrence, insertRecurrence)
  • v2 delivery system actions: src/delivery.ts:645713 (handleSystemAction switch on schedule_task/cancel_task/pause_task/resume_task/update_task)
  • v2 session-db: src/db/session-db.ts:103198 (insertTask, cancelTask, pauseTask, resumeTask, updateTask, all with series_id matching)
  • v2 task-script: container/agent-runner/src/task-script.ts:79121 (applyPreTaskScripts, wakeAgent logic)
  • v2 DB schema: docs/db-session.md:3156 (messages_in table with process_after, recurrence, series_id)