mirror of https://github.com/qwibitai/nanoclaw.git synced 2026-06-12 18:11:51 +08:00

Files

T

gavrielc 47950671fa docs: add v1→v2 action-items analysis + SDK signal probe tool

- docs/v1-vs-v2/: full v1→v2 regression analysis (SUMMARY + 21 per-module
  docs + ACTION-ITEMS rollup with decisions + timezone recreation spec).
- container/agent-runner/scripts/sdk-signal-probe.ts: empirical harness
  used to characterise Claude Agent SDK event/hook/stderr timing for the
  stuck-detection design in item 9.
- src/channels/chat-sdk-bridge.ts: document the conversations Map staleness
  in a code comment; fix deferred to when dynamic group registration lands
  (ACTION-ITEMS item 17).

No runtime behavior change.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

2026-04-20 01:00:04 +03:00

4.4 KiB

Raw Blame History

group-queue: v1 vs v2

Scope

v1: src/v1/group-queue.ts (325 LOC), group-queue.test.ts (457 LOC) — in-memory per-group state machine, IPC-file dispatch
v2: no equivalent class. Serialization is now DB-based and distributed across src/session-manager.ts, src/host-sweep.ts, src/container-runner.ts, src/delivery.ts

Capability map

v1 behavior	v2 location	Status	Notes
Per-group message queue	`inbound.db.messages_in` + `status='pending'`	replaced	Atomic status transitions serialize work per-session
Per-group task queue	`inbound.db.messages_in` with `kind='task'`	replaced	Same table; `kind` discriminates
`MAX_CONCURRENT_CONTAINERS` global cap	`container-runner.ts:42-52` `activeContainers` Map + `wakeContainer` dedup	kept	Enforced at spawn
One container per group invariant	One container per session	redefined	Session is identity unit now
Task-before-message priority (`drainGroup`)	`host-sweep.ts` recurrence + `delivery.ts` active poll	partially lost	No priority; polled by `process_after` timestamp ordering
Exponential retry backoff	`host-sweep.ts:145-147` `BACKOFF_BASE_MS * 2^tries`	kept	Max 5 tries, same shape
Idle preemption (`notifyIdle`/`closeStdin`)	heartbeat file mtime	removed	No interrupt signal — container polls continuously
Message dispatch to active container (`sendMessage`)	Write to `messages_in` table	replaced	Host writes; container polls
Cascading drain on task arrival	`delivery.ts` (~1s) + `host-sweep.ts` (~60s) polls	async-ized	Work discovery on next tick, not synchronous
Shutdown without kill	containers continue under `--rm`	similar	Host shutdown does not stop containers
Task dedup (`pendingTasks.some(t => t.id === id)`)	PK on `messages_in.id`	partial	Unique ID prevents DB duplicates; does not prevent two distinct rows with same series_id
`drainWaiting` (waiting-group fairness)	Implicit: any session can wake if slot free	async	No explicit fairness

Serialization model diff

v1 (push-based): GroupState in memory per group: active, pendingMessages, pendingTasks, idleWaiting, runningTaskId. drainGroup() synchronously dispatches. IPC file write signals container readiness. State lost on restart.

v2 (pull-based via DB): messages_in.status is the queue (pending → processing → completed/failed). Host writes rows + calls wakeContainer(); container polls + atomic UPDATE to take work. One writer per DB file (host→inbound, container→outbound) eliminates cross-mount contention. Heartbeat file mtime replaces IPC for liveness. State persisted; survives crashes.

Missing from v2

Idle-state preemption — v1 could interrupt an idle container on task arrival via closeStdin. v2 has no interrupt; container finishes current work and polls again
Synchronous drain cascade — v1's drainGroup immediately ran the next item; v2 discovers it on the next poll tick (~1s active, ~60s sweep)
In-memory task dedup — v1 checked pending-task list before enqueue. v2 can have two task rows with the same series_id coexisting (both pending) — relies on atomic status update for single-execution, best-effort
Priority ordering — v1 tasks preempted messages; v2 is timestamp-ordered only

Behavioral discrepancies

Aspect	v1	v2
Wake trigger	on enqueue (sync)	on `wakeContainer()` call, or poll finding due message
Idle timeout	implicit via IPC	explicit heartbeat mtime (10 min)
Task ordering	FIFO within group, tasks preempt messages	`process_after` timestamp; ties by insert seq
Retry	host `scheduleRetry()`	host sweep detects stale, increments `tries`, sets backoff
Concurrency cap	same	same (enforced in `spawnContainer` dedup)

Worth preserving?

Explicit task dedup — add (kind, series_id, session_id) unique index on messages_in, or dedup in host-sweep.ts before inserting retry rows. Currently best-effort via atomic status update
Priority ordering — add a priority column or document the ~1s task-wake latency as the SLA
Idle preemption — not critical; 1s polling is acceptable for most workflows
Fairness — v1's drainWaiting ensured no group starved. v2 is fair by timestamp but untested under concurrent load. Monitor in production

4.4 KiB Raw Blame History