mirror of
https://github.com/qwibitai/nanoclaw.git
synced 2026-06-12 18:11:51 +08:00
47950671fa
- docs/v1-vs-v2/: full v1→v2 regression analysis (SUMMARY + 21 per-module docs + ACTION-ITEMS rollup with decisions + timezone recreation spec). - container/agent-runner/scripts/sdk-signal-probe.ts: empirical harness used to characterise Claude Agent SDK event/hook/stderr timing for the stuck-detection design in item 9. - src/channels/chat-sdk-bridge.ts: document the conversations Map staleness in a code comment; fix deferred to when dynamic group registration lands (ACTION-ITEMS item 17). No runtime behavior change. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
4.4 KiB
4.4 KiB
group-queue: v1 vs v2
Scope
- v1:
src/v1/group-queue.ts(325 LOC),group-queue.test.ts(457 LOC) — in-memory per-group state machine, IPC-file dispatch - v2: no equivalent class. Serialization is now DB-based and distributed across
src/session-manager.ts,src/host-sweep.ts,src/container-runner.ts,src/delivery.ts
Capability map
| v1 behavior | v2 location | Status | Notes |
|---|---|---|---|
| Per-group message queue | inbound.db.messages_in + status='pending' |
replaced | Atomic status transitions serialize work per-session |
| Per-group task queue | inbound.db.messages_in with kind='task' |
replaced | Same table; kind discriminates |
MAX_CONCURRENT_CONTAINERS global cap |
container-runner.ts:42-52 activeContainers Map + wakeContainer dedup |
kept | Enforced at spawn |
| One container per group invariant | One container per session | redefined | Session is identity unit now |
Task-before-message priority (drainGroup) |
host-sweep.ts recurrence + delivery.ts active poll |
partially lost | No priority; polled by process_after timestamp ordering |
| Exponential retry backoff | host-sweep.ts:145-147 BACKOFF_BASE_MS * 2^tries |
kept | Max 5 tries, same shape |
Idle preemption (notifyIdle/closeStdin) |
heartbeat file mtime | removed | No interrupt signal — container polls continuously |
Message dispatch to active container (sendMessage) |
Write to messages_in table |
replaced | Host writes; container polls |
| Cascading drain on task arrival | delivery.ts (~1s) + host-sweep.ts (~60s) polls |
async-ized | Work discovery on next tick, not synchronous |
| Shutdown without kill | containers continue under --rm |
similar | Host shutdown does not stop containers |
Task dedup (pendingTasks.some(t => t.id === id)) |
PK on messages_in.id |
partial | Unique ID prevents DB duplicates; does not prevent two distinct rows with same series_id |
drainWaiting (waiting-group fairness) |
Implicit: any session can wake if slot free | async | No explicit fairness |
Serialization model diff
v1 (push-based): GroupState in memory per group: active, pendingMessages, pendingTasks, idleWaiting, runningTaskId. drainGroup() synchronously dispatches. IPC file write signals container readiness. State lost on restart.
v2 (pull-based via DB): messages_in.status is the queue (pending → processing → completed/failed). Host writes rows + calls wakeContainer(); container polls + atomic UPDATE to take work. One writer per DB file (host→inbound, container→outbound) eliminates cross-mount contention. Heartbeat file mtime replaces IPC for liveness. State persisted; survives crashes.
Missing from v2
- Idle-state preemption — v1 could interrupt an idle container on task arrival via
closeStdin. v2 has no interrupt; container finishes current work and polls again - Synchronous drain cascade — v1's
drainGroupimmediately ran the next item; v2 discovers it on the next poll tick (~1s active, ~60s sweep) - In-memory task dedup — v1 checked pending-task list before enqueue. v2 can have two task rows with the same series_id coexisting (both pending) — relies on atomic
statusupdate for single-execution, best-effort - Priority ordering — v1 tasks preempted messages; v2 is timestamp-ordered only
Behavioral discrepancies
| Aspect | v1 | v2 |
|---|---|---|
| Wake trigger | on enqueue (sync) | on wakeContainer() call, or poll finding due message |
| Idle timeout | implicit via IPC | explicit heartbeat mtime (10 min) |
| Task ordering | FIFO within group, tasks preempt messages | process_after timestamp; ties by insert seq |
| Retry | host scheduleRetry() |
host sweep detects stale, increments tries, sets backoff |
| Concurrency cap | same | same (enforced in spawnContainer dedup) |
Worth preserving?
- Explicit task dedup — add
(kind, series_id, session_id)unique index onmessages_in, or dedup inhost-sweep.tsbefore inserting retry rows. Currently best-effort via atomic status update - Priority ordering — add a
prioritycolumn or document the ~1s task-wake latency as the SLA - Idle preemption — not critical; 1s polling is acceptable for most workflows
- Fairness — v1's
drainWaitingensured no group starved. v2 is fair by timestamp but untested under concurrent load. Monitor in production