mirror of
https://github.com/qwibitai/nanoclaw.git
synced 2026-06-12 18:11:51 +08:00
47950671fa
- docs/v1-vs-v2/: full v1→v2 regression analysis (SUMMARY + 21 per-module docs + ACTION-ITEMS rollup with decisions + timezone recreation spec). - container/agent-runner/scripts/sdk-signal-probe.ts: empirical harness used to characterise Claude Agent SDK event/hook/stderr timing for the stuck-detection design in item 9. - src/channels/chat-sdk-bridge.ts: document the conversations Map staleness in a code comment; fix deferred to when dynamic group registration lands (ACTION-ITEMS item 17). No runtime behavior change. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
49 lines
4.4 KiB
Markdown
49 lines
4.4 KiB
Markdown
# group-queue: v1 vs v2
|
|
|
|
## Scope
|
|
- v1: `src/v1/group-queue.ts` (325 LOC), `group-queue.test.ts` (457 LOC) — in-memory per-group state machine, IPC-file dispatch
|
|
- v2: **no equivalent class**. Serialization is now DB-based and distributed across `src/session-manager.ts`, `src/host-sweep.ts`, `src/container-runner.ts`, `src/delivery.ts`
|
|
|
|
## Capability map
|
|
|
|
| v1 behavior | v2 location | Status | Notes |
|
|
|---|---|---|---|
|
|
| Per-group message queue | `inbound.db.messages_in` + `status='pending'` | replaced | Atomic status transitions serialize work per-session |
|
|
| Per-group task queue | `inbound.db.messages_in` with `kind='task'` | replaced | Same table; `kind` discriminates |
|
|
| `MAX_CONCURRENT_CONTAINERS` global cap | `container-runner.ts:42-52` `activeContainers` Map + `wakeContainer` dedup | kept | Enforced at spawn |
|
|
| One container per group invariant | One container per **session** | redefined | Session is identity unit now |
|
|
| Task-before-message priority (`drainGroup`) | `host-sweep.ts` recurrence + `delivery.ts` active poll | **partially lost** | No priority; polled by `process_after` timestamp ordering |
|
|
| Exponential retry backoff | `host-sweep.ts:145-147` `BACKOFF_BASE_MS * 2^tries` | kept | Max 5 tries, same shape |
|
|
| Idle preemption (`notifyIdle`/`closeStdin`) | heartbeat file mtime | **removed** | No interrupt signal — container polls continuously |
|
|
| Message dispatch to active container (`sendMessage`) | Write to `messages_in` table | replaced | Host writes; container polls |
|
|
| Cascading drain on task arrival | `delivery.ts` (~1s) + `host-sweep.ts` (~60s) polls | **async-ized** | Work discovery on next tick, not synchronous |
|
|
| Shutdown without kill | containers continue under `--rm` | similar | Host shutdown does not stop containers |
|
|
| Task dedup (`pendingTasks.some(t => t.id === id)`) | PK on `messages_in.id` | partial | Unique ID prevents DB duplicates; does not prevent two distinct rows with same series_id |
|
|
| `drainWaiting` (waiting-group fairness) | Implicit: any session can wake if slot free | async | No explicit fairness |
|
|
|
|
## Serialization model diff
|
|
**v1 (push-based):** `GroupState` in memory per group: `active`, `pendingMessages`, `pendingTasks`, `idleWaiting`, `runningTaskId`. `drainGroup()` synchronously dispatches. IPC file write signals container readiness. State lost on restart.
|
|
|
|
**v2 (pull-based via DB):** `messages_in.status` is the queue (`pending` → `processing` → `completed`/`failed`). Host writes rows + calls `wakeContainer()`; container polls + atomic UPDATE to take work. One writer per DB file (host→inbound, container→outbound) eliminates cross-mount contention. Heartbeat file mtime replaces IPC for liveness. State persisted; survives crashes.
|
|
|
|
## Missing from v2
|
|
1. **Idle-state preemption** — v1 could interrupt an idle container on task arrival via `closeStdin`. v2 has no interrupt; container finishes current work and polls again
|
|
2. **Synchronous drain cascade** — v1's `drainGroup` immediately ran the next item; v2 discovers it on the next poll tick (~1s active, ~60s sweep)
|
|
3. **In-memory task dedup** — v1 checked pending-task list before enqueue. v2 can have two task rows with the same series_id coexisting (both pending) — relies on atomic `status` update for single-execution, best-effort
|
|
4. **Priority ordering** — v1 tasks preempted messages; v2 is timestamp-ordered only
|
|
|
|
## Behavioral discrepancies
|
|
| Aspect | v1 | v2 |
|
|
|---|----|----|
|
|
| Wake trigger | on enqueue (sync) | on `wakeContainer()` call, or poll finding due message |
|
|
| Idle timeout | implicit via IPC | explicit heartbeat mtime (10 min) |
|
|
| Task ordering | FIFO within group, tasks preempt messages | `process_after` timestamp; ties by insert seq |
|
|
| Retry | host `scheduleRetry()` | host sweep detects stale, increments `tries`, sets backoff |
|
|
| Concurrency cap | same | same (enforced in `spawnContainer` dedup) |
|
|
|
|
## Worth preserving?
|
|
1. **Explicit task dedup** — add `(kind, series_id, session_id)` unique index on `messages_in`, or dedup in `host-sweep.ts` before inserting retry rows. Currently best-effort via atomic status update
|
|
2. **Priority ordering** — add a `priority` column or document the ~1s task-wake latency as the SLA
|
|
3. **Idle preemption** — not critical; 1s polling is acceptable for most workflows
|
|
4. **Fairness** — v1's `drainWaiting` ensured no group starved. v2 is fair by timestamp but untested under concurrent load. Monitor in production
|