mirror of
https://github.com/qwibitai/nanoclaw.git
synced 2026-06-12 18:11:51 +08:00
47950671fa
- docs/v1-vs-v2/: full v1→v2 regression analysis (SUMMARY + 21 per-module docs + ACTION-ITEMS rollup with decisions + timezone recreation spec). - container/agent-runner/scripts/sdk-signal-probe.ts: empirical harness used to characterise Claude Agent SDK event/hook/stderr timing for the stuck-detection design in item 9. - src/channels/chat-sdk-bridge.ts: document the conversations Map staleness in a code comment; fix deferred to when dynamic group registration lands (ACTION-ITEMS item 17). No runtime behavior change. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
4.6 KiB
4.6 KiB
container-runner: v1 vs v2
Scope
- v1:
src/v1/container-runner.ts(677 LOC) +container-runner.test.ts(204 LOC) — spawn + IPC plumbing + stdin/stdout JSON + process supervision + output-marker parsing - v2:
src/container-runner.ts(405 LOC) +src/container-config.ts(114 LOC) +src/session-manager.ts(DB paths). Net ~272 LOC removed by eliminating IPC and output parsing
Capability map
| v1 behavior | v2 location | Status | Notes |
|---|---|---|---|
| Image selection | container-runner.ts:348-349 |
kept | Reads imageTag from container.json or env |
| Env injection | container-runner.ts:266-284 |
changed | Replaced IPC vars with SESSION_INBOUND/OUTBOUND_DB_PATH, SESSION_HEARTBEAT_PATH, AGENT_PROVIDER, NANOCLAW_* admin IDs |
| Volume mounts | container-runner.ts:200-252 |
changed | Removed per-group IPC dir; added session folder /workspace + agent group /workspace/agent |
| Mount validation | container-runner.ts:240-244 |
kept | Validates additionalMounts from container.json |
| Provider integration | container-runner.ts:184-198 |
new | resolveProviderContribution() wires provider host-side configs |
| stdin/stdout IPC | — | removed | v1 lines 318-387; v2 uses DB polling only; stdio=['ignore','pipe','pipe'] |
| Process spawn | container-runner.ts:119 |
kept | |
OneCLI ensureAgent + applyContainerConfig |
container-runner.ts:301-313 |
enhanced | v2 calls ensureAgent first |
| Admin ID injection | container-runner.ts:289-295 |
new | Queries getOwners/getGlobalAdmins/getAdminsOfAgentGroup at wake |
| Idle timeout | container-runner.ts:135-140 |
changed | v2 uses resetIdle() callback on activeContainers entry, settable by delivery.ts |
| Timeout logic | — | removed | v1 had configurable per-group timeout reset on output markers |
| Output parsing | — | removed | v1 parsed ---NANOCLAW_OUTPUT_START/END--- from stdout; v2 ignores stdout |
| Streaming output callback | — | removed | v1 had onOutput() for real-time delivery |
| Per-exit log file | — | removed | v1 wrote groups/<folder>/logs/container-*.log with full I/O; v2 only logs stderr to logger.debug |
| Graceful SIGTERM→SIGKILL | — | simplified | v2 just calls stopContainer() |
| Concurrent wake dedup | container-runner.ts:44-82 |
new | wakePromises Map prevents race on spawn |
| Per-group image builds | container-runner.ts:357-405 |
new | buildAgentGroupImage() writes imageTag |
| Session folder init | container-runner.ts:210 |
new | initGroupFilesystem() at spawn |
Heartbeat file /workspace/.heartbeat |
session-manager.ts | new | File-touch replaces IPC liveness |
Task/group JSON snapshots (current_tasks.json, available_groups.json) |
— | removed | v2 pushes data via inbound.db writeDestinations/writeSessionRouting |
| Container name | container-runner.ts:103 |
changed | nanoclaw-v2-${folder}-${Date.now()} |
Missing from v2
- Streaming output markers —
---NANOCLAW_OUTPUT_START/END---enabled pre-completion delivery; v2 must wait for outbound.db poll to deliver results - Configurable per-group timeout —
group.containerConfig.timeoutoverride is gone; all groups shareIDLE_TIMEOUT - Per-exit detailed logs — v1 wrote timestamped logs with full I/O + mounts + stderr + stdout; invaluable for post-mortem
- Graceful-stop sentinel — v1 sent SIGTERM and waited for
_closemarker before SIGKILL - JSON snapshots for tasks/groups —
current_tasks.json/available_groups.jsonin the group IPC dir
Behavioral discrepancies
- Async result model: v1
runContainerAgent()returnedPromise<ContainerOutput>with inline result; v2wakeContainer()is fire-and-forget — results asynchronous via delivery poll - No stdin: v1 wrote full
ContainerInputJSON to stdin; v2 container reads everything from inbound.db - Admin injection at wake: v2 queries admins fresh on every spawn (
NANOCLAW_ADMIN_USER_IDS) - Destination routing timing: v2 calls
writeDestinations()+writeSessionRouting()on every wake so changes apply without restart - Session lifecycle: v1 created a session per spawn; v2 resolves session via router before wake
Worth preserving?
- Streaming output: Meaningful latency improvement. Hybrid model (DB polling + optional marker pre-delivery) could reduce perceived latency for long outputs
- Per-group timeout: Restore — different agent groups have different expected latencies
- Per-exit logs: At minimum, restore on non-zero exit. Cheap forensics, huge debug value
- Graceful-stop sentinel: Not critical — bun container is disposable