Rename the CLI binary, socket path, container wrapper, error prefixes,
and all references from `nc` to `ncl`. Add ~/.local/bin symlink during
setup and pnpm script alias.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Corepack with no version pin pulls latest pnpm (currently 11.0.8), which
silently stops honoring `only-built-dependencies[]=` in `.npmrc` for
global installs. The allowlist file ends up correctly written but
ignored, so:
- `@anthropic-ai/claude-code`'s postinstall — which downloads the
platform-native Claude binary — never runs. Agents then crash at
runtime with "claude native binary not installed... postinstall did
not run."
- `agent-browser`'s postinstall, which chmods the linux-arm64 binary,
is also skipped, so the binary fails with EPERM the first time it's
invoked.
Pin the container's pnpm to 10.33.0 (the same version host's
package.json already pins via `packageManager`). Keep the two in
lockstep so a host bump triggers a deliberate container bump.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Drop the nc MCP tool in favor of a standalone Bun CLI script at
container/agent-runner/src/cli/nc.ts. Same interface as host-side
bin/nc — all three callers (operator, Claude on host, agent in
container) now use the same nc CLI.
Container transport: writes cli_request to outbound.db (BEGIN
IMMEDIATE for seq safety), polls inbound.db for response, acks via
processing_ack. Dockerfile adds a /usr/local/bin/nc wrapper that
execs the mounted source.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
vercel@53.0.1 declares a dep on @vercel/static-build@2.9.22 which is not
published on npm (only 2.9.21 exists), breaking every fresh container
build that resolves vercel@latest.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Replace the per-group "written once at init, owned by the group" CLAUDE.md
with a host-regenerated entry point that imports:
- a shared base (`container/CLAUDE.md` mounted RO at `/app/CLAUDE.md`)
- optional per-skill fragments (skills that ship `instructions.md`)
- optional per-MCP-server fragments (inline `instructions` field in
`container.json`)
- per-group agent memory (`CLAUDE.local.md`, auto-loaded by Claude Code)
Principle: RW = per-group memory, RO = shared content. Source/skills/base
are shared; personality, config, working files, and Claude state stay
per-group.
Key changes:
- New `src/claude-md-compose.ts` — per-spawn composition +
`migrateGroupsToClaudeLocal()` one-time cutover.
- New `container/CLAUDE.md` — shared base, seeded verbatim from the
former `groups/global/CLAUDE.md`.
- `src/container-runner.ts` — swap `/workspace/global` mount for RO
`/app/CLAUDE.md`; call `composeGroupClaudeMd()` after
`initGroupFilesystem()`.
- `src/group-init.ts` — drop `.claude-global.md` symlink + initial
`CLAUDE.md` write; seed `CLAUDE.local.md` from `opts.instructions`.
- `src/index.ts` — call `migrateGroupsToClaudeLocal()` at startup.
- `src/container-config.ts` — add optional `instructions` field to
`McpServerConfig` (inline per-MCP guidance fragment).
- `container/Dockerfile` — drop dead `/workspace/global` mkdir.
- Remove obsolete `scripts/migrate-group-claude-md.ts`.
Migration (runs once at host startup, idempotent):
- Delete `.claude-global.md` symlinks in each group.
- Rename each `groups/<folder>/CLAUDE.md` → `CLAUDE.local.md`
(preserves existing per-group content as memory).
- Delete `groups/global/` directory.
Design docs: `docs/claude-md-composition.md` and `docs/shared-source.md`
(the latter is the sibling design discussion this refactor builds on).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Replace the per-group agent-runner-src copy model with a single shared
read-only mount. Source and skills are now RO + shared; personality,
config, working files, and Claude state stay RW + per-group.
Key changes:
- Mount container/agent-runner/src/ RO at /app/src (all groups share one copy)
- Mount container/skills/ RO at /app/skills; per-group skill selection via
symlinks in .claude-shared/skills/ based on container.json "skills" field
- Mount container.json as nested RO bind on top of RW group dir
- Move all NANOCLAW_* env vars to container.json (runner reads at startup)
- New runner config.ts module replaces process.env reads
- Move command gate (filtered/admin) from container to host router
- Dockerfile: remove source COPY, split CLI installs (claude-code last),
move agent-runner deps above CLIs for better layer caching
- Add writeOutboundDirect for router denial responses
- Design doc at docs/shared-src.md
Not included (follow-up): DB migration to drop agent_provider columns,
cleanup of orphaned agent-runner-src directories.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The Agent SDK's IPC protocol must match the Claude Code version. Also
allowlist @anthropic-ai/claude-code in only-built-dependencies so its
postinstall script runs during Docker build — without this, the native
binary is never installed and the SDK fails at spawn time.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
v2 ships with only claude baked in. opencode now lives on the `providers`
branch and gets copied in via the /add-opencode skill.
Removed:
- src/providers/opencode.ts
- container/agent-runner/src/providers/{opencode,mcp-to-opencode}.ts + test
- @opencode-ai/sdk from agent-runner package.json + bun.lock
- opencode-ai global install + OPENCODE_VERSION ARG from Dockerfile
- opencode self-registration imports from both provider barrels
- opencode test case from factory.test.ts
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Container side:
- agent-runner switches to Bun. Drops better-sqlite3 (native compile gone),
drops tsc build step in-image AND the tsc-on-every-session-wake in the
entrypoint — bun runs src/index.ts directly. bun:sqlite replaces
better-sqlite3; cross-mount DB invariants (journal_mode=DELETE, busy_timeout)
preserved. Named params converted from @name to $name because bun:sqlite
does not auto-strip the prefix the way better-sqlite3 does.
- Tests ported from vitest to bun:test (only describe/it/expect/before/afterEach
used, API-compatible). vitest.config.ts excludes container/agent-runner/.
- bun.lock replaces pnpm-lock.yaml + pnpm-workspace.yaml under
container/agent-runner/. Host pnpm workspace does NOT include this tree.
Dockerfile improvements (independent of Bun but bundled while touching the file):
- tini as PID 1 for correct SIGTERM propagation (prevents half-written
outbound.db on shutdown).
- Extracted entrypoint.sh — readable and diffable vs the old inline printf.
- BuildKit cache mounts for apt + bun install + pnpm install.
- --no-install-recommends on apt, pinned CLAUDE_CODE_VERSION, AGENT_BROWSER,
VERCEL, BUN_VERSION.
- CJK fonts (~200MB) behind ARG INSTALL_CJK_FONTS=false; build.sh reads from
.env; setup/container.ts reads the same .env so /setup and manual rebuild
stay in sync.
- PLAYWRIGHT_SKIP_BROWSER_DOWNLOAD=1 in case any postinstall tries to pull a
redundant Chromium.
- /home/node 755 (was 777).
Host side:
- src/container-runner.ts dynamic spawn command collapses from
`pnpm exec tsc --outDir /tmp/dist … && node /tmp/dist/index.js` to
`exec bun run /app/src/index.ts` — cold start ~200-500ms faster per wake.
CI:
- oven-sh/setup-bun@v2 alongside Node/pnpm. Adds explicit container
typecheck (was documented in CLAUDE.md, not enforced) and `bun test` for
agent-runner tests.
Enable corepack and PNPM_HOME in container, switch all npm/npx
invocations to pnpm/pnpm exec. Use wildcard COPY for optional
pnpm-lock.yaml in agent-runner.
- Add unregistered_senders table to capture dropped message origins
(one row per sender, upserted with message_count and last_seen)
- Add inbound DM logging to chat-sdk-bridge for debugging
- Add vercel CLI to base container image
- Install vercel-cli and frontend-engineer container skills
- Default requiresTrigger to false in register step
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The /workspace/ipc/* tree is a v1 leftover — v2 routes everything
through inbound.db / outbound.db. Refresh the surrounding comment to
describe what the entrypoint actually does.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
POSIX-style TZ strings like IST-2 cause a hard RangeError crash in
formatMessages because Intl.DateTimeFormat only accepts IANA identifiers.
- Add isValidTimezone/resolveTimezone helpers to src/timezone.ts
- Make formatLocalTime fall back to UTC on invalid timezone
- Validate TZ candidates in config.ts before accepting
- Add timezone setup step to detect and prompt when autodetection fails
- Use node:22-slim in Dockerfile (node:24-slim Trixie package renames)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat: implement credential proxy for enhanced container environment isolation
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: address PR review — bind proxy to loopback, scope OAuth injection, add tests
- Bind credential proxy to 127.0.0.1 instead of 0.0.0.0 (security)
- OAuth mode: only inject Authorization on token exchange endpoint
- Add 5 integration tests for credential-proxy.ts
- Remove dangling comment
- Extract host gateway into container-runtime.ts abstraction
- Update Apple Container skill for credential proxy compatibility
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: scope OAuth token injection by header presence instead of path
Path-based matching missed auth probe requests the CLI sends before
the token exchange. Now the proxy replaces Authorization only when
the container actually sends one, leaving x-api-key-only requests
(post-exchange) untouched.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: bind credential proxy to docker0 bridge IP on Linux
On bare-metal Linux Docker, containers reach the host via the bridge IP
(e.g. 172.17.0.1), not loopback. Detect the docker0 interface address
via os.networkInterfaces() and bind there instead of 0.0.0.0, so the
proxy is reachable by containers but not exposed to the LAN.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: bind credential proxy to loopback on WSL
WSL uses Docker Desktop with the same VM routing as macOS, so
127.0.0.1 is correct and secure. Without this, the fallback to
0.0.0.0 was triggered because WSL has no docker0 interface.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: detect WSL via /proc instead of env var
WSL_DISTRO_NAME isn't set under systemd. Use
/proc/sys/fs/binfmt_misc/WSLInterop which is always present on WSL.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Add fonts-noto-cjk to the container image so that Chromium renders
CJK (Chinese, Japanese, Korean) text correctly in screenshots and
PDF exports. Without this, CJK characters appear as empty rectangles.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: WA 515 stream error reconnect exiting early before key sync
Pass isReconnect flag on 515 reconnect so the registered-creds check
doesn't bail out before the handshake completes (caused "logging in..."
hang after successful pairing).
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: container permission errors on Docker with non-default uid
Make /home/node world-writable in the Dockerfile so the SDK can write
.claude.json. Add --user flag matching host uid/gid in container-runner
so bind-mounted files are accessible. Skip when running as root (uid 0),
as the container's node user (uid 1000), or on native Windows.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: write ASSISTANT_NAME to .env during setup
When a custom assistant name is chosen, persist it to .env so config.ts
picks it up at runtime. Uses temp file for cross-platform sed
compatibility (macOS/Linux/WSL).
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Pass secrets to the SDK via the `env` query option instead of setting
process.env, so Bash subprocesses never inherit API keys. Delete
/tmp/input.json immediately after reading to remove secrets from disk.
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Use a PreToolUse SDK hook to prepend `unset ANTHROPIC_API_KEY
CLAUDE_CODE_OAUTH_TOKEN` to every Bash command Kit runs, preventing
secret leakage via env/printenv/echo/$PROC. Secrets are now passed
via stdin JSON instead of mounted env files, closing all known
exfiltration vectors.
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
* feat: streaming container mode, IPC messaging, agent teams support
Major architectural shift from single-shot container runs to long-lived
streaming containers with IPC-based message injection.
- Agent runner: query loop with AsyncIterable prompt to keep stdin open
for agent teams (fixes isSingleUserTurn premature shutdown)
- New standalone stdio MCP server (ipc-mcp-stdio.ts) inheritable by
subagents, with send_message and schedule_task tools
- Streaming output: parse OUTPUT_START/END markers in real-time, send
results to WhatsApp as they arrive
- IPC file-based messaging: host writes to ipc/{group}/input/, agent
polls for follow-up messages without respawning containers
- Per-group settings.json with CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS=1
- SDK bumped to 0.2.34 for TeamCreate tool support
- Container idle timeout (30min) with _close sentinel for shutdown
- Orphaned container cleanup on startup
- alwaysRespond flag for groups that skip trigger pattern check
- Uncaught exception/rejection handlers with timestamps in logger
- Combined SDK documentation into single deep dive reference
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* chore: remove unused ipc-mcp.ts (replaced by ipc-mcp-stdio.ts)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: clarify agent communication model in docs and tool descriptions
- CLAUDE.md (main + global): split communication instructions into
"responding to messages" vs "scheduled tasks" sections
- send_message tool: note that scheduled task output is not sent to user
- Remove structured output (outputFormat) — not needed with current flow
- Regular output is sent to WhatsApp; scheduled task output is only logged
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* chore: ignore dynamic group data while preserving base structure
Only track groups/main/CLAUDE.md and groups/global/CLAUDE.md. All other
group directories and files are ignored to prevent tracking user-specific
session data.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: resolve critical bugs in streaming container mode
Bug 1 (scheduled task hang): Task scheduler now passes onOutput callback
with idle timer that writes _close sentinel after IDLE_TIMEOUT, so
containers exit cleanly instead of blocking queue slots for 30 minutes.
Scheduled tasks stay alive for interactive follow-up via IPC.
Bug 2 (timeout disabled): Remove resetTimeout() from stderr handler.
SDK writes debug logs continuously, resetting the timer on every line.
Timeout now only resets on actual output markers in stdout.
Bug 3 (trigger bypass): Piped messages in startMessageLoop now check
trigger pattern for non-main groups. Non-trigger messages accumulate in
DB and are pulled as context via getMessagesSince when a trigger arrives.
Bug 7 (non-atomic IPC writes): GroupQueue.sendMessage uses temp file +
rename for atomic writes, matching ipc-mcp-stdio.ts pattern.
Also: flip isVerbose back to false (debug leftover), add isScheduledTask
to host-side ContainerInput interface.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: idle timer not starting + scheduled task groupFolder missing
Two bugs that prevented the scheduled task idle timeout fix from working:
1. onOutput was only called when parsed.result !== null, but session
update markers have result: null. The idle timer never started for
"silent" query completions, leaving containers parked at
waitForIpcMessage until hard timeout.
2. Scheduler's onProcess callback didn't pass groupFolder to
queue.registerProcess, so closeStdin no-oped (groupFolder was null).
The _close sentinel was never written even when the idle timer fired.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: duplicate messages and timestamp rollback in piping path
Two bugs introduced by the trigger context accumulation change:
1. processGroupMessages didn't advance lastAgentTimestamp until after
the container finished. The piping path's getMessagesSince(lastAgent
Timestamp) re-fetched messages already sent as the initial prompt,
causing duplicates.
2. processGroupMessages overwrote lastAgentTimestamp with the original
batch timestamp on completion, rolling back any advancement made by
the piping path while the container was running.
Fix: advance lastAgentTimestamp immediately after building the prompt,
before starting the container. This matches the piping path behavior
and eliminates both the overlap and the rollback.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: container idles 30 extra minutes after _close during query
When _close was detected during pollIpcDuringQuery, it was consumed
(deleted) and stream.end() was called. But after runQuery returned,
main() still emitted a session-update marker (resetting the host's idle
timer) and called waitForIpcMessage (which polled forever since _close
was already gone). The container had to wait for a second _close.
Fix: runQuery now returns closedDuringQuery. When true, main() skips
the session-update marker and waitForIpcMessage, exiting immediately.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: resume branching, internal tags, and output forwarding
- Fix resume branching: pass resumeSessionAt with last assistant UUID
to anchor each query loop resume to the correct conversation tree
position. Prevents agent responses landing on invisible branches
when agent teams subagents create parallel JSONL entries.
- Add <internal> tag stripping: agent can wrap internal reasoning in
<internal> tags which are logged but not sent to WhatsApp. Prevents
duplicate messages and internal monologue reaching users.
- Forward scheduled task output: scheduled tasks now send result text
to WhatsApp (with <internal> stripping), matching regular message
behavior. No more special-case instructions.
- Update Communication guidance in CLAUDE.md: simplified to "your
output is sent to the user or group" with soft guidance on
<internal> tags and send_message usage.
- Add messaging behavior docs to schedule_task tool: prompts the
scheduling agent to include guidance on whether the task should
always/conditionally/never message the user.
- Mount security: containerPath now optional, defaults to basename
of hostPath.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: cursor rollback on error, flush guard, verbose logging
- Roll back lastAgentTimestamp on container error so retries can
re-process the messages instead of silently losing them.
- Add guard flag to flushOutgoingQueue to prevent duplicate sends
from concurrent flushes during rapid WA reconnects.
- Revert isVerbose from hardcoded false back to env-based check
(LOG_LEVEL=debug|trace).
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: orphan container cleanup was silently failing
The startup cleanup used `container ls --format {{.Names}}` which is
Docker Go-template syntax. Apple Container only supports `--format json`
or `--format table`. The command errored with exit code 64, but the
catch block silently swallowed it — orphan containers were never cleaned
up on restart.
Fixed to use `--format json` and parse `configuration.id` from the
JSON output. Also filters by `status: running` and logs a warning on
failure instead of silently catching.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* docs: add Discord badge and community section
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: idle timer reset on null results and flush queue message loss
- Only reset idle timer on actual results (non-null), not session-update
markers. Prevents containers staying alive 30 extra minutes after the
agent finishes work.
- flushOutgoingQueue now uses shift() instead of splice(0) so unattempted
messages stay in the queue if an unexpected error bails the loop.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* docs: add Agent Swarms to README
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: update Telegram skill for current architecture
Rewrite integration instructions to match the per-group queue/SQLite
architecture: remove onMessage callback pattern (store to DB, let
message loop pick up), fix startSchedulerLoop signature, add
TELEGRAM_ONLY service startup, SQLite registration, data/env/env sync,
@mention-to-trigger translation, and BotFather group privacy docs.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: Telegram skill message chunking, media placeholders, chat discovery
- Split long messages at Telegram's 4096 char limit to prevent silent
send failures
- Store placeholder text for non-text messages (photos, voice, stickers,
etc.) so the agent knows media was sent
- Update getAvailableGroups filter to include tg: chats so the agent can
discover and register Telegram chats via IPC
- Fix removal step numbering
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* docs: update REQUIREMENTS.md and SPEC.md for SQLite architecture
- Replace all registered_groups.json / sessions.json / router_state.json
references with SQLite equivalents
- Fix CONTAINER_TIMEOUT default (300000 → 1800000)
- Add missing config exports (IDLE_TIMEOUT, MAX_CONCURRENT_CONTAINERS)
- Update folder structure: add missing src files (logger, group-queue,
mount-security), remove non-existent utils.ts, list all skills
- Fix agent-runner entry (ipc-mcp.ts → ipc-mcp-stdio.ts)
- Update startup sequence to reflect per-group queue architecture
- Fix env mounting description (data/env/env, not extracted vars)
- Update troubleshooting to use sqlite3 commands
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* docs: fix README architecture description, revert SPEC.md env error
- README: update architecture blurb to mention per-group queue, add
group-queue.ts to key files, update file descriptions
- SPEC.md: restore correct credential filtering description (only auth
vars are extracted from .env, not the full file)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Container fixes:
- Run as non-root 'node' user (required for --dangerously-skip-permissions)
- Add allowDangerouslySkipPermissions: true to SDK options
- Mount .env file to work around Apple Container -i env var bug
- Use --mount for readonly, -v for read-write (Apple Container quirk)
- Bump SDK to 0.2.29, zod to v4
- Install Claude Code CLI globally in container
Logging improvements:
- Write per-run logs to groups/{folder}/logs/container-*.log
- Add debug-level logging for mounts and container args
Documentation:
- Add /debug skill with comprehensive troubleshooting guide
- Update /setup skill with API key configuration step
- Update SPEC.md with container details, mount syntax, security notes
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Agents run in isolated Linux VMs via Apple Container
- All groups get Bash access (safe - sandboxed in container)
- Browser automation via agent-browser + Chromium
- Per-group configurable additional directory mounts
- File-based IPC for messages and scheduled tasks
- Container image with Node.js 22, Chromium, agent-browser
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>