- CLAUDE.md: new key files, updated groups verbs, rewritten self-mod section, new Container Config and Container Restart sections - db-central.md: container_configs table (§1.15), migrations 014+015 - db-session.md: messages_in schema with trigger, source_session_id, on_wake columns - schema.ts: comment no longer references disk-based config - cli.instructions.md: rewritten for scope-aware usage, auto-fill, restart/config ops, group-scoped examples Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
24 KiB
⚠️ STOP — READ THIS FIRST IF YOU ARE CLAUDE ⚠️
If you are reading this because you just ran git pull, git merge, git fetch && git merge, or any equivalent to bring in upstream changes — and you see merge conflicts or a large diff involving this file — HALT IMMEDIATELY.
This is NanoClaw v2, a ground-up rewrite with breaking changes throughout. It cannot be merged into an existing v1 install. Attempting to resolve the conflicts by hand, run builds, or "fix" anything will corrupt the user's install and burn tokens for no result.
Do this instead:
- Abort the merge:
git merge --abort(orgit reset --hard ORIG_HEADif the merge already completed). - Tell the user, verbatim: "This is the v2 rewrite — it can't be merged into your existing install. Exit Claude Code (or open a separate terminal) and run
bash migrate-v2.shfrom the shell." - Wait for the user to confirm before doing anything else. Do not run the migration script yourself — it requires an interactive terminal and cannot be run from within Claude Code.
If you are a fresh install (you ran git clone, not git pull) and there are no conflicts, ignore this banner and continue below.
NanoClaw
Personal Claude assistant. See README.md for philosophy and setup. Architecture lives in docs/.
Quick Context
The host is a single Node process that orchestrates per-session agent containers. Platform messages land via channel adapters, route through an entity model (users → messaging groups → agent groups → sessions), get written into the session's inbound DB, and wake a container. The agent-runner inside the container polls the DB, calls Claude, and writes back to the outbound DB. The host polls the outbound DB and delivers through the same adapter.
Everything is a message. There is no IPC, no file watcher, no stdin piping between host and container. The two session DBs are the sole IO surface.
Entity Model
users (id "<channel>:<handle>", kind, display_name)
user_roles (user_id, role, agent_group_id) — owner | admin (global or scoped)
agent_group_members (user_id, agent_group_id) — unprivileged access gate
user_dms (user_id, channel_type, messaging_group_id) — cold-DM cache
agent_groups (workspace, memory, CLAUDE.md, personality, container config)
↕ many-to-many via messaging_group_agents (session_mode, trigger_rules, priority)
messaging_groups (one chat/channel on one platform; unknown_sender_policy)
sessions (agent_group_id + messaging_group_id + thread_id → per-session container)
Privilege is user-level (owner/admin), not agent-group-level. See docs/isolation-model.md for the three isolation levels (agent-shared, shared, separate agents).
Two-DB Session Split
Each session has two SQLite files under data/v2-sessions/<session_id>/:
inbound.db— host writes, container reads.messages_in, routing, destinations, pending_questions, processing_ack.outbound.db— container writes, host reads.messages_out, session_state.
Exactly one writer per file — no cross-mount lock contention. Heartbeat is a file touch at /workspace/.heartbeat, not a DB update. Host uses even seq numbers, container uses odd.
Central DB
data/v2.db holds everything that isn't per-session: users, user_roles, agent_groups, messaging_groups, wiring, pending_approvals, user_dms, chat_sdk_* (for the Chat SDK bridge), schema_version. Migrations live at src/db/migrations/.
For ad-hoc queries from skills or scripts, use the in-tree wrapper rather than the sqlite3 CLI: pnpm exec tsx scripts/q.ts <db> "<sql>". The host setup intentionally avoids depending on the sqlite3 binary (setup/verify.ts:5); the wrapper goes through the better-sqlite3 dep that setup already installs and verifies. Default-output format matches sqlite3 -list (pipe-separated, no header) so existing skill text reads identically.
Key Files
| File | Purpose |
|---|---|
src/index.ts |
Entry point: init DB, migrations, channel adapters, delivery polls, sweep, shutdown |
src/router.ts |
Inbound routing: messaging group → agent group → session → inbound.db → wake |
src/delivery.ts |
Polls outbound.db, delivers via adapter, handles system actions (schedule, approvals, etc.) |
src/host-sweep.ts |
60s sweep: processing_ack sync, stale detection, due-message wake, recurrence |
src/session-manager.ts |
Resolves sessions; opens inbound.db / outbound.db; manages heartbeat path |
src/container-runner.ts |
Spawns per-agent-group Docker containers with session DB + outbox mounts, OneCLI ensureAgent |
src/container-runtime.ts |
Runtime selection (Docker vs Apple containers), orphan cleanup |
src/modules/permissions/access.ts |
canAccessAgentGroup — owner / global admin / scoped admin / member resolution against user_roles + agent_group_members |
src/modules/approvals/primitive.ts |
pickApprover, pickApprovalDelivery, requestApproval, approval-handler registry |
src/command-gate.ts |
Router-side admin command gate — queries user_roles directly (no env var, no container-side check) |
src/onecli-approvals.ts |
OneCLI credentialed-action approval bridge |
src/user-dm.ts |
Cold-DM resolution + user_dms cache |
src/group-init.ts |
Per-agent-group filesystem scaffold (CLAUDE.md, skills, agent-runner-src overlay) |
src/db/container-configs.ts |
CRUD for container_configs table (per-group container runtime config) |
src/backfill-container-configs.ts |
Migrates legacy container.json files into the DB on startup |
src/container-restart.ts |
Kill + on-wake respawn for agent group containers |
src/db/ |
DB layer — agent_groups, messaging_groups, sessions, container_configs, user_roles, user_dms, pending_*, migrations |
src/channels/ |
Channel adapter infra (registry, Chat SDK bridge); specific channel adapters are skill-installed from the channels branch |
src/providers/ |
Host-side provider container-config (claude baked in; opencode etc. installed from the providers branch) |
container/agent-runner/src/ |
Agent-runner: poll loop, formatter, provider abstraction, MCP tools, destinations |
container/skills/ |
Container skills mounted into every agent session (onecli-gateway, welcome, self-customize, agent-browser, slack-formatting) |
groups/<folder>/ |
Per-agent-group filesystem (CLAUDE.md, skills, per-group agent-runner-src/ overlay) |
scripts/init-first-agent.ts |
Bootstrap the first DM-wired agent (used by /init-first-agent skill) |
migrate-v2.sh + setup/migrate-v2/ |
v1→v2 migration. Standalone script: bash migrate-v2.sh. Seeds DB, copies groups/sessions, installs channels, builds container, offers service switchover, then hands off to /migrate-from-v1 skill for owner setup and CLAUDE.md cleanup. See docs/migration-dev.md. |
Admin CLI (ncl)
ncl queries and modifies the central DB — agent groups, messaging groups, wirings, users, roles, and more. On the host it connects via Unix socket (src/cli/socket-server.ts); inside containers it uses the session DB transport (container/agent-runner/src/cli/ncl.ts).
ncl <resource> <verb> [<id>] [--flags]
ncl <resource> help
ncl help
| Resource | Verbs | What it is |
|---|---|---|
| groups | list, get, create, update, delete, restart, config get/update, config add-mcp-server/remove-mcp-server, config add-package/remove-package | Agent groups (workspace, personality, container config) |
| messaging-groups | list, get, create, update, delete | A single chat/channel on one platform |
| wirings | list, get, create, update, delete | Links a messaging group to an agent group (session mode, triggers) |
| users | list, get, create, update | Platform identities (<channel>:<handle>) |
| roles | list, grant, revoke | Owner / admin privileges (global or scoped to an agent group) |
| members | list, add, remove | Unprivileged access gate for an agent group |
| destinations | list, add, remove | Where an agent group can send messages |
| sessions | list, get | Active sessions (read-only) |
| user-dms | list | Cold-DM cache (read-only) |
| dropped-messages | list | Messages from unregistered senders (read-only) |
| approvals | list, get | Pending approval requests (read-only) |
Key files: src/cli/dispatch.ts (dispatcher + approval handler), src/cli/crud.ts (generic CRUD registration), src/cli/resources/ (per-resource definitions).
Channels and Providers (skill-installed)
Trunk does not ship any specific channel adapter or non-default agent provider. The codebase is the registry/infra; the actual adapters and providers live on long-lived sibling branches and get copied in by skills:
channelsbranch — Discord, Slack, Telegram, WhatsApp, Teams, Linear, GitHub, iMessage, Webex, Resend, Matrix, Google Chat, WhatsApp Cloud (+ helpers, tests, channel-specific setup steps). Installed via/add-<channel>skills.providersbranch — OpenCode (and any future non-default agent providers). Installed via/add-opencode.
Each /add-<name> skill is idempotent: git fetch origin <branch> → copy module(s) into the standard paths → append a self-registration import to the relevant barrel → pnpm install <pkg>@<pinned-version> → build.
Self-Modification
One tier of agent self-modification today:
install_packages/add_mcp_server— changes to the per-agent-group container config in the DB (apt/npm deps, wire an existing MCP server). Single admin approval per request; on approve, the handler insrc/modules/self-mod/apply.tsrebuilds the image when needed (install_packagesonly), writes anon_wakemessage, kills the container, and respawns viaonExitcallback. The on-wake message is only picked up by the fresh container's first poll — dying containers can never steal it.container/agent-runner/src/mcp-tools/self-mod.ts.
A second tier (direct source-level self-edits via a draft/activate flow) is planned but not yet implemented.
Container Config
Per-agent-group container runtime config (provider, model, packages, MCP servers, mounts, etc.) lives in the container_configs table in the central DB. Materialized to groups/<folder>/container.json at spawn time so the container runner can read it. Managed via ncl groups config get/update and the self-mod MCP tools.
cli_scope — controls what the agent can do with ncl from inside the container:
| Value | Behavior |
|---|---|
disabled |
Agent never learns about ncl (instructions excluded from CLAUDE.md). Host dispatch rejects any cli_request. |
group (default) |
Agent can access groups, sessions, destinations, members only, scoped to its own agent group. --id and group args are auto-filled. Cross-group access rejected. cli_scope changes blocked. |
global |
Unrestricted. Set automatically for owner agent groups via init-first-agent. |
Key files: src/db/container-configs.ts, src/container-config.ts, src/cli/dispatch.ts (scope enforcement), src/claude-md-compose.ts (instructions exclusion).
Container Restart
ncl groups restart --id <group-id> [--rebuild] [--message <text>]. Kills running containers; if --message is provided, writes an on_wake message and respawns via onExit callback. Without --message, containers come back on the next user message. From inside a container, --id is auto-filled and only the calling session is restarted.
The on_wake column on messages_in ensures wake messages are only picked up by a fresh container's first poll iteration. This prevents the race where a dying container (still in its SIGTERM grace period) could steal the message. killContainer accepts an optional onExit callback that fires after the process exits, guaranteeing the old container is gone before the new one spawns.
Key files: src/container-restart.ts, src/container-runner.ts (killContainer), container/agent-runner/src/db/messages-in.ts (getPendingMessages).
Secrets / Credentials / OneCLI
API keys, OAuth tokens, and auth credentials are managed by the OneCLI gateway. Secrets are injected into per-agent containers at request time — none are passed in env vars or through chat context. The container agent sees this via the onecli-gateway container skill (container/skills/onecli-gateway/SKILL.md), which teaches it how the proxy works, how to handle auth errors, and to never ask for raw credentials. Host-side wiring: src/onecli-approvals.ts, ensureAgent() in container-runner.ts. Run onecli --help.
Gotcha: auto-created agents start in selective secret mode
When the host first spawns a session for a new agent group, container-runner.ts:385 calls onecli.ensureAgent({ name, identifier }). The OneCLI POST /api/agents endpoint creates the agent in selective secret mode — meaning no secrets are assigned to it by default, even if the secrets exist in the vault and have host patterns that would otherwise match.
Symptom: container starts, the proxy + CA cert are wired correctly, but the agent gets 401 Unauthorized (or similar) from APIs whose credentials are in the vault. The credential just isn't in this agent's allow-list.
The SDK does not expose setSecretMode — the only fix is the CLI (or the web UI at http://127.0.0.1:10254).
# Find the agent (identifier is the agent group id)
onecli agents list
# Flip to "all" so every vault secret with a matching host pattern gets injected
onecli agents set-secret-mode --id <agent-id> --mode all
# Or, stay selective and assign specific secrets
onecli secrets list # find secret ids
onecli agents set-secrets --id <agent-id> --secret-ids <id1>,<id2>
# Inspect what an agent currently has
onecli agents secrets --id <agent-id> # secrets assigned to this agent
onecli secrets list # all vault secrets (with host patterns)
If you've just enabled mode all, no container restart is needed — the gateway looks up secrets per request, so the next API call from the running container will see the new credentials.
Requiring approval for credential use
Approval-gating credentialed actions is a two-sided flow:
- Server-side (OneCLI gateway): decides when to hold a request and emit a pending approval. As of
onecli@1.3.0, the CLI does not expose this —rules create --actiononly acceptsblockorrate_limit, andsecrets createhas no approval flag. Approval policies must be configured via the OneCLI web UI athttp://127.0.0.1:10254. If/when the CLI grows anapproveaction, this section needs updating. - Host-side (nanoclaw): receives pending approvals and routes them to a human.
src/modules/approvals/onecli-approvals.tsregisters a callback viaonecli.configureManualApproval(cb)(long-pollsGET /api/approvals/pending). The callback usespickApprover+pickApprovalDeliveryfromsrc/modules/approvals/primitive.tsto DM an approver. Approvers are resolved from theuser_rolestable — preference order: scoped admins for the agent group → global admins → owners. There is no env var likeNANOCLAW_ADMIN_USER_IDS; roles are persisted in the central DB only.
If approvals are configured server-side but the host callback isn't running (or throws), every credentialed call hangs until the gateway times out. Conversely, if the gateway has no rule asking for approval, the host callback never fires regardless of how it's wired.
Skills
Four types of skills. See CONTRIBUTING.md for the full taxonomy.
- Channel/provider install skills — copy the relevant module(s) in from the
channelsorprovidersbranch, wire imports, install pinned deps (e.g./add-discord,/add-slack,/add-whatsapp,/add-opencode). - Utility skills — ship code files alongside
SKILL.md(e.g./claw). - Operational skills — instruction-only workflows (
/setup,/debug,/customize,/init-first-agent,/manage-channels,/init-onecli,/update-nanoclaw). - Container skills — loaded inside agent containers at runtime (
container/skills/:onecli-gateway,welcome,self-customize,agent-browser,slack-formatting).
| Skill | When to Use |
|---|---|
/setup |
First-time install, auth, service config |
/init-first-agent |
Bootstrap the first DM-wired agent (channel pick → identity → wire → welcome DM) |
/manage-channels |
Wire channels to agent groups with isolation level decisions |
/customize |
Adding channels, integrations, behavior changes |
/debug |
Container issues, logs, troubleshooting |
/update-nanoclaw |
Bring upstream updates into a customized install |
/init-onecli |
Install OneCLI Agent Vault and migrate .env credentials |
Contributing
Before creating a PR, adding a skill, or preparing any contribution, you MUST read CONTRIBUTING.md. It covers accepted change types, the four skill types and their guidelines, SKILL.md format rules, and the pre-submission checklist.
PR Hygiene
Before creating a PR, run these checks:
git diff upstream/main --stat HEAD
git log upstream/main..HEAD --oneline
Show the output and wait for approval. Installation-specific files (group files, .claude/settings.json, local configs) should not be included.
Development
Run commands directly — don't tell the user to run them.
# Host (Node + pnpm)
pnpm run dev # Host with hot reload
pnpm run build # Compile host TypeScript (src/)
./container/build.sh # Rebuild agent container image (nanoclaw-agent:latest)
pnpm test # Host tests (vitest)
# Agent-runner (Bun — separate package tree under container/agent-runner/)
cd container/agent-runner && bun install # After editing agent-runner deps
cd container/agent-runner && bun test # Container tests (bun:test)
Container typecheck is a separate tsconfig — if you edit container/agent-runner/src/, run pnpm exec tsc -p container/agent-runner/tsconfig.json --noEmit from root (or bun run typecheck from container/agent-runner/).
Service management:
# macOS (launchd)
launchctl load ~/Library/LaunchAgents/com.nanoclaw.plist
launchctl unload ~/Library/LaunchAgents/com.nanoclaw.plist
launchctl kickstart -k gui/$(id -u)/com.nanoclaw # restart
# Linux (systemd)
systemctl --user start|stop|restart nanoclaw
Troubleshooting
Check these first when something goes wrong:
| What | Where |
|---|---|
| Host logs | logs/nanoclaw.error.log first (delivery failures, crash-loop backoff, warnings), then logs/nanoclaw.log for the full routing chain |
| Setup logs | logs/setup.log (overall), logs/setup-steps/*.log (per-step: bootstrap, environment, container, onecli, mounts, service, etc.) |
| Session DBs | data/v2-sessions/<agent-group>/<session>/ — inbound.db (messages_in: did the message reach the container?), outbound.db (messages_out: did the agent produce a response?) |
Note: container logs are lost after the container exits (--rm flag). If the agent silently failed inside the container, there's no persistent log to inspect.
Supply Chain Security (pnpm)
This project uses pnpm with minimumReleaseAge: 4320 (3 days) in pnpm-workspace.yaml. New package versions must exist on the npm registry for 3 days before pnpm will resolve them.
Rules — do not bypass without explicit human approval:
minimumReleaseAgeExclude: Never add entries without human sign-off. If a package must bypass the release age gate, the human must approve and the entry must pin the exact version being excluded (e.g.package@1.2.3), never a range.onlyBuiltDependencies: Never add packages to this list without human approval — build scripts execute arbitrary code during install.pnpm install --frozen-lockfileshould be used in CI, automation, and container builds. Never run barepnpm installin those contexts.
Docs Index
| Doc | Purpose |
|---|---|
| docs/architecture.md | Full architecture writeup |
| docs/api-details.md | Host API + DB schema details |
| docs/db.md | DB architecture overview: three-DB model, cross-mount rules, readers/writers map |
| docs/db-central.md | Central DB (data/v2.db) — every table + migration system |
| docs/db-session.md | Per-session inbound.db + outbound.db schemas + seq parity |
| docs/agent-runner-details.md | Agent-runner internals + MCP tool interface |
| docs/isolation-model.md | Three-level channel isolation model |
| docs/setup-wiring.md | What's wired, what's open in the setup flow |
| docs/architecture-diagram.md | Diagram version of the architecture |
| docs/build-and-runtime.md | Runtime split (Node host + Bun container), lockfiles, image build surface, CI, key invariants |
| docs/v1-to-v2-changes.md | v1→v2 architecture diff — vocabulary for where v1 things moved |
| docs/migration-dev.md | Migration development guide — testing, debugging, dev loop |
Container Build Cache
The container buildkit caches the build context aggressively. --no-cache alone does NOT invalidate COPY steps — the builder's volume retains stale files. To force a truly clean rebuild, prune the builder then re-run ./container/build.sh.
Container Runtime (Bun)
The agent container runs on Bun; the host runs on Node (pnpm). They communicate only via session DBs — no shared modules. Details and rationale: docs/build-and-runtime.md.
Gotchas — trigger + action:
- Adding or bumping a runtime dep in
container/agent-runner/→ editpackage.json, thencd container/agent-runner && bun installand commit the updatedbun.lock. Do not runpnpm installthere — agent-runner is not a pnpm workspace. - Bumping
@anthropic-ai/claude-agent-sdk,@modelcontextprotocol/sdk, or any agent-runner runtime dep → nominimumReleaseAgepolicy applies to this tree. Check the release date on npm, pin deliberately, neverbun updateblindly. - Writing a new named-param SQL insert/update in the container → use
$namein both SQL and JS keys:.run({ $id: msg.id }).bun:sqlitedoes not auto-strip the prefix the waybetter-sqlite3does on the host. Positional?params work normally. - Adding a test in
container/agent-runner/src/→ import frombun:test, notvitest. Vitest runs on Node and can't loadbun:sqlite.vitest.config.tsexcludes this tree. - Adding a Node CLI the agent invokes at runtime (like
agent-browser,claude-code,vercel) → put it in the Dockerfile's pnpm global-install block, pinned to an exact version via a newARG. Don't usebun install -g— that bypasses the pnpm supply-chain policy. - Changing the Dockerfile entrypoint or the dynamic-spawn command (
src/container-runner.tsline ~301) → keepexec bun ...so signals forward cleanly. The image has no/app/dist; don't reintroduce a tsc build step. - Changing session-DB pragmas (
container/agent-runner/src/db/connection.ts) →journal_mode=DELETEis load-bearing for cross-mount visibility. Read the comment block at the top of the file first.
CJK font support
Agent containers ship without CJK fonts by default (~200MB saved). If you notice signals the user works with Chinese/Japanese/Korean content — conversing in CJK, CJK timezone (e.g., Asia/Tokyo, Asia/Shanghai, Asia/Seoul, Asia/Taipei, Asia/Hong_Kong), system locale hint, or mentions of needing to render CJK in screenshots/PDFs/scraped pages — offer to enable it:
# Ensure .env has INSTALL_CJK_FONTS=true (overwrite or append)
grep -q '^INSTALL_CJK_FONTS=' .env && sed -i.bak 's/^INSTALL_CJK_FONTS=.*/INSTALL_CJK_FONTS=true/' .env && rm -f .env.bak || echo 'INSTALL_CJK_FONTS=true' >> .env
# Rebuild and restart so new sessions pick up the new image
./container/build.sh
launchctl kickstart -k gui/$(id -u)/com.nanoclaw # macOS
# systemctl --user restart nanoclaw # Linux
container/build.sh reads INSTALL_CJK_FONTS from .env and passes it through as a Docker build-arg. Without CJK fonts, Chromium-rendered screenshots and PDFs containing CJK text show tofu (empty rectangles) instead of characters.