Files
nanoclaw/docs/v1-to-v2-changes.md
T
gabi-simons 3ee7d2147e feat: add v1 → v2 migration to setup flow (experimental)
`bash nanoclaw.sh` detects a v1 install before channel pairing and does a
best-effort automated port of operationally important state. Hands off to
a new `/migrate-from-v1` skill for owner seeding and fork customizations.

Between the timezone and channel steps, `setup/auto.ts` calls
`runMigrateV1()` which orchestrates these registered sub-steps (each a
separate entry in the progression log with its own raw log + status
block — failures never abort the chain):

- **migrate-detect** — scans siblings of the v2 checkout + common $HOME
  locations; `$NANOCLAW_V1_PATH` overrides authoritatively. Relaxed
  `package.json` check lets forks + partial installs still match; DB
  presence is the strongest signal.
- **migrate-validate** — asserts v1 DB shape (tables + required
  columns); writes `schema-mismatch.json` on failure. Subsequent steps
  short-circuit their DB-dependent parts but still run.
- **migrate-db** — seeds `agent_groups` + `messaging_groups` +
  `messaging_group_agents` from v1's `registered_groups`. JID
  decomposition (`dc:123` → `channel_type='discord'`,
  `platform_id='discord:123'`); `trigger_pattern` + `requires_trigger`
  → `engage_mode` + `engage_pattern` (mirrors migration 010 backfill).
  Users + user_roles are NOT seeded — the skill does that with an owner
  interview. Idempotent: existing rows reused, not duplicated.
- **migrate-groups** — rsync group folders. v1 `CLAUDE.md` → v2
  `CLAUDE.local.md` (v2 composes `CLAUDE.md` at container spawn); v1
  `container_config` JSON → `.v1-container-config.json` sidecar for the
  skill to translate. Tight v1-pattern scan (`/workspace/ipc/tasks`,
  `store/messages.db`, `[PR_CONTEXT:`, etc.) flags files referencing
  v1-specific infrastructure — content is NOT modified, just flagged in
  the handoff.
- **migrate-env** — merges v1 `.env` into v2 `.env`, never overwriting
  existing v2 keys.
- **migrate-channel-auth** — per-channel registry tracks v1 env keys,
  v2 required keys (with source-of-key instructions — e.g. Discord
  needs `DISCORD_PUBLIC_KEY` which v1 never stored), and candidate
  on-disk auth state paths (Baileys keystore, matrix sync state,
  etc.). Missing required v2 keys surface as actionable followups and
  flip the step to `partial`.
- **migrate-channels** — runs `setup/install-<channel>.sh` for each
  detected channel in non-interactive mode. Install-script output is
  captured to `logs/setup-migration/install-<channel>.log` sidecars
  (silent under the parent spinner). Channels with no v2 adapter get
  a `not_supported` followup but don't degrade status.
- **migrate-tasks** — v1 `scheduled_tasks` → `messages_in` rows with
  `kind='task'` in each session's `inbound.db`. `schedule_type`
  mapping (cron / interval / once → v2 cron). Idempotent: skips v1
  task ids already present. Inactive rows dumped to
  `inactive-tasks.json` for reference.

Everything writes to `logs/setup-migration/handoff.json` — the source
of truth the skill consumes.

`.claude/skills/migrate-from-v1/SKILL.md`:

- **Phase A** (always): owner seeding + v1 access policy flip
  (`unknown_sender_policy` public/strict) via `AskUserQuestion`. Pulls
  sender candidates from v1's `messages` table as hints.
- **Phase B** (if followups exist): walks
  `handoff.followups` — translates `.v1-container-config.json`
  sidecars, handles `not_supported` channels, fills in missing
  required keys with instructions on where to get them.
- **Phase C** (fork-aware): `git log <upstream>..HEAD` in v1. Empty →
  "no customizations to port." Non-empty → scope choice (mechanical /
  full interview / reference-only). Portable categories
  (`container/skills/*`, `.claude/skills/*`, docs) scan+copy with
  `scanForV1Patterns`. Non-portable (`src/*`,
  `container/agent-runner/src/*`) stash to `docs/v1-fork-reference/`
  — explicit "don't translate v1 infra to v2" warning because v1's
  IPC file queue / single DB don't exist in v2.

Clearly marked in README, CLAUDE.md, SKILL.md header, and via a `p.warn`
that fires once per run when v1 is detected. Users with no v1 install
see a silent skip — no prompts, no noise.

Verified end-to-end against a live v1 install (300 discord + 1
discord-supervisor groups, fork with ~15 commits of PR-factory work):
- Detect → validate → db (301 rows seeded) → groups (301 CLAUDE.local.md
  + 178 other files + 1 container_config sidecar) → env (4 keys copied)
  → channel-auth (flagged missing `DISCORD_APPLICATION_ID` +
  `DISCORD_PUBLIC_KEY`) → channels (discord installed, discord-supervisor
  → not_supported) → tasks (0 rows, skipped)
- Idempotent re-run: 0 rows created, 903 rows reused; tasks skip if
  id already present
- Fresh-user case: silent skip, no prompts, straight to "You're ready!"
- Schema-mismatch case: recorded to `schema-mismatch.json`, chain
  continues

- Unit tests for the pure transforms (`parseJid`,
  `inferChannelType`, `triggerToEngage`, `scanForV1Patterns`,
  `looksLikeV1Install`)
- Validate `requiredV2Keys` for telegram/slack/matrix/teams/webex/
  resend/linear against the actual Chat SDK packages (Discord was
  verified from real error output)
- Widen candidate auth file paths for WhatsApp/Matrix/iMessage based
  on real non-Discord v1 installs once we have some

See docs/v1-to-v2-changes.md for the v1 → v2 architecture diff.
2026-04-23 13:06:14 +00:00

12 KiB

NanoClaw v1 → v2 — what changed

Big-picture differences between NanoClaw v1 (the ~/nanoclaw checkout you've been running) and v2 (this rewrite). Not a migration guide — that's what bash nanoclaw.sh and the /migrate-from-v1 skill are for. This doc is the vocabulary: when something has moved or been renamed, find it here.

Read this before touching the migration code or porting customizations forward.


One-line summary

v1 was one Node process with one SQLite file and native channel adapters. v2 is a host that spawns per-session Docker containers, splits state across a central DB + per-session DB pair, routes through an explicit entity model, and installs channels as skills from a sibling branch.


Entity model — the biggest shift

v1: one flat table registered_groups(jid, name, folder, trigger_pattern, requires_trigger, is_main, channel_name). A group folder is the unit of agent identity. A chat (JID) is wired to exactly one folder, and trigger_pattern is an opaque regex the router applies to every incoming message.

v2: three tables, with a deliberate many-to-many in the middle:

agent_groups  ─┐
               ├─ messaging_group_agents ─┬─ messaging_groups
               │   (engage_mode,          │   (channel_type,
               │    engage_pattern,       │    platform_id,
               │    sender_scope,         │    unknown_sender_policy)
               │    ignored_message_policy,
               │    session_mode, priority)

Consequences:

  • One agent can answer on many chats, and one chat can fan out to many agents. v1 couldn't do either.
  • No is_main flag. Privilege is now explicit via user_roles (owner/admin, global or scoped). See below.
  • No trigger_pattern regex. Replaced with four orthogonal columns. Mapping rule used by the automated migration and by the /migrate-from-v1 skill:
    • v1 trigger_pattern non-empty → v2 engage_mode='pattern', engage_pattern = <the regex>
    • v1 requires_trigger=0 or pattern was ./.* → v2 engage_mode='pattern', engage_pattern='.' (the "always" flavor)
    • no pattern and requires a trigger → v2 engage_mode='mention'
    • sender_scope and ignored_message_policy are new; defaults all / drop
  • JID decomposition. v1's jid column stored dc:12345 / tg:67890. v2 splits this into channel_type + platform_id. Concretely: dc:12345 becomes channel_type='discord', platform_id='discord:12345'. Prefix aliases (dcdiscord, tgtelegram, wawhatsapp) are in setup/migrate-v1/shared.ts.
  • channel_name was unreliable in v1. Many rows had it empty; the actual channel had to be guessed from the JID prefix. v2's channel_type is always explicit.

Central DB vs session DBs

v1: one SQLite file at store/messages.db. Every chat, message, registered group, scheduled task, and session lived there. Host and any agent processes all opened the same file.

v2: three DB shapes.

  1. data/v2.dbcentral. Everything that isn't per-session: users, roles, agent groups, messaging groups, wirings, pending approvals, user DMs, schema migrations.
  2. data/v2-sessions/<session_id>/inbound.dbhost writes, container reads. messages_in, routing, destinations, pending questions, processing_ack. This is where scheduled tasks live (see "Scheduling" below).
  3. data/v2-sessions/<session_id>/outbound.dbcontainer writes, host reads. messages_out, session_state.

Exactly one writer per file. No cross-mount lock contention. Heartbeat is a file touch at /workspace/.heartbeat, not a DB update. Host uses even seq numbers, container uses odd.

Message history (v1 messages table, v1 chats table) is not migrated. The migration copies operationally important state forward (agents, channels, wirings, scheduled tasks, group folders) and leaves chat logs behind.


Scheduling

v1: dedicated scheduled_tasks table in store/messages.db with its own columns (schedule_type, schedule_value, next_run, last_run, context_mode, script, status). A separate cron-ish scheduler process read from it.

v2: scheduled tasks are messages_in rows with kind='task' in a session's inbound.db. Relevant columns:

  • process_after (ISO8601) — host sweep wakes the container when datetime(process_after) <= datetime('now')
  • recurrence — cron string; NULL = one-shot
  • series_id — groups recurring occurrences; set to the task id on first insert
  • statuspending | processing | completed | failed | paused

The public API is insertTask() in src/modules/scheduling/db.ts. Recurrence is computed in the user's TZ via cron-parser (see src/modules/scheduling/recurrence.ts). The migration maps v1's schedule_type+schedule_value pair into a single cron string before calling insertTask().

Tasks can exist before a session is awake — the host sweep creates/wakes the container on the first due tick.


Credentials

v1: .env — plain environment variables. DISCORD_BOT_TOKEN, ANTHROPIC_API_KEY, etc. The host read them directly and passed them in to any code that needed them.

v2: OneCLI Agent Vault. A separate local service at http://127.0.0.1:10254 holds secrets. Agents are scoped to specific secrets and the vault injects them into approved API requests as they leave the container. The container never sees the raw secret value.

Gotcha: auto-created agents default to selective secret mode — no secrets attached, even if matching secrets exist in the vault. See the "auto-created agents start in selective secret mode" section of the root CLAUDE.md for the fix (onecli agents set-secret-mode --mode all).

What the automated migration does: copies every v1 .env key verbatim into v2 .env, never overwriting existing v2 keys. The OneCLI vault migration is a separate step owned by the /init-onecli skill, which knows how to pull from .env.


Channel adapters

v1: native adapters (e.g. discord.js used directly) imported in src/channels/. Installing a channel meant editing code, adding a dependency, and setting env vars.

v2: channel adapters live on a sibling channels branch. Each /add-<channel> skill:

  1. git fetch origin channels
  2. git show channels:src/channels/<name>.ts > src/channels/<name>.ts
  3. Appends import './<name>.js'; to src/channels/index.ts
  4. pnpm install @chat-adapter/<name>@<pinned>
  5. pnpm run build

Idempotent — re-running is a no-op. Pinned versions keep the supply chain honest. The automated migration detects which channels were wired in v1 (via distinct channel_name / JID prefix) and runs the matching setup/install-<channel>.sh for each. Channels in v1 that don't have a v2 skill (rare now, more common as v2 catches up) are recorded in the handoff file for the /migrate-from-v1 skill to raise with the user.

Channel auth beyond .env. Some channels store session state on disk (Baileys WhatsApp keystore, Matrix sync state, iMessage tokens). The channel-auth sub-step has a per-channel registry (setup/migrate-v1/shared.ts: CHANNEL_AUTH_REGISTRY) that knows which file globs to copy alongside env keys.


Privilege — from implicit to explicit

v1: registered_groups.is_main = 1 flagged one group as the privileged one. No users table. Permissions were conventions, not enforced.

v2: explicit tables.

  • users(id = "<channel_type>:<handle>", kind, display_name) — one row per messaging-platform identifier
  • user_roles(user_id, role ∈ {owner, admin}, agent_group_id nullable, granted_by, granted_at) — owner is always global; admin can be global or scoped
  • agent_group_members(user_id, agent_group_id, ...) — "known" membership for the sender_scope='known' gate

Owner gets seeded during the /migrate-from-v1 skill's interview phase ("Which handle is you?"). The automated migration doesn't guess — v1 has no source of truth for it.

Default access — "anyone can talk to the bot" vs "only known users". v1 stored this implicitly (via trigger regex + is_main). v2 exposes it as messaging_groups.unknown_sender_policy ∈ {'strict', 'request_approval', 'public'}. The skill asks the user which mode v1 ran in and flips the migrated messaging groups accordingly.


Group folders on disk

v1: groups/<folder>/CLAUDE.md and optional logs/. CLAUDE.md was a plain instruction file, group-specific.

v2: each group still lives at groups/<folder>/, but the shape is richer:

  • CLAUDE.mdcomposed at container spawn from .claude-shared.md (symlink to global) + .claude-fragments/*.md (module fragments) + CLAUDE.local.md. Don't edit CLAUDE.md directly.
  • CLAUDE.local.md — per-group content. The migration writes v1's old CLAUDE.md here.
  • container.json — optional per-group container config (apt deps, env, mounts). v1's registered_groups.container_config JSON is close but not identical — the migration stores the v1 payload at groups/<folder>/.v1-container-config.json for the skill to reconcile, rather than silently mapping it.
  • .claude-fragments/ and .claude-shared.md are installed by initGroupFilesystem() the first time the host touches the group, so the migration only has to write CLAUDE.local.md and leave the scaffolding to the host.

Host process vs containers

v1: single Node process. The "agent" was the same process as the router.

v2: Node host at top, Bun-runtime Docker container per session. They communicate only via the two session DBs. No shared modules, no IPC, no stdin piping. If you wrote custom code that reached from the agent into host internals (or vice versa), that surface no longer exists — porting it is a /migrate-from-v1 skill topic, not a mechanical copy.

Lockfiles: host uses pnpm-lock.yaml, agent-runner uses bun.lock. minimumReleaseAge: 4320 on the host side (3-day supply-chain wait); agent-runner has no release-age gate.


Self-modification and MCP tools

v1: if you added MCP servers or self-modification plumbing, it was usually direct edits to the long-running process.

v2:

  • MCP servers register through container/agent-runner/src/mcp-tools/*.ts and load per-session. There's also install_packages and add_mcp_server self-mod tools that go through an admin-approval flow (src/modules/self-mod/apply.ts) before rebuilding the container image.
  • Custom MCP tools you wrote in v1 map cleanly to the v2 tool registry, but the import paths, runtime (Bun vs Node), and SQL helper differences (bun:sqlite uses $name-prefixed params) may need adjustment. The skill walks through this.

Things that are gone or don't map

  • scheduled_tasks as a separate table — moved into session inbound.db under kind='task'. Migration ports active rows; inactive/completed are exported to logs/setup-migration/inactive-tasks.json for reference.
  • messages / chats tables (chat history) — not migrated. Stay in the v1 checkout if you need them.
  • router_state (key/value) — not migrated. v2 state lives in the explicit tables above.
  • sessions (v1 group→session_id) — v1 sessions don't map; v2 sessions are keyed by (agent_group_id, messaging_group_id, thread_id) and are created on demand.
  • Raw access to the old store/messages.db — the v1 DB is left in place and untouched. If migration goes wrong you can re-run it (the migration sub-steps are idempotent for agents/channels/wirings; folders use rsync semantics).

Migration surface — where the code lives

  • setup/migrate-v1.ts — orchestrator called from setup/auto.ts between the timezone and channel steps.
  • setup/migrate-v1/<sub-step>.ts — registered in setup/index.ts STEPS; each runs under runQuietStep.
  • logs/setup.log — progression log. Each sub-step appends one entry.
  • logs/setup-steps/NN-migrate-<x>.log — raw per-sub-step stdout/stderr.
  • logs/setup-migration/handoff.json — summary of what was migrated, what failed, what was deferred. Read by the /migrate-from-v1 skill.
  • logs/setup-migration/schema-mismatch.json — written only if migrate-validate finds a v1 DB that doesn't match the expected shape. The skill uses this to decide what to hand back to the user.
  • logs/setup-migration/inactive-tasks.json — completed or stopped v1 scheduled_tasks, exported for reference.
  • .claude/skills/migrate-from-v1/SKILL.md — tells Claude how to finish anything the automation couldn't and then interview the user about custom code changes.