docs(migrate-from-v1): trim Phase 0 to intent only

Previous version spelled out launchctl/systemctl commands, log lines to grep for, diagnostic recipes — the agent reading this skill knows all of that. Keep only the parts that aren't obvious from the rest of the codebase: which steps are blocking vs deferred, the smoke-test ordering, and the non-destructive framing for the user. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-06-04 10:14:47 +08:00 · 2026-05-02 18:31:23 +03:00
parent 2617313f19
commit 2bc1279a12
1 changed files with 11 additions and 83 deletions
@@ -39,97 +39,25 @@ Once `handoff.json` exists, proceed to Phase 0.

 ## Phase 0: Get v2 routing real messages

-Goal: get from "the script finished" to "the user sent a message and v2 answered" as fast as possible — *before* spending tokens on CLAUDE.local.md cleanup, fork customisations, or anything that requires deeper engagement. v1 is paused, not touched; flipping back is a one-line restart.
+Before any deeper migration work, prove v2 actually answers messages on the user's real channels. v1 is paused, not touched — flipping back is a service restart.

-### 0a — Fix blockers (only the blockers)
+### 0a — Fix blockers only

-Walk `handoff.steps`. A step is **blocking** only if its failure prevents v2 from routing a single message. Treat these as blockers:
+A step is **blocking** if its failure stops the bot from routing one message. Re-run or hand-fix only these; defer everything else to its later phase:

-| Step | Why blocking |
-|------|--------------|
-| `1b-db` | No `messaging_groups` → router has nothing to match |
-| `1d-sessions` | No session → no inbound DB to write into |
-| `2c-install-<channel>` | No adapter for the channel the user wants to test |
-| `2d-whatsapp-lids` | WhatsApp DMs may arrive as `<lid>@lid` and miss migrated phone-keyed rows |
-| `3a-docker` / `3e-build` | No container image → agent can't run |
-| `3b-onecli` | Anthropic credentials not injected → first agent call 401s |
+| Blocking | Deferred |
+|---|---|
+| `1b-db`, `1d-sessions`, `2c-install-<channel>`, `2d-whatsapp-lids`, `3a-docker`, `3b-onecli`, `3e-build` | `1a-env`, `1c-groups`, `1e-tasks`, `2b-channel-auth`, `3c-auth` |

-**Defer** these — they don't block a smoke test, and most surface naturally in later phases:
+### 0b — Smoke test, then continue

- `1a-env`, `1c-groups`, `1e-tasks`, `2b-channel-auth`, `3c-auth`
+Tell the user the switch is non-destructive (v1 is paused, not modified; reverting is one command). Help them stop v1's service unit and start v2's, tail the host log for a clean boot, and have them send a real test message. Use `AskUserQuestion` to confirm the bot responded.

-For each blocker: read `handoff.step_logs_dir/<step>.log`, identify the cause, re-run the underlying script directly (`pnpm exec tsx setup/migrate-v2/<step>.ts <v1_path>`) or hand-fix mechanically. Use `AskUserQuestion` for judgment calls. Don't simulate the script's work.
+If yes, continue to Phase 1. If no, diagnose from `logs/nanoclaw.log` and re-test — don't proceed to deeper work on a broken router.

-Common blockers:
- **`1b-db` failed**: JID couldn't be parsed. Insert `agent_groups` + `messaging_groups` for the user's confirmed channel.
- **`2c-install-<channel>` failed**: `git fetch origin channels` issue. The user can run `bash setup/install-<channel>.sh` directly.
- **`3e-build` failed**: usually stale builder cache. `docker buildx prune -f && ./container/build.sh`.
+### Deferred failures

-### 0b — Smoke test before any further migration work
-
-Tell the user, verbatim:
-
-> Before we touch CLAUDE.local.md or fork customisations, let's confirm v2 actually answers your real messages. **This is non-destructive — v1 is just paused, not touched.** v1 and v2 share your WhatsApp identity (we copied `store/auth/` over), so only one can be online at a time, but flipping back is instant.
-
-Find the v2 service unit (per-checkout hash):
-
-```bash
-# macOS
-launchctl list | grep nanoclaw
-# Linux
-systemctl --user list-units 'nanoclaw*'
-```
-
-Switch v1 → v2:
-
-```bash
-# macOS
-launchctl unload ~/Library/LaunchAgents/com.nanoclaw.plist
-launchctl load   ~/Library/LaunchAgents/com.nanoclaw-v2-<hash>.plist
-
-# Linux
-systemctl --user stop nanoclaw
-systemctl --user start nanoclaw-v2-<hash>
-```
-
-Tail the log and confirm clean boot:
-
-```bash
-tail -f logs/nanoclaw.log
-```
-
-Watch for `NanoClaw running` plus `Channel adapter started` for each installed channel (and `Connected to <channel>` for native adapters like WhatsApp).
-
-Ask the user to send a real test message — a DM to the bot, or a post in a known group from a non-bot account. A working route logs an inbound event → session resolution → container spawn → outbound delivery.
-
-`AskUserQuestion`: *"Did v2 respond? — Yes / No, here's what happened."*
-
-**If yes**: continue to Phase 1.
-
-**If no**: do not proceed. Read `logs/nanoclaw.log` + `logs/nanoclaw.error.log` and diagnose. Common patterns:
- WhatsApp DM with no routing chain in the log → check `SELECT platform_id FROM messaging_groups WHERE platform_id LIKE '%@lid'`. If empty, re-run `setup/migrate-v2/whatsapp-resolve-lids.ts`.
- Agent inside container fails on Anthropic 401 → OneCLI agents start in `selective` secret mode. `onecli agents set-secret-mode --id <agent-id> --mode all`.
- Channel disconnected silently → restart: `launchctl kickstart -k gui/$(id -u)/com.nanoclaw-v2-<hash>`.
-
-Re-test before continuing.
-
-### Reverting (anytime — not just now)
-
-```bash
-# macOS — back to v1
-launchctl unload ~/Library/LaunchAgents/com.nanoclaw-v2-<hash>.plist
-launchctl load   ~/Library/LaunchAgents/com.nanoclaw.plist
-
-# Linux
-systemctl --user stop nanoclaw-v2-<hash>
-systemctl --user start nanoclaw
-```
-
-v1's process, data, credentials, and groups are untouched the whole time. Reverting is just a service restart.
-
-### Deferred non-blocker failures
-
-If you skipped non-blocker failures in 0a (`1a-env`, `1c-groups`, `1e-tasks`, `2b-channel-auth`, `3c-auth`), they still need fixing — most surface naturally in later phases (`1c-groups` ↔ Phase 2 CLAUDE.local.md cleanup, `1e-tasks` ↔ task verification). Re-run any that don't get covered before declaring the migration done.
+Re-visit anything you skipped in 0a before declaring the migration done. Most surface naturally in later phases (`1c-groups` ↔ Phase 2, `1e-tasks` ↔ task verification).

 ## Phase 1: Owner and access