Single forward-looking reference that replaces the two untracked planning docs (REFACTOR_PLAN.md + REFACTOR_EXECUTION.md) which had become a mix of historical PR timeline and still-relevant decisions. Keeps only what's actionable going forward: - Module tiers, the four registries, and the module distribution model (architecture summary). - Remaining work: Phase 5 (v2 → main) and the modules-branch decision. - Operational patterns worth preserving (standing checks, TDZ rule, branch-sync file-presence diff procedure, prettier drift pattern). - 17 curated open questions across design, distribution, core slotting, and documentation. Canonical references (docs/module-contract.md, docs/architecture.md, etc.) are linked but not duplicated. This doc is transient — retire when the refactor is fully behind us. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
13 KiB
NanoClaw Refactor — Forward-Looking Reference
Consolidates what's still relevant from REFACTOR_PLAN.md and REFACTOR_EXECUTION.md: open decisions, remaining work, operational patterns worth keeping. Historical PR timeline and phase framing have been dropped — the work is in the commit history.
Architecture (still authoritative)
Module tiers
Three categories, distinguished by shipping model and dependency direction:
| Tier | Where it lives | Loaded by default? | Removal cost |
|---|---|---|---|
| Core | src/** (outside src/modules/, src/channels/, src/providers/) |
always | N/A — can't remove |
| Default modules | src/modules/<name>/ on main |
yes — imported by src/modules/index.ts |
edit core imports (intentional friction) |
| Optional modules | src/modules/<name>/ on main (for now — see open q #7) |
yes, via barrel import | delete files + barrel line + revert MODULE-HOOK edits |
| Channel adapters | src/channels/<name>.ts on channels branch |
no — cherry-pick via /add-<name> |
delete files + barrel line |
| Providers | on providers branch |
no — cherry-pick via /add-<provider> |
delete files + barrel line |
Default modules today: typing, mount-security, approvals, cli.
Optional modules: interactive, scheduling, permissions, agent-to-agent, self-mod.
Dependency rule: core ← default modules ← optional modules. Optional modules must not depend on each other. Known transitional violation (flagged): src/db/messaging-groups.ts auto-wires agent_destinations when agent-to-agent is installed.
The four registries
Full contract in docs/module-contract.md. Summary:
- Delivery action handlers —
delivery.ts; modules callregisterDeliveryAction(name, fn). - Router inbound gate —
router.ts; single setter (setSenderResolver+setAccessGate). Default: allow-all. - Response dispatcher —
response-registry.ts; modules callregisterResponseHandler(fn). First to returntrueclaims. - Container MCP tool self-registration —
container/agent-runner/src/mcp-tools/server.ts; modules callregisterTools([...])at import.
Anything else single-consumer uses either a sqlite_master-guarded inline read or a MODULE-HOOK:<name>:start/end skill edit.
Module distribution (pending)
main— core + default modules + default channel (cli). Ships clean.channels— fully loaded runnable branch with all channel adapters; skills cherry-pick from it.providers— same pattern for agent providers (OpenCode).modulesbranch — proposed but NOT created yet. See "Remaining work" below.
Remaining work
Phase 5: merge v2 → main
Cut-over the refactor. Pre-reqs (already met): green build, green tests, green service boot, clean channels / providers syncs.
Open logistics:
- Release versioning: bump to
1.3.0at merge time or cut av2-rctag first for internal testing? Non-blocking — decide at merge. - Coordinate with anyone still running the old
main(v1.2.53) — breaking change for them. - Announce the new layout + the one shell command that changed (
pnpm run chatis new default).
modules branch — create, skip, or defer?
The original plan (PR #10) was to fork a modules branch and populate it with the 5 optional modules, so future /add-<module> skills pull via git show origin/modules:path. Three paths:
- (a) Create it now. Matches the
channels/providerspattern for consistency. Extra surface to maintain: every core change must be merged intomodulesat phase boundaries (same cadence as channels/providers). Pays off if we ever want to make a module truly optional (not shipped on main). - (b) Skip it. Leave all 5 optional modules shipped on main. No
modulesbranch, no install skills, no cherry-picking. Simpler but loses the "opt-in" property for users who want a leaner install. - (c) Defer. Ship main without the modules branch; create it later if someone actually wants to slim their install. No-cost option for now.
Recommendation leans toward (c) — we've already paid the architectural cost (tier boundary, dependency rule, registries) without needing the branch today.
Per-module follow-ups (tracked as open questions below)
Each has a specific landing zone when we get to it:
- #11–13 (admin mechanism, providers registry, container-runner audit) — scope a focused cleanup pass.
- #14 (CLAUDE.md review) — single dedicated PR touching every module.
- #15 (A2A / destinations rethink) — requires design, not just cleanup.
- #17–18 (self-mod rethink, per-group source) — requires design.
- #19 (system vs user CLAUDE.md) — requires install-skill tooling.
Operational patterns (keep using these)
Standing checks for every PR
Non-negotiable; a unit test suite alone doesn't catch circular-import TDZ bugs:
pnpm run buildclean.pnpm test+bun test(incontainer/agent-runner/) all green.- Service actually starts.
gtimeout 5 node dist/index.js(orlaunchctl kickstart) must reachNanoClaw running. Unit tests import individual files; onlymain()exercises the module-init order. - Expected boot log lines present (at least:
Central DB ready,Delivery polls started,Host sweep started,NanoClaw running, plus any module lifecycle line likeOneCLI approval handler startedorCLI channel listening).
Module architecture rule (TDZ bug, PR #3)
Any registry state a module writes to at import time must live in a file with no back-edge to src/index.ts — transitively. src/index.ts imports src/modules/index.js for side effects; if a module calls registerX() at top level and X lives in src/index.ts, the ES module loader hits a TDZ reference on the const declaration. Fix: registry state lives in its own dependency-free file (e.g. src/response-registry.ts). Any new registry follows the same pattern.
Branch sync procedure
After every v2 (or future main) sync into channels / providers / future modules:
-
File-presence diff. Enumerate files that existed pre-sync but are missing post-sync:
git ls-tree -r <pre-sync> | awk '{print $4}' | sort > /tmp/pre.txt git ls-tree -r <post-sync> | awk '{print $4}' | sort > /tmp/post.txt comm -23 /tmp/pre.txt /tmp/post.txtClassify each missing file:
- Intentional (core deleted it) → leave deleted.
- Branch-owned (channels branch still needs it) → restore from pre-sync HEAD.
This has caught real losses on both
channels(17 adapter files plus 3 setup scripts after PR #2's channel move) andproviders(opencode files after PR #2). -
Cross-file consistency. When restoring a file, check whether something else that also changed references it (e.g.
setup/index.ts'sSTEPSmap). -
Run the standing checks against the synced branch (not just v2).
Prettier drift pattern
The format:fix pre-commit hook sometimes reformats peer files after the commit completes, leaving cosmetic-only diffs in the working tree. Discard with git checkout -- <files>. Do not re-commit the drift — it's trivial whitespace and noise.
Open questions (curated)
Design / architecture
-
NANOCLAW_ADMIN_USER_IDSas the admin mechanism. Host queriesuser_rolesat container wake, collapses into env var, container compares sender IDs. Conflates identity-at-send with privilege-at-wake and forces the container to care about namespaced user IDs. Revisit during a container-runner audit. -
Host-side
src/providers/registry. One real consumer (OpenCode). A registry is probably overkill — the install skill could just editcontainer-runner.tsviaMODULE-HOOK. Fold into the container-runner audit. -
Container-runner audit.
src/container-runner.tshas accreted wake/spawn/kill, mount assembly, OneCLI credential application, admin-ID env var, idle timers, image rebuild. Some pieces should pull apart or move into modules. Not blocking. Related to #1 and #2. -
Revisit destinations + A2A capability holistically. The destination projection invariant, dual-purpose routing+ACL table, channel vs agent destination shapes,
createMessagingGroupAgentauto-wire coupling — more machinery than the feature warrants. Phase 3 moved it out of core intact; a redesign is warranted but scoped post-refactor. -
Self-mod approach rethink. Three separate MCP tools + three delivery actions + three approval handlers for what's essentially "mutate container.config.json and rebuild." Also: post-rebuild latency (host sweep waits up to 60s), and agents sometimes send redundant
add_mcp_server+request_rebuildpairs. Consider collapsing into a single "apply this container-config diff" approval primitive. -
Per-agent-group source / per-group base image. Self-mod today layers packages/MCP on a shared base. As groups diverge (different base images, provider configs, runtime toolchains), the shared-base assumption won't scale. Scope post-refactor.
Distribution / operational
-
Providers on a consolidated
modulesbranch? Staying separate for now. Revisit if a second optional provider appears. -
Per-group module enablement. Modules are currently project-wide. If one agent group wants approvals and another doesn't, we'd need per-group feature flags. Flag if asked.
-
Module removal UX. We do not drop tables on uninstall. Is that the right default? (Alternative:
/remove-<module>optionally runs a down migration. YAGNI until requested.) -
Cross-module ordering for the response dispatcher. Registration order determines who claims a given
questionId. IDs are disjoint in practice (q-…vsappr-…), so first-match-wins is safe. If a third response-consuming module arrives, we may need keyed dispatch. -
Versioned module migrations. Reinstalls are idempotent (migrator skips anything already in
schema_version). If a module ships a new migration in a later version, the install skill must append the new file + barrel entry without touching prior ones. Simplest rule: install skills are additive; content changes to an already-applied migration are a hard error. -
Telegram pairing imports from permissions (channels branch).
src/channels/telegram.tsreaches intosrc/modules/permissions/db/*forgrantRole/hasAnyOwner/upsertUserin the pairing-bootstrap branch. Cross-branch tier violation. Fix: extract those writes into a pairing helper (e.g.src/channels/telegram-pairing-accept.tsorsetup/pair-telegram.ts). Non-blocking.
Core slotting (files not explicitly discussed)
-
state-sqlite.ts,webhook-server.ts,timezone.ts. state-sqlite is likely core (host tracker). Webhook-server likely core (channel infra). Timezone likely core utility. Confirm if any of them prove to be module-shaped during future audits. -
Chat SDK bridge location.
src/channels/chat-sdk-bridge.tsis channel infra that bridges adapters on thechannelsbranch. Stays insrc/channels/for now. -
OneCLI credential injection. Lives in
container-runner.ts. Every agent call uses it, no clean optional boundary. Stays core. Related:onecli-approvals.tsis bundled inside theapprovalsdefault module on the assumption OneCLI stays in core. If OneCLI later moves to its own module,onecli-approvalsfollows.
Documentation
-
CLAUDE.md content per module. Every module ships with project.md + agent.md. Need a dedicated review pass: (a) write the missing agent-to-agent snippets, (b) audit other modules for accuracy/tone, (c) confirm
agent.mdfiles are actually tailored for the agent vs. copy-pastes ofproject.md. -
Split system CLAUDE.md from user CLAUDE.md. Project
CLAUDE.mdandgroups/global/CLAUDE.mdmix system-authored content (module contracts, install-skill appends) with user customizations. Updates currently risk clobbering user intent. Look at a system-owned region (or separate file) that skills rewrite freely plus a user-owned one that's never touched. Related to #16.
Where the canonical references live
- Module contract —
docs/module-contract.md - Architecture overview —
docs/architecture.md - DB layout —
docs/db.md,docs/db-central.md,docs/db-session.md - Agent-runner internals —
docs/agent-runner-details.md - Channel isolation model —
docs/isolation-model.md - Build + runtime split —
docs/build-and-runtime.md - Top-level —
CLAUDE.md
This doc (REFACTOR.md) is transient — prune when open questions close; retire entirely once the refactor is fully behind us and the operational patterns have been absorbed into CLAUDE.md or docs/.