Documents the fix from #2510 (closes#2465) in user-facing prose
following the RELEASING.md style guide. Single-bullet release —
no rollup opener since this is a clean one-bump cycle.
The `destinations add` and `destinations remove` custom ops in the admin
CLI INSERT/DELETE rows in the central `agent_destinations` table, but
did not project the change into running sessions' `inbound.db`. The
agent-runner container reads its destination map from the per-session
projection, so until the next container spawn (`container-runner.ts`
hydrates on every wake), the running agent saw a stale map — explaining
the "dropped: unknown destination" symptom after a fresh `ncl
destinations add` even though the central row was clearly committed.
Same handler runs for both the direct-host path and the approval-execution
path because the `cli_command` approval handler in `dispatch.ts` re-enters
`dispatch()` as `caller: 'host'`, so the fix at the handler level covers
both surfaces.
Helper iterates over `getSessionsByAgentGroup(agentGroupId)` (every
active session for the affected agent), guarded by `hasTable('agent_destinations')`
and a lazy dynamic import of `writeDestinations` to keep the agent-to-agent
module optional. Per-session try/catch keeps one bad session from killing
the whole projection; failures are logged at WARN with session id + error.
Regression test invokes the dispatcher with `caller: 'host'` (the same
re-entry the approval handler uses after admin approves), with two active
sessions on the source agent group, and asserts the `destinations` row
lands in every session's inbound.db after `add` and is cleared after `remove`.
Fixes#2465
RELEASING.md frames the per-bump release policy as a goal that is cut
manually, not as automation. The v2.0.63 CHANGELOG rollup line still
asserted the stronger claim ("NanoClaw publishes a GitHub Release on
every package.json version bump"), which contradicts the policy doc.
Soften to match RELEASING.md so the two land consistently on main.
The "For detailed release notes, see the full changelog on the
documentation site" line pointed at a docs portal that does not exist.
CHANGELOG.md is the canonical record, so the header now says only what
is true: all notable changes are documented in this file.
Two revisions in RELEASING.md based on review feedback:
1. Soften the "release per bump" claim. The policy is aspirational and
release publication is manual, so the opening now states the goal
("publish a GitHub Release for every package.json version bump that
lands on main") and acknowledges that there can be lag between a bump
merging and the release being cut. Intent: timeliness, not strict 1:1.
2. Add a "Channels and stability" section that explicitly states NanoClaw
ships a single channel today, distinguishes latest/stable/pinned for
consumers, and reserves space for a future pre-release channel without
inventing structure that does not yet exist. Folds the previous Pinning
section into the new structure as the Pinned bullet.
CHANGELOG.md gets a rollup entry covering v2.0.55..v2.0.63 in the
project voice (bold lead-ins, [BREAKING] prefix with inline workaround,
doc links to setup/lib/install-slug.sh, no PR numbers).
RELEASING.md is new and documents the per-bump release policy starting
with v2.0.63: tag every package.json bump, mirror the CHANGELOG entry
into the GitHub Release body, append Contributors and (when relevant)
New Contributors sections, and use rollup framing when multiple bumps
collapsed into one release.
The gmail/gcal Phase 4 restart blocks and uninstall one-liners
still hardcoded `com.nanoclaw` / `restart nanoclaw`, so on a v2
install they would fail with "no such service" or kick the
wrong unit.
Phase 4 restart now uses the canonical
`source setup/lib/install-slug.sh` + `$(launchd_label)` /
`$(systemd_unit)` pattern with the standalone `Run from your
NanoClaw project root:` lead-in. Uninstall one-liners switch
to the inline-subshell form
`"$(. setup/lib/install-slug.sh && systemd_unit)"`.
(Folds in #2489's v2-alignment changes to the same two files;
the deferral noted in the original PR body is no longer needed
now that #2489 has merged.)
Split the embedded forms ("... — run from your NanoClaw project root:")
into a separate `Run from your NanoClaw project root:` line directly
above the code block, so the lead-in pattern is uniform across all
restart blocks.
Replace inline `# run from your NanoClaw project root` annotations on
`source setup/lib/install-slug.sh` lines with one standalone prose
lead-in per code block. Also drop parenthetical "(run from the project
root...)" mentions where the same convention is already obvious.
- swap remaining inline subshells from `; helper` to `&& helper` so source
failures surface as the source error instead of a downstream 'command not
found' on the helper call
- fix two service-status checks that still grepped for the bare v1 name
(init-first-agent, add-emacs) — same canonical inline form as the rest of
the sweep, scoped to the per-install slug
- collapse add-parallel's verify block to the inline form so it stops
shadowing the canonical pattern
- note 'run from your NanoClaw project root' beside every restart snippet
that sources `setup/lib/install-slug.sh` (inline as a bash comment on
the source line, plus parenthetical lead-ins where the snippet is
prose-form) so the relative-path dependency is loud at the spot it
matters
The `ncl` transport-error message and ~20 skill docs hardcoded v1's
`com.nanoclaw` / `nanoclaw` for launchd labels and systemd units. Under
v2 the names are slug-suffixed per checkout (`com.nanoclaw.<slug>`,
`nanoclaw-<slug>.service`), so those commands no longer match a real
service on the host.
- `src/cli/client.ts` — extract `formatTransportError` into
`src/cli/transport-errors.ts` so it can read `install-slug` and call
`getLaunchdLabel()` / `getSystemdUnit()`.
- `src/cli/transport-errors.test.ts` — regression test for #2484: the
error string must not contain the bare v1 names.
- `.claude/skills/**/*.md` — replace hardcoded restart snippets with
the canonical `source setup/lib/install-slug.sh` + `$(systemd_unit)` /
`$(launchd_label)` pattern (or the inline subshell form where the
snippet is a one-liner).
Closes#2484Closes#2485
Three issues with the DB-edit steps that ship in #2489:
- `'$[#]'` was double-quoted in the surrounding bash string, so bash
arith-expanded `$#` (positional-arg count, 0 in interactive shell)
before sqlite ever saw it — silently overwrote index 0 instead of
appending. Now escaped as `'\$[#]'`.
- `sqlite3` CLI replaced with `pnpm exec tsx scripts/q.ts` — clean
installs have no sqlite3 binary; setup/verify.ts:5 codifies that
NanoClaw avoids depending on it.
- `strftime('%s','now')` replaced with `datetime('now')` — the column
stores ISO strings everywhere else; mixing epoch ints made any
consumer doing `datetime(updated_at)` parse those rows as 1970.
Also: reworded the "approval-gated" wording to distinguish container
vs host-operator-shell invocation, and added the "Why this can't be
container.json" note to add-gcal-tool (gmail had it, gcal didn't).
Two pieces of post-v1 drift in the gmail/gcal skills made the instructions
either dead-code edits or silently broken installs:
1. The TOOL_ALLOWLIST edit step is redundant. claude.ts derives
mcp__<name>__* allow-patterns dynamically from each group's
mcpServers map (claude.ts:294-297), so registering the MCP server in
Phase 3 already authorizes the tools. Removed the edit step, its
pre-check, its troubleshooting attribution, and its uninstall mirror;
replaced with an explanatory note pointing at the dynamic derivation.
2. The "edit groups/<folder>/container.json" step doesn't stick.
materializeContainerJson rewrites that file from the central DB on
every spawn (post-migration 014-container-configs), so hand edits are
silently overwritten on next restart. Rewrote Phase 3 to use
`ncl groups config add-mcp-server` (which persists to DB) for the
MCP-server entry, and a sqlite3 json_insert workaround for the mount
entry — with a note to switch to `ncl groups config add-mount` once
#2395 lands. Removal step rewritten the same way using
`remove-mcp-server` and a sqlite3 json_group_array filter.
Fixes#2488
Follow-up to #2467. The trailing "anything outside these tags is also
treated as scratchpad" clause contradicted the rest of the system prompt,
which requires bare text to be wrapped in `<message>` blocks. Removing it
keeps the description focused on what `<internal>` actually does.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The welcome skill told the agent to send the greeting via `send_message`,
but the destinations system prompt also requires the final response to
be wrapped in `<message to="…">` blocks (since 1d4d920). The agent
followed both, sending the greeting once via the MCP tool and once via
the wrapped final output.
- welcome/SKILL.md: drop the mechanism — "send a short, warm greeting"
lets the system prompt steer how it's delivered.
- destinations.ts: reframe `<message>` blocks and `send_message` as the
same delivery surface, with the explicit note that each call/block
lands as its own message — so they compose into a sequence rather than
reading as additive duplicates of the same content.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Without files:read, @chat-adapter/slack cannot download attachments —
Slack returns an HTML login page in place of file bytes and the adapter
throws a NetworkError. Bundles files:write for symmetric outbound
(files.uploadV2).
Closes#2457
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The parenthetical "(single-destination: just write)" was stale after
9db39b2 removed the bare-text routing fallback. Agents following this
hint had their responses silently dropped to scratchpad.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The upstream install script supports ONECLI_VERSION; use it to avoid
pulling an untested gateway release during setup.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Platforms like Teams send userIds in "29:xxx" format which already
include a colon. Blindly prefixing with channelType produced double-
namespaced ids (e.g. "teams:29:xxx") that never matched the users
table, causing all approval clicks to be rejected. Mirror the
resolveOrCreateUser logic: only prefix when the raw id has no colon.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
When the agent outputs bare text without <message to="..."> blocks,
nothing gets delivered — silent failure. Now the poll-loop pushes a
one-shot correction back into the active query telling the agent to
re-send with proper wrapping. Capped at once per user turn to avoid
loops; resets when a new follow-up message arrives.
Also updates destination instructions to require explicit <internal>
wrapping for scratchpad instead of treating bare text as implicit
scratchpad.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Tell the compactor to include the <message to="name"> wrapping reminder
verbatim at the END of the summary so it's the last thing the agent sees
after compaction. Previously the instruction just asked to "preserve"
routing info, which the compactor could place anywhere or summarize away.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The compacted event handler injected a system-tagged reminder into the
live query after SDK auto-compaction, which caused the agent to send
an unintended message. Reverts the four changes from #2327:
- Remove `compacted` variant from ProviderEvent union
- Restore `result` yield for compact_boundary in ClaudeProvider
- Remove compacted event handler and getAllDestinations import in poll-loop
- Remove compaction integration tests and CompactingProvider helper
Closes#2325 differently — the reminder approach is not viable.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The container opens inbound.db read-only, so it can't ALTER TABLE.
If the host hasn't run migrateMessagesInTable yet (e.g., container
rebuilt before host restart), the on_wake column won't exist and
the query crashes, causing a restart loop.
Detect the column via PRAGMA table_info and conditionally include
the filter clause.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sweep of outbound strings, doc URLs, comments, and clone instructions
that were missed in the original org rename. One both-match case in
setup/lib/channels-remote.sh (URL detection) accepts either name so
existing forks with a `qwibitai` remote continue to resolve cleanly;
everywhere else is a straight rename.
Historical mentions left intact:
- CHANGELOG.md (v2.0.0 entry, frozen history)
- .claude/skills/add-gmail-tool/SKILL.md (pre-v2 qwibitai skill — historical)
- repo-tokens/badge.svg (auto-regenerated by update-tokens.yml)
Addresses review feedback on this branch:
- Fix TS2352 build error in dispatch.ts: `getSession()` returns `Session`,
which has no index signature, so `(s as Record<string, unknown>)` is rejected
by tsc. `Session.agent_group_id` exists — read it directly.
- Fix a regression introduced by dropping the `groupField in data` guard:
the post-handler scope check now runs for *every* command under a whitelisted
resource, including custom ops, which return ad-hoc shapes. `ncl groups config
get` (access:open, reachable by a group-scoped agent) returns a config object
with no `id` field → `data['id'] !== ctx.agentGroupId` → `forbidden`, even on
the agent's own config. Fix: tag the auto-generated list/get handlers with
`generic: 'list' | 'get'` on `CommandDef` (set in `registerResource`) and run
the post-handler check only when `cmd.generic` is set. Generic handlers return
raw DB rows that carry `scopeField`; custom ops are already pinned to the
caller's group by the pre-handler `--id` auto-fill or the approval gate.
Fail-closed-when-`scopeField`-missing is preserved (now scoped to generic
list/get).
- Tests: `dispatch.test.ts` mocks `getResource` (the real resources aren't
registered in this unit), tags the two post-handler test commands as `generic`,
and adds coverage for: custom op returning a non-row object not being rejected;
`sessions-get` pre-handler returning "session not found" for foreign and
non-existent UUIDs (no existence oracle) and allowing the caller's own session;
generic list/get failing closed when a resource declares no `scopeField`.
Full suite: 323 passing.
- Remove FORK.md from the PR diff — it's the fork's personal README, carried in
because the branch was cut from the fork's `main` rather than upstream.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
12 patch versions ahead. The 2.1.120 binary baseline introduced a
number of plugin and skill behaviors that have since landed in the
public Claude Code docs: ${CLAUDE_EFFORT} substitution, settled
`arguments` field in skill frontmatter, plugin `channels` field.
No breaking changes for nanoclaw's runtime contract. Verified by
running container/skills/{agent-browser,vercel-cli,slack-formatting}
under the bumped image; all three load and execute as expected.
SDK at ^0.2.116 (caret) remains compatible with claude-code 2.1.128.
Bumping CLAUDE_CODE_VERSION invalidates the pnpm install layer in
container/Dockerfile and triggers a full rebuild of the agent image.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Allow individual agent groups to opt into different models or effort levels
without changing host-wide defaults. Useful when one group is high-stakes
(opus, high effort) but most are routine (sonnet/haiku, low effort).
container.json gains two optional fields:
- model: alias ("sonnet" | "opus" | "haiku") or full model ID
- effort: "low" | "medium" | "high" | "xhigh" | "max"
Both omitted = SDK default (current behavior). The host plumbs them as
NANOCLAW_MODEL / NANOCLAW_EFFORT env vars at container spawn time; the
agent-runner reads them in providers/index.ts and threads through to the
provider via ProviderOptions. The Claude provider passes them straight to
sdkQuery options.
`effort` is currently typed as `any` because the @anthropic-ai/claude-
agent-sdk type doesn't surface it yet — passing it through still works at
runtime via the SDK's loose option handling. Drop the cast once the SDK
adds an `effort` field to its options type.
Explain that --message sets an on-wake instruction so the fresh
container can continue after restart (verify tools, notify user).
Without it, the container only comes back on the next user message.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
One-liner in cli.instructions.md pointing to `ncl groups config help`.
Each config operation's description now says whether restart or rebuild
is needed — agent discovers it via progressive disclosure.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Capture staged file list before prettier runs, then re-add only
those files. Prevents pulling in unrelated unstaged changes.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The hook ran format:fix but didn't re-stage the modified files, so
commits went through with unformatted code and CI caught the diff.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Group-scoped agents could previously:
- See all agent groups via `groups list` (generic list skips --id filter)
- Look up any session by UUID via `sessions get`
- Request cli_scope change to global via config update approval
Fixed by:
- Post-handler filtering: list results filtered, get results verified
against caller's agent_group_id
- Pre-handler --id check scoped to resources where id IS the group ID
(groups, destinations) so session UUIDs aren't falsely rejected
- cli_scope/cli-scope args blocked outright for group-scoped agents,
before the approval gate
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
When init-first-agent creates an agent group for an owner, set
cli_scope to 'global' so the owner's personal agent has full ncl
access. All other agent groups remain 'group'-scoped by default.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add cli_scope column to container_configs with three levels:
- disabled: agent never learns about ncl (instructions excluded from
CLAUDE.md) and host dispatch rejects any cli_request
- group (default): agent can only access groups, sessions, destinations,
and members resources, scoped to its own agent group with auto-filled
--id/--agent_group_id/--group args. Help output reflects the scope.
- global: unrestricted access (current behavior)
Enforcement is host-side only — no image rebuild or env var needed.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Decouple container restart from config updates — config CLI ops now only
write to the DB; restart is a separate `ncl groups restart` command with
--rebuild and --message flags. Add on_wake column to messages_in so wake
messages are only picked up by a fresh container's first poll, preventing
dying containers from stealing them during the SIGTERM grace window.
killContainer accepts an onExit callback for race-free respawn. Agent-
called restart auto-scopes to the calling session.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
config add/remove-package should only update the DB and restart.
Image rebuild is handled by the self-mod approval flow or manually.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Operation keys like 'config get' read naturally and crud.ts normalizes
spaces to dashes for the registry name.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
`ncl groups config get` now works alongside `ncl groups config-get`.
Parser joins all positionals with dashes; dispatcher falls back by
trimming the last segment as a target ID (`ncl groups get abc123`).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The 5-minute timeout in buildAgentGroupImage was tight for first-time
apt + pnpm global installs on slow networks (the exact scenario
install_packages triggers, since the image hasn't pre-installed the
requested packages). Hit ETIMEDOUT on a real install with apt + npm
packages.
900_000ms gives realistic headroom without masking genuinely hung builds.
Source of truth for container runtime config moves from
groups/<folder>/container.json to a new container_configs table.
The file becomes a materialized view written at spawn time.
- New container_configs table with scalar columns (provider, model,
effort, image_tag, assistant_name, max_messages_per_prompt) and
JSON columns (mcp_servers, packages_apt, packages_npm, skills,
additional_mounts)
- Startup backfill seeds DB from existing container.json files
- materializeContainerJson() replaces readContainerConfig + ensureRuntimeFields
- Self-mod handlers (install_packages, add_mcp_server) write to DB
- Provider cascade simplified: session -> container_configs -> 'claude'
- ncl groups config-{get,update,add-mcp-server,remove-mcp-server,
add-package,remove-package} custom operations
- restartAgentGroupContainers() helper for config change propagation
- Container side unchanged (still reads /workspace/agent/container.json)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- genericList now accepts column filters (--flag value) and LIMIT (default 200)
- Remove early inDb.close() in container pollResponse to avoid double-close
- Document filtering and --limit in cli.instructions.md
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
When a container agent calls an approval-gated ncl command, dispatch
now sends an approval card to an admin instead of returning a stub
error. On approve, the handler re-dispatches the original command
and notifies the agent with the result.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Rename the CLI binary, socket path, container wrapper, error prefixes,
and all references from `nc` to `ncl`. Add ~/.local/bin symlink during
setup and pnpm script alias.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Failures now launch an interactive Claude session instead of the
non-interactive assist (REASON/COMMAND parser). The user debugs
with full terminal access and types /exit to return to setup.
The original assist mode is available via --assist-mode flag or
NANOCLAW_SETUP_ASSIST_MODE=1 env var.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- host-core.test.ts: add in_reply_to: null to routeAgentMessage calls
(required after #2267 added the field to RoutableAgentMessage)
- agent-route.test.ts: use 'closed' instead of 'archived' (not a valid
Session status)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Remove the dynamic `onecli.getGatewaySkill()` fetch from `buildMounts` —
the skill content ships as a static SKILL.md. This avoids adding latency
to every container spawn and dirtying the source tree at runtime.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The skipped coexistence test and the findSessionByAgentGroup
bug-documenting test were written before the A2A return-path fix
(#2267). That fix sidesteps findSessionByAgentGroup entirely —
A2A replies now use source_session_id for routing, so the
"newest session wins" behavior is only a fallback for unsolicited
first-contact A2A where any session will do.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Auto-discovered by composeGroupClaudeMd() as module-cli.md fragment,
included in every agent group's composed CLAUDE.md.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Squash merge of PR #2267 by ddaniels.
When an agent group has more than one active session, A2A replies landed
in the newest session via findSessionByAgentGroup's ORDER BY created_at
DESC. The session that asked the question never saw the answer.
Adds origin-aware return-path routing with three layers:
1. Direct return-path: if the reply has in_reply_to, look up the
triggering inbound row's source_session_id and route there.
2. Peer-affinity fallback: find the most recent A2A inbound from this
peer and use its source_session_id.
3. Legacy fallback: newest active session (pre-migration compat).
Container-side: MCP send_message/send_file now thread the current
batch's in_reply_to through to outbound rows via current-batch.ts.
Also flips our A2A bug-documenting test (#2332) from asserting the
broken behavior to asserting the fixed behavior.
Co-Authored-By: Doug Daniels <ddaniels888@gmail.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Three tests that exercise agent-to-agent routing and document the broken
behavior that #2332 describes:
1. A2A outbound lands in target session — basic happy path, passes.
2. A2A return path resolves to wrong session when source agent has
multiple channel sessions. Researcher responds to PA, but
findSessionByAgentGroup picks PA's newest session (Discord) instead
of the Slack session that originated the A2A call. Test asserts the
buggy behavior (response in Discord, nothing in Slack).
3. A2A-only session gets null session_routing. writeSessionRouting on a
session with messaging_group_id=NULL writes all nulls — the target
agent has no default routing for replies. Test asserts the nulls.
These tests pass today by asserting the broken state. When #2332 is
fixed (origin-aware return routing), these assertions should flip to
the correct behavior.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Host-side (vitest):
- Routed message preserves platformId/channelType/threadId on messages_in
- Fan-out gives each agent correct per-agent routing
- writeSessionRouting populates session_routing from messaging group
- writeSessionRouting writes null routing for agent-shared sessions
- Per-thread session includes thread_id in session_routing
- Agent-shared resolves to same session on repeated calls
- Agent-shared session has null messaging_group_id
- findSessionByAgentGroup returns channel-bound session (documents #2332)
- Skip: agent-shared/channel-bound coexistence (blocked on #2332 fix)
Container-side (bun:test):
- Internal tags stripped between message blocks
- Mixed task + chat batch with correct routing
The agent-shared tests uncovered the exact bug from #2332:
findSessionByAgentGroup doesn't distinguish agent-shared from
channel-bound sessions, so A2A resolution reuses a channel session
when one exists.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add 14 tests covering key routing and dispatch flows that previously had
zero direct coverage:
dispatchResultText:
- bare text produces no outbound (scratchpad only)
- unknown destination dropped, valid destination sent
- multiple <message> blocks each produce correct outbound
- internal tags stripped from scratchpad
originAttr / from= metadata:
- chat/task/webhook/system messages include from= when destination matches
- fallback to raw unknown:channel:platform when no match
- from= omitted when routing is null
resolveDestinationThread:
- null thread_id when no prior inbound from destination
- most recent thread_id wins with multiple inbound messages
Also fix merge issue: restore getAllDestinations import removed by our PR
but still needed by #2327's compaction reminder. Fix stale destinations
test assertion from #2328 ("no special wrapping needed" → "Every response
must be wrapped").
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Corepack with no version pin pulls latest pnpm (currently 11.0.8), which
silently stops honoring `only-built-dependencies[]=` in `.npmrc` for
global installs. The allowlist file ends up correctly written but
ignored, so:
- `@anthropic-ai/claude-code`'s postinstall — which downloads the
platform-native Claude binary — never runs. Agents then crash at
runtime with "claude native binary not installed... postinstall did
not run."
- `agent-browser`'s postinstall, which chmods the linux-arm64 binary,
is also skipped, so the binary fails with EPERM the first time it's
invoked.
Pin the container's pnpm to 10.33.0 (the same version host's
package.json already pins via `packageManager`). Keep the two in
lockstep so a host bump triggers a deliberate container bump.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Without this, an unrecoverable failure such as TokenInvalid causes the
gateway listener to restart ~10x/sec, which Discord's Cloudflare layer
treats as abuse and answers with a multi-hour IP block. Both the clean-
expiry path and the error path now share a backoff that doubles up to
1h, with a >5min healthy run resetting the counter.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Add integration test for per-destination thread_id resolution: seeds two
destinations with different thread IDs, verifies each outbound message
carries the correct thread_id (not a global one from the batch routing).
- Add log line in resolveDestinationThread catch block for debuggability.
- Remove stray "(ensurePreCompactHook is defined after the main function.)"
comment from group-init.ts.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The poll loop had a bare-text routing fallback in dispatchResultText: when
the agent produced text without <message to="..."> wrapping, it would auto-
route to the session's originating channel (via a frozen RoutingContext) or
to the single configured destination. This caused three problems:
1. Routing drift: RoutingContext was extracted once from the initial batch
and never refreshed. When the initial batch was a null-routed cron task
and a real chat arrived mid-query, replies were silently dropped to
scratchpad because the frozen routing had all-null fields.
2. Cross-channel thread bleed: sendToDestination applied a single
routing.threadId to every outbound message regardless of destination.
In agent-shared sessions (multiple channels sharing one session), one
channel's thread ID was stamped onto messages to a different channel.
3. Inconsistent formatting: task, webhook, and system messages had no
origin metadata in their formatted output, so the agent couldn't tell
which destination they came from — even when the underlying messages_in
rows carried routing fields.
Changes:
- Remove the bare-text routing fallbacks in dispatchResultText (both the
routing-based and single-destination shortcuts). All agent output must
be wrapped in <message to="name">...</message>. Bare text is scratchpad.
- Update buildDestinationsSection() to require explicit wrapping for all
groups, including single-destination. No more "no special wrapping
needed" shortcut.
- Resolve thread_id per-destination via resolveDestinationThread(), which
queries messages_in for the most recent message matching the target
channel+platform. Falls back to null (top-level channel message) when
no prior inbound exists for that destination.
- Extract originAttr() helper in formatter.ts and apply it to all message
types. Tasks now render as <task from="dest" time="...">, webhooks as
<webhook from="dest" source="..." event="...">, system responses as
<system_response from="dest" ...>. The agent always sees where a
message originated.
- Add a PreCompact shell hook (compact-instructions.ts) that outputs
custom compaction instructions, telling the compactor to preserve
recent message XML structure and routing metadata in the summary.
Wired via settings.json in the .claude-shared scaffold, with a
migration path (ensurePreCompactHook) for existing groups.
Relation to open PRs:
- #2277 (mergeRouting) becomes unnecessary — the routing fallback it
patches no longer exists. Can be closed.
- #2327 (post-compaction destination reminder) is complementary — it
handles the post-compaction push, this handles pre-compaction
instructions. Both can merge independently.
- #2328 (default routing instruction) is complementary — it adds "reply
to the from= destination" guidance to the multi-destination section.
Compatible with the unified instruction format here.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Closesqwibitai/nanoclaw#2325.
When the Claude Code SDK auto-compacts the conversation context, the
compaction summary tends to drop the agent's learned <message to="…">
wrapping discipline. The destinations table is still populated and the
system prompt still lists them, but the behavioral pattern degrades —
A2A sends and multi-channel routing silently revert to bare-text or
single-channel delivery for the rest of the session, until the next
/clear.
Three small changes wire a reminder back into the live query when this
fires:
- New `compacted` event on ProviderEvent. Distinct from `result` so it
doesn't mark the turn completed or get dispatched as a chat message
(which is also why "Context compacted (N tokens compacted)." stops
appearing as noise in user-facing chats — it was a side-effect of
reusing the result event path).
- ClaudeProvider yields `compacted` instead of `result` for the SDK's
compact_boundary system event.
- Poll-loop's event handler reacts by pushing a system-tagged reminder
back into the active query when there are >1 destinations. Single-
destination groups skip the push since they have a fallback that
works without wrapping.
Tests cover both branches (multi-destination → reminder fires;
single-destination → no reminder) using a CompactingProvider that
emits the new event mid-stream.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
When a multi-destination agent receives an inbound message, the model
had no explicit guidance about which destination to address by default
and would sometimes pick the wrong one — e.g. Casa replying to the
admin's group questions in Laura's DM instead of in the group itself.
The formatter already injects `from="<destname>"` on every inbound
<message> tag (formatter.ts:184), so the origin is right there in the
prompt — the system prompt just never told the agent to use it.
Added one line to buildDestinationsSection() that nudges the agent
toward replying via the same destination the message came from, with
an out for explicit cross-destination requests ("tell Laura that…").
Single-destination groups are unaffected (they take a separate
short-circuit path with a fallback that auto-replies to the origin).
Tests cover the multi-destination, single-destination, and
no-destination cases.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Today the Claude auth picker has only three real-auth options. A user
without a Pro/Max subscription, an OAuth token, or an API key has no
graceful escape — Ctrl-C kills setup entirely.
Add a fourth option that confirms the trade-off (no agent runtime + no
Claude debug help during setup) and, on Yes, marks auth skipped and
lets setup continue. On No, loop back to the picker. Existing
NANOCLAW_SKIP=auth env hatch is unchanged.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds a skill that installs the mnemon CLI into agent containers, giving each
agent group a persistent, queryable knowledge graph across sessions.
Mnemon stores facts (insights) with categories, importance scores, and entity
tags, and connects them with typed edges (causal, semantic, temporal, entity).
The agent can remember, recall, search, link, and forget facts — surviving
container restarts and context compaction.
Installation: drops the mnemon binary from the channels branch, creates the
per-agent-group data directory, and configures the agent's CLAUDE.md to load
the skill on every spawn.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Replace "full E.164, e.g. +15551234567" with plain-language guidance
mirroring the WhatsApp setup card: "start with + and your country code,
no spaces or dashes" plus a worked example. "E.164" is the technical
name for the format and means nothing to non-telecom users; the
explanation it stands in for is one sentence.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
After picking "Other…" from the channel picker, today's flow drops the
user straight into a free-text prompt with no way back. Replace it with
a brightSelect that offers either "Type the channel name" (existing
behavior) or "← Back to channel selection" — same back-affording pattern
the channel sub-flows already use.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Teams setup is 6+ Azure steps over 30+ minutes. Today, every
"Done / Stuck / Show again" gate forces continuation; the only escape
is Ctrl-C, which kills setup entirely. Add a fourth option at each gate
that returns to the channel picker so a stuck operator can pick a
different channel without losing the rest of setup.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The first-keyword check (`WITH` → SELECT path) was wrong for CTEs that
precede mutations (e.g. `WITH stale AS (...) DELETE FROM t WHERE ...`).
These would be routed through `db.prepare().all()` instead of executing
the mutation.
Use better-sqlite3's `stmt.reader` property, which asks SQLite's own
parser whether the statement returns data. Single mutations go through
`stmt.run()`; compound statements (which `prepare()` rejects) fall back
to `db.exec()`.
Add a regression test for WITH...DELETE.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Setup deliberately avoids the sqlite3 CLI (`setup/verify.ts:5` calls
this out: "Uses better-sqlite3 directly (no sqlite3 CLI)") and never
installs or probes for the binary. Despite that, 13 skills shelled out
to `sqlite3 ...` directly, breaking on hosts where the CLI isn't
preinstalled — the same root cause as #2191 but spread across the
skill surface.
Add `scripts/q.ts`, a ~30-LOC wrapper over the `better-sqlite3` dep
that setup already installs and verifies. Default output matches
`sqlite3 -list` (pipe-separated, no header) so existing skill text
reads identically — only the binary changes. SELECT/WITH queries go
through `db.prepare().all()`; everything else (INSERT/UPDATE/DELETE,
including compound statements) goes through `db.exec()`.
Migrate every in-tree caller:
- 17 hardcoded invocations across 8 SKILL.md files (init-first-agent,
add-deltachat, add-signal, add-emacs, add-whatsapp, add-ollama-provider,
debug, add-parallel) plus add-deltachat/VERIFY.md.
- `manage-channels/SKILL.md` shows canonical SQL but never prescribed
a tool, so the assistant defaulted to `sqlite3` and silently failed.
Add a one-line wrapper hint above the SQL block.
- `migrate-v2.sh` schema/count probes (was the original #2191 case).
Replace `.tables` with `SELECT name FROM sqlite_master`.
- Document the wrapper convention in root `CLAUDE.md` under "Central DB".
Add `scripts/q.test.ts` with 6 vitest cases covering both modes,
NULL rendering, empty-result, compound mutations, and arg validation.
Wire `scripts/**/*.test.ts` into `vitest.config.ts`.
Out of scope (flagged for follow-up):
- `debug` and `add-parallel` still reference the v1-only path
`store/messages.db`. Routing through the wrapper now produces a
cleaner "no such file" error, but the surrounding sections are
v1-era throughout — a v1-content cleanup is its own PR.
- `cleanup-sessions.sh` is being addressed in #1889 (different style,
hard-fail rather than wrap); left untouched here to avoid stepping
on that author's work.
Closes#2191.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Slack's profile button is in the bottom-left of the desktop sidebar (not
the top-right), and the "More" overflow icon next to "Copy member ID" is
the vertical kebab `⋮`, not the horizontal `⋯`. Match what users actually
see in Slack.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Move the "Get started: …" URL above the numbered instructions and
render it in bright white so it pops against the brand-cyan body.
(Headless-only — interactive runs still auto-open the URL in a
browser, no card line.)
- Group the OAuth scope list vertically by family (im, channels,
groups, chat, users, reactions) instead of one comma-run wall.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The skill's "Assess Current State" step said only "query agent_groups,
messaging_groups, ..." without specifying columns. The `register` CLI
takes `--assistant-name "<name>"` (mentioned three times in the same
SKILL.md), but the schema column is `name`, not `assistant_name` — and
the SKILL.md never linked the two.
When the agent had to compose a SELECT against `agent_groups` from the
SKILL.md vocabulary alone, it extrapolated `--assistant-name` into a
column name and produced:
SELECT id, folder, assistant_name FROM agent_groups;
-> Error: in prepare, no such column: assistant_name
Replace the prose pointer with canonical SQL queries that match the
real schema. The `name AS assistant_name` alias preserves the familiar
term in the agent's output.
Verified locally as a drop-in: `/manage-channels` runs clean from end
to end with this version, no further inference needed.
Closes#2289
SQLite TIMESTAMP columns store UTC without a zone marker. `Date.parse`
treats timezoneless ISO strings as local time, so on any non-UTC host
every claim and processAfter looks (TZ offset) hours stale. That makes
fresh claims trip the kill-claim path on the first sweep tick — every
container gets killed within seconds of spawn.
Two affected sites in host-sweep.ts:
- decideStuckAction reads claim.status_changed and computes claimAge.
On a TZ=Europe/Madrid host (UTC+2), a claim made 5s ago looks
7205s old and exceeds CLAIM_STUCK_MS (60s).
- The orphan retry loop reads msg.processAfter and skips messages
rescheduled into the future. On the same host, future timestamps
look (TZ offset) hours in the past, so the skip is missed and
tries gets bumped on every tick.
Fix: introduce parseSqliteUtc(s) which appends "Z" only when no zone
marker is present, then call it from both sites. Behavior under
TZ=UTC is unchanged.
Verified on a production v2 install on TZ=Europe/Madrid: with the
patch applied, an idle container survived 30+ minutes without being
killed (previously: killed within 60s of spawn).
Tests: 5 new cases covering the bare/Z/+offset/invalid input matrix
and a TZ-independence check. All 19 host-sweep tests pass and tsc
clears against main.
migrate-v2.sh probes ${ONECLI_URL_CHECK}/health (with ONECLI_URL_CHECK
defaulting to http://127.0.0.1:10254, the OneCLI web port). That path
returns 404, so the detection branch never matches an already-running
OneCLI instance and the script falls through to the install path.
The web app's health endpoint is /api/health
(apps/web/src/app/api/health/route.ts) and has been since the OneCLI
repo was made public. /health was never exposed by the web on :10254
nor by the gateway on :10255 (the gateway uses /healthz).
Verified against a running OneCLI v1.21.0:
GET :10254/api/health -> 200 {"status":"ok","version":"1.21.0",...}
GET :10254/health -> 404 (Next.js fallback HTML)
GET :10255/healthz -> 200
GET :10255/health -> 400 (gateway parses non-/healthz as CONNECT)
Closes#2285
Resource-first CLI: `nc groups list`, `nc wirings get <id>`, etc.
Seven resources defined (groups, messaging-groups, wirings, users,
roles, members, sessions) with full column documentation that serves
as the single source of truth for help output and arg validation.
- CRUD helper auto-registers list/get/create/update/delete from
declarative resource definitions with generic SQL
- Custom operations for composite-PK resources (roles grant/revoke,
members add/remove)
- Access model: open (reads) / approval (writes) / hidden
- `nc help` lists resources; `nc <resource> help` shows fields
- Positional target IDs: `nc groups get <id>`
- Removed unused priority column from wirings
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Drop the nc MCP tool in favor of a standalone Bun CLI script at
container/agent-runner/src/cli/nc.ts. Same interface as host-side
bin/nc — all three callers (operator, Claude on host, agent in
container) now use the same nc CLI.
Container transport: writes cli_request to outbound.db (BEGIN
IMMEDIATE for seq safety), polls inbound.db for response, acks via
processing_ack. Dockerfile adds a /usr/local/bin/nc wrapper that
execs the mounted source.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add delivery action handler (cli_request) so the host dispatches CLI
commands arriving from container agents via outbound.db and writes
responses back to inbound.db. Add nc MCP tool in the agent-runner
following the ask_user_question blocking pattern.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
PR #2259 (Baileys v6→v7) was merged into the channels branch instead of
main. PR #2260 was merged into main 28s later assuming v7 was already
in place. The v6 pin survived in three sites while the WhatsApp adapter
copied from origin/channels at install time was already on the v7 LID
API, breaking every fresh migrate-v2.sh run at 2c-install-whatsapp with
TS errors on remoteJidAlt/participantAlt/lid-mapping.update.
Bumps the pin to 7.0.0-rc.9 (the version v1 has been running on for
months) in:
- setup/install-whatsapp.sh
- setup/add-whatsapp.sh
- .claude/skills/add-whatsapp/SKILL.md (install instruction)
package.json + pnpm-lock.yaml are not touched here — install-whatsapp.sh
mutates them at runtime via pnpm install with the corrected pin.
Closes#2283
Today's copy says "Check that signal-cli is installed (we'll guide
you if not)" but the auto-install PR (#2281) makes that misleading —
we don't guide, we just install. Update the intro list to match what
will actually happen, and add a "no input needed for any of it" lead
so users know to expect a hands-off run.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
When a user picks Signal in setup and signal-cli isn't on PATH, today
NanoClaw bails with a GitHub releases link and tells them to re-run.
That's a hard wall for non-technical users — GitHub releases pages
are intimidating, and the Linux native build / Java decision isn't
obvious.
Replace the bail-out with a real install: a new install-signal-cli.sh
script that does `brew install signal-cli` on macOS or downloads the
native Linux release into ~/.local/bin (no Java, no sudo). Wired into
ensureSignalCli with a spinner; probe again after, fall back to the
original manual-install copy if anything fails.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Resolve conflict in src/index.ts shutdown sequence — keep both
stopCliServer() from nc-cli and try/finally + resetCircuitBreaker()
from main.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
openInboundDb() always opened /workspace/inbound.db which doesn't exist
in CI. In test mode, return a thin wrapper over the in-memory singleton
that delegates prepare/exec but no-ops close(), so callers' try/finally
cleanup doesn't destroy the shared DB mid-test.
One flag (_testMode), no monkey-patching, no saved-close bookkeeping.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The test wrapper forwards the in-memory outDb as the writable handle,
avoiding the filesystem reopen that fails in CI. The function stays
private — the optional writableOutDb param is an internal detail, not
a public API.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
WhatsApp's mobile UI calls the menu "You" on iOS and "Settings" on
Android (depending on platform/version). Both QR-scan and pairing-code
captions only mentioned "Settings", so iOS users had to figure out the
iOS-specific path on their own.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Stacked on #2269 (back-nav scaffolding) plus the Telegram, Slack, and
Teams PRs. They share the same scaffolding file from #2269 — they
don't compile without it, so they have to stack.
Signal had no user-facing prompt before the install kicked off, so
there was nothing to attach a Back option to. This adds a brief "Set
up Signal" info card (what's about to happen, no new phone number
needed) followed by a Continue/Back brightSelect. The card serves
double duty — context for the install plus the Back gate.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Stacked on #2269 (back-nav scaffolding) plus the Telegram and Slack
PRs. They share the same scaffolding file from #2269 — they don't
compile without it, so they have to stack.
Both Teams paths already had a brightSelect at the right place, so we
just extend each with a Back option — no new prompts:
- Existing-credentials path: Yes/No confirm becomes Yes/No/Back
- Fresh-setup path: the very first stepGate ("How did that go?") gets
a 4th option. Subsequent stepGates keep the original 3 options so
we never lose mid-flow state.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Stacked on the back-nav scaffolding from #2269 and the Telegram PR.
Slack's first prompt was already a single-purpose "Press Enter to open
Slack app settings" confirm. Replacing it with a 2-option brightSelect
(Open / ← Back) folds the Back gate into the existing screen — net
same number of prompts as before, just with a way out. The redundant
confirmThenOpen Press-Enter step is dropped; openUrl is called inline.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Stacked on the back-nav scaffolding from the Discord/WhatsApp/iMessage
PR — depends on setup/lib/back-nav.ts and the auto.ts loop.
Telegram's "no existing token" path adds one extra prompt — a
brightSelect "Ready to paste your bot token?" between the BotFather
instructions and the token paste. Clack's p.password prompt doesn't
support menu options so we can't fold Back into the paste itself; the
cleanest fix is a separate gate immediately before. The "existing
token" path doesn't add noise — the Yes/No confirm becomes Yes/No/Back.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Picking the wrong messaging channel during setup left users with no way
to bail out — they had to either complete the chosen flow or kill setup
and start over. This adds a Back option to the first prompt of three
channel sub-flows that share the same simple shape (one leading
brightSelect that's easy to extend).
Mechanics:
- New `setup/lib/back-nav.ts` exports a BACK_TO_CHANNEL_SELECTION
sentinel and ChannelFlowResult type.
- `setup/auto.ts` wraps the channel dispatch in a while-loop; channels
return BACK_TO_CHANNEL_SELECTION to bounce back to the chooser
without restarting setup. Channels not yet wired return void and the
loop exits after one pass, so the change is backwards compatible.
- Discord, WhatsApp, iMessage each add a `← Back to channel selection`
option to their first prompt.
Telegram, Slack, Teams, and Signal will follow as separate PRs — they
each need a slightly different shape (extra prompt insertions, gating
inside multi-step flows, etc.) and are easier to review independently.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
nodeenv doesn't support major-only version specifiers. Use lts
which resolves to the latest LTS release.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The disk threshold was unreliable on hosts with separate /home or /var
mounts where df underreports free space. Simplify the pre-flight to a
RAM-only check.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@chat-adapter/discord@4.27.0 includes vercel/chat#256, which fixes the
Discord adapter unconditionally setting payload.content alongside
payload.embeds when posting a card. In 4.26.0 every Discord card
appeared twice (text content above the embed, identical content inside
the embed) — every new install reproduced this on the welcome tour and
on every approval card.
The other 7 skills bump in lockstep because @chat-adapter/discord@4.27.0
depends on chat@4.27.0 while @chat-adapter/<other>@4.26.0 depend on
chat@4.26.0. Mixing the cohort produces a TypeScript dual-version
conflict between the bridge and adapter ChatInstance types.
Files updated (one line per file in each pnpm install command):
- add-discord (the user-visible bug fix)
- add-gchat, add-github, add-linear, add-slack, add-teams, add-telegram,
add-whatsapp-cloud (cohort consistency)
Out of scope: add-imessage, add-matrix, add-webex, add-resend use
third-party packages with independent versioning.
Closes#2264
The send_card MCP tool wrote outbound rows with type='card' but the
chat-sdk-bridge deliver() had no branch for them, so the payload fell
through to the text fallback (where text is undefined) and silently
returned without calling the adapter. delivery.ts then marked the
message delivered with platformMsgId=undefined and the user saw nothing.
Add a dedicated card branch mirroring the ask_question structure:
- Build Card from title, description, and string-or-{text} children
- Render only URL actions as LinkButtons (send_card is fire-and-forget
per its docstring, so callback buttons would have nowhere to land)
- Drop empty cards with a warn log instead of posting blank
- Fall back text: content.fallbackText > description > title
Affects every Chat SDK adapter that goes through the bridge: Discord,
Telegram, Slack, Teams, GChat, GitHub, Linear, WhatsApp Cloud, iMessage,
Matrix, Webex, Resend.
Tests: adds five cases covering normal render, action filtering,
link-button rendering, empty-card skip, and a regression check that
non-card chat-sdk payloads still flow through the text branch.
Closes#2263
Remove step 2d (whatsapp-resolve-lids.ts) which pre-created duplicate
messaging_groups rows keyed by @lid alongside the phone-keyed rows.
This caused split sessions — the same contact got separate sessions
depending on which JID format arrived.
With the Baileys v7 upgrade (PR #2259 on channels), the adapter
resolves every LID to a phone JID via extractAddressingContext before
the message reaches the router, making dual rows unnecessary.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Two friction points in the Telegram channel's "Open Telegram" card,
both surfaced when running setup on a VM-via-SSH where the user's
local laptop has no Telegram client installed:
1. The opening sentence read "Opening @yourbot in Telegram so it's
ready when the pairing code shows up." On a headless device that's
misleading — nothing is auto-opened, the user has to click the
link or use their phone. Rewrite as a direct, action-led
instruction on the headless flow only:
Open @yourbot in Telegram now — the pairing code is coming next,
and that's where you'll send it.
Plus a "Get started: <url>" line and a full-strength mobile
fallback hint inside the card so headless users have all
self-serve options visible.
On non-headless the original status-style line stays accurate
(`xdg-open` / `open` does fire for users with Telegram desktop
installed), so the card stays a single line.
2. Clicking `https://t.me/yourbot` silently fails when the user's
local device has no Telegram client. Non-headless gains:
- a "(must be installed here)" qualifier on the confirm prompt
so users without Telegram desktop know up-front;
- a single combined dim fallback line below the prompt:
"If browser does not appear, please visit: <url> — or
search for @yourbot on your mobile."
Direct `p.confirm` + `openUrl` instead of `confirmThenOpen` for
the non-headless branch so we control the dim line fully (single
combined line vs the helper's default URL-only line).
Headless layout drives the same self-serve content via the card body
itself; no confirm prompt fires there.
NanoClaw is known to not run reliably on GCE instances. Detect via DMI
during pre-flight (between the spec check and root warning) and let the
user abort before sinking time into bootstrap.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Pre-flight check in nanoclaw.sh that detects available RAM and free disk
on the project-root partition (Linux + macOS) before the bootstrap
spinner runs. Below 3700 MB RAM or 20 GB free disk, surfaces a "likely
cannot run" warning with a Try-anyway prompt defaulting to abort. The
3700 MB floor sits below 4 GB because "4 GB" VMs typically report
3700–3900 MB after kernel reserves (Hetzner CX21 ≈ 3814, AWS t3.medium
≈ 3800). Cheaper to fail here than to wait through pnpm install on a
host that can't run the agent container. Diagnostic events fire on
continue/abort.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
setup/lib/windowed-runner.ts was the one place on main still printing
elapsed time as raw seconds (`(170s)`) instead of using the
minute-aware `fmtDuration` helper from #2108. Two spots — the live
spinner suffix that ticks during the build, and the
success/error completion suffix — both now go through `fmtDuration`,
so anything past 60 seconds renders as `Xm Ys` (e.g. `2m 50s`) like
the rest of the setup flow.
The miss happened because a separate PR (closed) was supposed to
remove the timer entirely from this file, so #2108 deliberately
skipped it. With that other PR closed, applying `fmtDuration` here
is the consistent fix.
Pure formatting change. The helper itself is unchanged from #2108;
behavior under 60s is identical (`Xs`); behavior past 60s now
matches everywhere else.
Step 1 of the Telegram channel's BotFather instructions used to read:
1. Open Telegram and message @BotFather
Two small UX issues with that:
- "BotFather" reads slightly sketchy without context — a first-time
user has no way to know it's the official, sanctioned account
rather than an impersonator.
- Typing the username from memory leaves room for picking a typo'd
impostor account (Telegram has many @BotF4ther / @BotFAther / etc.
look-alikes).
Update the line so the official-bot framing is part of the instruction
itself:
1. Open Telegram and message @BotFather — Telegram's official bot
for creating and managing bots
One-line change in the existing note() body. No new dependencies, no
asset churn, no other behavior change.
Claude Code 2.1.116+ treats SDK `allowedTools` as a hard whitelist:
servers whose namespace isnt listed are filtered out before the agent
ever sees them, regardless of `permissionMode: bypassPermissions` or
any `permissions.allow` in settings. The static TOOL_ALLOWLIST only
contained `mcp__nanoclaw__*`, so any MCP wired via add_mcp_server (or
directly in container.json) was silently dropped.
Derive `mcp__<sanitized-name>__*` entries at the SDK call site from
the already-aggregated `this.mcpServers` map, mirroring the SDKs own
sanitization rule (chars outside [A-Za-z0-9_-] become _).
Prior diagnosis by @jsboige in #2028 (withdrawn, not upstreamed).
The OneCLI installer (curl onecli.sh/install | sh) doesn't pass
--remove-orphans to docker compose up. After the upstream service rename
(app -> onecli), the legacy onecli-app-1 container keeps :10254 bound
and crashes the new bring-up. This breaks /migrate-v2.sh on any host
that has a pre-rename OneCLI installed.
Workaround: before invoking the installer, remove containers in the
"onecli" compose project whose service name isn't in the v2 set
({onecli, postgres}). Label-keyed and no-op on fresh installs.
Filed upstream; remove this once the installer adds --remove-orphans.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The previous approach deleted the v1 unit file and symlinked it to v2,
making rollback impossible. Now we just disable v1 and leave the file
on disk so users can switch back with a single command.
Also adds rollback instructions to the migration summary output.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
After migration keeps v2, the old unslugged `nanoclaw.service` (or
`com.nanoclaw.plist`) was only disabled — the unit file stayed on disk.
A `systemctl --user restart nanoclaw` would start v1 instead of v2.
Now the migration removes the old file and symlinks it to the v2 unit,
so the legacy name transparently starts v2. Handles systemd (Linux/WSL)
and launchd (macOS). Idempotent — skips if the symlink already exists.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The migration script has interactive prompts and streams progress
output that gets collapsed when run via Claude Code's Bash tool.
Add a TTY guard that exits early with instructions to use the !
prefix instead.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The credential proxy already reads ANTHROPIC_AUTH_TOKEN (credential-proxy.ts
line 33) and uses it for OAuth-mode authentication, but setup/verify.ts did
not include it in its credential-detection regex. Users with
ANTHROPIC_AUTH_TOKEN in .env saw 'CREDENTIALS: missing' even though their
credentials were valid at runtime.
Add ANTHROPIC_AUTH_TOKEN to the regex and add a matching test case.
Closes gh-853
Container typecheck and bun install gracefully skip when bun isn't
installed on the host. Linux service restart now detects the actual
systemd service name instead of hardcoding 'nanoclaw'.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The skill was written for v1 and missed several v2 changes: container
rebuild after merge, dependency install for both pnpm and bun lockfiles,
container typecheck, channel/provider branch update awareness, and
platform-aware service restart instructions.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The pre-message printed by setup/register-claude-token.sh used to
say "A browser window will open for you to sign in with your
Claude account." Accurate on a laptop or desktop, but a lie on
headless devices (Pi, SSH'd-into Linux server, CI) where the
browser auto-open never lands and the user actually has to copy
the URL `claude setup-token` prints to another device.
Add a small bash isHeadless check (mirrors `isHeadless()` in
setup/platform.ts: Linux without DISPLAY / WAYLAND_DISPLAY) and
swap the heredoc accordingly:
- Headless: "A sign-in link will appear for you to sign in with
your Claude account. When you finish, we'll save the token
to your OneCLI vault automatically."
- GUI: existing "A browser window will open…" copy, unchanged.
The trailing "Press Enter to continue, or edit the command first."
line and the actual `claude setup-token` invocation are unchanged
— only the leading sentence flips.
/./ requires at least one character and silently drops messages with no
text (e.g. Telegram photo/video/file sent without a caption). Switching
to /[\s\S]*/ matches the empty string too, so media-only messages now
reach the router and then the agent.
#2183 added orphan-claim cleanup that reopens `outbound.db` by session
path (`openOutboundDbRw(session.agent_group_id, session.id)`) so the
delete runs against a writable handle even when callers pass a readonly
one. That works for the production caller — there's a real on-disk
session DB at the expected path.
The test wrapper `_resetStuckProcessingRowsForTesting` (introduced in
the same series, #2151) is called with in-memory DBs that have no
on-disk path. The reopen creates a fresh empty file at
`<DATA_DIR>/v2-sessions/ag-test/sess-test/outbound.db`, runs the delete
against that, and leaves the in-memory `outDb` (which the test reads
afterward) untouched. The two `resetStuckProcessingRows — orphan claim
cleanup` tests assert `getProcessingClaims(outDb).toEqual([])` after
the call and fail on the row that's still there.
Fix: drop the `_…ForTesting` wrapper, export `resetStuckProcessingRows`
directly with an optional `writableOutDb` parameter. When omitted
(production), the function reopens `outbound.db` RW by session path —
existing behavior, existing safety guarantee. When provided (tests, or
any future caller that already holds a writable handle), the function
uses it directly and skips the reopen. The optional parameter has a
real meaning, not a "for tests" hack.
Public API surface change: `_resetStuckProcessingRowsForTesting` is
gone, `resetStuckProcessingRows` is now exported. No other callers
inside the repo besides the test.
The first-time setup picker only listed seven channels with bash
installers. Users wanting to install one of the other channels (matrix,
github, linear, webex, etc.) had no entry point from the picker and had
to know to run /add-<name> from Claude Code afterwards.
Add an "Other…" option that prompts for a free-text name, normalizes it
(accepts "matrix", "add-matrix", or "/add-matrix"), and prints a hint
telling the user to run /add-<name> from Claude Code after setup
finishes. The verify step's "What's left" panel already covers the
empty-channels case, so the user is not left thinking the channel was
wired.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Skill files only — copied from PR #2192 (channels branch).
Source adapter (src/channels/deltachat.ts) lives on the channels
branch and is installed by the skill.
Co-Authored-By: Axel McLaren <scm@axml.uk>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
If 1b-db is re-run after the v2 service has already started (e.g.
recovering from an earlier failure), the messaging_group it would
otherwise create may already exist — auto-created by the runtime router
on the first inbound message, with the router's default
unknown_sender_policy ('request_approval'), not the migration's intent
('public'). The previous reuse path skipped creation but never updated
the policy, so re-runs left the bot hanging every message waiting for
an approver that wasn't seeded yet.
When reusing an existing row that has zero wired agent_groups (signal
of a router auto-create), reset the policy to 'public'. Once any wiring
exists, the user has had a chance to tighten via the skill — leave it.
Also adds a CHANGELOG entry covering this and the two sibling fixes
(Discord DM resolution, symlink skip in copyTree).
fs.copyFileSync follows symlinks, so a single broken/dangling link in v1
(e.g. .claude-shared.md → /app/CLAUDE.md, a container-side path that
doesn't resolve on the host) crashed the alphabetical traversal with
ENOENT — preventing later folders, including the actual registered
group, from being copied.
Check entry.isSymbolicLink() and skip with a one-line log. v2 uses
composed CLAUDE.md fragments, so v1's container-path symlinks have no v2
meaning and don't need to be carried forward.
The resolver only enumerated guild channels, so any v1 install whose
registered Discord chat was a DM (a common case for personal-bot
installs) failed 1b-db with "not found in any guild" — leaving the
migration without an agent_group or wiring, and the user with a bot that
received messages but had nowhere to route them.
Add an unresolved-channel classification pass: for any v1 channel id not
found in a guild, GET /channels/<id> and emit discord:@me:<id> when the
type is DM (1) or GROUP_DM (3). Matches the runtime adapter's
guild_id || "@me" encoding. Other types / 404 / 403 keep current
skip-with-warning behavior.
Caller passes the v1 channel id list (already on hand). Test coverage
extends the existing mock-fetch pattern with DM, GROUP_DM, orphan, and
dedupe cases.
Trust the agent to figure out which failed steps actually stop
routing. The rule is the goal ("can the bot route one message?"),
not a hardcoded list.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2b-channel-auth: copies the Baileys keystore + channel-specific env
keys. Without it WhatsApp can't connect — saw this firsthand when
the original candidatePaths bug left env_keys=0,files=0.
3c-auth: registers Anthropic credentials in OneCLI. 3b installs the
gateway; 3c puts the secret in the vault. Without 3c every agent
request 401s regardless of 3b's status.
1c-groups stays deferred — agent runs on stock CLAUDE.md without it,
but routing works.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Previous version spelled out launchctl/systemctl commands, log lines
to grep for, diagnostic recipes — the agent reading this skill knows
all of that. Keep only the parts that aren't obvious from the rest of
the codebase: which steps are blocking vs deferred, the smoke-test
ordering, and the non-destructive framing for the user.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Phase 0 used to be "triage every failed step before doing anything
else", which front-loaded a bunch of fixes for things that don't
actually block the user from proving v2 works. Restructure:
- 0a — fix blockers only (1b/1d/2c/2d/3a/3b/3e). Defer non-blockers
(1a, 1c, 1e, 2b, 3c) — most surface naturally in later phases.
- 0b — smoke test: switch v1 → v2, send a real message, verify the
routing chain in logs/nanoclaw.log. AskUserQuestion gates whether
to continue.
- Revert recipe (launchctl/systemctl) called out as always-available,
not destructive — v1 process, data, and credentials are untouched.
Up-front list of what the script handled now also mentions the
WhatsApp LID resolution and Baileys keystore copy, so users see
exactly what continuity they're getting.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
README: replace the one-line v1 migration note with a collapsed
<details> block. Quick Start stays compact for the common case (fresh
install) while v1 users get the actual instructions. Calls out
explicitly that the script must be run from a real terminal — not from
inside a Claude session — so the channel-select / switchover prompts
and the Node/pnpm/Docker bootstrap all work.
migrate-from-v1 skill: add a Preflight section that aborts if
logs/setup-migration/handoff.json is missing. Without this, invoking
the skill before the script just leads Claude to start guessing /
running shell commands. The new message redirects them to the script
and tells them it'll hand back to Claude on completion.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
v1 stored every WhatsApp DM as `<phone>@s.whatsapp.net`. v2's WA
adapter sometimes resolves the chat to `<lid>@lid` instead — when
WhatsApp delivers via the LID protocol and Baileys hasn't yet learned
a LID→phone mapping for that contact (cold cache after migration).
The router then can't find the phone-keyed messaging_group and
silently drops the message at router.ts:184.
Baileys persists every LID↔phone pair it has ever learned to disk as
`store/auth/lid-mapping-<phone>.json` (forward) and
`lid-mapping-<lid>_reverse.json` (reverse). v1 will already have these
populated for every contact it has talked to. New step 2d-whatsapp-lids
parses the reverse files and writes paired LID-keyed `messaging_groups`
+ `messaging_group_agents` rows so both `<phone>@s.whatsapp.net` and
`<lid>@lid` route to the same agent_group with the same engage rules.
No Baileys boot, no WhatsApp connectivity required — pure filesystem
read of files we've already copied via 2b-channel-auth. Step is
no-op-on-skip if either store/auth or whatsapp DM rows are missing.
Anything that slips through (a contact whose LID v1 never learned)
falls back to the runtime approval flow once the WA adapter sets
isMention=true on DMs — each unknown LID DM auto-creates an
approval-required messaging_group and the owner gets a one-tap
register prompt.
Verified end-to-end on a 12-group v1 install: 3 DM rows aliased,
inbound DM routed via the LID-keyed row.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
v1 didn't track is_group separately; db.ts hardcoded `is_group: 1` for
every messaging_group. v2 uses is_group=0 to collapse DM sub-thread
sessions and to drive routing decisions, so getting it wrong is latent
risk on otherwise-working installs.
New helper inferIsGroup(channelType, platformId) lives in shared.ts so
tasks.ts and any future migration step can reuse it. Inferred per
channel:
- whatsapp: `<id>@g.us` is a group, anything else is a DM
- telegram: negative chat IDs are groups, positive are DMs
- everything else: default to 1 (least surprising for chats v1 chose
to register, where DM auto-create paths weren't used)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
migrate-v2.sh
Replace `declare -A STEP_RESULTS` with two parallel indexed arrays
(STEP_NAMES + STEP_STATUSES) plus a `record_step` helper. macOS ships
bash 3.2 which has no associative arrays — `declare -A` errored out
silently and every `STEP_RESULTS["1a-env"]=...` triggered a fatal
bash arithmetic error (interpreting "1a" as a number). Visible
symptom: `steps: {}` in handoff.json. Latent symptom: phase 2c's
install loop sometimes bailed mid-iteration before invoking the
channel install script, leaving channel code uninstalled while
reporting `overall_status: success`.
migrate-v2-reset.sh
Cover the gaps that left install side-effects in place between
iterations:
- Remove untracked adapter files in src/channels/ (mirror the
pattern already used for container/skills/).
- Restore tracked setup helpers that channel installs overwrite
(setup/whatsapp-auth.ts, setup/pair-telegram.ts, setup/index.ts)
and remove untracked ones they create (setup/groups.ts).
- Restore package.json + pnpm-lock.yaml (channel installs add
deps like @whiskeysockets/baileys).
Setup/migrate-v2/* is intentionally not touched — that's where user
WIP lives.
Verified end-to-end: reset → migrate → all 9 steps reported in
handoff.json with status "success", phase 2c install actually runs.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- shared.ts: parseJid now recognizes raw Baileys WhatsApp JIDs
(`<id>@s.whatsapp.net`, `@g.us`, etc.); v2PlatformId returns the raw
JID for whatsapp to match what the runtime adapter emits. Without this,
every WhatsApp group in a v1 install was silently skipped.
- discord-resolver.ts: new helper that uses DISCORD_BOT_TOKEN to look up
channelId → guildId via the Discord API, since v1 stored only the
channel id but v2 needs `discord:<guildId>:<channelId>`. Best-effort:
on missing/invalid token or network error, returns empty resolver and
the affected groups are skipped with the reason surfaced per channel.
- db.ts, tasks.ts: route Discord groups through the resolver; other
channels go through v2PlatformId unchanged. Resolver only built when
at least one Discord group exists, so non-Discord installs incur no
network.
- db.ts: when every v1 group is skipped, exit non-zero with a FAIL line
instead of `OK:groups=N,...,skipped=N`, so the wrapper doesn't hide
total failure under a successful-looking summary.
- migrate-v2.sh: run_step now surfaces ERROR: lines from successful
steps (with count + first 3 + raw log path); phase 2c install loop
populates STEP_RESULTS so install failures show in handoff.json
instead of silently passing.
- sessions.ts: copyTree skips dangling symlinks (e.g. v1's
`.claude/debug/latest`) instead of crashing the entire step.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
initTestSessionDb() creates an in-memory inbound singleton, but
openInboundDb() always opened the hardcoded /workspace/inbound.db
path. Every test that exercised getPendingMessages — directly, or via
test fixtures that load data through it (e.g. poll-loop.test.ts:29
loads formatter test rows via getPendingMessages) — failed with
SQLITE_CANTOPEN under `bun test` outside a real container.
Baseline on main: 34 pass, 25 fail across 6 files. After this fix:
59 pass, 0 fail.
In test mode, openInboundDb returns the in-memory singleton. The
singleton's .close() is no-op'd in initTestSessionDb so caller
try/finally cleanup doesn't tear down the shared DB; closeSessionDb
invokes the saved original close to do the real teardown.
Production behavior is unchanged — _inboundIsTest only flips inside
initTestSessionDb, which is never called outside the test runner.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
PR #2151 added deleteOrphanProcessingClaims() to resetStuckProcessingRows(),
but outDb is always opened readonly (openOutboundDb uses immutable: true).
The write silently failed, leaving orphan processing_ack rows that block
future message delivery for the session.
Fix: add openOutboundDbRw() alongside the existing readonly opener and use
it in resetStuckProcessingRows() to open a short-lived writable handle just
for the delete. The readonly handle is still used for all reads above.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The follow-up poller filtered /clear out of every tick without acking
the row, and pushed every other slash command through plain
formatMessages() (XML wrapping). On a warm container the outer
while(true) loop never regains control, so:
- /clear sat pending in messages_in forever (no response at all)
- /compact, /cost, /context, /files, /remote-control arrived at the
SDK as XML-wrapped user text and were never dispatched as commands
Both modes are invisible to host monitoring: rows are either left
pending without a processing_ack claim, or marked completed normally;
heartbeat keeps firing inside the SDK event loop.
When the follow-up poller observes any slash command (admin or
passthrough — categorizeMessage decides), end the active query so the
current turn winds down cleanly and the outer loop wakes, re-fetches
the same pending set, and runs them through the canonical path
(/clear handler + formatMessagesWithCommands raw dispatch). Leave the
rows untouched so the outer-loop fetch sees the same set the poller
saw.
Cost: each slash command on a warm container forces close+reopen of
the SDK stream — a few seconds of subprocess startup. The Anthropic
prompt cache is server-side with a 5-min TTL keyed on prefix hash, so
stream lifecycle does not affect cache lifetime; close+reopen within
5 min still gets cache hits.
Also corrects the warm-stream rationale comment on processQuery, which
implied keeping the stream open preserved cache warmth — it doesn't.
Testing evidence — cache stays warm across stream close+reopen:
Turn 1 (warm session):
Usage: in=6 out=245 cache_create=92 cache_read=22996
Full cache hit (22996 tokens).
Turn 2 — /clear arrives:
Pending slash command — ending stream so outer loop can process
Clearing session (resetting continuation)
Usage: in=6 out=95 cache_create=9393 cache_read=13600
System prompt + tool defs (~13600 tokens) still hit cache;
conversation history is gone (continuation reset) so the new turn
writes fresh context.
Turn 3 — /cost arrives:
Pending slash command — ending stream so outer loop can process
Usage: in=0 out=0 cache_create=0 cache_read=0 wall=0.0s api=0.0s
/cost is a CLI built-in: dispatched locally by the SDK, no API
call. Pre-fix this would have arrived as XML-wrapped user text
and never dispatched — confirms the broader fix works.
Turn 4 (next chat after /cost):
Usage: in=6 out=142 cache_create=328 cache_read=22993
Full cache hit again (22993 tokens read, 328 written). Despite the
/cost-induced stream close+reopen, the server-side prompt cache
survived: the new sdkQuery() resumed the same continuation, the
request prefix matched the cached entry.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
OneCLI runs in a Docker container, so Docker must be installed first.
Reordered: Docker (3a) → OneCLI (3b) → Auth (3c) → Skills (3d) →
Build (3e). OneCLI install now skips with a clear message if Docker
isn't available.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
migrate-v2.sh now runs setup/install-docker.sh when Docker isn't
found instead of just printing a message. The container build step
reports failure (not skip) when Docker is unavailable so the skill
can triage it.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The migration is no longer experimental — it's been tested end-to-end
with service switchover, session continuity, and revert. Updated the
changelog entry to reflect the new migrate-v2.sh flow.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Extracted the helpers we use (JID parsing, trigger mapping, channel
auth registry, generateId, v2PlatformId) into setup/migrate-v2/shared.ts.
Deleted setup/migrate-v1/ entirely — no code references it anymore.
Updated README, CLAUDE.md, docs/v1-to-v2-changes.md, and
docs/migration-dev.md to reference the new paths and migrate-v2.sh
entry point.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The old migration flow (detect → validate → db → groups → env →
channel-auth → channels → tasks) ran inside `bash nanoclaw.sh` via
setup/auto.ts. Replaced by the standalone `bash migrate-v2.sh` flow.
Deleted:
- setup/migrate-v1.ts (orchestrator)
- setup/migrate-v1/{detect,validate,db,env,groups,channel-auth,channels,tasks}.ts
Kept:
- setup/migrate-v1/shared.ts (used by new migrate-v2/ steps)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
New entry point: `bash migrate-v2.sh` from the v2 checkout.
Replaces the old setup-embedded migration flow with a standalone
4-phase script + rewritten Claude skill for the interactive parts.
Phase 0: Bootstrap (Node/pnpm/deps via setup.sh) + find v1
Phase 1: Core state (env, DB, groups, sessions, tasks)
Phase 2: Channels (clack multiselect, auth copy, code install)
Phase 3: Infrastructure (OneCLI, auth, Docker, skills, container build)
Service switchover: stop v1 → start v2 → test → keep or revert
Phase 4: Handoff → exec claude "/migrate-from-v1"
The skill handles: owner seeding, access policy, CLAUDE.local.md
cleanup, container config validation, fork customization porting.
Key fixes found during testing:
- triggerToEngage: requires_trigger=0 must override non-empty pattern
- unknown_sender_policy defaults to 'public' (strict drops all msgs
before owner is seeded)
- Service revert must stop v2 (parse unit name from step log, not
early tsx one-liner that can fail)
- Session continuity: copy JSONL from -workspace-group/ to
-workspace-agent/ and write continuation:claude into outbound.db
- container_config.additionalMounts written directly to container.json
(same shape in v1 and v2)
- EXIT trap writes handoff.json; explicit write_handoff before exec
Includes migrate-v2-reset.sh for dev iteration and docs/migration-dev.md
for testing/debugging reference.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- verify: remove the CLI ping; cli-agent step earlier in setup already
proved the round-trip works, and the test agent gets cleaned up before
verify runs — so the ping was guaranteed to fail on installs that wired
a messaging app instead of staying CLI-only. Status now collapses to
service-running ∧ credentials ∧ ≥1 wired group.
- agent-ping: catch Claude Code's "Please run /login" / "Not logged in" /
"Invalid API key" banners so a successfully-spawned agent that has no
credentials no longer reports as 'ok'.
- auth paste: validate the full sk-ant-oat…AA shape; when the cleaned
input is under 90 chars, surface a truncation-specific hint pointing at
terminal wrap as the likely cause. Strip internal whitespace at both
validate and assignment so multi-line pastes that survive clack also
go through cleanly.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Resolve import conflict in setup/auto.ts — keep runMigrateV1 import,
deduplicate runWindowedStep and getLaunchdLabel/getSystemdUnit imports.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
vercel@53.0.1 declares a dep on @vercel/static-build@2.9.22 which is not
published on npm (only 2.9.21 exists), breaking every fresh container
build that resolves vercel@latest.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Mirrors the four defenses on the outbound side onto extractAttachmentFiles:
1. Reject unsafe messageId via isSafeAttachmentName before any inbox path
is built. WhatsApp passes msg.key.id through raw and that field is
client generated, so a peer can craft it; future end to end encrypted
adapters will have the same property.
2. lstatSync on the inbox dir refuses a pre placed symlink before
mkdirSync would silently follow it.
3. realpathSync + isPathInside contains the resolved dir under the
session inbox root.
4. writeFileSync uses the wx flag so a pre placed symlink at the file
path is refused atomically by the kernel; EEXIST surfaces as a
logged skip.
Threat: the session dir is mounted writable into the container at
/workspace, so a compromised agent can pre place inbox/<future msgId>/
as a symlink and wait for a chat message with a matching id to redirect
the host write. The four guards together close that window.
Consolidates with the existing isSafeAttachmentName helper from
attachment-safety.ts rather than introducing a duplicate basename
validator inside session-manager.
Co-Authored-By: Daisuke Tsuji <dim0627@gmail.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two fixes on top of the follow-up pre-task-script work:
1. The void async IIFE inside the interval handler had no catch, so a
throw from the dynamic import or applyPreTaskScripts escaped as an
unhandled rejection — terminating the container. The initial-batch
path is wrapped by processQuery's outer try/catch; the follow-up
path needs its own. Now logs the error and lets the next tick retry.
2. Re-check `done` immediately before query.push. The flag can flip
true while applyPreTaskScripts is awaited (outer stream finishes
during the script execution); without the re-check we'd push into a
closed query. Claimed messages get released by the host's
processing-claim sweep — same recovery posture as the rest of the
poller.
Co-Authored-By: Michael Zazon <mzazon@gmail.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Routes the post-ping `_ping-test` cleanup through `spawnQuiet` +
`setupLog.step` so a non-zero exit from `delete-cli-agent.ts` lands
in `logs/setup-steps/cleanup-cli-agent.log` and the progression log,
and prints a one-line warn to the user. Previously the spawnSync was
fire-and-forget with `stdio: 'ignore'`, leaving an orphan agent group
silently if cleanup failed.
Restores the original copy on the cli-agent step labels, the ping
explainer paragraph, and the post-ping spinner stop line — those
copy changes are out of scope for this PR.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Cached singleton can return stale rows on virtiofs/NFS mounts,
causing follow-up messages to silently never be polled. Add
openInboundDb() with mmap_size=0 and switch the three messages_in
readers to it.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Replaces the single-line `NanoClaw` wordmark printed by
nanoclaw.sh with a multi-line splash frame: the lobster mascot
rendered as truecolor braille, drifting bubbles on either side,
the figlet wordmark below (Nano in bold, Claw in cyan bold),
three taglines — "Small.", "Runs on your machine.", "Yours to
modify." — and a navy seafloor line.
The frame is pre-rendered into `assets/setup-splash.txt` (built
from `assets/nanoclaw-icon.png` via chafa for the lobster +
figlet for the wordmark). nanoclaw.sh just streams the literal
bytes — no runtime dependency on chafa, figlet, or
ImageMagick.
Total height: 30 lines. Visible width: ~40 columns (fits any
terminal). Truecolor ANSI codes are used directly; terminals
without truecolor support will see a degraded but still
readable frame.
Also removes the standalone "Small. Runs on your machine.
Yours to modify." tagline line that nanoclaw.sh used to print
above the bootstrap spinner — those taglines now appear inside
the splash, so showing them again would duplicate.
The wordmark-suppression flow downstream (`setup:auto` honoring
`NANOCLAW_BOOTSTRAPPED=1`) is unchanged: the splash prints once
in nanoclaw.sh, setup:auto's `printIntro()` sees the flag and
keeps the clack `p.intro` line clean ("Let's get you set up.").
On GUI devices the URL was previously rendered dim inside the
instructional `note(...)` card, then `confirmThenOpen` printed
its prompt below: read the card, see the URL, then a separate
"Press Enter to open the X" prompt with no link near it. Two
visual moments for what's really one decision.
This PR pulls the URL out of the card on GUI devices and
relocates it directly under the action line of the confirm
prompt, separated only by a dim "If browser does not appear,
please visit: <url>" line:
│
◆ Press Enter to open the Developer Portal
│ If browser does not appear, please visit: … (dim)
│ ● Yes / ○ No
│
Action and fallback live as one prompt block — the user sees
both at the same time, no need to scroll back up to grab the
URL if the auto-open misses.
Headless behavior is unchanged: `formatNoteLink` still emits
"Get started: <url>" inside the card on headless devices (per
#2146), and `confirmThenOpen` still no-ops on headless (per
#2145). The only thing that changed for headless is the leading
`\n` in the helper output, which acts as a visual separator from
the steps above.
Five call sites adjusted (Discord ×3, Slack ×1, Telegram ×1) to
use `.filter((line) => line !== null)` so the now-nullable
`formatNoteLink` cleanly drops out of GUI-rendered cards.
When a card's auto-open is gated on `confirmThenOpen`, the URL also
appears inside the surrounding `note(...)` as a copy-paste fallback —
rendered dim because on a GUI device the auto-open is doing the
heavy lifting and the printed URL is just an incidental backup.
On headless devices the auto-open doesn't run (per #2145), so the
URL inside the note is the user's *only* path forward. A dim URL
reads as "incidental reference" exactly when it should be reading
as "this is the action."
Adds `formatNoteLink(url)` to setup/lib/browser.ts:
- GUI device → `k.dim(url)` (unchanged from today)
- Headless device → `Get started: <url>` at full strength
Replaces five call sites (Discord ×3, Slack ×1, Telegram ×1).
Single helper, atomic switch via the same `isHeadless()` plumbing
introduced in #2145, so the headless behavior across all five
flows stays in sync.
Wires the existing `isHeadless()` from setup/platform.ts into
`confirmThenOpen`. When the helper detects a headless device
(Linux without `DISPLAY`/`WAYLAND_DISPLAY`), both the
"Press Enter to open your browser" prompt and the actual
`openUrl(...)` call are skipped — there's no browser to launch
and the user can't usefully press Enter to summon one.
Why this is enough — the surrounding flow already supports the
headless path implicitly:
- Every `confirmThenOpen` call site sits beneath a `note(...)`
that prints the URL and the steps the user needs to take.
The URL is already visible to copy-paste onto another
device.
- Every site is followed by an explicit confirmation prompt
("Got your bot token?", "Done with the X?", etc.) that
naturally serves as the headless user's "I finished the
thing on my other device" signal.
So the headless branch becomes: read the note, do the thing,
answer the next prompt — without a useless "Press Enter to
open your browser" detour in between.
Coverage rationale (~95% accurate for the cases that actually
cause user confusion today):
- Linux + no `DISPLAY`/`WAYLAND_DISPLAY` → headless. Catches:
• Raspberry Pi headless installs
• Bare-metal Linux servers
• SSH'd into Linux without X11 forwarding
• CI environments on Linux
• Linux containers (which have no display)
- macOS → never headless. Even SSH'd Macs can usually still
open URLs through the local user's session, so treating
them as GUI-capable is the right default.
- Windows → never headless (effectively always GUI in
practice).
The remaining ~5% are edge cases (someone manually unset
`DISPLAY` on a desktop Linux session, etc.) that almost never
happen accidentally and recover gracefully — the URL is still
visible in the surrounding note.
Six call sites in channel adapters (Discord ×3, Slack ×1,
Telegram ×1, Teams ×1) all change behavior atomically through
the single helper. No per-site copy changes needed; consistency
is enforced by the central wiring.
Adds a `fmtDuration(ms)` helper in `setup/lib/theme.ts` that returns
`47s` under a minute and `1m 34s` from 60s onward, then routes every
elapsed-time spinner suffix in the setup flow through it. Replaces
the inline `Math.round((Date.now() - start) / 1000)` + `(${elapsed}s)`
pattern at every site.
Format is consistent past 60s — `1m 0s` over `1m` — so the live
spinner doesn't change shape at every whole-minute crossing.
Sites updated: setup/auto.ts, setup/lib/{runner,tz-from-claude,
claude-assist}.ts, and setup/channels/{signal,whatsapp,telegram,
discord,slack}.ts. Pre-allocated suffix budgets in `fitToWidth`
calls bumped from `' (999s)'` to `' (99m 59s)'` so long-running
steps don't blow past the reserved width.
Remove the grouped detectExistingEnv() block that asked "reuse all or
start fresh" at the top of setup. Each channel step now reads credentials
directly from .env on disk via readEnvKey() and offers to reuse them
individually at the point of use.
- Add readEnvKey() helper in setup/environment.ts
- Remove ENV_KEY_GROUPS, ExistingEnvGroup, detectExistingEnv from auto.ts
- Move detectRegisteredGroups skip to right before cli-agent step
- Switch all channel files (telegram, discord, slack, teams, imessage)
from process.env to readEnvKey()
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The spinner label exceeded terminal width, breaking clack's cursor-up
redraw and causing each animation tick to print a new line instead of
updating in-place. Wrap with fitToWidth() like other setup spinners.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
When the host kills a container (absolute-ceiling, claim-stuck, or crashed),
resetStuckProcessingRows reset messages_in but left orphan rows in
processing_ack. The next sweep tick spawned a fresh container and, on the
same tick, ran enforceRunningContainerSla against outbound.db that still
contained the previous container's claim with a hours-old status_changed
timestamp — instant kill-claim, before the agent-runner could open
outbound.db to run its own clearStaleProcessingAcks(). Loop until tries
hit MAX_TRIES.
Add deleteOrphanProcessingClaims() in session-db and call it at the end of
resetStuckProcessingRows. Safe to write outbound.db here because the host
only enters this path after killContainer (or when no container is running).
Tests in host-sweep.test.ts cover the helper plus the regression: orphan
claim from a 2h-old kill is now removed atomically with the messages_in
reset, so the next sweep tick sees an empty claims list and the freshly
respawned container survives long enough to start its agent-runner.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Setup steps like install-node.sh and install-docker.sh run sudo
non-interactively. Without NOPASSWD, password prompts can silently
hang when piped through the setup runner.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
When maybeReexecUnderSg() re-launches setup:auto under `sg docker`,
the new process had no memory of completed steps — it re-prompted the
welcome menu, re-ran environment and container checks, and then failed
on onecli because the earlier run's state was lost.
Pass NANOCLAW_SKIP with completedStepNames() so the re-exec'd process
skips already-finished steps, suppress the welcome menu and existing-env
prompts on re-exec since the user already answered them.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The root cause of broken keyboard navigation was sg docker prompting
for the (unset) group password when the user wasn't in the docker
group. Fix by running sudo usermod -aG docker before sg docker.
This makes the stty sane calls and p.confirm workaround unnecessary,
so revert those. Also remove the manual docker group instruction from
nanoclaw.sh since container.ts handles it automatically.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Users running setup as root hit permission issues with containers,
services, and file ownership. Warn early with an interactive prompt
and provide step-by-step instructions to create a regular user.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Move the MIME/type-to-extension maps and derivation helpers out of
session-manager.ts into a dedicated attachment-naming module — keeps
session-manager focused on session lifecycle and gives the helpers
a natural home for unit tests alongside the existing attachment-safety
module.
Two small fixes alongside the extraction:
- extForMime now guards `typeof mime !== 'string'` before .split, so a
buggy bridge passing `mimeType: { ... }` (object) no longer crashes
the inbound write loop.
- deriveAttachmentName computes Date.now() once per call instead of
twice, and tightens the explicit-name check to a string-and-truthy
guard so non-string values fall through to derivation.
Adds attachment-naming.test.ts with 11 cases covering MIME normalization
(case + parameters), Telegram type fallback, the non-string defensive
guard, and the bare-timestamp fallback.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The schedule_task MCP tool wrote routing fields (platform_id, channel_type,
thread_id) onto the outbound system message's row columns, but
handleSystemAction (src/delivery.ts) parses content JSON and forwards only
that to handlers. handleScheduleTask (src/modules/scheduling/actions.ts)
reads content.platformId/channelType/threadId — which the writer never
populated — so every kind='task' row landed in messages_in with all-null
routing.
When host-sweep wakes a scheduled task, dispatchResultText's fast path
requires routing on the message and bails when it's null, falling through
to the "Routing recovery" retry prompt. End-user delivery still works
because the agent can pick a destination from its destinations table on
retry — so the bug went undetected, silently costing one extra LLM turn
per scheduled-task wake. Sessions whose destinations table has no channel
row (e.g. agent-only destinations) fail outright with a recovery loop.
Fix: add the routing fields to the content JSON so the writer matches the
contract handleScheduleTask already expects. cancel/pause/resume/update_task
operate by id alone and don't need routing.
Removing the "Not logged in · Please run /login" detection and
substitution from this PR — narrowing scope to just the OneCLI
gateway transient-retry change. The login-message handling will be
addressed separately.
Reverts:
- AgentProvider.isAuthRequired / authRequiredMessage
- ClaudeProvider auth-required regex, classifier, and remediation text
- poll-loop writeAuthRequiredMessage helper + call sites
- claude.test.ts (auth-only test file)
OneCLI/wakeContainer changes (the remaining content of the PR) are
unaffected.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Closes#1820.
The container agent-runner sets CLAUDE_CODE_AUTO_COMPACT_WINDOW
unconditionally on the container process env, with no way to override
it per-deployment without editing source. Read process.env first and
fall back to the existing 165000 literal when unset.
Default behavior is unchanged for installs that do not set the env
var. Operators running 1M-context models or emergency-tuning a live
deployment can now raise or lower the threshold from the host env.
Pre-commit hook ran prettier on the prior commit but left the reformats
unstaged. Folding them in here so the branch is clean.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds a paired `authRequiredMessage()` method to AgentProvider so
per-provider auth-failure remediation can differ. Claude returns the
Anthropic/`claude` instruction; future providers (Codex, OpenCode, …)
can return their own remediation text. The poll-loop calls
`provider.authRequiredMessage?.()` and falls back to a generic message
if a provider implements `isAuthRequired` without supplying its own
remediation.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds a transport-agnostic CLI control plane shared between three eventual
callers (host shell, Claude in project, container agent) — though only the
host-side socket transport is wired in this commit. Container DB transport
and approval flow land alongside the first risky command.
- src/cli/frame.ts: wire format (RequestFrame, ResponseFrame, CallerContext)
- src/cli/registry.ts: command registry with RiskClass
- src/cli/dispatch.ts: transport-agnostic dispatcher
- src/cli/transport.ts: Transport interface
- src/cli/socket-client.ts: SocketTransport against data/nc.sock
- src/cli/socket-server.ts: host-side listener (chmod 0600, line-delimited JSON)
- src/cli/format.ts: human table / --json output modes
- src/cli/client.ts: `nc` argv -> frame -> transport -> stdout
- src/cli/commands/list-groups.ts: first command (riskClass: safe)
- bin/nc: bash launcher (resolves project root via symlink)
- src/index.ts: start/stop server + import command barrel
`data/nc.sock` is intentionally separate from `data/cli.sock` (which the
existing chat-style channel adapter still owns).
Verified end-to-end: `nc list-groups`, `nc list groups`, `--json`,
unknown-command error, host-down ENOENT message with start instructions.
typecheck clean; eslint reports only the same `no-catch-all` warnings the
rest of the codebase has.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
When a channel bridge passes an attachment without an explicit `name`,
extractAttachmentFiles fell back to `attachment-<ts>` with no extension.
Agents could not tell whether the file was a JPEG, PDF, or audio clip,
and tools keyed on extension (image viewers, exiftool, etc.) misbehaved.
Two cases are now covered:
1. Channels that set `mimeType` but no `name` (Discord/Slack documents,
Telegram document uploads). A small MIME-to-extension table covers
the common content types — image/*, audio/*, video/*, pdf, zip,
txt, json. Unknown MIMEs fall back to the unsuffixed name.
2. Channels that set `att.type` but no `mimeType` (Telegram photos,
stickers, voice, animations). The chat-sdk bridge sets a coarse
media-class (`photo` / `sticker` / `voice` / `video` /
`animation`) which is reliable enough to derive a canonical
extension. Telegram GIFs are MP4 under the hood.
The existing isSafeAttachmentName security guard is preserved — the
derived name still passes through it before disk I/O. The new lookup
tables emit static values from internal maps and cannot construct a
path-traversal payload; attacker-controlled att.name continues to flow
through the same validator.
Tasks arriving during an active query were pushed into the stream as
follow-ups without running their `script` gate — so a wakeAgent=false
pre-script that was supposed to suppress the tick silently leaked
through and woke the agent every time. Evidence: monitoring cron
firing every 10 min with [task-script] log lines never showing.
Run applyPreTaskScripts on the follow-up batch too: wakeAgent=false
tasks get marked completed and dropped; wakeAgent=true tasks have
scriptOutput enriched exactly like the initial-batch path. Added a
pollInFlight guard to serialize async runs and avoid overlapping
script executions when the interval fires while one is still going.
Wrapped in a MODULE-HOOK:scheduling-pre-task-followup marker block
to match the existing initial-batch hook convention.
- wakeContainer now never throws — returns Promise<boolean>, catches
internally. Closes the regression risk for the 5 awaited callers in
agent-to-agent, interactive, and approvals/response-handler that the
previous version left unwrapped. Router uses the boolean to stop the
typing indicator on transient failure; host-sweep just awaits.
- Tighten AUTH_REQUIRED_RE: anchor to start-of-string with the specific
`·` (U+00B7) separator the CLI uses, so an agent that quotes the
banner mid-sentence in a normal reply doesn't trip the classifier.
- Log a one-line note from writeAuthRequiredMessage so substitutions
are visible when debugging "user got the credentials message but I
don't see why."
- Add unit tests for ClaudeProvider.isAuthRequired covering both banner
variants, trailing content, mid-sentence quoting, leading-prose
quoting, alternate separators, and unrelated text.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Scratch agent uses fixed folder `_ping-test` so it can never collide
with a real agent on re-runs
- Added --folder flag to init-cli-agent.ts and cli-agent step wrapper
- Delete always targets `_ping-test` exactly — no re-derivation needed
- Removed normalizeName coupling and FOLDER status field (no longer needed)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- delete-cli-agent.ts discovers tables with agent_group_id dynamically
instead of hardcoding a list
- cli-agent step emits FOLDER in its status block so setup/auto.ts
reads it from the step result instead of re-deriving via normalizeName
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The "Terminal Agent" created for the connection test is now silently
deleted after a successful ping. If the user chooses to chat, a new
agent is auto-created as "{name}'s Terminal" — no name prompt needed.
Condensed the three-line ping section into a single "Connection verified."
status line.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Two related fixes for the case where credentials aren't usable:
1. Replace Claude Code's "Not logged in / Invalid API key · Please run
/login" output with a host-aware message. The user can't run /login
from chat, so the raw text is unhelpful. Provider gains an optional
isAuthRequired() classifier; the poll-loop substitutes the message
on both result-text and error paths.
2. Treat OneCLI gateway failure as a transient hard error instead of
spawning a credential-less container. The catch in container-runner
now propagates; router and host-sweep wrap wakeContainer to log and
leave the inbound row pending so the next 60s sweep tick retries.
Router also stops the typing indicator on failure.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Replace the hardcoded Approve/Ignore card with a multi-step flow:
- Single agent: "Connect to [name]" / "Connect new agent" / "Reject"
- Multiple agents: "Choose existing agent" (follow-up list) / "Connect new agent" / "Reject"
- "Connect new agent" prompts for a free-text name via DM, creates immediately on reply
- Add setMessageInterceptor router hook for capturing free-text replies
- Add resolveChannelName optional method to ChannelAdapter interface
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Wraps the word "assistant" in `accentGreen` (#3fba50, added in #2103)
across the six channel adapters that ask "What should your assistant
be called?" — Discord, iMessage, Signal, Slack, Telegram, WhatsApp.
Mirrors the green emphasis on "you" in the display-name prompt: the
green word names the subject of the question (assistant vs operator)
so the operator parses it at a glance.
Adds an `accentGreen` helper (#3fba50) with the same TTY/NO_COLOR/
truecolor gating as the rest of the palette, then wraps the word
"you" in the "What should your assistant call you?" prompt so the
operator parses at a glance who the question is about — the user,
not the assistant. The mirror prompt that asks for the assistant's
name ("What should your assistant be called?") is left for a
follow-up.
Customize `brightSelect`'s render function so the focused option's
label paints in brand cyan during selection and the submitted answer
paints in dim cyan after the user moves on. Inactive options keep
their default rendering — only the cursor and submitted state pick
up the color, matching the body-text emphasis added in #2101.
Also migrate the one remaining `p.select` call site (the "What next?"
prompt after the first chat) to `brightSelect` so every menu in the
setup flow goes through the same render path. The shape of the call
matches what `brightSelect` already supports — message + options
with value/label/hint — so no feature is lost in the swap.
Reuses `brandBody` from #2101 for the cyan, so the prompt highlight
and the body prose share one definition of the brand body color.
Adds a `brandBody` helper in setup/lib/theme.ts that wraps prose in
brand cyan (#2BB7CE), with the same TTY/NO_COLOR/truecolor gating used
by `brand`/`brandBold`/`brandChip`. The helper splits multi-line input
and colors each line independently so the SGR sequence doesn't bleed
across clack's gutter prefix.
Routing:
- `note()` (the un-dim card wrapper from #2095) now passes
`brandBody` as its `format` callback, so card bodies render
cyan line-by-line.
- Every prose `p.log.{message,info,success,step,warn}` call in the
setup flow wraps its body argument in `brandBody`. Calls whose
body is explicitly `k.dim(...)` (failure transcript tails, log
paths, claude-assist response previews) are left alone — those
are the "preview/debug" cases the dim-policy comment in
theme.ts already carves out.
- Spinner-finish lines in windowed-runner / claude-assist color
only the message portion; the `(5s)` elapsed suffix stays dim.
Brand cyan accents (chips, wordmark, inline emphasis) are unchanged.
This PR only adds the body color.
A follow-up will add OSC 11 dark/light detection so light-mode
terminals get a brand blue (#2b6fdc) variant — opt-in upgrade with
no regression for the dark-mode default.
When pasting an invalid token, the old value stayed in the input
field. Pasting a new token appended to the old one instead of
replacing it, causing repeated validation failures.
Add clearOnError: true to all 8 password prompts across setup.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
When re-running setup on a machine that already has a .env with
channel tokens or OneCLI config, detect them early and offer to
reuse instead of prompting the user to paste everything again.
- Add detectExistingEnv() to parse .env and group known keys
- Add detectExistingDisplayName() to read display name from v2.db
- Defer display name prompt until actually needed (cli-agent or channel)
- Skip cli-agent and first-chat when groups are already wired
- Add token reuse checks to Telegram, Discord, Slack, Teams, iMessage
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Clack's `p.note` defaults to `format: e => styleText("dim", e)`, which
fades note bodies regardless of the project's stated readability stance
(see comment on `dimWrap` in setup/lib/theme.ts: "prose renders at the
terminal's regular weight"). The dim styling makes body copy hard to
read on dark terminals and visibly washes out brand-colored segments
embedded in cards (e.g. the chip + bold heading rows).
Add a `note()` helper in setup/lib/theme.ts that wraps `p.note` with a
pass-through formatter, and route every setup-flow `p.note` call site
through it: setup/auto.ts, every setup/channels/*.ts adapter, and the
two setup/lib/claude-* helpers.
Pre-styled segments (brandBold, brandChip, formatPairingCard,
formatCodeCard) now render at full strength instead of being faded
alongside surrounding prose.
Use script(1) to capture PTY output and extract OAuth token when
browser-based auth isn't available, with fallback code-paste flow.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
When setup fails and claude-assist kicks in, instead of silently
skipping when the CLI is missing or unauthenticated, interactively
offer to install it (via install-claude.sh) and sign in (via
claude setup-token) so the user can get diagnostic help immediately.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- getDelay indexed by attempt (1-based) into a 0-indexed array, so the
leading 0 was unreachable and every "after a crash" delay was shifted
up one slot. Use attempt - 1 so the documented schedule (0s → 0s →
10s → 30s → 2min → 5min → 15min cap) actually holds.
- enforceStartupBackoff runs before initDb (which creates DATA_DIR), so
on a fresh checkout fs.writeFileSync hit ENOENT. write() now
mkdirSync's DATA_DIR first.
- shutdown() didn't run resetCircuitBreaker if teardownChannelAdapters
threw, so a graceful exit with a teardown error would be counted as a
crash on the next start. Wrap teardown in try/finally.
- Adds src/circuit-breaker.test.ts: state transitions, full schedule
(parameterized), reset-window expiry, malformed file, and the
fresh-install path.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Backs off on rapid restarts to avoid exhausting Discord gateway identify
limits and triggering Cloudflare IP bans. Resets on clean shutdown so only
crashes accumulate the counter. Also adds a troubleshooting section to
CLAUDE.md with the most useful diagnostic locations.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Slack interactive buttons (channel approval cards) require Interactivity
to be enabled in the app settings. Without it, button clicks silently
fail to reach the host. Added the step to both the setup wizard
post-install checklist and the add-slack SKILL.md.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Slack setup previously stopped after installing the adapter, leaving
users to manually discover /init-first-agent. When they DM'd the bot,
the channel-approval flow silently failed because no owner existed.
Now the Slack setup flow matches Discord/Telegram:
- Collects the operator's Slack member ID
- Opens a DM channel via conversations.open (requires im:write scope)
- Runs init-first-agent to establish ownership, wiring, and welcome DM
- Updates post-install note to focus on webhook URL (the only remaining step)
The welcome DM is delivered via chat.postMessage (outbound), which works
before Event Subscriptions are configured. The user sees the greeting
immediately; inbound replies require webhooks.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
setup/auto.ts spawned register-claude-token.sh via runInheritScript, which
inherits the parent Node process's PATH. When OneCLI was installed earlier
in the same setup run, its installer wrote the binary to ~/.local/bin and
appended a PATH line to the user's shell rc — but rc updates do not reach
an already-running process. The script's first guard, `command -v onecli`,
failed instantly (~3ms), and the auth step reported "Couldn't complete the
Claude sign-in" even though the real blocker was OneCLI not on PATH.
Patch process.env.PATH at the top of main() so every subsequent shell-out
sees ~/.local/bin. Idempotent — no-op if already present. Also drops a
duplicate `pollHealth` import that was lurking in the import block.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Slipped through during the #2035 rebase resolution — both #2030's import
and ours landed in the merge. TypeScript dedups by symbol so it didn't
fail the typecheck, but it's noise and would've eventually tripped a
linter rule.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The original approach passed ANTHROPIC_AUTH_TOKEN into the container
as an env var and disabled the proxy for the custom host (NO_PROXY) —
which works, but bypasses OneCLI entirely for that credential. The
container holds the raw secret, the gateway loses audit/rotation, and
we lose the rest of the vault's protections for this cohort.
OneCLI-native version: store the token as a generic secret with header
injection (--header-name Authorization --value-format 'Bearer {value}'
+ host-pattern matching the base URL hostname). The container only
needs ANTHROPIC_BASE_URL plus a placeholder ANTHROPIC_AUTH_TOKEN — the
proxy rewrites the Authorization header on the wire.
setup/lib/setup-config.ts — adds --anthropic-auth-token alongside the
existing --anthropic-base-url.
setup/auto.ts — runAuthStep short-circuits the auth-method prompt when
both NANOCLAW_ANTHROPIC_BASE_URL and NANOCLAW_ANTHROPIC_AUTH_TOKEN are
set: creates the OneCLI generic secret, writes ANTHROPIC_BASE_URL to
.env (so the runtime reads it), and appends `import './claude.js';` to
src/providers/index.ts (so the provider only registers when the user
has configured a custom endpoint — no branching for everyone else).
src/providers/claude.ts — drops ANTHROPIC_AUTH_TOKEN/NO_PROXY
passthrough. Reads ANTHROPIC_BASE_URL from .env, sets a placeholder
ANTHROPIC_AUTH_TOKEN in container env so the SDK includes an
Authorization header for OneCLI to overwrite.
src/providers/index.ts — removes the unconditional import; setup
appends it on demand.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Users with a custom Anthropic-compatible endpoint (ANTHROPIC_BASE_URL) were
getting 401s because the OneCLI proxy injects ANTHROPIC_API_KEY=placeholder
and forwards to api.anthropic.com, overriding the custom endpoint and key.
Add a claude provider host config that reads ANTHROPIC_BASE_URL,
ANTHROPIC_AUTH_TOKEN, and CLAUDE_CODE_DISABLE_NONESSENTIAL_TRAFFIC from .env
and passes them into the container. Also sets NO_PROXY for the custom host so
the OneCLI proxy doesn't intercept those requests.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Matches the OneCLI CLI's own format expectation ("oc_... format" per
`onecli auth login --help`) so a malformed token gets caught at setup
time rather than at first vault call.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Replaces the example.internal placeholder with the hosted gateway URL
so the advanced screen and --help suggest the canonical destination
out of the box.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Without `onecli auth login`, setup-time CLI calls (e.g. `secrets list`
inside anthropicSecretExists, `secrets create` in runPasteAuth) hit a
secured remote vault unauthenticated and fail silently — the auth step
sees no existing Anthropic credential and prompts the user to add one
even when it's already in the remote vault.
Two auth surfaces matter here: the CLI's persistent store via
`onecli auth login --api-key`, and ONECLI_API_KEY in .env that the
runtime SDK reads at request time. We need both.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds a single config registry that drives both CLI flags and an opt-in
advanced-settings screen, so power users can override defaults like
remote OneCLI host/token or alt Anthropic endpoints without burdening
the standard linear flow with extra prompts.
Why: advanced configurations didn't fit cleanly into the existing
sequenced setup. PR #2030 took the "add another prompt step" route for
remote OneCLI; this approach instead routes those overrides through a
single source of truth so adding the next knob (alt endpoint, custom
host pattern, …) doesn't mean another prompt-or-skip decision.
setup/lib/setup-config.ts — schema (typed entry list with surface
'flag' | 'flag+ui'), name derivation (camelCase → NANOCLAW_UPPER_SNAKE
+ --kebab-case), seeded with --onecli-api-host, --onecli-api-token,
--anthropic-base-url, plus existing NANOCLAW_SKIP / NANOCLAW_DISPLAY_NAME
as flag-only entries.
setup/lib/setup-config-parse.ts — argv parser (--key value, --key=value,
--no-bool, -- terminator), env reader, applyToEnv() bridge that writes
resolved values back to process.env so existing step code keeps reading
env vars unchanged. Also --help printer.
setup/lib/setup-config-screen.ts — interactive menu loop. Entries
render with current value as hint; selecting one opens the right prompt
type (text / password for secrets / confirm / brightSelect for enums);
"Done" returns to the main flow.
setup/auto.ts — parses argv first (--help short-circuits before any
render), folds env+flags into process.env, then offers a welcome menu:
"Standard setup" (default) vs "Advanced". The onecli step branches on
NANOCLAW_ONECLI_API_HOST: if set, skips the local-vs-fresh prompt
entirely, runs pollHealth pre-flight, then calls runQuietStep with
--remote-url. Token, when provided, writes through to ONECLI_API_KEY in
.env. Welcome copy tightened (drops the duplicate wordmark/tagline) so
the bash → clack handoff reads as one flow.
setup/onecli.ts — cherries the --remote-url implementation from PR
run()) and generalizes writeEnvOnecliUrl into a writeEnvVar helper so
ONECLI_API_KEY follows the same upsert path.
nanoclaw.sh — forwards "$@" to setup:auto so flags reach the parser;
trims the redundant "Setting up your personal AI assistant" subtitle
and the bootstrap teach line so the pre-clack section isn't competing
with the clack intro for the same role.
Token plumbing only fires in --remote-url mode; local installs are
unauthenticated against localhost and don't need it.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Allow connecting to an OneCLI gateway running on another host instead
of installing one locally. Adds a third choice ('Connect to a remote
OneCLI') alongside reuse/fresh in the setup wizard, prompts for the
remote URL, validates reachability before proceeding, and passes
--remote-url to the onecli step.
In onecli.ts: extracts installOnecliCliOnly() for the remote path
(installs the CLI binary but skips the gateway), exports pollHealth
for use by auto.ts, and handles --remote-url to configure api-host
and write ONECLI_URL to .env without running the full gateway install.
Absorbs battle-tested knowledge from the v2 skill into the upstream
add-signal: registration paths (new number + linked device), CAPTCHA
flow, VoIP SMS-first timing, Java prereq, config-lock warning, wiring
SQL for groups, not_member silent-drop fix, GroupV2 groupId extraction
note, and UUID-based platform ID format.
Corrects a factual error in the upstream: DM platform IDs are
signal:{UUID} (ACI), not phone numbers.
Removes the now-redundant add-signal-v2 skill.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
setup/register.ts had two bugs that prevented new channels from being
registered via `/manage-channels`:
1. createMessagingGroupAgent was called with the legacy field names
`trigger_rules` and `response_scope`. The SQL INSERT expects
`engage_mode` / `engage_pattern` / `sender_scope` / `ignored_message_policy`
(migration 010). Every register call failed with
`RangeError: Missing named parameter "engage_mode"` after the agent
and messaging group were partially created — leaving an orphaned pair.
Now mirrors scripts/init-first-agent.ts:wireIfMissing:
- Groups (is_group=1) default to engage_mode='mention' (bot only
responds when addressed).
- DMs (is_group=0) default to engage_mode='pattern' with '.' (respond
to every message).
- An explicit --trigger overrides the pattern regex.
2. The "normalize platform_id" block unconditionally prefixed
"<channel>:" even for native IDs like WhatsApp JIDs
("120363408974444974@g.us"), iMessage emails ("user@example.com"),
or Signal phones ("+15551234567") / Signal groups ("group:abc"). But
the router (src/router.ts:158) looks up messaging_groups by the raw
event.platformId from the adapter, which for these native adapters
never has a prefix. So the prefixed row was never matched — the
message was silently dropped with no "Message routed" log.
Extracted scripts/init-first-agent.ts:namespacedPlatformId into
src/platform-id.ts so both setup paths use the same heuristic (skip
the prefix for IDs containing '@', starting with '+', or starting
with 'group:'). Prevents future drift between the two paths.
Tested by: re-running `setup/index.ts --step register` for a WhatsApp
group JID, confirming the row is created with correct engage fields
and matching platform_id, then sending a test message and observing
"Message routed" with the right agent group.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds /add-gcal-tool — a sibling of /add-gmail-tool that installs
@cocal/google-calendar-mcp with the same OneCLI stub-file pattern. Skill
applies the Dockerfile + TOOL_ALLOWLIST changes at install time; trunk
stays clean so users who never run the skill don't carry the calendar
MCP in their image.
Dropped the Phase 5 dry-run section since it hardcoded a per-install
image tag slug and duplicated Phase 4's live agent test.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The upstream precedence fix (5845a5a) made agent_groups.agent_provider and
sessions.agent_provider authoritative for host-side provider contribution
(per-session mount, env passthrough), but those DB values don't propagate
into the group's container.json — and the in-container runner reads
`provider` from container.json, not from the DB. That caused a confusing
failure mode: flipping the DB column to 'codex', rebuilding, and
restarting still spawned a Claude runner because container.json had no
provider field. The old skill wording ("container receives AGENT_PROVIDER
from the resolved value") overstated the integration.
Update add-codex and add-opencode "Per group / per session" sections to
say: set `"provider": "<name>"` in the group's container.json — that's
the source the runner reads. Keep the DB columns documented for the
host-side contribution they actually drive, and spell out the
session → group → container.json → 'claude' fallback so the precedence
is still discoverable.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Phase 2 of the SKILL.md already contains the Dockerfile + TOOL_ALLOWLIST
edit instructions with an "ALREADY APPLIED" short-circuit. Keeping those
edits out of trunk means users who never run /add-gmail-tool don't carry
the Gmail MCP package in their image.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
On a fork PR, GITHUB_TOKEN is demoted to read-only regardless of the
workflow's permissions: block, so issues.addLabels() returns 403. The
label workflow silently works for PRs that skip the template (no
checkboxes ticked → no API call) and fails for PRs that actually
follow it — a hostile incentive against contributors who do the right
thing.
pull_request_target runs in the context of the base branch with full
declared permissions, which is the documented fix for this case. Safe
here because the workflow is metadata-only: it reads
context.payload.pull_request.body and calls addLabels. No checkout,
no PR-supplied code executes. A SECURITY comment is added above the
trigger to keep it that way.
Refs:
- https://docs.github.com/en/actions/reference/events-that-trigger-workflows#pull_request_target
- https://securitylab.github.com/resources/github-actions-preventing-pwn-requests/
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two long-line violations introduced in d121cd1 (isGroup plumbing)
exceed the printWidth limit. CI format:check fails on every PR
opened against main until this is fixed; the fix is isolated here
so no behavior change is mixed in.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Filenames in forwardAttachedFiles arrived from the source agent's
messages_out content and were used directly in path.join on both
source outbox read and target inbox write. A value like `../evil.sh`
could escape `inbox/<a2a-id>/` on the target session (and similarly
the source outbox on read), breaking session isolation — an
adversarial or hallucinating sub-agent could overwrite files in
a sibling session.
Adds isSafeAttachmentName(name) — exported so it's unit-testable —
which rejects empty, `.`, `..`, anything containing `/`, `\`, or
NUL, and anything path.basename would strip. Guard runs before any
I/O. Unsafe names are dropped with a warning log, same pattern as
missing-source-file handling; a bad filename in one attachment
doesn't kill the whole route's text delivery.
Addresses Codex Review P1 on qwibitai/nanoclaw#1967.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Before: `send_file(to='parent')` from a sub-agent wrote the bytes to
the sub-agent's own session outbox, but agent-to-agent routing copied
only the content JSON — the target's inbound message referenced
`files: ['x.png']` but the bytes lived in a session directory the
target couldn't mount. Parent agents orchestrating sub-agents (e.g.
Design Team delegating illustration work to an Illustrator sub-agent
on Codex) received file-reference messages with nothing to forward.
Fix: on route, if the source's content has `files`, copy each referenced
file from `<source>/outbox/<src-msg-id>/` to
`<target>/inbox/<a2a-msg-id>/`, and emit `attachments` (the existing
formatter convention — see formatter.ts:223) with `localPath` relative
to `/workspace/`. The target formatter already renders these as
`[file: <name> — saved to /workspace/inbox/<a2a-id>/<name>]`, so the
target agent sees the path and can call `send_file(path=…, to=…)` to
forward onward.
Convention matches what session-manager.ts:256 already does for
base64-encoded channel-inbound attachments — same inbox layout, same
content shape. Nothing on the formatter/agent side needed to change.
## Scope
- `forwardAttachedFiles(source, target)` — pure-ish helper that copies
files and returns the attachments array.
- `forwardFileAttachments(msg, …)` — wraps the helper for the route
path: parses content, copies files if present, merges into any
existing `attachments`, re-serialises.
- `routeAgentMessage` — uses the rewritten content when writing the
target's inbound row.
- Log line now includes `forwardedFileCount` for observability.
Missing source files are skipped with a warning rather than killing
the route — a bad filename in a batch shouldn't drop the
accompanying text.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Before, every provider stored its opaque continuation id under the
single outbound.db key `sdk_session_id`. Flipping a session's
agent_provider (e.g. Codex → Claude) meant the new provider read the
old provider's id at wake, handed it to its own SDK, and got a
"No conversation found" error that cost the user one sacrificed
message before the stale-session recovery path cleared the id.
This reshapes session_state so continuations are keyed
`continuation:<provider>` instead. Consequences:
- Per-provider continuations coexist. Flipping Claude → Codex → Claude
resumes the Claude thread exactly where it left off, with the
intervening Codex thread also still on file.
- No provider ever reads another provider's id. Switching costs no
sacrificed message and emits no transient error.
- Legacy installs are migrated forward on first startup:
migrateLegacyContinuation() adopts any pre-existing `sdk_session_id`
row into the current provider's slot (best guess — it was whichever
provider ran last), then deletes the legacy row unconditionally so
it can't poison a future provider's read.
runPollLoop now takes providerName alongside the provider instance,
and threads it through processQuery to setContinuation on init.
Tests: 9 new tests covering set/get isolation across providers,
clear-specificity, legacy-adoption, legacy-always-deleted,
prefer-existing-slot-over-legacy, and idempotency of a second
migration call.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds /add-gmail-tool — a Utility skill that installs Gmail as an MCP tool
in NanoClaw v2 using OneCLI for credential injection. No raw OAuth tokens
ever reach the container; the gateway swaps the "onecli-managed" stub
bearer for the real token at request time.
Scope (3 files):
- container/Dockerfile: pnpm global-install of
@gongrzhe/server-gmail-autoauth-mcp@1.1.11, pinned behind GMAIL_MCP_VERSION.
Also pins zod-to-json-schema@3.22.5 to avoid an ERR_PACKAGE_PATH_NOT_EXPORTED
crash: the MCP server's loose zod range resolves zod@3.24.x while
zod-to-json-schema@3.25.x imports the zod/v3 subpath that only exists in
zod>=3.25.
- container/agent-runner/src/providers/claude.ts: adds 'mcp__gmail__*' to
TOOL_ALLOWLIST so the agent can invoke the server's tools.
- .claude/skills/add-gmail-tool/SKILL.md: pre-flight checks (OneCLI Gmail app
connected, stubs present, mount allowlist covers ~/.gmail-mcp, agent
secret-mode), per-group wiring in container.json (mount + mcpServers),
verification steps, troubleshooting, removal instructions. Credits to
gongrzhe for the MCP server and the add-atomic-chat-tool / add-vercel
skill patterns.
Addresses #1500 (proxy Gmail OAuth through credential proxy) on the Gmail
side. Overlaps in intent with #1810 but stays surgical — no bundled
unrelated changes.
Tested end-to-end on Linux/Docker: CLI and WhatsApp self-chat agents can
list labels, search/read/send mail via OneCLI-injected tokens.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Migration 010-engage-modes (replace trigger_rules + response_scope with
engage_mode/engage_pattern/sender_scope/ignored_message_policy) updated
the schema and the production code paths, but missed setup/register.ts.
The step still constructed a payload with the dropped columns. On any
fresh v2 install, attempting to register a channel via:
pnpm exec tsx setup/index.ts --step register -- --platform-id ...
fails with: `Missing named parameter "engage_mode"`. This affects every
flow that calls the register step — the /add-<channel> skills,
/manage-channels, and the setup auto driver.
Map old → new:
- trigger_rules.pattern (string) → engage_mode='pattern',
engage_pattern=<pattern>
- requiresTrigger=false (no pattern) → engage_mode='pattern',
engage_pattern='.' (the "always" sentinel from migration 010)
- requiresTrigger=true (no pattern) → engage_mode='mention'
- response_scope='all' → sender_scope='all',
ignored_message_policy='drop' (conservative default matching the
migration backfill rule)
Tested by registering three Telegram channels (one DM, two groups) on a
fresh v2 install — all succeeded.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
resolveProviderContribution read only containerConfig.provider (from each
group's container.json) and ignored both agent_groups.agent_provider and
sessions.agent_provider. The provider-install skills (opencode, codex)
and CLAUDE.md document those DB columns as the source of truth with
session-overrides-group precedence, but the code never consulted them —
so setting `agent_provider = 'codex'` on a group had no effect, and the
only way to route to a non-default provider was to edit the per-group
JSON directly. Discovered while wiring up Codex: DB update landed but
the spawned container kept running Claude.
Extract a pure `resolveProviderName(session, group, containerConfig)`
with the documented precedence:
sessions.agent_provider
→ agent_groups.agent_provider
→ container.json `provider`
→ 'claude'
`resolveProviderContribution` now calls it. The container.json fallback
stays so existing installs that only set provider in JSON keep working.
Empty strings treated as unset to avoid footguns when a DB-backed form
writes '' for "no override."
Added unit tests covering precedence, null-fallthrough, empty-string
fallthrough, and case normalization.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
`bash nanoclaw.sh` can now offer Signal as a channel choice, scan the
signal-cli link QR in the terminal, and wire up the first agent end to
end — mirroring the WhatsApp and Telegram flows.
Pieces:
- setup/add-signal.sh — non-interactive installer. Fetches
src/channels/signal.ts + signal.test.ts from the channels branch,
appends the self-registration import, installs qrcode (for the
setup-flow QR render), and builds. Idempotent and standalone-runnable.
- setup/signal-auth.ts — step runner. Spawns `signal-cli link --name
NanoClaw`, watches stdout for the `sgnl://linkdevice?…` (or legacy
`tsdevice://`) URL, emits SIGNAL_AUTH_QR with it. On exit 0, runs
`signal-cli -o json listAccounts` and reports the new account via
SIGNAL_AUTH STATUS=success. Pre-check via listAccounts returns
STATUS=skipped if an account is already linked.
- setup/channels/signal.ts — interactive driver. Probes for signal-cli
(offering `brew install signal-cli` on macOS or linking GitHub
releases on Linux if missing), runs add-signal.sh, renders each
SIGNAL_AUTH_QR block as a terminal QR inside a clack spinner,
persists SIGNAL_ACCOUNT to .env + data/env/env, restarts the
service, then wires the first agent via init-first-agent.
- setup/index.ts: register `signal-auth` in the STEPS map.
- setup/auto.ts: add 'signal' to ChannelChoice, import the driver,
add it to the channel picker (after WhatsApp, hint "needs signal-cli
installed"), branch the dispatch, and map channelDmLabel.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signal adapter source (src/channels/signal.ts + signal.test.ts) now
lives on the `channels` branch alongside all other channel adapters,
per the trunk/channels split documented in CLAUDE.md and CONTRIBUTING.md
("Trunk does not ship any specific channel adapter"). The /add-signal
skill fetches the file from origin/channels like every other channel.
This PR to main therefore carries only:
- .claude/skills/add-signal/{SKILL,VERIFY,REMOVE}.md — the skill itself
- scripts/init-first-agent.ts — unrelated infra fix that benefits any
native-ID channel (Signal, WhatsApp) by skipping the channel-prefix
on platform IDs that already have their own format
The fixed adapter source + tests were pushed to the channels branch in
a parallel commit.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
getAskQuestionRender used to hardcode the card title and option labels
for pending_channel_approvals and pending_sender_approvals in the
DB-access layer, duplicating wording that already lived in the approval
modules. That caused a visible drift between the initial card title —
picked per event in channel-approval.ts ("📣 Bot mentioned in new chat"
vs. "💬 New direct message") — and the post-click render, which
always showed the constant "📣 Channel registration".
Mirror the pattern already used by pending_approvals: add title /
options_json columns on both pending_*_approvals tables via migration
013, have the approval modules write them at creation time, and let
getAskQuestionRender just SELECT.
- Migration 013 ALTERs the two tables to add title + options_json.
- PendingChannelApproval / PendingSenderApproval types and their
create functions grow the two fields.
- channel-approval.ts / sender-approval.ts normalize options once
and pass both title and options_json into the insert.
- getAskQuestionRender drops the hardcoded render objects and reads
the stored values.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Correctness fixes:
- parseSignalStyles now uses a recursive walker so nested styles (e.g.
**bold with `code` inside**) produce correct offsets against the final
plain text. Previous impl recorded styles against intermediate text and
didn't reindex when later passes stripped prefix characters.
- *single-asterisk* maps to ITALIC (was BOLD, divergent from standard
Markdown). _underscore_ also maps to ITALIC.
- EchoCache keys on (platformId, text) so an outbound "hi" to Alice no
longer drops a real "hi" inbound from Bob.
- On TCP socket close, flip adapter connected=false and log a warning so
operators see lost daemon connections instead of silently failing sends.
- signalTcpCheck clears its 5s timeout on success so successful checks
don't leak a setTimeout handle.
Config hygiene:
- Rename SIGNAL_HTTP_HOST/PORT to SIGNAL_TCP_HOST/PORT (transport is TCP
JSON-RPC, not HTTP) and add SIGNAL_CLI_PATH for non-PATH installs.
- Remove unused readFileSync import.
- Log a warning in deliver() when outbound files are dropped (native
adapter doesn't forward attachments to signal-cli yet).
Tests:
- Nested style offset correctness
- *italic* and _italic_ ITALIC mapping
- Cross-recipient echo isolation
- Same-recipient echo still suppressed
- isConnected() flips on socket close
- Outbound-files warn-and-drop path
SKILL.md realigned to the add-telegram / add-whatsapp template: fetches
from the `channels` branch (not a `skill/*` branch), lists pre-flight
idempotency checks, adds Features / Troubleshooting sections. Added
VERIFY.md and REMOVE.md siblings.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Native Signal adapter using signal-cli TCP JSON-RPC daemon. No Chat SDK
bridge or npm dependencies — uses only Node.js builtins.
Features:
- DM and group message support
- Voice message detection (placeholder text; transcription via
/add-voice-transcription skill)
- Typing indicators (DMs only)
- Mention detection via text match
- Managed daemon lifecycle (auto-start/stop signal-cli)
- Echo suppression for outbound messages
Also fixes init-first-agent.ts to skip channel-prefixing for phone
numbers (+...) and Signal group IDs (group:...), which are native
platform IDs that adapters send without a channel prefix.
Install via /add-signal skill. Uses /init-first-agent for channel wiring.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The prior instruction told users to append "@openai/codex@${CODEX_VERSION}" to
a single combined `pnpm install -g` block. That block no longer exists on
main — the Dockerfile splits each global CLI (vercel, agent-browser,
claude-code) into its own RUN layer for cache granularity. Update the skill
to add a standalone RUN block for Codex that matches the existing pattern.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Mirrors the /add-opencode and /add-ollama-provider pattern. Copies the
add-codex SKILL.md from the providers branch onto trunk so the skill is
discoverable without a manual branch copy.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
createPendingQuestion and createPendingApproval both run before the
adapter delivery call. When delivery fails and the retry loop reinvokes
deliverMessage with the same questionId/approvalId, the second attempt
hit UNIQUE constraint on the pending_questions.question_id (or
pending_approvals.approval_id) and threw — so the retry never reached
the send step, and every subsequent retry failed the same way until
max-attempts marked the message permanently failed.
Switch both inserts to INSERT OR IGNORE. Return bool indicating whether
a new row was actually inserted so delivery.ts can avoid logging
"Pending question created" twice for the same card.
Symptom that surfaced this: a send-layer ValidationError on one attempt
followed by SqliteError on every subsequent attempt, with the user
seeing neither the card nor a follow-up. Seen in conjunction with the
Telegram 64-byte callback_data limit (fixed separately in
#1942/chat-sdk-bridge), but the idempotency gap applies to any
transient delivery failure — rate limits, network blips, adapter 5xx —
and is worth fixing on its own.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
ask_question cards failed to deliver on Telegram whenever any option had
a non-trivial value (e.g. an ISO datetime, a URL, or a long token).
Telegram limits inline-keyboard callback_data to 64 bytes, and the
previous encoding embedded both the questionId and the full option
value in each button's actionId plus a second copy as value, producing
payloads well over the cap. The adapter threw ValidationError, delivery
was marked permanently failed, and the agent sat waiting on an answer
that never reached the user.
Fix:
- Button id is now `ncq:<questionId>:<index>` and button value is the
stringified index. Callback payloads shrink from ~100 bytes to ~40
and fit Telegram's cap for any option list with <100 items.
- Both callback-decode sites (Chat SDK `onAction` for Telegram/Slack/
etc., and the Discord Gateway interaction handler) resolve the
index back to the real option value via
`getAskQuestionRender(questionId)` before dispatching to the host's
onAction — so response handlers (pending_questions, pending_approvals)
are unchanged and still receive the canonical value.
- `resolveSelectedOption` helper has a backward-compat fallback:
non-numeric tails are treated as literal values so any card
delivered under the old encoding still resolves if the user clicks
it after deploy.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Pre-commit prettier reformatted this in the working tree but didn't
re-stage. Keeping it in a separate commit to avoid amending a prior
commit.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
When a container exits with an unresolved processing_ack claim, the
sweep's crashed-container cleanup would reset the matching inbound
message with tries++ and a future process_after. dueCount then dropped
to 0, so the wake step never fired — and the next sweep tick found the
same orphan claim, bumped tries again, and pushed process_after further
out. The message reached MAX_TRIES and was marked failed without any
container ever being spawned.
Two changes:
1. Reorder sweep so the wake step runs before crashed-container
cleanup. A fresh container clears orphan 'processing' rows on its
own startup (container/agent-runner/src/db/connection.ts), so once
we get it running the claim resolves itself.
2. Make resetStuckProcessingRows idempotent: if a message already has
process_after set to a future time, skip the retry bump. The wake
path will pick it up when the backoff elapses. Requires returning
process_after from getMessageForRetry.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
After a container exits, its .heartbeat file is left behind with the
mtime of its last SDK activity. When the same session spawns a new
container, the host sweep's ceiling check reads that stale mtime and
kills the freshly-spawned container within seconds — before the new
instance has had time to touch the file itself.
The sweep already has a carve-out for "no heartbeat file" (treated as a
fresh spawn, given grace), so simply removing the orphan at spawn time
restores the intended semantics.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Align the environment check with the v2 setup flow so existing wired agent groups are detected from data/v2.db instead of the retired v1 store. This prevents setup from reporting no registered groups on valid v2 installs and adds regression coverage for both v2 and pre-migration state.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The initial /add-atomic-chat-tool merge added src edits directly to main.
That conflicts with the utility-skill pattern used elsewhere (e.g. /claw):
the skill folder should ship the file and SKILL.md should instruct copy +
idempotent edits at install time, not a git merge that carries src diffs.
- Move container/agent-runner/src/atomic-chat-mcp-stdio.ts →
.claude/skills/add-atomic-chat-tool/atomic-chat-mcp-stdio.ts
- Revert the atomic_chat mcpServers entry in agent-runner index.ts
- Revert mcp__atomic_chat__* from TOOL_ALLOWLIST in providers/claude.ts
- Revert ATOMIC_CHAT_* env forwarding and [ATOMIC] log elevation in
src/container-runner.ts
- Empty .env.example back out
- Rewrite SKILL.md: copy the shipped file, then apply deterministic Edits
(index.ts, providers/claude.ts, container-runner.ts, .env.example)
with exact before/after snippets the installer agent can match.
Main is now back to its pre-PR state for the tool; /add-atomic-chat-tool
re-applies everything at install time.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Exposes local Atomic Chat models (OpenAI-compatible API at
127.0.0.1:1337/v1) as tools to the container agent. Adds
atomic_chat_list_models and atomic_chat_generate alongside
the existing Ollama skill.
Rebased on current main:
- MCP server registered in agent-runner index.ts using bun (no tsc
step in-image), sibling path to index.ts, env: {} with ATOMIC_CHAT_*
forwarded when set.
- allowedTools entry moved to providers/claude.ts TOOL_ALLOWLIST.
- SKILL.md: drop obsolete per-group copy step (single RO mount
supersedes it); use pnpm build.
Made-with: Cursor
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
v2's Chat SDK Discord adapter emits `platform_id` as
`discord:<guild_id>:<channel_id>` at runtime, but v1 only stored
`dc:<channel_id>` (no guild). Before this fix `migrate-db` wrote
`discord:<channel_id>` into `messaging_groups.platform_id`, which didn't
match what v2 saw on incoming messages — v2 treated every message as a
new channel and fired its channel-registration approval flow instead of
routing to the migrated agent_group.
Now `migrate-db` fetches the bot's guilds once per channel_type via
`GET /users/@me/guilds`. When the bot is in exactly one guild (the
common case), the guild id is spliced into every Discord platform_id at
seed time — matching v2's runtime format. Multi-guild bots fall back to
the v1-format id; v2's channel-registration flow repairs on first
message.
Cost: one extra Discord API call per migration run (not per channel).
No new failure modes — network/auth issues return null, fall through to
the existing behavior.
## Surface
- `v2PlatformId(channelType, jid, { guildId })` — new optional `extra`
parameter. Back-compat with existing callers.
- `fetchBotGuilds(channelType, lookup)` — new helper in `shared.ts`,
same pattern as `autoResolveV2Keys`. Handles Discord today; extending
to other channels is a case-by-case API check.
- `migrate-db` pre-loop: builds `v1EnvMap`, fetches guilds per channel
type, caches single-guild IDs for the row loop.
## Testing
Verified on a 300-channel Discord v1 install:
- Fresh run produced `discord:<guild>:<channel>` platform_ids from the
start
- Incoming messages now route to the migrated agent_group instead of
firing the unwire approval flow
Rate-limit note: `/users/@me/guilds` is a single call. Per-channel
`/guilds/<id>/channels` lookups for multi-guild bots would need proper
rate-limit handling — deferred.
`migrate-channel-auth` now tries to derive v2-required keys that v1 never
stored by calling the channel's API with the credential v1 did have. When
the gap can be closed automatically, the keys land in v2 `.env` before
the missing-required check, and the step reports `success` instead of
`partial`. When it can't, the existing followup fires unchanged.
## Discord
v1 used raw `discord.js` (bot token only). v2's Chat SDK needs
`DISCORD_APPLICATION_ID` + `DISCORD_PUBLIC_KEY`. Both can be fetched with
the bot token via:
GET /oauth2/applications/@me
Authorization: Bot <DISCORD_BOT_TOKEN>
→ { id, verify_key, … }
For a stock v1 Discord user, this means `bash nanoclaw.sh` now produces
a fully working v2 Discord adapter with zero manual key-setting — just
stop v1, and v2 takes over.
## Surface
- `autoResolveV2Keys(channelType, lookup)` in `setup/migrate-v1/shared.ts`
— pluggable per-channel resolver, returns a `{key: value}` map. Never
throws; returns `{}` on any failure (network, auth, unexpected shape).
Logs keys resolved, never values.
- `migrate-channel-auth` wiring: build a lookup over v1 + v2 .env, call
the resolver, append resolved keys to v2 .env (never overwriting), sync
to `data/env/env`, then re-check `requiredV2Keys` to compute the real
gap. Sidecar annotation `(auto-resolved)` on `env_keys_copied` in the
handoff so the skill can tell which came from v1 vs derived.
## Extending to other channels
Slack has `/auth.test` (bot token → team/app info), Telegram has `/getMe`,
Matrix has `/whoami`. Most don't cover the full required-key set v2 needs
(e.g. Slack's `SLACK_SIGNING_SECRET` lives only in app config and has no
API equivalent). Add resolvers case-by-case when the API supports it; the
registry's `requiredV2Keys` + followup fallback covers the rest.
## Testing
- Stripped `DISCORD_APPLICATION_ID` + `DISCORD_PUBLIC_KEY` from v2 `.env`
- Re-ran migration (wired-only, 301 groups): resolver populated both keys
via the API; `migrate-channel-auth: success` (was `partial`);
`overall_status: success`
- Restarted v2: Discord adapter booted clean, Gateway connected,
`GUILD_CREATE` received
- v1 stopped, v2 handling Discord traffic
`bash nanoclaw.sh` detects a v1 install before channel pairing and does a
best-effort automated port of operationally important state. Hands off to
a new `/migrate-from-v1` skill for owner seeding and fork customizations.
Between the timezone and channel steps, `setup/auto.ts` calls
`runMigrateV1()` which orchestrates these registered sub-steps (each a
separate entry in the progression log with its own raw log + status
block — failures never abort the chain):
- **migrate-detect** — scans siblings of the v2 checkout + common $HOME
locations; `$NANOCLAW_V1_PATH` overrides authoritatively. Relaxed
`package.json` check lets forks + partial installs still match; DB
presence is the strongest signal.
- **migrate-validate** — asserts v1 DB shape (tables + required
columns); writes `schema-mismatch.json` on failure. Subsequent steps
short-circuit their DB-dependent parts but still run.
- **migrate-db** — seeds `agent_groups` + `messaging_groups` +
`messaging_group_agents` from v1's `registered_groups`. JID
decomposition (`dc:123` → `channel_type='discord'`,
`platform_id='discord:123'`); `trigger_pattern` + `requires_trigger`
→ `engage_mode` + `engage_pattern` (mirrors migration 010 backfill).
Users + user_roles are NOT seeded — the skill does that with an owner
interview. Idempotent: existing rows reused, not duplicated.
- **migrate-groups** — rsync group folders. v1 `CLAUDE.md` → v2
`CLAUDE.local.md` (v2 composes `CLAUDE.md` at container spawn); v1
`container_config` JSON → `.v1-container-config.json` sidecar for the
skill to translate. Tight v1-pattern scan (`/workspace/ipc/tasks`,
`store/messages.db`, `[PR_CONTEXT:`, etc.) flags files referencing
v1-specific infrastructure — content is NOT modified, just flagged in
the handoff.
- **migrate-env** — merges v1 `.env` into v2 `.env`, never overwriting
existing v2 keys.
- **migrate-channel-auth** — per-channel registry tracks v1 env keys,
v2 required keys (with source-of-key instructions — e.g. Discord
needs `DISCORD_PUBLIC_KEY` which v1 never stored), and candidate
on-disk auth state paths (Baileys keystore, matrix sync state,
etc.). Missing required v2 keys surface as actionable followups and
flip the step to `partial`.
- **migrate-channels** — runs `setup/install-<channel>.sh` for each
detected channel in non-interactive mode. Install-script output is
captured to `logs/setup-migration/install-<channel>.log` sidecars
(silent under the parent spinner). Channels with no v2 adapter get
a `not_supported` followup but don't degrade status.
- **migrate-tasks** — v1 `scheduled_tasks` → `messages_in` rows with
`kind='task'` in each session's `inbound.db`. `schedule_type`
mapping (cron / interval / once → v2 cron). Idempotent: skips v1
task ids already present. Inactive rows dumped to
`inactive-tasks.json` for reference.
Everything writes to `logs/setup-migration/handoff.json` — the source
of truth the skill consumes.
`.claude/skills/migrate-from-v1/SKILL.md`:
- **Phase A** (always): owner seeding + v1 access policy flip
(`unknown_sender_policy` public/strict) via `AskUserQuestion`. Pulls
sender candidates from v1's `messages` table as hints.
- **Phase B** (if followups exist): walks
`handoff.followups` — translates `.v1-container-config.json`
sidecars, handles `not_supported` channels, fills in missing
required keys with instructions on where to get them.
- **Phase C** (fork-aware): `git log <upstream>..HEAD` in v1. Empty →
"no customizations to port." Non-empty → scope choice (mechanical /
full interview / reference-only). Portable categories
(`container/skills/*`, `.claude/skills/*`, docs) scan+copy with
`scanForV1Patterns`. Non-portable (`src/*`,
`container/agent-runner/src/*`) stash to `docs/v1-fork-reference/`
— explicit "don't translate v1 infra to v2" warning because v1's
IPC file queue / single DB don't exist in v2.
Clearly marked in README, CLAUDE.md, SKILL.md header, and via a `p.warn`
that fires once per run when v1 is detected. Users with no v1 install
see a silent skip — no prompts, no noise.
Verified end-to-end against a live v1 install (300 discord + 1
discord-supervisor groups, fork with ~15 commits of PR-factory work):
- Detect → validate → db (301 rows seeded) → groups (301 CLAUDE.local.md
+ 178 other files + 1 container_config sidecar) → env (4 keys copied)
→ channel-auth (flagged missing `DISCORD_APPLICATION_ID` +
`DISCORD_PUBLIC_KEY`) → channels (discord installed, discord-supervisor
→ not_supported) → tasks (0 rows, skipped)
- Idempotent re-run: 0 rows created, 903 rows reused; tasks skip if
id already present
- Fresh-user case: silent skip, no prompts, straight to "You're ready!"
- Schema-mismatch case: recorded to `schema-mismatch.json`, chain
continues
- Unit tests for the pure transforms (`parseJid`,
`inferChannelType`, `triggerToEngage`, `scanForV1Patterns`,
`looksLikeV1Install`)
- Validate `requiredV2Keys` for telegram/slack/matrix/teams/webex/
resend/linear against the actual Chat SDK packages (Discord was
verified from real error output)
- Widen candidate auth file paths for WhatsApp/Matrix/iMessage based
on real non-Discord v1 installs once we have some
See docs/v1-to-v2-changes.md for the v1 → v2 architecture diff.
getAskQuestionRender only checked pending_questions and
pending_approvals, missing the channel and sender approval tables.
Approval button clicks showed the raw value ("approve") instead of
the selectedLabel ("✅ Wired"). Extend the lookup to also check
pending_channel_approvals and pending_sender_approvals.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Channel and sender approval cards showed raw platform IDs
(e.g. discord:1475578393738219540:...) instead of readable context.
Extract sender name from the event content for channel approvals,
and use the channel type name for sender approvals.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The router hardcoded is_group=0 when auto-creating messaging groups,
causing channel mentions to be misclassified as DMs. The Chat SDK
bridge knows which handler fired (onDirectMessage vs onNewMention)
so thread the signal through InboundMessage → InboundEvent → router.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Discord puts the clicking user at interaction.member.user for guild
interactions but interaction.user for DM interactions. The Gateway
handler only checked interaction.member, so DM button clicks resolved
to an empty user ID and were silently rejected as unauthorized.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Slack: interactive driver walks through app creation, validates the
bot token via auth.test, installs the adapter, and prints a
post-install checklist for the webhook URL + Event Subscriptions
config. No welcome DM since Slack needs a public URL before inbound
events work — the driver's own "finish in Slack" note replaces the
outro "check your DMs" banner.
iMessage: picks local (macOS) vs remote (Photon) mode. Local mode
opens the node binary's directory in Finder so the user can drag it
into Full Disk Access. Remote mode prompts for Photon URL + API key.
Asks for the operator's phone/email, then wires the first agent
including a welcome iMessage.
Both marked "(experimental)" in the askChannelChoice picker.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two installs on the same host could trash each other's containers: the
reaper used `docker ps --filter name=nanoclaw-`, a substring match that
picked up every install's containers. A crash-looping peer (e.g. a legacy
v1 plist respawning ~6k times) would call cleanupOrphans on every boot and
kill the healthy install's session containers within seconds of spawn.
- Stamp `--label nanoclaw-install=<slug>` onto every spawned container.
- cleanupOrphans filters by that label; healthy peers are left alone.
- Setup preflight enumerates `com.nanoclaw*` launchd plists / nanoclaw
user systemd units, probes state/runs, and unloads any that are
crash-looping (state != running AND runs > 10) before installing
this install's service.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Three UX tweaks after watching a user walk through setup:
1. Claude-assist "Run this command?" now defaults to Yes. After Claude has already been asked to diagnose + explained the fix, the vast majority of users want to run it — the No-default added friction without proportional safety.
2. claude-assist persists its session across failures in one setup run. First invocation captures session_id from the stream-json init event; subsequent invocations pass --resume <id>. Claude sees prior failures as conversation history instead of treating each hiccup as a blank-slate ticket.
3. First-chat flow no longer drops the user into a free-text chat loop by default. Instead: explain what the ping/pong check is doing, wait for the pong, then offer "Continue with setup" (recommended, default) or "Pause here and chat with your agent from the terminal" (opt-in). The free-text loop is still reachable, just not the default path.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The upstream onecli.sh/cli/install script resolves the latest release via
api.github.com/repos/onecli/onecli-cli/releases/latest — anonymous callers
get throttled to 60 req/hour per IP, and once exhausted the installer dies
with "curl: (56) 403 / Error: could not determine latest release". Shared
IPs (corporate NAT, public Wi-Fi) hit this without ever running the
installer themselves. Reproduced locally: rate_limit remaining=0 → upstream
installer returns the exact user error.
Fallback path when upstream fails:
1. Resolve version via `curl -fsSL -o /dev/null -w '%{url_effective}' \
https://github.com/onecli/onecli-cli/releases/latest`. That endpoint
302s to /tag/vX.Y.Z — parses the version without an API call.
2. If the redirect probe also fails, install a pinned fallback version
(ONECLI_CLI_FALLBACK_VERSION, currently 1.3.0).
3. Download the archive from /releases/download/vX.Y.Z/… directly (the
CDN path isn't API-throttled), extract, and install to /usr/local/bin
or ~/.local/bin mirroring upstream's install-dir logic.
Gateway install (onecli.sh/install, docker-compose based) is untouched —
it doesn't hit the API.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Dimmed explanatory prose blocks were hard to read against dark terminals. Shift the weight ladder up a notch:
- dimWrap() no longer dims. Multi-line prose (the step-intro copy, etc.) renders at the terminal's regular weight.
- Spinner outcome labels (done/failed/skipped) are now bold via runUnderSpinner, so each step's headline reads stronger than the body copy around it.
- Un-dim two command-hint blocks in auto.ts (docker-group setfacl + service restart; the socket-error remediation commands) — those are commands the user may need to type.
Dim is still used where it helps — (Ns) spinner timings, URLs, short inline parentheticals — and for the preview/debug blocks dim is explicitly reserved for: dumpTranscriptOnFailure tail and claude-assist streams.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
When corepack enable fails with EACCES (common when Node is installed to a system-writable prefix like /usr/local that the user doesn't own), we fall back to `npm install -g pnpm`. But npm's global prefix isn't always on the shell's PATH — users often set `npm config set prefix ~/.npm-global` to avoid sudo, and the resulting bin dir isn't picked up by `command -v`. Install succeeded, but pnpm "wasn't there" for the follow-up `pnpm install`.
Now after the npm fallback we query `npm config get prefix` and prepend `<prefix>/bin` to PATH. Mirror the same lookup in nanoclaw.sh right before `exec pnpm run setup:auto` — setup.sh's PATH mutation doesn't propagate back, and the hand-off needs pnpm visible too.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Container step: duration hint + 3-line rolling output window with
60s stall detector that offers "keep waiting" vs "ask Claude"
- First chat: reframed as a try-out with sandbox-model explainer
(wakes on message, sleeps when idle, context persists)
- Timezone: auto-detected non-UTC zones now get an explicit
confirm from the user instead of silent persist
- Outro: added always-on warning + prominent "check your DM" banner
when a channel was configured; directive last line
- Discord: always show token-location reminder even when user says
they have one; new "do you have a server?" branch walks through
server creation if not
- All select prompts: custom brightSelect renderer keeps inactive
option labels at full brightness (was dim gray); adds @clack/core
as a direct dep
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two NanoClaw installs on the same host used to fight over the shared `com.nanoclaw` launchd label / `nanoclaw.service` systemd unit and the `nanoclaw-agent:latest` docker tag — the second install silently rewrote the service pointer and rebuilt the image out from under the first. Introduces a deterministic per-checkout slug (sha1(projectRoot)[:8]) and namespaces everything off it:
- Service: `com.nanoclaw-v2-<slug>` / `nanoclaw-v2-<slug>.service`
- Image: `nanoclaw-agent-v2-<slug>:latest` (base), `nanoclaw-agent-v2-<slug>:<agentGroupId>` (per-group)
New shared helpers: src/install-slug.ts (host) + setup/lib/install-slug.sh (bash). Both compute the same slug so verify/probe/add-*.sh/build.sh/container-runner all agree. Any v1 `com.nanoclaw` service left on the host stays untouched and can coexist.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Before: setup/onecli.ts ran `curl -fsSL onecli.sh/install | sh` unconditionally. For users with OneCLI already running and bound to a specific listener (host-accessible, shared with other apps), re-running the installer rebound the gateway and broke those consumers.
Now: auto.ts probes for an existing install (`onecli version` + `onecli config get api-host`). If detected, clack asks: use the existing instance (recommended) or install a fresh one. The new --reuse flag in the onecli step skips the installer, reads the configured api-host, writes ONECLI_URL to .env, and moves on without touching the running gateway.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Forks that keep the upstream nanoclaw repo under a non-origin remote name (typically `upstream`, with `origin` pointing at the user's fork) hit "git fetch origin channels failed" when adding a channel, because the fork doesn't carry the channels branch. New setup/lib/channels-remote.sh walks `git remote -v` for a url matching qwibitai/nanoclaw, auto-adds `upstream` if none matches, and honors NANOCLAW_CHANNELS_REMOTE as an override. Wired into the four add-*.sh scripts that setup:auto invokes (discord, telegram, whatsapp, teams).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Some Node installs (older nvm, node@22 keg-only on brew, minimal distro packages) don't ship corepack, so the bootstrap was dying with "corepack: command not found" before pnpm could land on PATH. Now guards the corepack call and falls back to `npm install -g pnpm@<pinned>`, reading the version from package.json's packageManager field.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Deletes the Claude-orchestrated /setup and /new-setup flows. The scripted installer (bash nanoclaw.sh → setup:auto) now handles bootstrap, container, OneCLI, auth, service, first agent, and optional channel wiring end-to-end with inline Claude-assisted recovery on failure. Keeps /setup as a one-line redirect so the trigger still resolves. Drops the opt-out diagnostics files that belonged to the old flow and updates cross-refs in add-wechat, migrate-nanoclaw, and update-nanoclaw.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Ground-up v2 rewrite supersedes all conflicting files from main. The one main-side fix (ONECLI_API_KEY forwarding, 8b5b581) already landed on v2 as 3db66c0. README preview banner dropped since v2 is now main.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Trunk ships no channel adapters — /add-telegram installs the package on demand from the channels branch. This dependency was stale and pulled ~2 transitive packages into every fresh install.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Removes transient analysis/proposal/checklist docs whose purpose is served once v2 ships: REFACTOR.md, docs/v1-vs-v2/, docs/checklist.md, docs/shared-source.md, docs/claude-md-composition.md, docs/module-contract.md, docs/DEBUG_CHECKLIST.md. Updates CLAUDE.md and docs/README.md index rows accordingly.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Shared bash + node emitter in setup/lib/diagnostics.{sh,ts} reads/writes data/install-id so every event from a single install shares one distinct_id — bash-side setup_launched/setup_start, node-side auto_started, per-step started/completed, auth_method_chosen, channel_chosen, first_chat_ready/failed, setup_incomplete, setup_aborted, setup_completed. Opt-out via NANOCLAW_NO_DIAGNOSTICS=1.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds a structured drip-feed of capabilities (memory, agents, scheduling, research, code, UI, files, self-customization) and explicit sections on approvals, access control, and natural interaction.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
signal-sdk launches signal-cli without --config, so attachments land at
~/.local/share/signal-cli/attachments/ (XDG default) rather than
data/signal/. Document this in the Channel Info section and add a
troubleshooting entry explaining the symptom (voice messages silently
skipped, no transcript), how to confirm (ps aux | grep signal-cli), and
the automatic fallback the adapter uses.
Adds the /add-signal-v2 skill: a native Signal channel adapter wrapping
signal-sdk (signal-cli under the hood). No bot API — NanoClaw registers
as a full Signal account on a dedicated number or as a linked device.
Features: text, group & DM routing, voice transcription via whisper.cpp,
attachments, emoji reactions, @mention detection, quote-reply detection.
Troubleshooting note updated: GroupV2 group ID lives at
envelope.dataMessage.groupV2.id — not groupInfo.groupId (GroupV1/legacy).
Adds a callout for the Chat SDK + approval dialogs preview with a
collapsible containing the fork/checkout instructions.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
description: Add Atomic Chat MCP server so the container agent can call local models served by the Atomic Chat desktop app via its OpenAI-compatible API.
---
# Add Atomic Chat Integration
This skill adds a stdio-based MCP server that exposes models running in the local [Atomic Chat](https://github.com/AtomicBot-ai/Atomic-Chat) desktop app as tools for the container agent. Claude remains the orchestrator but can offload work to local models served by Atomic Chat on `http://127.0.0.1:1337/v1` (OpenAI-compatible).
Tools exposed:
-`atomic_chat_list_models` — list models currently available in Atomic Chat (`GET /v1/models`)
-`atomic_chat_generate` — send a prompt to a specified model and return the response (`POST /v1/chat/completions`)
Model management (download, delete) is done through the **Atomic Chat desktop UI** — the app is a fork of Jan and manages its own model library.
The skill ships the MCP server source in this folder and copies it into the agent-runner tree at install time, then wires it up with small edits to `index.ts`, `providers/claude.ts`, and `container-runner.ts`. No branch merge — all edits are additive and idempotent.
## Phase 1: Pre-flight
### Check if already applied
Check if `container/agent-runner/src/atomic-chat-mcp-stdio.ts` exists. If it does, skip to Phase 3 (Configure).
### Check prerequisites
Verify Atomic Chat is installed and its local API server is running. On the host:
```bash
curl -s http://127.0.0.1:1337/v1/models | head
```
If the request fails:
1. Install Atomic Chat from the [latest release](https://github.com/AtomicBot-ai/Atomic-Chat/releases) (macOS only for now — `atomic-chat.dmg`).
2. Open the app.
3. Open **Settings → Local API Server** and make sure it's enabled on port `1337`.
4. Go to the **Hub** (or **Models**) tab and download at least one model (e.g. Llama 3.2 3B, Qwen 2.5 Coder 7B).
5. Load the model once by sending any message in Atomic Chat's UI to warm it up.
Edit `container/agent-runner/src/providers/claude.ts`. Find `'mcp__nanoclaw__*',` in the `TOOL_ALLOWLIST` array and add `'mcp__atomic_chat__*',` on the following line:
```ts
'mcp__nanoclaw__*',
'mcp__atomic_chat__*',
];
```
### Forward host env vars into the container
Edit `src/container-runner.ts` in `buildContainerArgs`. Find the `TZ` env line:
```ts
args.push('-e',`TZ=${TIMEZONE}`);
```
Add ATOMIC_CHAT forwarding right after it:
```ts
args.push('-e',`TZ=${TIMEZONE}`);
// Atomic Chat MCP tool: forward host overrides if set (default is host.docker.internal:1337).
By default, the MCP server connects to `http://host.docker.internal:1337` (Docker Desktop) with a fallback to `localhost`. To use a custom host, add to `.env`:
Atomic Chat does **not require authentication** when running locally — leave this unset. Only set it if you've put Atomic Chat behind a reverse proxy that enforces auth:
> Send a message like: "use atomic chat to tell me the capital of France"
>
> The agent should use `atomic_chat_list_models` to find available models, then `atomic_chat_generate` to get a response.
### Check logs if needed
```bash
tail -f logs/nanoclaw.log | grep -i atomic
```
Look for:
-`[ATOMIC] Listing models...` — list request started
-`[ATOMIC] Found N models` — models discovered
-`[ATOMIC] >>> Generating with <model>` — generation started
-`[ATOMIC] <<< Done: <model> | Xs | N tokens | M chars` — generation completed
## Troubleshooting
### Agent says "Atomic Chat is not installed" or tries to run a CLI
The agent is looking for a CLI that doesn't exist instead of using the MCP tools. This means:
1. The MCP server wasn't copied — check `container/agent-runner/src/atomic-chat-mcp-stdio.ts` exists
2. The MCP server wasn't registered — check `container/agent-runner/src/index.ts` has the `atomic_chat` entry in `mcpServers`
3. The allowlist wasn't updated — check `container/agent-runner/src/providers/claude.ts` includes `mcp__atomic_chat__*` in `TOOL_ALLOWLIST`
4. The container wasn't rebuilt — run `./container/build.sh`
### "Failed to connect to Atomic Chat"
1. Verify the host API is reachable: `curl http://127.0.0.1:1337/v1/models`
2. Confirm the Local API Server is enabled in Atomic Chat's settings
3. Check Docker can reach the host: `docker run --rm curlimages/curl curl -s http://host.docker.internal:1337/v1/models`
4. If using a custom host, check `ATOMIC_CHAT_HOST` in `.env`
### `model not found` / 404 on generate
The model ID passed to `atomic_chat_generate` must exactly match one of the IDs returned by `atomic_chat_list_models`. Ask the agent to list models first, then pick one from that list.
### Slow first response
Atomic Chat lazy-loads models into memory on first use. The initial call may take longer while the model warms up. Subsequent calls against the same model are fast.
### Agent doesn't use Atomic Chat tools
The agent may not know about the tools. Try being explicit: "use the atomic_chat_generate tool with llama3.2-3b-instruct to answer: ..."
### Context window or output size issues
Atomic Chat respects each model's native context length. If you hit limits, pass `max_tokens` explicitly when calling `atomic_chat_generate`, or switch to a model with a larger context window in the Atomic Chat UI.
text:`Failed to connect to Atomic Chat at ${ATOMIC_CHAT_HOST}: ${errinstanceofError?err.message : String(err)}`,
},
],
isError: true,
};
}
},
);
server.tool(
'atomic_chat_generate',
'Send a prompt to a local Atomic Chat model and get a response. Good for cheaper/faster tasks like summarization, translation, or general queries. Use atomic_chat_list_models first to see available models.',
{
model: z
.string()
.describe(
'The model ID as returned by atomic_chat_list_models (e.g. "llama3.2-3b-instruct")',
),
prompt: z.string().describe('The prompt to send to the model'),
system: z
.string()
.optional()
.describe('Optional system prompt to set model behavior'),
temperature: z
.number()
.optional()
.describe('Sampling temperature (0.0–2.0). Defaults to model default.'),
max_tokens: z
.number()
.optional()
.describe('Maximum number of tokens to generate in the response.'),
},
async(args)=>{
log(`>>> Generating with ${args.model} (${args.prompt.length} chars)...`);
writeStatus('generating',`Generating with ${args.model}`);
description: Use Codex (CLI + AppServer) as the full agent provider — planning, tool orchestration, native compaction, MCP tools, session resume — in place of the Claude Agent SDK. ChatGPT subscription or OPENAI_API_KEY. Per-group via agent_provider. Distinct from using OpenAI as an MCP tool (where Claude remains the planner).
---
# Codex agent provider
NanoClaw runs agents in a long-lived **poll loop** inside the container. The backend is selected with **`AGENT_PROVIDER`** (`claude` | `opencode` | `codex` | `mock`).
Trunk ships with only the `claude` provider baked in. This skill copies the Codex provider files in from the `providers` branch, wires them into the host and container barrels, updates the Dockerfile to install the Codex CLI, and rebuilds the image.
The Codex provider runs `codex app-server` as a child process and speaks JSON-RPC over stdio. That gives it native session resume, streaming events, MCP tool access, and `thread/compact/start` compaction — same feature bar as the Claude Agent SDK, without the Anthropic-only lock-in.
## Install
### Pre-flight
If all of the following are already present, skip to **Configuration**:
-`import './codex.js';` line in `src/providers/index.ts`
-`import './codex.js';` line in `container/agent-runner/src/providers/index.ts`
-`ARG CODEX_VERSION` and `"@openai/codex@${CODEX_VERSION}"` in the pnpm global-install block in `container/Dockerfile`
Missing pieces — continue below. All steps are idempotent; re-running is safe.
### 1. Fetch the providers branch
```bash
git fetch origin providers
```
### 2. Copy the Codex source files
Wholesale copies (owned entirely by this skill — user edits to these files won't survive a re-run, as designed):
```bash
git show origin/providers:src/providers/codex.ts > src/providers/codex.ts
git show origin/providers:container/agent-runner/src/providers/codex.ts > container/agent-runner/src/providers/codex.ts
git show origin/providers:container/agent-runner/src/providers/codex-app-server.ts > container/agent-runner/src/providers/codex-app-server.ts
git show origin/providers:container/agent-runner/src/providers/codex.factory.test.ts > container/agent-runner/src/providers/codex.factory.test.ts
```
### 3. Append the self-registration imports
Each barrel gets one line — alphabetical placement keeps diffs small.
`src/providers/index.ts`:
```typescript
import'./codex.js';
```
`container/agent-runner/src/providers/index.ts`:
```typescript
import'./codex.js';
```
### 4. Add the Codex CLI to the container Dockerfile
Two edits to `container/Dockerfile`, both idempotent (skip if already present):
**(a)** In the "Pin CLI versions" ARG block (around line 18), add after `ARG CLAUDE_CODE_VERSION=...`:
```dockerfile
ARGCODEX_VERSION=0.124.0
```
**(b)** Add a new standalone `RUN` block for the Codex CLI, after the existing per-CLI install blocks (around line 106, right after the `@anthropic-ai/claude-code` block). The Dockerfile splits each global CLI into its own layer for cache granularity — keep that pattern; do not collapse them into a single combined `pnpm install -g` call:
```dockerfile
RUN --mount=type=cache,target=/root/.cache/pnpm \
pnpm install -g "@openai/codex@${CODEX_VERSION}"
```
Note: **no agent-runner package dependency** — Codex is a CLI binary, not a library. Unlike OpenCode, there's nothing to add to `container/agent-runner/package.json`.
Codex supports two primary auth paths and one experimental BYO-endpoint path. Pick the one that matches your setup.
### Option A — ChatGPT subscription (recommended for individuals)
On the host (not inside the container), run Codex's OAuth login:
```bash
codex login
```
This writes `~/.codex/auth.json` with a subscription token. The host-side Codex provider ([src/providers/codex.ts](../../../src/providers/codex.ts)) copies `auth.json` into a per-session `~/.codex` directory mounted into the container — your host's own Codex CLI is never touched.
No `.env` variables required for this mode.
### Option B — API key (recommended for CI or API billing)
```env
OPENAI_API_KEY=sk-...
CODEX_MODEL=gpt-5.4-mini
```
The host forwards both variables into the container. If both subscription (`auth.json`) and `OPENAI_API_KEY` are present, Codex prefers the subscription.
### Option C — BYO OpenAI-compatible endpoint (experimental)
Codex's built-in `openai` provider honors the `OPENAI_BASE_URL` env var directly. Point it at any OpenAI-compatible endpoint — Groq, Together, self-hosted vLLM, an OpenAI proxy, etc.
```env
OPENAI_API_KEY=...
OPENAI_BASE_URL=https://api.groq.com/openai/v1
CODEX_MODEL=llama-3.3-70b-versatile
```
Codex also ships first-class local-runner flags — `codex --oss --local-provider ollama` or `--local-provider lmstudio` — that auto-detect a local server. To use those inside NanoClaw, set `CODEX_MODEL` to a model your local runner serves and add the corresponding base URL; see the Codex CLI docs for the full `model_provider = oss` configuration.
**Experimental caveat:** tool-calling quality depends on the model and endpoint. Not every OpenAI-compat provider implements the full function-calling spec, and smaller models (< 30B) often struggle with multi-step tool orchestration. Test before committing.
### Per group / per session
Set `"provider": "codex"` in the group's **`container.json`** (`groups/<folder>/container.json`) — the in-container runner reads `provider` from there, not from the DB. The DB columns **`agent_groups.agent_provider`** and **`sessions.agent_provider`** (session overrides group) only drive host-side provider contribution — per-session `~/.codex` mount, `OPENAI_*` / `CODEX_MODEL` env passthrough — and do not propagate into `container.json` at spawn time. Set both, or just edit `container.json`; if they disagree, the runner uses `container.json` and the host-side resolver falls back through session → group → `container.json` → `'claude'`.
`CODEX_MODEL` applies process-wide via `.env`; if you need different models for different groups, set them via `container_config.env` on the group.
Extra MCP servers still come from **`NANOCLAW_MCP_SERVERS`** / `container_config.mcpServers` on the host. The runner merges them into the same `mcpServers` object passed to all providers.
## Operational notes
- **Spawn-per-query:** Codex's app-server is spawned fresh per query invocation, matching the OpenCode pattern. No long-lived daemon to keep healthy across sessions.
- **Per-session `~/.codex` isolation:** each group gets its own copy of the host's `auth.json`. The container can rewrite `config.toml` freely on every wake without touching the host's Codex config.
- **Native compaction:** kicks in automatically at 40K cumulative input tokens between turns, via `thread/compact/start`. If compaction fails, the provider logs and continues uncompacted — no fatal error.
- **Approvals:** auto-accepted inside the container (the container is the sandbox; same posture as Claude/OpenCode).
- **Mid-turn input:** Codex turns don't accept mid-turn messages. Follow-up `push()` calls queue and drain between turns, matching the OpenCode pattern. The poll-loop only pushes between turns anyway, so no messages are dropped.
- **Stale thread recovery:** `isSessionInvalid` matches on stale-thread-ID errors (`thread not found`, `unknown thread`, etc.) so a cold-started app-server can recover cleanly when it sees a stored continuation it no longer has.
cd container/agent-runner && bun test src/providers/codex.factory.test.ts &&cd -
```
After the image rebuild, set `agent_provider = 'codex'` on a test group and send a message. Successful round-trip looks like:
-`init` event with a stable thread ID as continuation
- One or more `activity` / `progress` events during the turn
-`result` event with the model's reply
If the agent hangs or errors, check `~/.codex/auth.json` exists on the host (Option A) or that `OPENAI_API_KEY` is forwarding correctly (Option B) — `docker exec` into a running container and `env | grep -i openai` to confirm.
To fully remove all account data including DeltaChat encryption keys:
```bash
rm -rf dc-account/
```
> **Warning:** This deletes the Autocrypt keys. Contacts who have verified your bot's key will need to re-verify if the same email address is re-used with a new account.
To keep the account for later reinstall, leave `dc-account/` intact.
## 5. Remove the package (optional)
```bash
pnpm remove @deltachat/stdio-rpc-server
```
## Verification
After removal, confirm the adapter is no longer starting:
```bash
grep "deltachat" logs/nanoclaw.log | tail -5
```
Expected: no `Channel adapter started` entry after the last restart.
description: Add DeltaChat channel integration via @deltachat/stdio-rpc-server. Native adapter — no Chat SDK bridge. Email-based messaging with end-to-end encryption.
---
# Add DeltaChat Channel
The adapter drives the `@deltachat/stdio-rpc-server` JSON-RPC subprocess directly — pure Node.js against the DeltaChat core library. Messages are delivered over email with Autocrypt/OpenPGP encryption.
## Install
### Pre-flight (idempotent)
Skip to **Credentials** if all of these are already in place:
-`@deltachat/stdio-rpc-server` is listed in `package.json` dependencies
Otherwise continue. Every step below is safe to re-run.
### 1. Fetch the channels branch
```bash
git fetch origin channels
```
### 2. Copy the adapter
```bash
git show origin/channels:src/channels/deltachat.ts > src/channels/deltachat.ts
```
### 3. Append the self-registration import
Append to `src/channels/index.ts` (skip if already present):
```typescript
import'./deltachat.js';
```
### 4. Install the adapter package (pinned)
```bash
pnpm install @deltachat/stdio-rpc-server@2.49.0
```
### 5. Build
```bash
pnpm run build
```
## Account Setup
A dedicated email account is strongly recommended — it will accumulate DeltaChat-formatted messages and store encryption keys. Not all providers work well with DeltaChat; check https://providers.delta.chat/ before picking one.
**Default security modes:** IMAP uses SSL/TLS (port 993), SMTP uses STARTTLS (port 587). Both are configurable via `.env` — see Credentials below.
Security settings are applied on every startup, so changing them in `.env` and restarting takes effect without wiping the account.
Sync to container: `mkdir -p data/env && cp .env data/env/env`
### Optional settings
The following are read from the process environment (not `.env`). To override them, add `Environment=` lines to the systemd service unit or your launchd plist:
| Variable | Default | Description |
|----------|---------|-------------|
| `DC_ACCOUNT_DIR` | `dc-account` | Directory for DeltaChat account data (IMAP state, keys, blobs) |
| `DC_DISPLAY_NAME` | `NanoClaw` | Bot display name shown in DeltaChat |
| `DC_AVATAR_PATH` | _(none)_ | Absolute path to avatar image; set at startup only |
The `/set-avatar` command (send an image with that caption) is the easiest way to set the avatar at runtime without modifying the service file. Only users with `owner` or global `admin` role can use it.
On first start the adapter configures the email account (IMAP/SMTP credentials, calls `configure()`). Subsequent starts skip straight to `startIo()`. Account data is stored in `dc-account/` in the project root (or your `DC_ACCOUNT_DIR`).
## Wiring
### DMs
**DeltaChat contacts cannot be added by email alone** — to start a chat, the user must open the bot's invite link in their DeltaChat app or scan its QR code. This triggers the SecureJoin handshake.
#### Step 1 — Get the invite link
After the service starts, the adapter logs the invite URL and writes a QR SVG:
```bash
grep "invite link" logs/nanoclaw.log | tail -1
# url field contains the https://i.delta.chat/... invite link
# also written to dc-account/invite-qr.svg (or $DC_ACCOUNT_DIR/invite-qr.svg)
```
The invite URL is stable (tied to the bot's email and encryption keys) so it stays valid across restarts.
#### Step 2 — Add the bot in DeltaChat
Two options for the user to connect:
- **Link**: Copy the `https://i.delta.chat/...` URL and open it on the device running DeltaChat. The app recognises it and shows a "Start chat" prompt.
- **QR code**: Open `dc-account/invite-qr.svg` in a browser or image viewer, display it on screen, and scan it from the DeltaChat app using the QR-scan button on the new-chat screen.
After accepting, DeltaChat exchanges keys and creates the chat automatically.
#### Step 3 — Wire the chat to an agent
Once the first message arrives the router auto-creates a `messaging_groups` row. Look up the chat ID:
```bash
pnpm exec tsx scripts/q.ts data/v2.db \
"SELECT platform_id, name FROM messaging_groups WHERE channel_type='deltachat' AND is_group=0 ORDER BY created_at DESC LIMIT 5"
```
Then run `/init-first-agent` — it creates the agent group, grants the user owner access, and wires the messaging group in one step:
```bash
pnpm exec tsx scripts/init-first-agent.ts \
--channel deltachat \
--user-id deltachat:user@example.com \
--platform-id <platform_id from above> \
--display-name "Your Name"
```
### Groups
Add the bot email to a DeltaChat group. When any member sends a message, the router creates a `messaging_groups` row with `is_group = 1`. Run `/manage-channels` to wire it to an agent group.
## Next Steps
If you're in the middle of `/setup`, return to the setup flow now.
Otherwise, run `/init-first-agent` to create an agent and wire it to your DeltaChat DM (see Wiring above), or `/manage-channels` to wire this channel to an existing agent group.
## Channel Info
- **type**: `deltachat`
- **terminology**: DeltaChat calls them "chats" (1:1 DMs) and "groups"
- **supports-threads**: no — DeltaChat has no thread model
- **platform-id-format**: numeric chat ID as a string (e.g. `"12"`) — the DeltaChat core's internal chat identifier
- **user-id-format**: `deltachat:{email}` — the contact's email address
- **how-to-find-id**: Send a message from DeltaChat to the bot email, then query `messaging_groups` as shown above
- **typical-use**: Personal assistant over DeltaChat DMs; small groups where participants use DeltaChat
- **default-isolation**: One agent per bot identity. Multiple chats with the same operator can share an agent group; groups with other people should typically use `isolated` session mode
### Features
- File attachments — inbound and outbound; inbound waits up to 30 seconds for large-message download to complete
- Invite link logged on every startup — URL + QR SVG written to `dc-account/invite-qr.svg`; see Wiring for the bootstrap flow
-`/set-avatar` — send an image with this caption to change the bot's DeltaChat avatar (admin/owner only)
- Connectivity watchdog — restarts IO if IMAP goes quiet for 20 minutes or connectivity drops below threshold for two consecutive 5-minute checks
- Network nudge — `maybeNetwork()` called every 10 minutes to recover from prolonged idle
Not supported: DeltaChat reactions, message editing/deletion, read receipts.
### Connectivity model
`isConnected()` returns `true` when the internal connectivity value is ≥ 3000:
- App password not generated — Gmail and some others require this when 2FA is enabled
- Port/security mismatch — defaults are port 993 + SSL/TLS for IMAP and port 587 + STARTTLS for SMTP; override with `DC_IMAP_PORT`/`DC_IMAP_SECURITY` or `DC_SMTP_PORT`/`DC_SMTP_SECURITY` in `.env`
### Provider uses SMTP port 465 (SSL/TLS) instead of 587
Set `DC_SMTP_SECURITY=1` and `DC_SMTP_PORT=465` in `.env`, then restart.
### Messages not arriving
1. Check the service is running and the adapter started: `grep "Channel adapter started.*deltachat" logs/nanoclaw.log`
3. Check the sender has been granted access — run `/init-first-agent` to create their user record and wire the chat
4. Verify the messaging group is wired: `pnpm exec tsx scripts/q.ts data/v2.db "SELECT mg.platform_id, mga.agent_group_id FROM messaging_groups mg JOIN messaging_group_agents mga ON mg.id = mga.messaging_group_id WHERE mg.channel_type='deltachat'"`
"SELECT id, platform_id, name FROM messaging_groups WHERE channel_type='deltachat' ORDER BY created_at DESC LIMIT 5"
```
If a row appears, the inbound routing is working. If not, the adapter isn't receiving the message — check logs for `DeltaChat: error handling incoming message`.
## 5. Verify user access
If the message arrived but the agent didn't respond, the sender may not have access:
```bash
pnpm exec tsx scripts/q.ts data/v2.db "SELECT id, display_name FROM users WHERE id LIKE 'deltachat:%'"
```
Grant access as shown in the SKILL.md "Grant user access" section.
1. NanoClaw running: `launchctl list | grep nanoclaw` (macOS) / `systemctl --user status nanoclaw` (Linux)
2. Messaging group wired: `sqlite3 data/v2.db "SELECT mg.platform_id, ag.folder FROM messaging_groups mg JOIN messaging_group_agents mga ON mg.id = mga.messaging_group_id JOIN agent_groups ag ON ag.id = mga.agent_group_id WHERE mg.channel_type = 'emacs'"`
2. Messaging group wired: `pnpm exec tsx scripts/q.ts data/v2.db "SELECT mg.platform_id, ag.folder FROM messaging_groups mg JOIN messaging_group_agents mga ON mg.id = mga.messaging_group_id JOIN agent_groups ag ON ag.id = mga.agent_group_id WHERE mg.channel_type = 'emacs'"`
3. Logs show inbound: `grep 'channel_type=emacs\|Emacs' logs/nanoclaw.log | tail -20`
If no messaging group row exists, run the `register` command above.
@@ -282,15 +285,18 @@ If an agent outputs org-mode directly, markers get double-converted and render i
# systemctl --user restart $(systemd_unit) # Linux
# Remove the NanoClaw block from your Emacs config
# Optionally clean up the messaging group:
sqlite3 data/v2.db "DELETE FROM messaging_group_agents WHERE messaging_group_id IN (SELECT id FROM messaging_groups WHERE channel_type='emacs'); DELETE FROM messaging_groups WHERE channel_type='emacs';"
pnpm exec tsx scripts/q.ts data/v2.db "DELETE FROM messaging_group_agents WHERE messaging_group_id IN (SELECT id FROM messaging_groups WHERE channel_type='emacs'); DELETE FROM messaging_groups WHERE channel_type='emacs';"
description: Add Google Calendar as an MCP tool (list calendars, list/search/create events, free/busy queries) using OneCLI-managed OAuth. Multi-calendar and multi-account supported. Mirrors /add-gmail-tool's stub pattern — no raw credentials ever reach the container; OneCLI injects real tokens at request time.
---
# Add Google Calendar Tool (OneCLI-native)
This skill wires [`@cocal/google-calendar-mcp`](https://github.com/cocal-com/google-calendar-mcp) into selected agent groups. The MCP server reads stub credentials containing the `onecli-managed` placeholder; the OneCLI gateway intercepts outbound calls to `calendar.googleapis.com` / `oauth2.googleapis.com` and swaps the bearer for the real OAuth token from its vault.
**Why this package (and not gongrzhe's):**`@gongrzhe/server-calendar-autoauth-mcp` only supports the `primary` calendar and exposes 5 tools (no `list_calendars`). `@cocal/google-calendar-mcp` explicitly supports multi-calendar and multi-account, and is actively maintained.
Tools exposed (surfaced as `mcp__calendar__<name>`, exact set depends on version — run `tools/list` against the MCP server to enumerate): `list-calendars`, `list-events`, `search-events`, `create-event`, `update-event`, `delete-event`, `get-event`, `list-colors`, `get-freebusy`, `get-current-time`, plus multi-account management tools.
**Why this pattern:** v2's invariant is that containers never receive raw API keys (CHANGELOG 2.0.0). Same stub pattern `/add-gmail-tool` uses. This skill is deliberately a sibling, not a combined "Google Workspace" skill — installs independently and removes cleanly.
## Phase 1: Pre-flight
### Verify OneCLI has Google Calendar connected
```bash
onecli apps get --provider google-calendar
```
Expected: `"connection": { "status": "connected" }` with scopes including `calendar.readonly` and `calendar.events`.
If not connected, tell the user:
> Open the OneCLI web UI at http://127.0.0.1:10254, go to Apps → Google Calendar, and click Connect. Sign in with the Google account the agent should act as. `calendar.readonly` + `calendar.events` are the minimum useful scopes.
### Verify stub credentials exist
The stub lives at `~/.calendar-mcp/` by convention (shared with `/add-gmail-tool`'s sibling). cocal doesn't default to this path (it uses `~/.config/google-calendar-mcp/tokens.json`) — we override via env vars below so it reads our stubs instead.
```bash
ls -la ~/.calendar-mcp/gcp-oauth.keys.json ~/.calendar-mcp/credentials.json 2>&1
Edit `container/Dockerfile`. Find the pinned-version ARG block and add:
```dockerfile
ARGCALENDAR_MCP_VERSION=2.6.1
```
If `/add-gmail-tool` has already been applied, the pnpm global-install block already exists with its `zod-to-json-schema@3.22.5` pin. Just append the calendar package — **the calendar-mcp uses `zod@4.x` and does NOT need that pin**, but it's harmless to share the block:
**No `TOOL_ALLOWLIST` edit needed.**`container/agent-runner/src/providers/claude.ts` derives the allow-pattern dynamically from each group's `mcpServers` map (`Object.keys(this.mcpServers).map(mcpAllowPattern)`), so registering `calendar` in Phase 3 automatically allows `mcp__calendar__*`. Earlier versions of this skill instructed a static `TOOL_ALLOWLIST` edit — that's now redundant.
### Rebuild the container image
```bash
./container/build.sh
```
## Phase 3: Wire Per-Agent-Group
For each agent group, persist two changes to the **central DB** (`data/v2.db`): the `mcpServers.calendar` entry and an `additionalMounts` entry for `.calendar-mcp`. Both flow through `materializeContainerJson` on every spawn, so editing `groups/<folder>/container.json` by hand does **not** stick — that file is regenerated from the DB.
### Register the MCP server
For each chosen `<group-id>` (use `ncl groups list` to enumerate):
Approval behaviour depends on where you run it: from inside an agent's container `ncl` write verbs are approval-gated (admin approves before it lands); from a host operator shell with full scope, it executes immediately. Either way, the response tells you which path it took.
### Add the `.calendar-mcp` mount
There is no `ncl groups config add-mount` verb yet (tracked in [#2395](https://github.com/nanocoai/nanoclaw/issues/2395)). Until that ships, edit the DB directly via the in-tree wrapper (`scripts/q.ts` — `setup/verify.ts:5` codifies that NanoClaw avoids depending on the `sqlite3` CLI binary, so don't shell out to it):
```bash
GROUP_ID='<group-id>'
HOST_PATH="$HOME/.calendar-mcp"
MOUNT=$(jq -cn --arg h "$HOST_PATH"'{hostPath:$h, containerPath:".calendar-mcp", readonly:false}')
SET additional_mounts = json_insert(additional_mounts, '\$[#]', json('$MOUNT')), \
updated_at = datetime('now') \
WHERE agent_group_id = '$GROUP_ID';"
```
Run from your NanoClaw project root (where `data/v2.db` lives). The `$[#]` placeholder is SQLite JSON1's append-to-end notation; it's `\$`-escaped so bash doesn't arithmetic-expand it before sqlite sees it. `updated_at` is ISO-string everywhere else in the schema, so use `datetime('now')` — not `strftime('%s','now')`, which would silently mix epoch ints into a column of YYYY-MM-DD HH:MM:SS strings.
**Switch to `ncl groups config add-mount` once #2395 lands.** Update this skill at that time.
`containerPath` is relative (mount-security rejects absolute paths — additional mounts land at `/workspace/extra/<relative>`).
**Why this can't be `groups/<folder>/container.json`:** post-migration `014-container-configs`, `materializeContainerJson` in `src/container-config.ts` rewrites that file from the DB on every spawn. Anything hand-edited there is silently overwritten on next restart.
**Same-group-as-gmail tip:** if this group already has the gmail MCP + `.gmail-mcp` mount, both coexist — `ncl groups config add-mcp-server` only updates the named entry, and `json_insert` appends to `additional_mounts` without disturbing existing entries.
-`command not found: google-calendar-mcp` → image not rebuilt.
-`ENOENT ...credentials.json` → mount missing. Check the mount allowlist.
-`401 Unauthorized` from `*.googleapis.com` → OneCLI isn't injecting; verify agent's secret mode and that Google Calendar is connected.
- Agent says "I don't have calendar tools" → the `calendar` MCP server isn't registered in this group's `mcpServers` (re-run the `ncl groups config add-mcp-server` step in Phase 3 for that group and restart it), or the agent-runner image is stale (`./container/build.sh`, `--no-cache` if suspicious).
## Removal
1. For each group that had Calendar wired, remove the MCP server from the DB:
```bash
ncl groups config remove-mcp-server --id <group-id> --name calendar
```
2. Remove the `.calendar-mcp` mount from the DB (no `remove-mount` verb yet — same #2395 dependency):
- **Why not gongrzhe:** earlier versions of this skill used `@gongrzhe/server-calendar-autoauth-mcp@1.0.2` which only supports the primary calendar with 5 event-level tools. The cocal server supersedes it.
- **Skill pattern:** direct sibling of [`/add-gmail-tool`](../add-gmail-tool/SKILL.md); same OneCLI stub mechanism.
description: Add Gmail as an MCP tool (read, search, send, label, draft) using OneCLI-managed OAuth. The agent gets Gmail tools in every enabled group; OneCLI injects real tokens at request time so no raw credentials are ever in the container or on disk in usable form.
---
# Add Gmail Tool (OneCLI-native)
This skill wires the [`@gongrzhe/server-gmail-autoauth-mcp`](https://www.npmjs.com/package/@gongrzhe/server-gmail-autoauth-mcp) stdio MCP server into selected agent groups. The MCP server reads stub credentials containing the `onecli-managed` placeholder; the OneCLI gateway intercepts outbound calls to `gmail.googleapis.com` and injects the real OAuth bearer from its vault.
Tools exposed (from `gmail-mcp@1.1.11`, surfaced to the agent as `mcp__gmail__<name>`): `search_emails`, `read_email`, `send_email`, `draft_email`, `delete_email`, `modify_email`, `batch_modify_emails`, `batch_delete_emails`, `download_attachment`, `list_email_labels`, `create_label`, `update_label`, `delete_label`, `get_or_create_label`, `list_filters`, `get_filter`, `create_filter`, `create_filter_from_template`, `delete_filter`.
**Why this pattern:** v2's invariant is that containers never receive raw API keys — OneCLI is the sole credential path (see CHANGELOG v2.0.0). The stub-file pattern satisfies this: the container sees `"onecli-managed"` placeholders, the gateway swaps them in flight.
## Phase 1: Pre-flight
### Verify OneCLI has Gmail connected
```bash
onecli apps get --provider gmail
```
Expected: `"connection": { "status": "connected" }` with scopes including `gmail.readonly`, `gmail.modify`, `gmail.send`.
If not connected, tell the user:
> Open the OneCLI web UI at http://127.0.0.1:10254, go to Apps → Gmail, and click Connect. Sign in with the Google account you want the agent to act as.
### Verify stub credentials exist
```bash
ls -la ~/.gmail-mcp/gcp-oauth.keys.json ~/.gmail-mcp/credentials.json 2>&1
If either file exists but does **not** contain `onecli-managed`, **STOP** and tell the user — these are real OAuth credentials from a previous non-OneCLI install. Back them up, then delete before proceeding. The OneCLI migration normally handles this; if it didn't, something is wrong.
`~/.gmail-mcp` must sit under an `allowedRoots` entry (e.g. `/home/<user>`). If it doesn't, tell the user to run `/manage-mounts` first or add their home directory.
### Check agent secret-mode
For each target agent group, confirm OneCLI will inject Gmail secrets into its container. Find the OneCLI agent ID that matches the group's `agentGroupId`:
```bash
onecli agents list
```
If that agent's `secretMode` is `all`, you're done — Gmail secrets (identified by OneCLI's Gmail hostPattern) will auto-inject. If it's `selective`, explicitly assign the Gmail secrets using the safe merge pattern (`set-secrets` replaces the entire list — always read first):
Edit `container/Dockerfile`. Find the pinned-version ARG block:
```dockerfile
ARGCLAUDE_CODE_VERSION=2.1.116
ARGAGENT_BROWSER_VERSION=latest
ARGVERCEL_VERSION=latest
ARGBUN_VERSION=1.3.12
```
Add a new line:
```dockerfile
ARGGMAIL_MCP_VERSION=1.1.11
```
Then find the last pnpm global-install `RUN` block (the one that installs `@anthropic-ai/claude-code`) and add a new block after it, before `# ---- Entrypoint`:
Pinned version matters — `minimumReleaseAge` in `pnpm-workspace.yaml` gates trunk installs, and CLAUDE.md requires a fixed ARG version for all Node CLIs installed into the image.
**Why the `zod-to-json-schema` pin:**`@gongrzhe/server-gmail-autoauth-mcp@1.1.11` has loose deps (`zod-to-json-schema: ^3.22.1`, `zod: ^3.22.4`). pnpm resolves `zod-to-json-schema` to the latest 3.25.x, which imports `zod/v3` — a subpath that only exists in `zod>=3.25`. But `zod` resolves to `3.24.x` (highest satisfying `^3.22.4` without breaking peer ranges). Result: `ERR_PACKAGE_PATH_NOT_EXPORTED` at import time. Pinning `zod-to-json-schema` to a pre-v3-subpath version avoids it. Re-check if you bump `GMAIL_MCP_VERSION`.
**No `TOOL_ALLOWLIST` edit needed.**`container/agent-runner/src/providers/claude.ts` derives the allow-pattern dynamically from each group's `mcpServers` map (`Object.keys(this.mcpServers).map(mcpAllowPattern)`), so registering `gmail` in Phase 3 automatically allows `mcp__gmail__*`. Earlier versions of this skill instructed a static `TOOL_ALLOWLIST` edit — that's now redundant.
### Rebuild the container image
```bash
./container/build.sh
```
Must complete cleanly. The new `pnpm install -g` layer is ~60s first time (cached on rebuild).
## Phase 3: Wire Per-Agent-Group
For each agent group that should have Gmail (ask the user — typically their personal DM and CLI agents, sometimes shared household agents), persist two changes to the **central DB** (`data/v2.db`): the `mcpServers.gmail` entry and an `additionalMounts` entry for `.gmail-mcp`. Both flow through `materializeContainerJson` on every spawn, so editing `groups/<folder>/container.json` by hand does **not** stick — that file is regenerated from the DB.
Approval behaviour depends on where you run it: from inside an agent's container `ncl` write verbs are approval-gated (admin approves before it lands); from a host operator shell with full scope, it executes immediately. Either way, the response tells you which path it took.
### Add the `.gmail-mcp` mount
There is no `ncl groups config add-mount` verb yet (tracked in [#2395](https://github.com/nanocoai/nanoclaw/issues/2395)). Until that ships, edit the DB directly via the in-tree wrapper (`scripts/q.ts` — `setup/verify.ts:5` codifies that NanoClaw avoids depending on the `sqlite3` CLI binary, so don't shell out to it):
```bash
GROUP_ID='<group-id>'
HOST_PATH="$HOME/.gmail-mcp"
MOUNT=$(jq -cn --arg h "$HOST_PATH"'{hostPath:$h, containerPath:".gmail-mcp", readonly:false}')
SET additional_mounts = json_insert(additional_mounts, '\$[#]', json('$MOUNT')), \
updated_at = datetime('now') \
WHERE agent_group_id = '$GROUP_ID';"
```
Run from your NanoClaw project root (where `data/v2.db` lives). The `$[#]` placeholder is SQLite JSON1's append-to-end notation; it's `\$`-escaped so bash doesn't arithmetic-expand it before sqlite sees it. `updated_at` is ISO-string everywhere else in the schema, so use `datetime('now')` — not `strftime('%s','now')`, which would silently mix epoch ints into a column of YYYY-MM-DD HH:MM:SS strings.
**Switch to `ncl groups config add-mount` once #2395 lands.** Update this skill at that time.
**Why the container path is relative:**`mount-security` rejects absolute `containerPath` values. Additional mounts are prefixed with `/workspace/extra/`, so `containerPath: ".gmail-mcp"` lands at `/workspace/extra/.gmail-mcp`. The MCP server's `GMAIL_OAUTH_PATH` / `GMAIL_CREDENTIALS_PATH` env vars point at that absolute location inside the container.
**Why this can't be `groups/<folder>/container.json`:** post-migration `014-container-configs`, `materializeContainerJson` in `src/container-config.ts` rewrites that file from the DB on every spawn. Anything hand-edited there is silently overwritten on next restart.
> In your `<agent-name>` chat, send: **"list my gmail labels"** or **"search my inbox for invoices from last month"**.
>
> The agent should use `mcp__gmail__list_labels` / `mcp__gmail__search`. The first call may take a second or two while the MCP server starts and OneCLI does the token exchange.
-`command not found: gmail-mcp` → image wasn't rebuilt or PATH doesn't include `/pnpm` (should — `ENV PATH="$PNPM_HOME:$PATH"` in Dockerfile).
-`ENOENT: no such file or directory, open '/workspace/extra/.gmail-mcp/credentials.json'` → mount is missing. Check `~/.config/nanoclaw/mount-allowlist.json` includes a parent of `~/.gmail-mcp`.
-`401 Unauthorized` from `gmail.googleapis.com` → OneCLI isn't injecting. Check the agent's secret mode (`onecli agents secrets --id <agent-id>`) and that the Gmail app is connected (`onecli apps get --provider gmail`).
- Agent says "I don't have Gmail tools" → the `gmail` MCP server isn't registered in this group's `mcpServers` (re-run the `ncl groups config add-mcp-server` step in Phase 3 for that group and restart it), or the agent-runner image is stale (rebuild with `./container/build.sh`, with `--no-cache` if suspicious).
## Removal
1. For each group that had Gmail wired, remove the MCP server from the DB:
```bash
ncl groups config remove-mcp-server --id <group-id> --name gmail
```
2. Remove the `.gmail-mcp` mount from the DB (no `remove-mount` verb yet — same #2395 dependency):
No `TOOL_ALLOWLIST` removal step — Phase 2 no longer edits it.
## Notes
- **Stub format is OneCLI-prescribed.** The `access_token: "onecli-managed"` pattern with `expiry_date: 99999999999999` tells the Google auth client the token is valid; OneCLI intercepts the outgoing Gmail API call and rewrites `Authorization: Bearer onecli-managed` to the real token. `expiry_date: 0` (refresh-interception) is an alternative the OneCLI docs describe — both work but OneCLI's own `migrate` command writes the far-future variant, which is what this skill assumes.
- **Scopes are set at OAuth connect time.** If the agent needs scopes beyond what's currently connected (e.g. the user later wants `calendar.readonly` for combined email/calendar workflows), disconnect and reconnect Gmail in the OneCLI web UI with the expanded scope set.
- **This is tool-only.** Inbound email as a channel (emails trigger the agent) is a separate piece of work — it needs a `src/channels/gmail.ts` adapter that polls the inbox and routes to a messaging group. The pre-v2 qwibitai skill had this; it has not been ported to v2's channel architecture as of v2.0.0.
## Credits & references
- **MCP server:** [`@gongrzhe/server-gmail-autoauth-mcp`](https://github.com/GongRzhe/Gmail-MCP-Server) by GongRzhe — MIT-licensed.
- **OneCLI credential stubs:** pattern documented at `https://onecli.sh/docs/guides/credential-stubs/gmail.md`.
- **Skill pattern:** modeled on [`add-atomic-chat-tool`](../add-atomic-chat-tool/SKILL.md) and [`add-vercel`](../add-vercel/SKILL.md).
- **Addresses:** [issue #1500](https://github.com/nanocoai/nanoclaw/issues/1500) (proxy Gmail/Calendar OAuth tokens through credential proxy) for the Gmail side.
- **Related PRs:** [#1810](https://github.com/nanocoai/nanoclaw/pull/1810) (pre-install Gmail/Notion MCP) overlaps on the "install the MCP server in the image" idea but bundles many unrelated changes; this skill is the focused OneCLI-native version.
@@ -71,40 +71,16 @@ AskUserQuestion: "Want periodic wiki health checks?"
2.**Monthly**
3.**Skip** — lint manually
If yes, create a NanoClaw scheduled task that runs in the wiki group. This is NOT a Claude Code cron job — it's a NanoClaw group task that runs in the agent container. Insert it into the SQLite database:
If yes, ask the agent to schedule the lint task using the `schedule_task` MCP tool in conversation.
'Run a wiki lint pass per the wiki container skill. Check for contradictions, orphan pages, stale content, missing cross-references, and gaps. Report findings and offer to fix issues.',
'cron',
'<cron-expr>',
'group',
nextRun,
'active',
new Date().toISOString()
);
db.close();
"
```
Use the group's `folder` and `chat_jid` from the registered groups table. Cron expressions: `0 10 * * 0` (weekly Sunday 10am) or `0 10 1 * *` (monthly 1st at 10am).
description: Add persistent graph-based memory via mnemon. Agents recall past context before responding and remember insights after each turn.
---
# Add Mnemon — Persistent Memory
Installs [mnemon](https://github.com/mnemon-dev/mnemon) in the agent container image. On each container start, `mnemon setup` registers Claude Code hooks that surface relevant memory before the agent responds and store new insights after each turn. Memory is written to the per-agent-group `.claude/` mount and survives container restarts.
## Provider Compatibility
**mnemon hooks only work with `--target claude-code`.** If the agent group uses `AGENT_PROVIDER=opencode`, hooks registered by `mnemon setup` will never fire — OpenCode spawns its own process and doesn't invoke the `claude` CLI at all.
`>/dev/stderr 2>&1` routes all mnemon output to stderr (docker logs) so it doesn't interfere with the JSON stdin handshake between host and agent-runner.
### 3. Rebuild and smoke-test the image
```bash
./container/build.sh
docker run --rm --entrypoint mnemon nanoclaw-agent:latest --version
Have a conversation with the agent, then start a new session and reference something from the earlier one. Mnemon should surface the relevant context automatically without you restating it.
## Phase 2 (OpenCode path) — context injection
mnemon hooks don't fire under OpenCode. Instead, the agent-runner injects mnemon context directly into every prompt via `wrapPromptWithContext()` in `container/agent-runner/src/providers/opencode.ts`. This is already implemented in NanoClaw — no code changes needed if you're on current `ester`/`main`.
**How it works:** On each prompt, `readMnemonContext()` checks for `MNEMON_DATA_DIR` (set by the Dockerfile `ENV`). If the env var is present, it reads `$MNEMON_DATA_DIR/prompt/guide.md` (mnemon's custom prompt guide, written by `mnemon setup`) or falls back to an inline guide. The content is prepended as a `<system>` block, instructing the agent to run `mnemon recall` at the start of relevant tasks and `mnemon remember` after key decisions.
**What this means for the agent:** The agent (running inside OpenCode) can call `mnemon recall`, `mnemon remember`, `mnemon link`, and `mnemon status` via its bash tool. mnemon writes its graph to `$MNEMON_DATA_DIR`, which is in the per-agent-group `.claude/` mount — so memory persists across container restarts.
**Applying:** Only the Dockerfile step from Phase 2 is needed for OpenCode agents. Skip `container/entrypoint.sh` entirely.
Start a session and ask the agent to run `mnemon status`. It should report empty graphs (no error) on first run.
```bash
# Also confirm the binary is present in the image:
docker run --rm --entrypoint mnemon nanoclaw-agent:latest --version
```
## Memory Storage
Mnemon writes to `/home/node/.claude/mnemon/` inside the container, which maps to the per-agent-group `.claude/` directory on the host. To find the exact host path:
```bash
docker inspect $(docker ps --filter name=nanoclaw-v2 --format '{{.Names}}'| head -1)\
The image wasn't rebuilt after adding the Dockerfile layer. Run `./container/build.sh` and restart.
### Memory not persisting across restarts
Verify `MNEMON_DATA_DIR` resolves to a mounted path (not an in-container ephemeral directory):
```bash
docker exec <container> sh -c 'ls -la $MNEMON_DATA_DIR'
```
If the directory is empty after conversations, the mount is missing or the path is wrong. Check the host mount with the `docker inspect` command above.
### Agent not using past memory
`mnemon setup` writes hooks into `/home/node/.claude/settings.json`. Verify:
Schema: **`agent_groups.agent_provider`** and **`sessions.agent_provider`**. Set to `opencode` for groups or sessions that should use OpenCode. The container receives `AGENT_PROVIDER` from the resolved value (session overrides group).
Set `"provider": "opencode"` in the group's **`container.json`** (`groups/<folder>/container.json`) — the in-container runner reads `provider` from there, not from the DB. The DB columns **`agent_groups.agent_provider`** and **`sessions.agent_provider`** (session overrides group) only drive host-side provider contribution — per-session XDG mount, `OPENCODE_*` env passthrough — and do not propagate into `container.json` at spawn time. Set both, or just edit `container.json`; if they disagree, the runner uses `container.json` and the host-side resolver falls back through session → group → `container.json` → `'claude'`.
Extra MCP servers still come from **`NANOCLAW_MCP_SERVERS`** / `container_config.mcpServers` on the host; the runner merges them into the same `mcpServers` object passed to **both** Claude and OpenCode providers.
description: Add Signal channel integration via signal-cli TCP daemon. Native adapter — no Chat SDK bridge.
---
# Add Signal Channel
Adds Signal messaging support via a native adapter that speaks JSON-RPC to a [signal-cli](https://github.com/AsamK/signal-cli) TCP daemon. No Chat SDK bridge — only Node.js builtins (`node:net`, `node:child_process`, `node:fs`).
Unlike Telegram or Discord, Signal has no bot API. NanoClaw registers as a full Signal account on a dedicated phone number (recommended) or links as a secondary device on your existing number.
> The Linux native tarball extracts a single binary directly to `~/.local/signal-cli` (not into a subdirectory). The symlink above puts it on PATH.
## Registration
Two paths. The new-number path is recommended and battle-tested.
### Path A: Register a new number (recommended)
Use a dedicated SIM or VoIP number. NanoClaw owns it entirely.
> **VoIP numbers:** Signal requires SMS verification before voice. Some VoIP providers are blocked even for voice calls. If registration fails with an auth error, try a different provider or a physical SIM.
**Step 1: Solve the CAPTCHA**
Signal requires a CAPTCHA on first registration:
1. Open `https://signalcaptchas.org/registration/generate.html` in a browser
2. Solve the captcha
3. Right-click the **"Open Signal"** button → **Copy Link**
4. The link starts with `signalcaptcha://` — the token is everything after that prefix
**Step 2: Request SMS verification**
```bash
signal-cli -a +1YOURNUMBER register --captcha "PASTE_TOKEN_HERE"
```
**Step 3: Voice call fallback (if your number can't receive SMS)**
Wait ~60 seconds after the SMS request, then:
```bash
signal-cli -a +1YOURNUMBER register --voice --captcha "SAME_TOKEN"
```
Signal calls your number and reads a 6-digit code. The same captcha token is reusable — no need to solve a new one.
> You must request SMS first. Requesting voice immediately fails with `Invalid verification method: Before requesting voice verification…`
**Step 4: Verify**
```bash
signal-cli -a +1YOURNUMBER verify CODE
```
No output = success.
**Step 5: Set profile name (optional)**
> ⚠ Stop NanoClaw before running signal-cli commands — the daemon holds an exclusive lock on its data directory while running.
signal-cli -a +1YOURNUMBER updateProfile --name "YourBotName"
systemctl --user start $(systemd_unit)
```
### Path B: Link as secondary device
Joins an existing Signal account as a secondary device. Simpler, but NanoClaw shares your personal number.
```bash
signal-cli -a +1YOURNUMBER link --name "NanoClaw"
```
This prints a `tsdevice:` URI. Scan it as a QR code on your phone: **Settings → Linked Devices → Link New Device**. QR codes expire in ~30 seconds — re-run if it expires.
## Install
### Pre-flight (idempotent)
Skip to **Credentials** if all of these are already in place:
-`src/channels/signal.ts` and `src/channels/signal.test.ts` both exist
Otherwise continue. Every step below is safe to re-run.
### 1. Fetch the channels branch
```bash
git fetch origin channels
```
### 2. Copy the adapter and tests
```bash
git show origin/channels:src/channels/signal.ts > src/channels/signal.ts
git show origin/channels:src/channels/signal.test.ts > src/channels/signal.test.ts
```
### 3. Append the self-registration import
Append to `src/channels/index.ts` (skip if the line is already present):
```typescript
import'./signal.js';
```
### 4. Build
```bash
pnpm run build
```
No npm packages to install — the adapter uses only Node.js builtins.
## Credentials
Add to `.env`:
```bash
SIGNAL_ACCOUNT=+1YOURNUMBER
```
### Optional settings
```bash
# TCP daemon host and port (default: 127.0.0.1:7583)
SIGNAL_TCP_HOST=127.0.0.1
SIGNAL_TCP_PORT=7583
# Path to the signal-cli binary (default: resolved on PATH)
SIGNAL_CLI_PATH=/usr/local/bin/signal-cli
# Whether NanoClaw manages the daemon lifecycle (default: true).
# Set to false if you run signal-cli daemon externally.
SIGNAL_MANAGE_DAEMON=true
# signal-cli data directory (default: ~/.local/share/signal-cli)
SIGNAL_DATA_DIR=~/.local/share/signal-cli
```
**Security note:** keep the TCP host on `127.0.0.1`. The daemon has no auth — binding it to a public interface would expose your full Signal account to the network.
Sync to container: `mkdir -p data/env && cp .env data/env/env`
New Signal users (including the owner's Signal identity) are silently dropped with `not_member` until granted access. After the user's first message appears in `messaging_groups`:
```bash
NOW=$(date -u +"%Y-%m-%dT%H:%M:%S.000Z")
pnpm exec tsx scripts/q.ts data/v2.db "
INSERT OR REPLACE INTO user_roles (user_id, role, agent_group_id, granted_by, granted_at)
Find the UUID from `messaging_groups.platform_id` or the `users` table.
## Next Steps
If you're in the middle of `/setup`, return to the setup flow now.
Otherwise, run `/init-first-agent` to create an agent and wire it to your Signal DM, or `/manage-channels` to wire this channel to an existing agent group.
## Channel Info
- **type**: `signal`
- **terminology**: Signal has "chats" (1:1 DMs) and "groups"
- **supports-threads**: no
- **platform-id-format**:
- DM: `signal:{UUID}` — sender's Signal UUID (ACI), **not** their phone number
- Group: `signal:{base64GroupId}` — base64-encoded GroupV2 ID
- **how-to-find-id**: Send a message to the bot, then query `messaging_groups` as shown above
- **typical-use**: Personal assistant via Signal DMs or small group chats
- **default-isolation**: One agent per Signal account. Multiple chats with the same operator can share an agent group; groups with other people should typically use `isolated` session mode
- Quoted replies — `replyTo*` fields populated from Signal quotes
- Typing indicators — DMs only (Signal doesn't support group typing)
- Echo suppression — outbound messages matched on `(platformId, text)` within a 10 s TTL to avoid syncMessage loops
- Note to Self — messages you send to your own account from another device route to the agent as inbound with `isFromMe: true`
- Voice attachments — detected but not transcribed by default; the agent receives `[Voice Message]` placeholder text. Run `/add-voice-transcription` for local transcription via parakeet-mlx
Not supported yet: outbound file attachments (logged and dropped), edit/delete messages, reactions.
## Troubleshooting
### Daemon not reachable
```bash
grep "Signal" logs/nanoclaw.log | tail
```
If you see `Signal daemon failed to start. Is signal-cli installed and your account linked?`:
- Confirm `signal-cli` is on PATH (or set `SIGNAL_CLI_PATH`)
- Confirm the account is linked: `signal-cli -a +1YOURNUMBER listIdentities` should succeed without prompting
If you see `Signal daemon not reachable at 127.0.0.1:7583` and `SIGNAL_MANAGE_DAEMON=false`, start the daemon yourself: `signal-cli -a +1YOURNUMBER daemon --tcp 127.0.0.1:7583`.
2. Channel wired: `pnpm exec tsx scripts/q.ts data/v2.db "SELECT mg.platform_id, mg.name FROM messaging_groups mg JOIN messaging_group_agents mga ON mg.id = mga.messaging_group_id WHERE mg.channel_type='signal'"`
3. Service running: `launchctl print gui/$(id -u)/"$(. setup/lib/install-slug.sh && launchd_label)"` (macOS) / `systemctl --user status "$(. setup/lib/install-slug.sh && systemd_unit)"` (Linux)
4.**Check for duplicate service instances** — if `logs/nanoclaw.error.log` shows `No adapter for channel type channelType="signal"` despite the adapter starting, two NanoClaw processes are racing. See the `/debug` skill section "No adapter for channel type / Messages silently lost" for the full fix.
### Messages delivered but never arrive (null platformMsgId)
Signal responses show `platformMsgId=undefined` in the main log. This means the delivery poll ran but found no adapter — likely a duplicate service instance issue (see above). Affected messages cannot be retried; the user must resend.
### Lost connection mid-session
If you see `Signal channel lost TCP connection to signal-cli daemon` in the logs, the daemon dropped the connection. Restart the service to re-establish.
### Messages dropped with `not_member`
The Signal user hasn't been granted membership. See "Grant user access" above. This affects every new Signal user, including the owner's Signal identity — which is a separate user record from their identity on other channels even if it's the same person.
### Captcha required
Signal requires a captcha for new registrations. Go to `https://signalcaptchas.org/registration/generate.html`, solve it, right-click "Open Signal", copy the link, extract the token after `signalcaptcha://`.
### `Invalid verification method: Before requesting voice verification…`
You must request SMS first, wait ~60 seconds, then request voice. Both steps can use the same captcha token.
### Config file in use / daemon lock
signal-cli holds an exclusive lock on its data directory while the daemon is running. Stop NanoClaw before running any `signal-cli` commands directly, then restart afterward.
### Group replies going to DM instead of group
Modern Signal groups use GroupV2. The adapter must extract the group ID from `envelope?.dataMessage?.groupV2?.id` — not `groupInfo?.groupId`, which is GroupV1/legacy. If group messages are routing as DMs, check `src/channels/signal.ts` and confirm the groupId extraction falls through to `groupV2.id`.
### Java not found
Install Java 17+ — see the Prerequisites section above.
### QR code expired (Path B)
QR codes expire in ~30 seconds. Re-run the link command to generate a new one.
Send a message to your own Signal number (Note to Self) from another device, or have someone send your linked number a DM. The bot should respond within a few seconds.
If nothing happens, tail `logs/nanoclaw.log` for `Signal channel connected` and `Signal message received`.
OneCLI uses selective secret mode — secrets must be explicitly assigned to each agent. Get the Vercel secret ID from the output above, then assign it to every agent:
```bash
# For each agent, add the Vercel secret to its assigned secrets list.
# First get current assignments, then set them with the new secret appended.
VERCEL_SECRET_ID=$(onecli secrets list 2>/dev/null | grep -B2 "Vercel"| grep '"id"'| head -1 | sed 's/.*"id": "//;s/".*//')
for agent in $(onecli agents list 2>/dev/null | grep '"id"'| sed 's/.*"id": "//;s/".*//');do
CURRENT=$(onecli agents secrets --id "$agent" 2>/dev/null | grep'"'| grep -v hint | grep -v data | sed 's/.*"//;s/".*//'| tr '\n'','| sed 's/,$//')
The adapter will print a **QR URL** to the logs and save it to `data/wechat/qr.txt`:
@@ -167,4 +170,4 @@ Otherwise, restart the service to pick up the new channel and wiring.
- **supports-threads**: no (WeChat has no reply threads)
- **typical-use**: Long-poll — the adapter holds a persistent connection to Tencent's iLink API and receives messages in real time. No webhook URL needed.
- **default-isolation**: `shared` session mode per messaging group (DM or room). Use `strict` sender policy if you want only specific users to reach the agent; `public` opens it to anyone who messages the bot.
- **post-install-wiring**: Use the `wire-dm.ts` helper (see the "Wire your first DM" section above) if running this skill standalone. If running inside `/new-setup`, `init-first-agent.ts` handles wiring — just pass the `platform-id` and `admin-user-id` captured above.
- **post-install-wiring**: Use the `wire-dm.ts` helper (see the "Wire your first DM" section above) if running this skill standalone. If running as part of `bash nanoclaw.sh`, `init-first-agent.ts` handles wiring — just pass the `platform-id` and `admin-user-id` captured above.
@@ -200,7 +200,7 @@ Otherwise, run `/manage-channels` to wire this channel to an agent group.
- **type**: `whatsapp`
- **terminology**: WhatsApp calls them "groups" and "chats." A "chat" is a 1:1 DM; a "group" has multiple members.
- **how-to-find-id**: DMs use `<phone>@s.whatsapp.net` (e.g. `14155551234@s.whatsapp.net`). Groups use `<id>@g.us`. To find your number: `node -e "const c=JSON.parse(require('fs').readFileSync('store/auth/creds.json','utf-8'));console.log(c.me?.id?.split(':')[0]+'@s.whatsapp.net')"`. Groups are auto-discovered — check `sqlite3 data/v2.db "SELECT platform_id, name FROM messaging_groups WHERE channel_type='whatsapp' AND is_group=1"`.
- **how-to-find-id**: DMs use `<phone>@s.whatsapp.net` (e.g. `14155551234@s.whatsapp.net`). Groups use `<id>@g.us`. To find your number: `node -e "const c=JSON.parse(require('fs').readFileSync('store/auth/creds.json','utf-8'));console.log(c.me?.id?.split(':')[0]+'@s.whatsapp.net')"`. Groups are auto-discovered — check `pnpm exec tsx scripts/q.ts data/v2.db "SELECT platform_id, name FROM messaging_groups WHERE channel_type='whatsapp' AND is_group=1"`.
- **supports-threads**: no
- **typical-use**: Interactive chat — direct messages or small groups
- **default-isolation**: Same agent group if you're the only participant across multiple chats. Separate agent group if different people are in different groups.
Signal sessions corrupted from rapid restarts. Clear sessions:
Signal sessions corrupted from rapid restarts. Clear sessions.
Run from your NanoClaw project root:
```bash
systemctl --user stop nanoclaw
source setup/lib/install-slug.sh
systemctl --user stop $(systemd_unit)
rm store/auth/session-*.json
systemctl --user start nanoclaw
systemctl --user start $(systemd_unit)
```
### Bot not responding
1. Auth exists: `test -f store/auth/creds.json`
2. Connected: `grep "Connected to WhatsApp" logs/nanoclaw.log | tail -1`
3. Channel wired: `sqlite3 data/v2.db "SELECT mg.platform_id, mg.name FROM messaging_groups mg JOIN messaging_group_agents mga ON mg.id=mga.messaging_group_id WHERE mg.channel_type='whatsapp'"`
4. Service running: `systemctl --user status nanoclaw`
3. Channel wired: `pnpm exec tsx scripts/q.ts data/v2.db "SELECT mg.platform_id, mg.name FROM messaging_groups mg JOIN messaging_group_agents mga ON mg.id=mga.messaging_group_id WHERE mg.channel_type='whatsapp'"`
4. Service running: `systemctl --user status "$(. setup/lib/install-slug.sh && systemd_unit)"`
### 1. "No adapter for channel type" / Messages silently lost (null platformMsgId)
**Symptom:** The bot stops replying. `logs/nanoclaw.error.log` shows repeated:
```
WARN No adapter for channel type channelType="telegram"
WARN No adapter for channel type channelType="signal"
```
The main log shows "Message delivered" entries with `platformMsgId=undefined` — meaning the delivery poll ran, found no adapter, and permanently marked the message as delivered without sending it.
**Root cause: two NanoClaw service instances running simultaneously.**
When a second service instance (often `nanoclaw-v2-<id>.service` running alongside `nanoclaw.service`) is active with a stale binary, it has no channel adapters registered. Its delivery poll races against the working instance and wins — permanently marking outbound messages as delivered without ever sending them.
**Diagnosis:**
```bash
# Check for duplicate running instances
ps aux | grep 'nanoclaw/dist/index.js'| grep -v grep
# Check which services are active
systemctl --user list-units 'nanoclaw*' --all
# Confirm channel adapters registered by the current process
1. Identify which service has the correct binary and EnvironmentFile (the one showing `signal`, `telegram`, `cli` all started in the log).
2. Stop and disable the stale duplicate service:
```bash
systemctl --user stop nanoclaw.service # or whichever is the old one
systemctl --user disable nanoclaw.service
```
3. If the remaining service unit is missing `EnvironmentFile`, add it:
```bash
# Edit the service unit — add this line under [Service]:
# EnvironmentFile=/home/[user]/nanoclaw/.env
systemctl --user daemon-reload
systemctl --user restart nanoclaw-v2-<id>.service
```
4. Verify only one instance runs: `ps aux | grep nanoclaw/dist/index.js | grep -v grep`
**Note:** Messages that were marked delivered with a null `platform_message_id` cannot be automatically retried — they are permanently lost. The user must resend their message.
### 2. "Claude Code process exited with code 1"
**Check the container log file** in `groups/{folder}/logs/container-*.log`
@@ -279,7 +322,7 @@ rm -rf data/sessions/
rm -rf data/sessions/{groupFolder}/.claude/
# Also clear the session ID from NanoClaw's tracking (stored in SQLite)
sqlite3 store/messages.db "DELETE FROM sessions WHERE group_folder = '{groupFolder}'"
pnpm exec tsx scripts/q.ts store/messages.db "DELETE FROM sessions WHERE group_folder = '{groupFolder}'"
```
To verify session resumption is working, check the logs for the same session ID across messages:
@@ -9,7 +9,7 @@ Stand up the first NanoClaw agent for a channel and verify end-to-end delivery b
## Prerequisites
- **Service running.** Check: `launchctl list | grep nanoclaw` (macOS) or `systemctl --user status nanoclaw` (Linux). If stopped, tell the user to run `/setup` first.
- **Service running.** Check: `launchctl list | grep "$(. setup/lib/install-slug.sh && launchd_label)"` (macOS) or `systemctl --user status "$(. setup/lib/install-slug.sh && systemd_unit)"` (Linux). If stopped, tell the user to run `/setup` first.
- **Target channel installed.** At least one `/add-<channel>` skill has run, credentials are in `.env`, and the adapter is uncommented in `src/channels/index.ts`.
- **Adapter connected.** Tail `logs/nanoclaw.log` — look for a recent `channel setup` / `adapter connected` line for the target channel.
@@ -54,7 +54,7 @@ Tell the user:
Wait for the user's confirmation. Then look up the most recent DM messaging groups:
```bash
sqlite3 data/v2.db "SELECT id, platform_id, name, created_at FROM messaging_groups WHERE channel_type='${CHANNEL}' AND is_group=0 ORDER BY created_at DESC LIMIT 5"
pnpm exec tsx scripts/q.ts data/v2.db "SELECT id, platform_id, name, created_at FROM messaging_groups WHERE channel_type='${CHANNEL}' AND is_group=0 ORDER BY created_at DESC LIMIT 5"
```
Show the top rows to the user and confirm which `platform_id` is theirs (usually the most recent). Record as `PLATFORM_ID`. If none appeared, check `logs/nanoclaw.log` for `unknown_sender` drops — the adapter might be rejecting inbound due to connection or permission issues.
@@ -103,7 +103,7 @@ Wait for the user's reply. If they confirm receipt, the skill is done.
If they say it didn't arrive, then diagnose using the DB directly (no waiting loops required — the message either delivered or it didn't):
-`sqlite3 data/v2-sessions/<agent-group-id>/sessions/<session-id>/outbound.db "SELECT id, status, created_at FROM messages_out ORDER BY created_at DESC LIMIT 5"` — check for stuck `pending` rows. Replace `<agent-group-id>` and `<session-id>` with the values from the script's output.
-`pnpm exec tsx scripts/q.ts data/v2-sessions/<agent-group-id>/sessions/<session-id>/outbound.db "SELECT id, status, created_at FROM messages_out ORDER BY created_at DESC LIMIT 5"` — check for stuck `pending` rows. Replace `<agent-group-id>` and `<session-id>` with the values from the script's output.
-`grep -E 'Unauthorized channel destination|container.*exited|error' logs/nanoclaw.log | tail -20` — look for ACL rejections or container crashes.
-`ls data/v2-sessions/<agent-group-id>/sessions/*/outbound.db` — confirm the session exists.
- Linux (systemd): `systemctl --user restart "$(. setup/lib/install-slug.sh && systemd_unit)"`
- WSL/manual: stop and re-run `bash start-nanoclaw.sh`
## Phase 5: Verify
@@ -259,6 +262,41 @@ Tell the user:
- To manage secrets: `onecli secrets list`, or open ${ONECLI_URL}
- To add rate limits or policies: `onecli rules create --help`
## Granting secrets to agents (safe merge)
`set-secrets`**replaces** the agent's entire secret list — it never appends. Always read the current list first and merge before calling it. This pattern is canonical across all skills that assign secrets:
-`<agentGroupId>` — the `agentGroupId` field in `groups/<folder>/container.json`
-`<new-secret-id>` — the `id` from `onecli secrets list`
- Multiple new secrets: append them comma-separated before the `printf` step
### git over HTTPS
OneCLI's proxy injects credentials proactively — `injections_applied=1` appears in `docker logs onecli` even when git sends no auth header. However, OneCLI sets `SSL_CERT_FILE` for Node/Python/Deno but not `GIT_SSL_CAINFO`. Without it, git rejects the OneCLI MITM certificate.
**Auth format matters**: GitHub's git smart HTTP protocol (`github.com`) requires `Basic` auth, not `Bearer`. GitHub's REST API (`api.github.com`) accepts `Bearer`. These must be configured as separate secrets with different formats — see `/add-github` for the full setup.
If an agent uses `git` or `gh`, add to `data/v2-sessions/<agent-group-id>/.claude-shared/settings.json`:
```json
"GIT_SSL_CAINFO":"/tmp/onecli-combined-ca.pem",
"GIT_TERMINAL_PROMPT":"0",
"GIT_CONFIG_COUNT":"1",
"GIT_CONFIG_KEY_0":"credential.helper",
"GIT_CONFIG_VALUE_0":"",
"GH_TOKEN":"ghp_onecli_proxy_replaces_this"
```
**Debugging injection**: `docker logs onecli 2>&1 | grep "github.com"` shows every request with `injections_applied=N` and the HTTP status. If `injections_applied=1` but status is still 401, the injected credential value is wrong or uses the wrong auth format for that endpoint.
## Troubleshooting
**"OneCLI gateway not reachable" in logs:** The gateway isn't running. Check with `curl -sf ${ONECLI_URL}/health`. Start it with `onecli start` if needed.
@@ -11,7 +11,22 @@ Privilege is a **user-level** concept, not a channel-level one (see `src/db/user
## Assess Current State
Read the central DB (`data/v2.db`) — query `agent_groups`, `messaging_groups`, `messaging_group_agents`, `users`, and `user_roles` tables. Also check `.env` for channel tokens and `src/channels/index.ts` for uncommented imports.
Read the central DB (`data/v2.db`) using these canonical queries (column names match the schema, not the CLI flags — the `register` command's`--assistant-name` is stored in `agent_groups.name`).
Run each via the in-tree wrapper — the host setup deliberately ships no `sqlite3` CLI:
Also check `.env` for channel tokens and `src/channels/index.ts` for uncommented imports.
Categorize channels as: **wired** (has DB entities + messaging_group_agents row), **configured but unwired** (has credentials + barrel import, no DB entities), or **not configured**.
description: Finish migrating a NanoClaw v1 install into v2. Run after `bash migrate-v2.sh` completes. Seeds the owner, cleans up CLAUDE.local.md files, reconciles container configs, and helps port custom v1 code. Triggers on "migrate from v1", "finish migration", "v1 migration".
---
# Finish v1 → v2 migration
`bash migrate-v2.sh` already ran the deterministic migration. It handled:
- .env keys merged
- v2 DB seeded (agent_groups, messaging_groups, wiring)
- Group folders copied (v1 CLAUDE.md → v2 CLAUDE.local.md)
- Session data copied with conversation continuity (incl. Claude Code memory + JSONL transcripts)
- Scheduled tasks ported
- Channel code installed and auth state copied (incl. WhatsApp Baileys keystore)
- WhatsApp LIDs resolved from `store/auth` and aliased into `messaging_groups`
- Container skills copied
- Container image built
Your job is the parts that need human judgment: triage any failed steps, seed the owner, clean up CLAUDE.local.md files, reconcile configs, and port any fork customizations.
Read `logs/setup-migration/handoff.json` first — it has `overall_status`, per-step results in `steps`, and a `followups` list.
## Preflight: was the script run?
Before anything else, check that `logs/setup-migration/handoff.json` exists. If it doesn't, the user is invoking this skill before `migrate-v2.sh` ran. Stop and tell them, verbatim:
> This skill finishes a migration that `migrate-v2.sh` started. Run that first, in your terminal — not from inside Claude:
>
> ```bash
> bash migrate-v2.sh
> ```
>
> It needs interactive prompts (channel selection, service switchover) and runs Node/pnpm bootstrap, Docker, OneCLI setup, and a container build that don't fit inside a Claude session. When it finishes, it'll hand control back to Claude automatically — at which point this skill picks up.
Do not attempt to run the script yourself, simulate its effects, or pick up the migration mid-stream. The deterministic side has dependencies on a real interactive shell.
Once `handoff.json` exists, proceed to Phase 0.
## Phase 0: Get v2 routing real messages
Before any deeper migration work, prove v2 actually answers messages on the user's real channels. v1 is paused, not touched — flipping back is a service restart.
### 0a — Fix blockers only
Walk `handoff.steps`. Fix only the failures that would stop the bot from routing one message; defer the rest to its later phase.
### 0b — Smoke test, then continue
Tell the user the switch is non-destructive (v1 is paused, not modified; reverting is one command). Help them stop v1's service unit and start v2's, tail the host log for a clean boot, and have them send a real test message. Use `AskUserQuestion` to confirm the bot responded.
If yes, continue to Phase 1. If no, diagnose from `logs/nanoclaw.log` and re-test — don't proceed to deeper work on a broken router.
### Deferred failures
Re-visit anything you skipped in 0a before declaring the migration done. Most surface naturally in later phases (`1c-groups` ↔ Phase 2, `1e-tasks` ↔ task verification).
## Phase 1: Owner and access
v2 auto-creates a `users` row for every sender it sees (via `extractAndUpsertUser` in `src/modules/permissions/index.ts`). By the time this skill runs, the owner's row likely already exists — it just needs the `owner` role granted.
**User ID format**: always `<channel_type>:<platform_handle>`. Each channel populates this differently:
Use the DB helpers in `src/db/user-roles.ts` — they keep indexes correct. Init the DB first:
```ts
import { initDb } from '../src/db/connection.js';
import { runMigrations } from '../src/db/migrations/index.js';
import { DATA_DIR } from '../src/config.js';
import path from 'path';
const db = initDb(path.join(DATA_DIR, 'v2.db'));
runMigrations(db);
```
### Access policy
After seeding the owner, discuss the access policy. v2's `messaging_groups.unknown_sender_policy` controls who can interact with the bot. `migrate-v2.sh` set it to `public` so the bot would respond during the switchover test, but the user may want to tighten it.
Present the options via `AskUserQuestion`:
1. **Public** (current) — anyone can message the bot. Good for personal DM bots.
2. **Known users only** — only users in `agent_group_members` can trigger the bot. Others are silently dropped.
3. **Approval required** — unknown senders trigger an approval request to the owner. Good for group chats where you want to vet new members.
If the user picks option 2 or 3, seed the known users from v1's message history. The v1 database is at `<handoff.v1_path>/store/messages.db`. It has a `messages` table with `sender` and `sender_name` columns. For each group:
```sql
-- v1: unique senders per chat (excluding bot messages)
SELECT DISTINCT sender, sender_name
FROM messages
WHERE chat_jid = '<v1_jid>' AND is_from_me = 0 AND sender IS NOT NULL
```
The `sender` value is a platform handle (e.g. `6037840640` for Telegram). Build the v2 user ID by inferring the channel type from the chat JID prefix (use `parseJid` from `setup/migrate-v2/shared.ts`) and combining: `<channel_type>:<sender>`.
For each sender:
1. Upsert into `users(id, kind, display_name)` if not already present.
2. Insert into `agent_group_members(user_id, agent_group_id)` for each agent group wired to that messaging group.
Show the user the list of senders being imported and let them deselect any they don't want.
Then update the messaging groups:
```sql
UPDATE messaging_groups SET unknown_sender_policy = '<chosen_policy>'
WHERE id IN (SELECT id FROM messaging_groups WHERE channel_type IN (<migrated_channels>))
```
## Phase 2: Clean up CLAUDE.local.md
The migration copied v1's entire CLAUDE.md into CLAUDE.local.md for each group. This file now contains v1 boilerplate that v2 handles through its own composed fragments (`container/CLAUDE.md` + `.claude-fragments/module-*.md`). The user's customizations are buried inside.
For each group that has a `CLAUDE.local.md`:
1. Read the file.
2. Read the v1 template it was based on. Determine which template by checking the v1 install:
- If the group had `is_main=1` in v1's `registered_groups`, the template was `groups/main/CLAUDE.md`
- Otherwise, the template was `groups/global/CLAUDE.md`
- The v1 path is in `handoff.json` → `v1_path`
3. Diff the file against the template. Identify sections that are:
- `/workspace/project/` → these paths don't exist in v2; discuss with the user
- `/workspace/ipc/` → gone; remove references
- `/workspace/extra/` → v2 uses `container.json` `additionalMounts`; keep but note the path may change
6. Keep the `# Name` heading and first paragraph (identity) — this is the user's agent personality.
7. Show the user the proposed new CLAUDE.local.md before writing it. Use `AskUserQuestion`: "Here's what I'd keep — look right?" with options to approve, edit, or keep the original.
If a CLAUDE.local.md has no user customizations (pure template copy), write a minimal file with just the identity heading.
## Phase 3: Container config
`migrate-v2.sh` writes `container.json` directly from v1's `container_config` (the `additionalMounts` shape is identical). If the v1 config was unparseable, it falls back to a `.v1-container-config.json` sidecar.
For each group, check:
1. If `container.json` exists, read it and verify the `additionalMounts` host paths are still valid on this machine. Flag any that don't exist.
2. If `.v1-container-config.json` exists (parse failure fallback), read it, discuss with the user, and write a proper `container.json`. Then delete the sidecar.
3. Check for `env` or `packages` fields — `env` may overlap with OneCLI vault, `packages` (apt/npm) are portable.
## Phase 4: Fork customizations
Check whether the user's v1 install was a customized fork.
If no commits ahead of upstream: stock v1, skip this phase.
If there are commits:
1. Show the commit list to the user.
2. `AskUserQuestion`: "How do you want to handle your v1 customizations?"
- **Copy portable items** (recommended) — copy `container/skills/*`, `.claude/skills/*`, `docs/*`. Scan each with `scanForV1Patterns` from `setup/migrate-v2/shared.ts`.
- **Full walkthrough** — go commit by commit, decide together.
- **Reference only** — stash to `docs/v1-fork-reference/` for later.
3. Source code (`src/*`, `container/agent-runner/src/*`) is NOT portable — v2's architecture is fundamentally different. Stash to `docs/v1-fork-reference/` with a README explaining what each file did. Don't translate.
## Principles
- **v1 checkout is read-only.** Never modify files under `handoff.v1_path`.
- **Show before writing.** Show diffs/proposed content before modifying CLAUDE.local.md or container.json.
- **Mask credentials** when displaying (first 4 + `...` + last 4 characters).
- **`handoff.json` is the recovery point.** If context gets compacted, re-read it and `git status` to recover state.
## Setup steps you can run
The setup flow at `setup/index.ts` has individual steps you can invoke if something is missing or failed:
```bash
pnpm exec tsx setup/index.ts --step <name>
```
| Step | When to use |
|------|-------------|
| `onecli` | OneCLI not installed or not healthy |
| `auth` | No Anthropic credential in vault |
| `container` | Container image needs rebuild |
| `service` | Service not installed or not running |
| `mounts` | Mount allowlist missing |
| `verify` | End-to-end health check (run after everything else) |
| `environment` | System check (Node, dirs) |
## When done
1. Run the verify step to confirm everything works:
```bash
pnpm exec tsx setup/index.ts --step verify
```
2. Delete `logs/setup-migration/handoff.json` — offer to save as `docs/migration-<date>.md` first.
3. Restart the service if running so changes take effect:
@@ -34,7 +34,7 @@ Two phases: **Extract** (build the migration guide) and **Upgrade** (use it). If
Run `git status --porcelain`. If non-empty, offer to stash or commit for them (AskUserQuestion: "Stash changes" / "Commit changes" / "I'll handle it"). If they want to commit, stage and commit with a descriptive message. If they want to stash, run `git stash push -m "pre-migration stash"`.
Check remotes with `git remote -v`. If `upstream` is missing, ask for the URL (default: `https://github.com/qwibitai/nanoclaw.git`), add it, then `git fetch upstream --prune`.
Check remotes with `git remote -v`. If `upstream` is missing, ask for the URL (default: `https://github.com/nanocoai/nanoclaw.git`), add it, then `git fetch upstream --prune`.
Detect upstream branch: check `git branch -r | grep upstream/` for `main` or `master`. Store as UPSTREAM_BRANCH.
description: End-to-end NanoClaw setup for any user regardless of technical background — from zero to a named agent reachable on a real messaging channel, with sensible defaults and every post-verification step skippable.
Purpose of this skill is to take any user — technical or not — from zero to a named agent wired to a real messaging channel in the fewest steps possible.
The flow has two halves:
- **Steps 1–6 — required.** Prerequisites, credential, service start, end-to-end ping. These run straight through.
- **Steps 7–12 — skippable.** Naming, channel wiring, QoL. Every step here is skippable: if the user says "skip", "not now", "later", or similar, move on without complaint. If they say they're done at any point, stop cleanly — don't push the remaining steps.
Before each step, narrate to the user in your own words what's about to happen — one short, friendly sentence, no jargon. Don't read a scripted line; use the step context below to speak naturally.
Each step is invoked as `pnpm exec tsx setup/index.ts --step <name>` and emits a structured status block Claude parses to decide what to do next.
Start with a probe: a single upfront scan that snapshots every prerequisite and dependency. The rest of the flow reads this snapshot to decide what to run, skip, or ask about — no per-step re-checking. The probe is pure bash (`setup/probe.sh`) with no external deps so it runs correctly before Node has been installed.
## Current state
!`bash setup/probe.sh`
## Flow
Parse the probe block above. For each step below, consult the named probe fields and skip, ask, or run accordingly. The probe always returns a real snapshot — there is no "node not installed" fallback; `HOST_DEPS=missing` is how you know Node/pnpm haven't been bootstrapped yet.
## Ordering and parallelism
Run steps sequentially by default: invoke the step, wait for its status block, act on the result, move to the next.
One permitted parallelism:
- **Step 2 (container image build) and step 3 (OneCLI install)** are independent — they may start together in the background.
- **Step 4 (auth) must NOT start until step 3 has completed.** Auth writes the secret into the OneCLI vault; if OneCLI isn't installed and healthy yet, the user gets asked for a credential the system can't store. Do not open an `AskUserQuestion` for step 4 while OneCLI is still installing.
- Step 2's image build may continue running past step 4 — the image isn't consumed until step 6 (first CLI agent). Join before step 6.
### 1. Node bootstrap
Check probe results and skip if `HOST_DEPS=ok` — Node, pnpm, `node_modules`, and `better-sqlite3`'s native binding are already in place.
If `HOST_DEPS=missing` and `node --version` fails (Node isn't installed at all), run `bash setup/install-node.sh`**before**`bash setup.sh` — the script handles both macOS (via `brew`) and Linux/WSL (NodeSource + apt). It's idempotent and short-circuits when node is already on PATH.
Then run `bash setup.sh`. If Node is already present and only `HOST_DEPS=missing`, run `bash setup.sh` directly — deps just haven't been installed yet.
- `DEPS_OK=false` or `NATIVE_OK=false` → Read `logs/setup.log`, fix, re-run.
> **Loose command:**`bash setup.sh`. Justification: pre-Node bootstrap. Can't call the Node-based dispatcher before Node and `pnpm install` are in place.
### 2. Docker
Check probe results and skip if `DOCKER=running` AND `IMAGE_PRESENT=true`.
**Runtime:**
- `DOCKER=not_found` → Docker is missing — install it so agent containers have an isolated place to run. Run `bash setup/install-docker.sh` (handles macOS via `brew --cask` and Linux via the official get.docker.com script, and adds the user to the `docker` group on Linux). On Linux the user may need to log out/in for group membership to take effect. On macOS, launch Docker afterwards with `open -a Docker`.
- `DOCKER=installed_not_running` → Docker is installed but the daemon is down — start it.
- macOS: `open -a Docker`
- Linux: `sudo systemctl start docker`
Wait ~15s after either, then proceed.
> **Loose commands:**`open -a Docker`, `sudo systemctl start docker`. Justification: daemon-start is a one-liner per platform, not worth wrapping. The actual install (which had the unmatchable `curl | sh` pattern) is now inside `setup/install-docker.sh`.
**Image (run if `IMAGE_PRESENT=false`):** build the agent container image — takes a few minutes the first time, one-off cost.
Check probe results and skip if `ONECLI_STATUS=healthy`.
OneCLI is the local vault that holds API keys and only releases them to agents when they need them.
`pnpm exec tsx setup/index.ts --step onecli`
### 4. Anthropic credential
Check probe results and skip if `ANTHROPIC_SECRET=true`.
The credential never travels through chat — the user generates it, registers it with OneCLI themselves, and the skill verifies.
**4a. Pick the source.** `AskUserQuestion`:
1. **Claude subscription (Pro/Max)** — "Generate a token via `claude setup-token` in another terminal."
2. **Anthropic API key** — "Use a pay-per-use key from console.anthropic.com/settings/keys."
**4b. Wait for the user to obtain the credential.** For subscription, have them run `claude setup-token` in another terminal. For API key, point them to the console URL above. Either way, they keep the token — just confirm when they have it.
**4c. Pick the registration path.** `AskUserQuestion` — substitute `${ONECLI_URL}` from the probe (or `.env`):
1. **Dashboard** — "Open ${ONECLI_URL} in a browser; add a secret of type `anthropic`, value = the token, host-pattern `api.anthropic.com`."
2. **CLI** — "Run in another terminal: `onecli secrets create --name Anthropic --type anthropic --value YOUR_TOKEN --host-pattern api.anthropic.com`"
Wait for the user's confirmation. If their reply happens to include a token (starts with `sk-ant-`), register it for them: `pnpm exec tsx setup/index.ts --step auth -- --create --value <TOKEN>`.
If `ANTHROPIC_OK=false`, the secret isn't there yet — ask them to retry, then re-check.
### 5. Service
Check probe results and skip if `SERVICE_STATUS=running`.
Start the NanoClaw background service — it relays messages between the user and the agent.
`pnpm exec tsx setup/index.ts --step service`
### 6. Wire a scratch CLI agent and verify end-to-end
**Do not narrate this step.** No "creating your first agent…", no "sending a ping…" chatter. The user's experience here is: they finished the last visible step (service), then a single success line appears. Wiring is an implementation detail at this point, not a user-facing milestone.
If step 2's container build is still running in the background, join it here before proceeding — the agent needs the image.
Use `INFERRED_DISPLAY_NAME` from the probe silently. **Do not ask the user.** The CLI agent at this stage is a scratch agent whose only purpose is to verify the end-to-end pipeline (host → container → agent → back). The user's real name capture happens in step 7.
First container spin-up takes ~60s. When the agent's reply arrives, emit exactly one line to the user:
> Test Agent success, proceeding with setup
If `pnpm run chat ping` times out or errors, tail `logs/nanoclaw.log`, diagnose, and fix — don't surface a half-success.
> **Loose command:**`pnpm run chat ping`. Justification: this is the same command the user will keep using after setup, so verification and the real channel are one and the same.
### 7. What should the agent call you?
Plain-prose ask (do **not** use `AskUserQuestion`):
> What should your agent call you? (Default: `<INFERRED_DISPLAY_NAME>`)
Capture the answer into a local variable `OPERATOR_NAME`. **Don't persist yet** — this value is consumed by step 10's channel wiring. If the user skips or confirms the default, set `OPERATOR_NAME = INFERRED_DISPLAY_NAME`.
### 8. What's your agent's name?
Plain-prose ask:
> What would you like to call your agent? (Default: `<OPERATOR_NAME>`)
Capture as `AGENT_NAME`. If skipped, set `AGENT_NAME = OPERATOR_NAME`. Nothing persisted yet.
### 9. Timezone
Run `pnpm exec tsx setup/index.ts --step timezone` and parse the status block.
- **RESOLVED_TZ is `UTC` or `Etc/UTC`** — before leaving UTC in `.env`, confirm with `AskUserQuestion`:
- **Question**: "Your system reports UTC as the timezone. Is that right, or are you somewhere else?"
- **Header**: "Timezone"
- **Options**:
1. `Keep UTC` — "Leave timezone as UTC."
2. `I'm somewhere else` — "I'll name the IANA zone (e.g. `America/New_York`, `Europe/London`, `Asia/Tokyo`) via Other."
If they pick "I'm somewhere else" (or type an IANA zone via Other), re-run `pnpm exec tsx setup/index.ts --step timezone -- --tz <answer>` to overwrite `.env`. If they keep UTC or skip, leave UTC in place.
- **NEEDS_USER_INPUT=true** — autodetection failed. Use `AskUserQuestion` with the same two options above (reword the question to "Autodetection failed — what timezone are you in?"), then re-run `pnpm exec tsx setup/index.ts --step timezone -- --tz <answer>` if they supply one. If they skip, move on.
- Otherwise — timezone is already set; move on.
### 10. Pick a messaging channel
Print the list as a numbered plain-prose list (too many options for `AskUserQuestion`, which caps at 4). The user replies with a number or channel name. Preserve the numbering exactly:
> Which messaging channel should I wire your agent to?
1. **Install the adapter.** For **Telegram**, run `bash setup/install-telegram.sh` directly — it bundles the preflight + fetch + copy + register + `pnpm install` + build from `/add-telegram` into one idempotent call. Then handle Telegram credentials inline (below) — **do not** invoke `/add-telegram` afterward; its Credentials section would generate an unapprovable `grep && sed && rm` to write `.env`. For every other channel, invoke the matching `/add-<channel>` skill via the Skill tool; it copies the adapter source in from the `channels` branch, registers it, installs the pinned npm package, and handles credentials. Some channels also run a pairing step as part of their flow.
**Telegram credentials (inline):**
- Walk the user through BotFather: `/newbot` → pick name + username ending in `bot` → copy the token.
- Remind them: in `@BotFather` → `/mybots` → their bot → Bot Settings → Group Privacy → **Turn off** (only needed if the bot will live in groups; DM-only can skip).
- Persist the token and sync it to the container mount with the generic setter:
2. **Capture platform IDs.** After the `/add-<channel>` skill finishes (or after inline credentials for Telegram), you need two values: the operator's user-id on that platform, and the chat/channel platform-id. Each channel surfaces these differently — consult the **Channel Info** section at the bottom of that skill's `SKILL.md` for the exact path. For Telegram, run `pnpm exec tsx setup/index.ts --step pair-telegram -- --intent <main|wire-to:folder|new-agent:folder>` directly and follow its `PAIR_TELEGRAM_ISSUED`/`PAIR_TELEGRAM STATUS=success` blocks — `PLATFORM_ID` and `ADMIN_USER_ID` land in the success block.
3. **Wire the agent.** Run `init-first-agent.ts` in DM mode:
```
pnpm exec tsx scripts/init-first-agent.ts \
--channel <channel> \
--user-id "<platform-user-id>" \
--platform-id "<platform-chat-id>" \
--display-name "<OPERATOR_NAME>" \
--agent-name "<AGENT_NAME>"
```
4. **Announce.** On success, emit the encouragement line verbatim:
> Your agent is now available on {channel-name}, you can already start chatting — But I encourage you to continue and finish this setup, we're almost done!
Substitute `{channel-name}` with the friendly name (e.g. "Telegram", "WhatsApp", "Slack").
If the user skipped, move on to step 11.
### 11. Host directory access
By default, agent containers can only touch their own workspace. If the user wants the agent to read or write files in specific host directories, those paths need to go on the mount allowlist.
Use `AskUserQuestion`:
- **Question**: "Want your agent to read or write files in any host directories (e.g. a code project, `~/Documents`)?"
- **Header**: "Host mounts"
- **Options**:
1. `Keep isolated` — "Agent only touches its own workspace (Recommended)."
2. `Add host paths` — "I'll name the directories to allowlist via Other."
If they pick "Add host paths" (or name paths via Other), invoke `/manage-mounts` via the Skill tool to add them. If they keep it isolated or skip, move on.
### 12. Quality of life
Optional polish. Print the list; the user may pick zero, one, or several — invoke each chosen skill in sequence:
> Want to add any of these? Pick any that sound useful — or skip:
> - `/add-compact` — `/compact` slash command for managing long sessions
> - `/add-karpathy-llm-wiki` — persistent knowledge-base memory for the agent
If the probe reports `PLATFORM=darwin`, also offer:
> - `/add-macos-statusbar` — macOS menu bar indicator with Start/Stop/Restart controls
Do **not** list `/add-macos-statusbar` on Linux. If the user skips everything, just move on.
### 13. Done
Short wrap-up:
> Setup complete. You can chat with your agent on {channel-name} — or via CLI with `pnpm run chat <message>`.
Substitute `{channel-name}` with whatever was wired in step 10. If step 10 was skipped, drop the "on {channel-name} — or" clause entirely so the line just mentions the CLI form.
## If anything fails
Any step that reports `STATUS: failed` in its status block: read `logs/setup.log` (or `logs/nanoclaw.log` for runtime failures), diagnose, fix the underlying cause, re-run the same `--step`. Don't bypass errors to keep moving.
description: Run initial NanoClaw setup. Use when user wants to install dependencies, authenticate messaging channels, register their main channel, or start the background services. Triggers on "setup", "install", "configure nanoclaw", or first-time setup requests.
description: Run initial NanoClaw setup. Use when user wants to install NanoClaw, configure it, or go through first-time setup. Triggers on "setup", "install", "configure nanoclaw", or first-time setup requests.
---
# NanoClaw Setup
Welcome the user to NanoClaw. Introduce yourself — you'll be walking them through the entire setup process step by step, from installing dependencies to getting their first message through. Keep it warm and brief (2-3 sentences).
Tell the user to run `bash nanoclaw.sh` in their terminal. That script handles the full end-to-end setup — dependencies, container image, OneCLI vault, Anthropic credential, service, first agent, and optional channel wiring.
Then explain that setup involves running many shell commands (installing packages, building containers, starting services), and recommend pre-approving the standard setup commands so they don't have to confirm each one individually.
Use `AskUserQuestion` with these options:
1. **Pre-approve (recommended)** — description: "Pre-approve standard setup commands so you don't have to confirm each one. You can review the list first if you'd like."
2. **No thanks** — description: "I'll approve each command individually as it comes up."
3. **Show me the list first** — description: "Show me exactly which commands will be pre-approved before I decide."
If they pick option 1: read `.claude/skills/setup/setup-permissions.json`, then read the project settings file at `.claude/settings.json` (create it if it doesn't exist with `{}`), and directly edit it to add/merge the permissions into the `permissions.allow` array. Do NOT use the `update-config` skill.
If they pick option 3: read and display `.claude/skills/setup/setup-permissions.json`, then re-ask with just options 1 and 2.
If they decline, continue — they'll approve commands individually.
---
**Internal guidance (do not show to user):**
- Run setup steps automatically. Only pause when user action is required (channel authentication, configuration choices).
- Setup uses `bash setup.sh` for bootstrap, then `npx tsx setup/index.ts --step <name>` for all other steps. Steps emit structured status blocks to stdout. Verbose logs go to `logs/setup.log`.
- **Principle:** When something is broken or missing, fix it. Don't tell the user to go fix it themselves unless it genuinely requires their manual action (e.g. authenticating a channel, pasting a secret token). If a dependency is missing, install it. If a service won't start, diagnose and repair.
- **UX Note:** Use `AskUserQuestion` for multiple-choice questions only (e.g. "which credential method?"). Do NOT use it when free-text input is needed (e.g. phone numbers, tokens, paths) — just ask the question in plain text and wait for the user's reply.
- **Timeouts:** Use 5m timeouts for install and build steps.
- **Waiting on user:** When the user needs to do something (change a setting, get a token, open a browser, etc.), stop and wait. Give clear instructions, then say "Let me know when done or if you need help." Do NOT continue to the next step. If they ask for help, give more detail, ask where they got stuck, and try to assist.
## 0. Git Upstream
Ensure `upstream` remote points to `qwibitai/nanoclaw`. If missing, add it silently:
If "Migrate now": invoke `/migrate-from-openclaw`, then return here and continue at step 2a (Timezone).
## 2a. Timezone
Run `pnpm exec tsx setup/index.ts --step timezone` and parse the status block.
- If NEEDS_USER_INPUT=true → The system timezone could not be autodetected (e.g. POSIX-style TZ like `IST-2`). AskUserQuestion: "What is your timezone?" with common options (America/New_York, Europe/London, Asia/Jerusalem, Asia/Tokyo) and an "Other" escape. Then re-run: `pnpm exec tsx setup/index.ts --step timezone -- --tz <their-answer>`.
- If STATUS=success and RESOLVED_TZ is `UTC` or `Etc/UTC` → confirm with the user: "Your system timezone is UTC — is that correct, or are you on a remote server?" If wrong, ask for their actual timezone and re-run with `--tz`.
- If STATUS=success → Timezone is configured. Note RESOLVED_TZ for reference.
## 3. Container Runtime (Docker)
### 3a. Install Docker
- DOCKER=running → continue to step 4
- DOCKER=installed_not_running → start Docker: `open -a Docker` (macOS) or `sudo systemctl start docker` (Linux). Wait 15s, re-check with `docker info`.
- DOCKER=not_found → Use `AskUserQuestion: Docker is required for running agents. Would you like me to install it?` If confirmed:
- macOS: install via `brew install --cask docker`, then `open -a Docker` and wait for it to start. If brew not available, direct to Docker Desktop download at https://docker.com/products/docker-desktop
- Linux: install with `curl -fsSL https://get.docker.com | sh && sudo usermod -aG docker $USER`. Note: user may need to log out/in for group membership.
### 3b. CJK fonts
Agent containers skip CJK fonts by default (~200MB saved). Without them, Chromium-rendered screenshots and PDFs show tofu for Chinese/Japanese/Korean.
- **User writing to you in Chinese, Japanese, or Korean** → enable without asking. Mention it briefly.
- **Resolved timezone from step 2a is a CJK region** (`Asia/Tokyo`, `Asia/Shanghai`, `Asia/Hong_Kong`, `Asia/Taipei`, `Asia/Seoul`) or other signal short of active CJK use → ask: "Enable CJK fonts? Adds ~200MB, lets the agent render CJK in screenshots and PDFs."
- **Otherwise** → skip.
To enable, write `INSTALL_CJK_FONTS=true` to `.env`:
- Dockerfile syntax or missing files: diagnose from the log and fix, then retry.
**If TEST_OK=false but BUILD_OK=true:** The image built but won't run. Check logs — common cause is runtime not fully started. Wait a moment and retry the test.
## 4. Credential System
### 4a. OneCLI
Install OneCLI and its CLI tool:
```bash
curl -fsSL onecli.sh/install | sh
curl -fsSL onecli.sh/cli/install | sh
```
Verify both installed: `onecli version`. If the command is not found, the CLI was likely installed to `~/.local/bin/`. Add it to PATH for the current session and persist it:
```bash
export PATH="$HOME/.local/bin:$PATH"
# Persist for future sessions (append to shell profile if not already present)
If an Anthropic secret is listed, confirm with user: keep or reconfigure? If keeping, skip to step 5.
AskUserQuestion: Do you want to use your **Claude subscription** (Pro/Max) or an **Anthropic API key**?
1. **Claude subscription (Pro/Max)** — description: "Uses your existing Claude Pro or Max subscription. You'll run `claude setup-token` in another terminal to get your token."
2. **Anthropic API key** — description: "Pay-per-use API key from console.anthropic.com."
#### Subscription path
Tell the user:
> Run `claude setup-token` in another terminal. It will output a token — copy it but don't paste it here.
Then stop and wait for the user to confirm they have the token. Do NOT proceed until they respond.
Once they confirm, they register it with OneCLI. AskUserQuestion with two options:
1. **Dashboard** — description: "Best if you have a browser on this machine. Open ${ONECLI_URL} and add the secret in the UI. Use type 'anthropic' and paste your token as the value."
**If the user's response happens to contain a token or key** (starts with `sk-ant-`): handle it gracefully — run the `onecli secrets create` command with that value on their behalf.
**After user confirms:** verify with `onecli secrets list` that an Anthropic secret exists. If not, ask again.
## 5. Set Up Channels
Show the full list of available channels in plain text (do NOT use AskUserQuestion — it limits to 4 options). Ask which one they want to start with. They can add more later with `/customize`.
Channels where the agent gets its own identity (name and avatar) are marked as recommended.
1. Discord *(recommended — agent gets own identity)*
2. Slack *(recommended — agent gets own identity)*
3. Telegram *(recommended — agent gets own identity)*
4. Microsoft Teams *(recommended — agent gets own identity)*
5. Webex *(recommended — agent gets own identity)*
6. WhatsApp
7. WhatsApp Cloud API
8. iMessage
9. GitHub
10. Linear
11. Google Chat
12. Resend (email)
13. Matrix
**Delegate to the selected channel's skill.** Each channel skill handles its own package installation, authentication, registration, and configuration.
Run `pnpm exec tsx setup/index.ts --step service` and parse the status block.
**If FALLBACK=wsl_no_systemd:** WSL without systemd detected. Tell user they can either enable systemd in WSL (`echo -e "[boot]\nsystemd=true" | sudo tee /etc/wsl.conf` then restart WSL) or use the generated `start-nanoclaw.sh` wrapper.
**If DOCKER_GROUP_STALE=true:** The user was added to the docker group after their session started — the systemd service can't reach the Docker socket. Ask user to run these two commands:
Replace `USERNAME` with the actual username (from `whoami`). Run the two `sudo` commands separately — the `tee` heredoc first, then `daemon-reload`. After user confirms setfacl ran, re-run the service step.
**If SERVICE_LOADED=false:**
- Read `logs/setup.log` for the error.
- macOS: check `launchctl list | grep nanoclaw`. If PID=`-` and status non-zero, read `logs/nanoclaw.error.log`.
- Linux: check `systemctl --user status nanoclaw`.
- Re-run the service step after fixing.
## 7a. Wire Channels to Agent Groups
The service is now running, so polling-based adapters (Telegram) can observe inbound messages — required for pairing.
Invoke `/manage-channels` to wire the installed channels to agent groups. This step:
1. Creates the agent group(s) and assigns a name to the assistant
2. Resolves each channel's platform-specific ID (Telegram via pairing code; other channels via the platform's own ID lookup)
3. Decides the isolation level — whether channels share an agent, session, or are fully separate
The `/manage-channels` skill reads each channel's `## Channel Info` section from its SKILL.md for platform-specific guidance (terminology, how to find IDs, recommended isolation).
**This step is required.** Without it, channels are installed but not wired — messages will be silently dropped because the router has no agent group to route to.
## 7b. Dashboard & Web Applications
AskUserQuestion: Do you want to create a dashboard and build web applications?
1. **Yes (recommended)** — description: "Get a NanoClaw dashboard to monitor your agents and build custom websites however you want. Deploys to Vercel."
2. **Not now** — description: "You can add this later with `/add-vercel`."
If yes: invoke `/add-vercel`.
## 8. Verify
Run `pnpm exec tsx setup/index.ts --step verify` and parse the status block.
**If STATUS=failed, fix each:**
- SERVICE=stopped → `pnpm run build`, then restart: `launchctl kickstart -k gui/$(id -u)/com.nanoclaw` (macOS) or `systemctl --user restart nanoclaw` (Linux) or `bash start-nanoclaw.sh` (WSL nohup)
- CHANNEL_AUTH shows `not_found` for any channel → re-invoke that channel's skill (e.g. `/add-telegram`)
- REGISTERED_GROUPS=0 → re-invoke `/manage-channels` from step 7a
Tell user to test: send a message in their registered chat. Show: `tail -f logs/nanoclaw.log`
## Troubleshooting
**Service not starting:** Check `logs/nanoclaw.error.log`. Common: wrong Node path (re-run step 7), credential system not running (check `curl ${ONECLI_URL}/api/health`), missing channel credentials (re-invoke channel skill).
**Container agent fails ("Claude Code process exited with code 1"):** Ensure Docker is running — `open -a Docker` (macOS) or `sudo systemctl start docker` (Linux). Check container logs in `groups/main/logs/container-*.log`.
**No response to messages:** Check trigger pattern. Main channel doesn't need prefix. Check DB: `pnpm exec tsx setup/index.ts --step verify`. Check `logs/nanoclaw.log`.
**Channel not connecting:** Verify the channel's credentials are set in `.env`. Channels auto-enable when their credentials are present. For WhatsApp: check `store/auth/creds.json` exists. For token-based channels: check token values in `.env`. Restart the service after any `.env` change.
Check if the user migrated from OpenClaw during this setup session (i.e. `/migrate-from-openclaw` was invoked). If you're unsure (e.g. after context compaction), check for `migration-state.md` in the project root — it exists during and sometimes after migration.
Write `/tmp/nanoclaw-diagnostics.json`. No paths, usernames, hostnames, or IP addresses.
Show the entire JSON to the user and ask via AskUserQuestion: **Yes** / **No** / **Never ask again**
**Yes**:
```bash
curl -s -X POST https://us.i.posthog.com/capture/ -H 'Content-Type: application/json' -d @/tmp/nanoclaw-diagnostics.json
rm /tmp/nanoclaw-diagnostics.json
```
**No**: `rm /tmp/nanoclaw-diagnostics.json`
**Never ask again**:
1. Replace contents of `.claude/skills/setup/diagnostics.md` with `# Diagnostics — opted out`
2. Replace contents of `.claude/skills/update-nanoclaw/diagnostics.md` with `# Diagnostics — opted out`
3. Remove the `## 9. Diagnostics` section from `.claude/skills/setup/SKILL.md` and the `## Diagnostics` section from `.claude/skills/update-nanoclaw/SKILL.md`
@@ -11,14 +11,15 @@ Run `/update-nanoclaw` in Claude Code.
## How it works
**Preflight**: checks for clean working tree (`git status --porcelain`). If `upstream` remote is missing, asks you for the URL (defaults to `https://github.com/qwibitai/nanoclaw.git`) and adds it. Detects the upstream branch name (`main` or `master`).
**Preflight**: checks for clean working tree (`git status --porcelain`). If `upstream` remote is missing, asks you for the URL (defaults to `https://github.com/nanocoai/nanoclaw.git`) and adds it. Detects the upstream branch name (`main` or `master`).
**Backup**: creates a timestamped backup branch and tag (`backup/pre-update-<hash>-<timestamp>`, `pre-update-<hash>-<timestamp>`) before touching anything. Safe to run multiple times.
**Preview**: runs `git log` and `git diff` against the merge base to show upstream changes since your last sync. Groups changed files into categories:
- **Skills** (`.claude/skills/`): unlikely to conflict unless you edited an upstream skill
- **Source** (`src/`): may conflict if you modified the same files
- **Build/config** (`package.json`, `pnpm-lock.yaml`, `tsconfig*.json`): lockfile changes trigger dep install
**Update paths** (you pick one):
- `merge` (default): `git merge upstream/<branch>`. Resolves all conflicts in one pass.
@@ -30,7 +31,7 @@ Run `/update-nanoclaw` in Claude Code.
**Conflict resolution**: opens only conflicted files, resolves the conflict markers, keeps your local customizations intact.
**Validation**: runs `pnpm run build` and `pnpm test`.
**Validation**: runs `pnpm run build` and `pnpm test`. If container files changed, also runs the container typecheck and `./container/build.sh`.
**Breaking changes check**: after validation, reads CHANGELOG.md for any `[BREAKING]` entries introduced by the update. If found, shows each breaking change and offers to run the recommended skill to migrate.
@@ -68,7 +69,7 @@ If output is non-empty:
Confirm remotes:
- `git remote -v`
If `upstream` is missing:
- Ask the user for the upstream repo URL (default: `https://github.com/qwibitai/nanoclaw.git`).
- Ask the user for the upstream repo URL (default: `https://github.com/nanocoai/nanoclaw.git`).
- **Build/config** (`package.json`, `pnpm-lock.yaml`, `tsconfig*.json`): lockfile changes trigger dep install
- **Other**: docs, tests, setup scripts, misc
**Large drift check:** If the upstream commit count and age suggest the user has a lot of catching up to do, mention that `/migrate-nanoclaw` might be a better fit — it extracts customizations and reapplies them on clean upstream instead of merging. Offer it as an option but don't push.
@@ -173,11 +175,31 @@ If it gets messy (more than 3 rounds of conflicts):
- If this fails because bun types are missing (`Cannot find type definition file for 'bun'`), skip with a note — type errors will surface at container runtime instead
**Container image rebuild** (only if any `container/` files are in CHANGED_FILES):
- `./container/build.sh`
If build fails:
- Show the error.
- Only fix issues clearly caused by the merge (missing imports, type mismatches from merged code).
@@ -209,8 +231,10 @@ If one or more `[BREAKING]` lines are found:
- For each skill the user selects, invoke it using the Skill tool.
- After all selected skills complete (or if user chose Skip), proceed to Step 7 (skill updates check).
# Step 7: Check for skill updates
After the summary, check if skills are distributed as branches in this repo:
# Step 7: Check for skill and channel/provider updates
## 7a: Skill branches
Check if skills are distributed as branches in this repo:
- `git branch -r --list 'upstream/skill/*'`
If any `upstream/skill/*` branches exist:
@@ -218,7 +242,21 @@ If any `upstream/skill/*` branches exist:
- Option 1: "Yes, check for updates" (description: "Runs /update-skills to check for and apply skill branch updates")
- Option 2: "No, skip" (description: "You can run /update-skills later any time")
- If user selects yes, invoke `/update-skills` using the Skill tool.
- After the skill completes (or if user selected no), proceed to Step 8.
## 7b: Channel and provider updates
Detect installed channels by reading `src/channels/index.ts` and collecting all `import './<name>.js';` lines (excluding `cli`). For providers, check `src/providers/index.ts` the same way.
If any channels/providers are installed AND `upstream/channels` or `upstream/providers` branches exist:
- List the installed channels/providers.
- Use AskUserQuestion to ask: "Would you like to update your installed channels/providers? Re-running `/add-<name>` is safe — it only updates code files, credentials and wiring are untouched."
- One option per installed channel/provider (e.g., "Update Slack (/add-slack)")
- "Skip — I'll update them later"
- Set `multiSelect: true`
- For each selected option, invoke the corresponding `/add-<channel>` or `/add-<provider>` skill.
If no channels/providers are installed, skip silently.
Proceed to Step 8.
# Step 8: Summary + rollback instructions
Show:
@@ -232,9 +270,10 @@ Show:
Tell the user:
- To rollback: `git reset --hard <backup-tag-from-step-1>`
- Backup branch also exists: `backup/pre-update-<HASH>-<TIMESTAMP>`
- Restart the service to apply changes:
- If using launchd: `launchctl unload ~/Library/LaunchAgents/com.nanoclaw.plist && launchctl load ~/Library/LaunchAgents/com.nanoclaw.plist`
- If running manually: restart `pnpm run dev`
- Restart the service to apply changes. The unit/label names are per-install — derive them with `setup/lib/install-slug.sh`. Run from your NanoClaw project root:
- **Linux**: `source setup/lib/install-slug.sh && systemctl --user restart $(systemd_unit)` (or, if you want to confirm the unit name first: `systemctl --user list-units --type=service | grep "$(. setup/lib/install-slug.sh && systemd_unit)"`)
- **Manual** (no service found): restart `pnpm run dev`
1. Replace contents of `.claude/skills/setup/diagnostics.md` with `# Diagnostics — opted out`
2. Replace contents of `.claude/skills/update-nanoclaw/diagnostics.md` with `# Diagnostics — opted out`
3. Remove the `## 9. Diagnostics` section from `.claude/skills/setup/SKILL.md` and the `## Diagnostics` section from `.claude/skills/update-nanoclaw/SKILL.md`
4. `rm /tmp/nanoclaw-diagnostics.json`
1. Replace contents of `.claude/skills/update-nanoclaw/diagnostics.md` with `# Diagnostics — opted out`
2. Remove the `## Diagnostics` section from `.claude/skills/update-nanoclaw/SKILL.md`
All notable changes to NanoClaw will be documented in this file.
For detailed release notes, see the [full changelog on the documentation site](https://docs.nanoclaw.dev/changelog).
## [2.0.64] - 2026-05-18
- **`ncl destinations add` and `remove` through the approval flow now reach the receiver immediately.** Approved destinations weren't being projected into the receiving agent's local session state, so a freshly-added destination silently failed at `send_message` with `unknown destination`, and a removed destination stayed resolvable until the next container restart. Both now take effect the moment the approval executes. Direct (non-approval) calls were unaffected.
## [2.0.63] - 2026-05-15
Rollup release covering v2.0.55 through v2.0.63 — everything merged since the v2.0.54 tag. Starting with this release, the goal is to publish a GitHub Release for every `package.json` version bump that lands on `main`; see [RELEASING.md](RELEASING.md).
- [BREAKING] **Service names are now per-install.** On v2 installs the launchd label and systemd unit are slugged to your project root: `com.nanoclaw.<sha1(projectRoot)[:8]>` and `nanoclaw-<slug>.service`. The old `com.nanoclaw` / `nanoclaw.service` names no longer match a real service — update any copy-pasted restart or status commands. Find your install's names with `source setup/lib/install-slug.sh && launchd_label` (macOS) or `systemd_unit` (Linux). The `ncl` transport-error help text and 26 skill files now use the canonical helper-driven pattern; see [setup/lib/install-slug.sh](setup/lib/install-slug.sh).
- **Compaction destination reminder placement fixed.** The reminder injected after SDK auto-compaction now appears at the end of the compaction summary so it isn't stripped during truncation. Replaces the placement shipped in v2.0.54.
- **Stronger message-wrapping enforcement.** The poll loop nudges the agent when its output lacks `<message>` wrapping, and `CLAUDE.md` core instructions now require wrapping even for single-destination agents. The welcome flow no longer double-greets.
- **OneCLI credentials after MCP install.** MCP servers added through `add_mcp_server` now inherit OneCLI gateway routing — fixes the case where the agent kept asking for API keys after installing a new server.
- **CLI scope hardening.**`scopeField` now fails closed when scope is missing, and `sessions get` is guarded against cross-group oracle access from group-scoped agents.
- **gmail/gcal skills aligned with v2.**`/add-gmail-tool` and `/add-gcal-tool` now reflect the v2 container-config model — DB-backed mounts, no dead `TOOL_ALLOWLIST` edits, no `container.json` writes that get clobbered on next spawn. Manual sqlite3/JSON1 invocations corrected.
- **Repo-rename cleanup.** Remaining `qwibitai/nanoclaw` references swept to `nanocoai/nanoclaw` across code and docs; CI workflow guards updated so they no longer no-op after the rename.
- Slack scope checklist now includes `files:read` and `files:write` for skills that read or post attachments.
- The internal-tag description in destination instructions no longer mentions scratchpads (which confused agents into routing them incorrectly).
- Container startup is now graceful when the `on_wake` column is missing on older sessions DBs.
## [2.0.54] - 2026-05-10
- **Per-group model and effort overrides.** Agent groups can now run a specific Claude model and effort level, set via `ncl groups config update --model <model> --effort <level>`. Defaults to the host-configured model when unset.
- **Claude Code 2.1.128.** Container claude-code bumped from 2.1.116 to 2.1.128.
- CLI help text improvements for `ncl groups config` and `ncl groups restart`.
## [2.0.48] - 2026-05-09
- **Container config moved to DB.** Per-agent-group container runtime config (provider, model, packages, MCP servers, mounts, skills) now lives in the `container_configs` table instead of `groups/<folder>/container.json`. Existing filesystem configs are backfilled automatically on startup. Managed via `ncl groups config get/update` and `config add-mcp-server/remove-mcp-server/add-package/remove-package`.
- **Explicit restart with on-wake messages.** Config CLI operations no longer auto-kill containers. New `ncl groups restart` command with `--rebuild` and `--message` flags. On-wake messages (`on_wake` column on `messages_in`) are only picked up by a fresh container's first poll, preventing dying containers from stealing them during the SIGTERM grace period. Self-mod approval handlers (`install_packages`, `add_mcp_server`) use the same race-free mechanism.
- **Per-group CLI scope.** New `cli_scope` setting on container config (`disabled` / `group` / `global`, default `group`). Controls what the agent can access via `ncl` from inside the container. `disabled` excludes CLI instructions from CLAUDE.md and blocks all requests. `group` (default) restricts to own-group resources with auto-filled args. `global` gives unrestricted access (set automatically for owner agent groups). Includes post-handler result filtering to prevent cross-group data leaks and blocks `cli_scope` escalation from group-scoped agents.
## [2.0.45] - 2026-05-08
- **Admin CLI (`ncl`).** New `ncl` command for querying and modifying the central DB — agent groups, messaging groups, wirings, users, roles, members, destinations, sessions, approvals, and dropped messages. Host-side transport via Unix socket; container-side transport via session DB. Write operations from inside containers go through the approval flow. `list` supports column filtering and `--limit`. Run `ncl help` for usage.
- **v1 → v2 migration.** Run `bash migrate-v2.sh` from the v2 checkout. Finds your v1 install (sibling directory or `NANOCLAW_V1_PATH`), merges `.env`, seeds the v2 DB from `registered_groups`, copies group folders (`CLAUDE.md` → `CLAUDE.local.md`), copies session data with conversation continuity, ports scheduled tasks, interactively selects and installs channels (clack multiselect), copies container skills, builds the agent container, and offers a service switchover to test. Hands off to Claude (`/migrate-from-v1`) for owner seeding, access policy, CLAUDE.md cleanup, and fork customization porting. See [docs/migration-dev.md](docs/migration-dev.md) and [docs/v1-to-v2-changes.md](docs/v1-to-v2-changes.md).
@@ -6,8 +6,8 @@ This is NanoClaw v2, a ground-up rewrite with breaking changes throughout. It ca
**Do this instead:**
1. Abort the merge: `git merge --abort` (or `git reset --hard ORIG_HEAD` if the merge already completed).
2. Tell the user, verbatim: *"This is the v2 rewrite — it can't be merged into your existing install. Run `bash migrate-v2.sh`to migrate instead."*
3. Wait for the user to confirm before doing anything else. Do not attempt the migration yourself.
2. Tell the user, verbatim: *"This is the v2 rewrite — it can't be merged into your existing install. Exit Claude Code (or open a separate terminal) and run `bash migrate-v2.sh`from the shell."*
3. Wait for the user to confirm before doing anything else. Do not run the migration script yourself — it requires an interactive terminal and cannot be run from within Claude Code.
If you are a fresh install (you ran `git clone`, not `git pull`) and there are no conflicts, ignore this banner and continue below.
@@ -53,6 +53,8 @@ Exactly one writer per file — no cross-mount lock contention. Heartbeat is a f
`data/v2.db` holds everything that isn't per-session: users, user_roles, agent_groups, messaging_groups, wiring, pending_approvals, user_dms, chat_sdk_* (for the Chat SDK bridge), schema_version. Migrations live at `src/db/migrations/`.
For ad-hoc queries from skills or scripts, use the in-tree wrapper rather than the `sqlite3` CLI: `pnpm exec tsx scripts/q.ts <db> "<sql>"`. The host setup intentionally avoids depending on the `sqlite3` binary (`setup/verify.ts:5`); the wrapper goes through the `better-sqlite3` dep that setup already installs and verifies. Default-output format matches `sqlite3 -list` (pipe-separated, no header) so existing skill text reads identically.
## Key Files
| File | Purpose |
@@ -70,13 +72,43 @@ Exactly one writer per file — no cross-mount lock contention. Heartbeat is a f
| `scripts/init-first-agent.ts` | Bootstrap the first DM-wired agent (used by `/init-first-agent` skill) |
| `migrate-v2.sh` + `setup/migrate-v2/` | v1→v2 migration. Standalone script: `bash migrate-v2.sh`. Seeds DB, copies groups/sessions, installs channels, builds container, offers service switchover, then hands off to `/migrate-from-v1` skill for owner setup and CLAUDE.md cleanup. See [docs/migration-dev.md](docs/migration-dev.md). |
## Admin CLI (`ncl`)
`ncl` queries and modifies the central DB — agent groups, messaging groups, wirings, users, roles, and more. On the host it connects via Unix socket (`src/cli/socket-server.ts`); inside containers it uses the session DB transport (`container/agent-runner/src/cli/ncl.ts`).
@@ -91,13 +123,35 @@ Each `/add-<name>` skill is idempotent: `git fetch origin <branch>` → copy mod
One tier of agent self-modification today:
1. **`install_packages` / `add_mcp_server`** — changes to the per-agent-group container config only (apt/npm deps, wire an existing MCP server). Single admin approval per request; on approve, the handler in `src/modules/self-mod/apply.ts` rebuilds the image when needed (`install_packages` only) and restarts the container. `container/agent-runner/src/mcp-tools/self-mod.ts`.
1. **`install_packages` / `add_mcp_server`** — changes to the per-agent-group container config in the DB (apt/npm deps, wire an existing MCP server). Single admin approval per request; on approve, the handler in `src/modules/self-mod/apply.ts` rebuilds the image when needed (`install_packages` only), writes an `on_wake` message, kills the container, and respawns via `onExit` callback. The on-wake message is only picked up by the fresh container's first poll — dying containers can never steal it. `container/agent-runner/src/mcp-tools/self-mod.ts`.
A second tier (direct source-level self-edits via a draft/activate flow) is planned but not yet implemented.
## Container Config
Per-agent-group container runtime config (provider, model, packages, MCP servers, mounts, etc.) lives in the `container_configs` table in the central DB. Materialized to `groups/<folder>/container.json` at spawn time so the container runner can read it. Managed via `ncl groups config get/update` and the self-mod MCP tools.
**`cli_scope`** — controls what the agent can do with `ncl` from inside the container:
| Value | Behavior |
|-------|----------|
| `disabled` | Agent never learns about ncl (instructions excluded from CLAUDE.md). Host dispatch rejects any `cli_request`. |
| `group` (default) | Agent can access `groups`, `sessions`, `destinations`, `members` only, scoped to its own agent group. `--id` and group args are auto-filled. Cross-group access rejected. `cli_scope` changes blocked. |
| `global` | Unrestricted. Set automatically for owner agent groups via `init-first-agent`. |
`ncl groups restart --id <group-id> [--rebuild] [--message <text>]`. Kills running containers; if `--message` is provided, writes an `on_wake` message and respawns via `onExit` callback. Without `--message`, containers come back on the next user message. From inside a container, `--id` is auto-filled and only the calling session is restarted.
The `on_wake` column on `messages_in` ensures wake messages are only picked up by a fresh container's first poll iteration. This prevents the race where a dying container (still in its SIGTERM grace period) could steal the message. `killContainer` accepts an optional `onExit` callback that fires after the process exits, guaranteeing the old container is gone before the new one spawns.
API keys, OAuth tokens, and auth credentials are managed by the OneCLI gateway. Secrets are injected into per-agent containers at request time — none are passed in env vars or through chat context. `src/onecli-approvals.ts`, `ensureAgent()` in `container-runner.ts`. Run `onecli --help`.
API keys, OAuth tokens, and auth credentials are managed by the OneCLI gateway. Secrets are injected into per-agent containers at request time — none are passed in env vars or through chat context. The container agent sees this via the `onecli-gateway` container skill (`container/skills/onecli-gateway/SKILL.md`), which teaches it how the proxy works, how to handle auth errors, and to never ask for raw credentials. Host-side wiring:`src/onecli-approvals.ts`, `ensureAgent()` in `container-runner.ts`. Run `onecli --help`.
### Gotcha: auto-created agents start in `selective` secret mode
@@ -141,7 +195,7 @@ Four types of skills. See [CONTRIBUTING.md](CONTRIBUTING.md) for the full taxono
- **Channel/provider install skills** — copy the relevant module(s) in from the `channels` or `providers` branch, wire imports, install pinned deps (e.g. `/add-discord`, `/add-slack`, `/add-whatsapp`, `/add-opencode`).
@@ -157,6 +211,17 @@ Four types of skills. See [CONTRIBUTING.md](CONTRIBUTING.md) for the full taxono
Before creating a PR, adding a skill, or preparing any contribution, you MUST read [CONTRIBUTING.md](CONTRIBUTING.md). It covers accepted change types, the four skill types and their guidelines, `SKILL.md` format rules, and the pre-submission checklist.
## PR Hygiene
Before creating a PR, run these checks:
```bash
git diff upstream/main --stat HEAD
git log upstream/main..HEAD --oneline
```
Show the output and wait for approval. Installation-specific files (group files, .claude/settings.json, local configs) should not be included.
## Development
Run commands directly — don't tell the user to run them.
| Session DBs | `data/v2-sessions/<agent-group>/<session>/` — `inbound.db` (`messages_in`: did the message reach the container?), `outbound.db` (`messages_out`: did the agent produce a response?) |
Note: container logs are lost after the container exits (`--rm` flag). If the agent silently failed inside the container, there's no persistent log to inspect.
## Supply Chain Security (pnpm)
@@ -209,9 +284,10 @@ This project uses pnpm with `minimumReleaseAge: 4320` (3 days) in `pnpm-workspac
1. **Check for existing work.** Search open PRs and issues before starting:
```bash
gh pr list --repo qwibitai/nanoclaw --search "<your feature>"
gh issue list --repo qwibitai/nanoclaw --search "<your feature>"
gh pr list --repo nanocoai/nanoclaw --search "<your feature>"
gh issue list --repo nanocoai/nanoclaw --search "<your feature>"
```
If a related PR or issue exists, build on it rather than duplicating effort.
@@ -43,7 +43,7 @@ Add capabilities to NanoClaw by merging a git branch. The SKILL.md contains setu
3. Claude walks through interactive setup (env vars, bot creation, etc.)
**Contributing a feature skill:**
1. Fork `qwibitai/nanoclaw` and branch from `main`
1. Fork `nanocoai/nanoclaw` and branch from `main`
2. Make the code changes (new files, modified source, updated `package.json`, etc.)
3. Add a SKILL.md in `.claude/skills/<name>/` with setup instructions — step 1 should be merging the branch
4. Open a PR. We'll create the `skill/<name>` branch from your work
@@ -123,7 +123,8 @@ Test your contribution on a fresh clone before submitting. For skills, run the s
1. **Link related issues.** If your PR resolves an open issue, include `Closes #123` in the description so it's auto-closed on merge.
2. **Test thoroughly.** Run the feature yourself. For skills, test on a fresh clone.
3. **Check the right box** in the PR template. Labels are auto-applied based on your selection:
3. **Check for installation-specific files.** Before creating a PR, verify no installation-specific files are in your diff (see PR Hygiene in CLAUDE.md).
4. **Check the right box** in the PR template. Labels are auto-applied based on your selection:
`nanoclaw.sh` walks you from a fresh machine to a named agent you can message. It installs Node, pnpm, and Docker if missing, registers your Anthropic credential with OneCLI, builds the agent container, and pairs your first channel (Telegram, Discord, WhatsApp, or a local CLI). If a step fails, Claude Code is invoked automatically to diagnose and resume from where it broke.
<details>
<summary><strong>Migrating from NanoClaw v1?</strong></summary>
Run from a fresh v2 checkout next to your v1 install:
`migrate-v2.sh` finds your v1 install (sibling directory, or `NANOCLAW_V1_PATH=/path/to/nanoclaw`), migrates state into the v2 checkout, then `exec`s into Claude Code to finish the parts that need judgment (owner seeding, CLAUDE.local.md cleanup, fork-customisation replay).
Run the script directly, not from inside a Claude session — the deterministic side needs interactive prompts and real shell I/O for Node/pnpm bootstrap, Docker, OneCLI, and the container build.
**What it does:** merges `.env`, seeds the v2 DB from `registered_groups`, copies group folders + session data + scheduled tasks, installs the channel adapters you select, copies channel auth state (including Baileys keystore + LID mappings for WhatsApp), builds the agent container.
**What it doesn't:** flip the system service. Pick *"switch to v2"* at the prompt, or do it manually after testing — your v1 install is left untouched.
See [docs/v1-to-v2-changes.md](docs/v1-to-v2-changes.md) for what's different and [docs/migration-dev.md](docs/migration-dev.md) for development notes.
</details>
## Philosophy
**Small enough to understand.** One process, a few source files and no microservices. If you want to understand the full NanoClaw codebase, just ask Claude Code to walk you through it.
@@ -190,3 +215,5 @@ See [CHANGELOG.md](CHANGELOG.md) for breaking changes, or the [full release hist
Consolidates what's still relevant from `REFACTOR_PLAN.md` and `REFACTOR_EXECUTION.md`: open decisions, remaining work, operational patterns worth keeping. Historical PR timeline and phase framing have been dropped — the work is in the commit history.
---
## Architecture (still authoritative)
### Module tiers
Three categories, distinguished by shipping model and dependency direction:
| Tier | Where it lives | Loaded by default? | Removal cost |
| **Default modules** | `src/modules/<name>/` on main | yes — imported by `src/modules/index.ts` | edit core imports (intentional friction) |
| **Optional modules** | `src/modules/<name>/` on main (for now — see open q #7) | yes, via barrel import | delete files + barrel line + revert `MODULE-HOOK` edits |
| **Channel adapters** | `src/channels/<name>.ts` on `channels` branch | no — cherry-pick via `/add-<name>` | delete files + barrel line |
| **Providers** | on `providers` branch | no — cherry-pick via `/add-<provider>` | delete files + barrel line |
Dependency rule: **core ← default modules ← optional modules**. Optional modules must not depend on each other. Known transitional violation (flagged): `src/db/messaging-groups.ts` auto-wires `agent_destinations` when agent-to-agent is installed.
### The four registries
Full contract in [`docs/module-contract.md`](docs/module-contract.md). Summary:
- **`channels`** — fully loaded runnable branch with all channel adapters; skills cherry-pick from it.
- **`providers`** — same pattern for agent providers (OpenCode).
- **`modules` branch** — proposed but NOT created yet. See "Remaining work" below.
---
## Remaining work
### Phase 5: merge `v2` → `main`
Cut-over the refactor. Pre-reqs (already met): green build, green tests, green service boot, clean `channels` / `providers` syncs.
Open logistics:
- Release versioning: bump to `1.3.0` at merge time or cut a `v2-rc` tag first for internal testing? Non-blocking — decide at merge.
- Coordinate with anyone still running the old `main` (v1.2.53) — breaking change for them.
- Announce the new layout + the one shell command that changed (`pnpm run chat` is new default).
### `modules` branch — create, skip, or defer?
The original plan (PR #10) was to fork a `modules` branch and populate it with the 5 optional modules, so future `/add-<module>` skills pull via `git show origin/modules:path`. Three paths:
- **(a) Create it now.** Matches the `channels`/`providers` pattern for consistency. Extra surface to maintain: every core change must be merged into `modules` at phase boundaries (same cadence as channels/providers). Pays off if we ever want to make a module *truly* optional (not shipped on main).
- **(b) Skip it.** Leave all 5 optional modules shipped on main. No `modules` branch, no install skills, no cherry-picking. Simpler but loses the "opt-in" property for users who want a leaner install.
- **(c) Defer.** Ship main without the modules branch; create it later if someone actually wants to slim their install. No-cost option for now.
Recommendation leans toward (c) — we've already paid the architectural cost (tier boundary, dependency rule, registries) without needing the branch today.
### Per-module follow-ups (tracked as open questions below)
Each has a specific landing zone when we get to it:
- #19 (system vs user CLAUDE.md) — requires install-skill tooling.
---
## Operational patterns (keep using these)
### Standing checks for every PR
Non-negotiable; a unit test suite alone doesn't catch circular-import TDZ bugs:
1. `pnpm run build` clean.
2. `pnpm test` + `bun test` (in `container/agent-runner/`) all green.
3. **Service actually starts.**`gtimeout 5 node dist/index.js` (or `launchctl kickstart`) must reach `NanoClaw running`. Unit tests import individual files; only `main()` exercises the module-init order.
4. Expected boot log lines present (at least: `Central DB ready`, `Delivery polls started`, `Host sweep started`, `NanoClaw running`, plus any module lifecycle line like `OneCLI approval handler started` or `CLI channel listening`).
### Module architecture rule (TDZ bug, PR #3)
Any registry state a module writes to at import time must live in a file with **no back-edge to `src/index.ts`** — transitively. `src/index.ts` imports `src/modules/index.js` for side effects; if a module calls `registerX()` at top level and `X` lives in `src/index.ts`, the ES module loader hits a TDZ reference on the const declaration. Fix: registry state lives in its own dependency-free file (e.g. `src/response-registry.ts`). Any new registry follows the same pattern.
### Branch sync procedure
After every `v2` (or future `main`) sync into `channels` / `providers` / future `modules`:
1. **File-presence diff.** Enumerate files that existed pre-sync but are missing post-sync:
- **Branch-owned** (channels branch still needs it) → restore from pre-sync HEAD.
This has caught real losses on both `channels` (17 adapter files plus 3 setup scripts after PR #2's channel move) and `providers` (opencode files after PR #2).
2. **Cross-file consistency.** When restoring a file, check whether something *else* that also changed references it (e.g. `setup/index.ts`'s `STEPS` map).
3. **Run the standing checks** against the synced branch (not just v2).
### Prettier drift pattern
The `format:fix` pre-commit hook sometimes reformats peer files *after* the commit completes, leaving cosmetic-only diffs in the working tree. Discard with `git checkout -- <files>`. Do not re-commit the drift — it's trivial whitespace and noise.
---
## Open questions (curated)
### Design / architecture
1. **`NANOCLAW_ADMIN_USER_IDS` as the admin mechanism.** Host queries `user_roles` at container wake, collapses into env var, container compares sender IDs. Conflates identity-at-send with privilege-at-wake and forces the container to care about namespaced user IDs. Revisit during a container-runner audit.
2. **Host-side `src/providers/` registry.** One real consumer (OpenCode). A registry is probably overkill — the install skill could just edit `container-runner.ts` via `MODULE-HOOK`. Fold into the container-runner audit.
3. **Container-runner audit.**`src/container-runner.ts` has accreted wake/spawn/kill, mount assembly, OneCLI credential application, admin-ID env var, idle timers, image rebuild. Some pieces should pull apart or move into modules. Not blocking. Related to #1 and #2.
4. **Revisit destinations + A2A capability holistically.** The destination projection invariant, dual-purpose routing+ACL table, channel vs agent destination shapes, `createMessagingGroupAgent` auto-wire coupling — more machinery than the feature warrants. Phase 3 moved it out of core intact; a redesign is warranted but scoped post-refactor.
5. **Self-mod approach rethink.**_Partially addressed_ — the redundant `request_rebuild` tool was removed; approval of `install_packages` now bundles rebuild + container restart, and `add_mcp_server` approval restarts without rebuilding (bun runs TS directly). Still to consider: collapsing `install_packages` + `add_mcp_server` into a single "apply this container-config diff" approval primitive to reduce post-rebuild latency further.
6. **Per-agent-group source / per-group base image.** Self-mod today layers packages/MCP on a shared base. As groups diverge (different base images, provider configs, runtime toolchains), the shared-base assumption won't scale. Scope post-refactor.
### Distribution / operational
7. **Providers on a consolidated `modules` branch?** Staying separate for now. Revisit if a second optional provider appears.
8. **Per-group module enablement.** Modules are currently project-wide. If one agent group wants approvals and another doesn't, we'd need per-group feature flags. Flag if asked.
9. **Module removal UX.** We do not drop tables on uninstall. Is that the right default? (Alternative: `/remove-<module>` optionally runs a down migration. YAGNI until requested.)
10. **Cross-module ordering for the response dispatcher.** Registration order determines who claims a given `questionId`. IDs are disjoint in practice (`q-…` vs `appr-…`), so first-match-wins is safe. If a third response-consuming module arrives, we may need keyed dispatch.
11. **Versioned module migrations.** Reinstalls are idempotent (migrator skips anything already in `schema_version`). If a module ships a *new* migration in a later version, the install skill must append the new file + barrel entry without touching prior ones. Simplest rule: install skills are additive; content changes to an already-applied migration are a hard error.
12. **Telegram pairing imports from permissions (channels branch).**`src/channels/telegram.ts` reaches into `src/modules/permissions/db/*` for `grantRole`/`hasAnyOwner`/`upsertUser` in the pairing-bootstrap branch. Cross-branch tier violation. Fix: extract those writes into a pairing helper (e.g. `src/channels/telegram-pairing-accept.ts` or `setup/pair-telegram.ts`). Non-blocking.
### Core slotting (files not explicitly discussed)
13. **`state-sqlite.ts`, `webhook-server.ts`, `timezone.ts`.** state-sqlite is likely core (host tracker). Webhook-server likely core (channel infra). Timezone likely core utility. Confirm if any of them prove to be module-shaped during future audits.
14. **Chat SDK bridge location.**`src/channels/chat-sdk-bridge.ts` is channel infra that bridges adapters on the `channels` branch. Stays in `src/channels/` for now.
15. **OneCLI credential injection.** Lives in `container-runner.ts`. Every agent call uses it, no clean optional boundary. Stays core. Related: `onecli-approvals.ts` is bundled inside the `approvals` default module on the assumption OneCLI stays in core. If OneCLI later moves to its own module, `onecli-approvals` follows.
### Documentation
16. **CLAUDE.md content per module.** Every module ships with project.md + agent.md. Need a dedicated review pass: (a) write the missing agent-to-agent snippets, (b) audit other modules for accuracy/tone, (c) confirm `agent.md` files are actually tailored for the agent vs. copy-pastes of `project.md`.
17. **Split system CLAUDE.md from user CLAUDE.md.** Project `CLAUDE.md` and `groups/global/CLAUDE.md` mix system-authored content (module contracts, install-skill appends) with user customizations. Updates currently risk clobbering user intent. Look at a system-owned region (or separate file) that skills rewrite freely plus a user-owned one that's never touched. Related to #16.
This doc (`REFACTOR.md`) is transient — prune when open questions close; retire entirely once the refactor is fully behind us and the operational patterns have been absorbed into `CLAUDE.md` or `docs/`.
Starting with v2.0.63, the goal is to publish a GitHub Release for every `package.json` version bump that lands on `main`. Releases are cut manually by a maintainer, so there can be lag between a bump merging and its release being published. The intent is *timeliness*, not strict 1:1 correlation with every bump.
Each release ships:
- A tagged commit on `main` (`vX.Y.Z`).
- A `CHANGELOG.md` entry under `## [<version>] - <YYYY-MM-DD>`.
- A GitHub Release whose body mirrors the CHANGELOG entry plus a contributors section.
## When to cut a release
A release is cut by a maintainer publishing it. The trigger is a `package.json` bump on `main`, but the publish step is manual — there is no fixed schedule, and bumps that land back-to-back may be rolled into a single release (as v2.0.55 through v2.0.63 were). Cutting more frequently is preferable to batching: smaller releases are easier to read, pin, and revert.
## What goes in a release
`CHANGELOG.md` is the canonical record of user-visible change. The release body on GitHub mirrors it. Aim for:
- **Bold lead-ins** per major feature or fix, then a sentence-case prose explanation.
- **`[BREAKING]` prefix** for any change that requires user action. Always include the workaround inline — never link to a separate doc for the fix.
- **Doc links** for major features (relative paths into the repo, e.g. `[setup/lib/install-slug.sh](setup/lib/install-slug.sh)`).
- **Inline commands** for actionable steps, in backticks.
- **Minor items** as single plain bullets at the bottom of the entry, no bold lead-in.
- **No PR numbers** in the user-facing prose. PR references can live in the GitHub Release's `## Contributors` section.
## Publishing the release
1. Bump `package.json` and add a `CHANGELOG.md` entry in the same commit (commit message: `chore: bump version to vX.Y.Z`).
2. Once the bump commit lands on `main`, open a draft GitHub Release:
- **Tag:**`vX.Y.Z`, target `main`.
- **Title:**`vX.Y.Z` (bare version — descriptive content lives in the body, matching the CHANGELOG header pattern).
- **Body:** copy the CHANGELOG entry verbatim. Append a `## Contributors` section listing every PR author who landed work in the release window. Append a `**Full Changelog**: https://github.com/nanocoai/nanoclaw/compare/<prev-tag>...vX.Y.Z` line at the bottom.
3. If anyone in the window opened their first NanoClaw PR, add a `## New Contributors` section above `## Contributors`, with each first-timer's first PR link and an invite to Discord.
4. Publish (not just save draft).
## Rollup releases
If multiple `package.json` bumps land between two GitHub Releases (as happened between v2.0.54 and v2.0.63), the next release is a rollup: its CHANGELOG entry covers everything merged since the last released tag, and the body opens with a one-line "Rollup release covering vX.Y.Z through vX.Y.W." note. After the catchup, return to one release per bump.
## Channels and stability
NanoClaw currently ships a single channel: every published release is a stable release.
- **Latest** — the most recent release on `main`, shown as "Latest release" on the GitHub Releases page. Consumers that want auto-bump follow GitHub's `/releases/latest` pointer.
- **Stable** — currently identical to latest. NanoClaw has no separate stable branch and no pre-release/RC channel.
- **Pinned** — any tagged release. Reproducible and the recommended choice for packagers and forks; published tags are not moved or retracted.
If a pre-release channel is introduced later (e.g. `vX.Y.Z-rc.N`), those releases will be marked "Pre-release" on GitHub so they do not become the `latest` pointer, and this section will be updated to describe the promotion path.
The tag is the source of truth — a GitHub Release's `target_commitish` always points to a tagged commit.
lines.push('To send a message, wrap it in a `<message to="name">...</message>` block.');
lines.push('You can include multiple `<message>` blocks in one response to send to multiple destinations.');
lines.push('Text outside of `<message>` blocks is scratchpad — logged but not sent anywhere.');
lines.push('Use `<internal>...</internal>` to make scratchpad intent explicit.');
lines.push('');
lines.push(
'To send a message mid-response (e.g., an acknowledgment before a long task), call the `send_message` MCP tool with the `to` parameter set to a destination name.',
'Wrap each delivered message in a `<message to="name">…</message>` block; include several blocks in one response to address several destinations. `<internal>…</internal>` marks thinking you don\'t want sent.',
);
lines.push('');
lines.push(
'When replying to an incoming message, default to addressing the destination it came `from` (every inbound `<message>` tag carries a `from="name"` attribute). Pick a different destination when the request asks for it (e.g., "tell Laura that…").',
);
lines.push('');
lines.push(
'The `send_message` MCP tool is the same delivery, available mid-turn — handy for a quick acknowledgment ("on it") before a slow tool call. Each `send_message` call and each final-response `<message>` block lands as its own message in the conversation, so they read as a sequence rather than as one combined reply.',
The `ncl` command is available at `/usr/local/bin/ncl`. It lets you query and modify NanoClaw's central configuration.
### Usage
```
ncl <resource> <verb> [--flags]
ncl <resource> help
ncl help
```
### Scope
Your CLI access may be scoped. Run `ncl help` to see which resources are available and whether args are auto-filled. Under `group` scope (the default), `--id` and group-related args are auto-filled to your agent group — you don't need to pass them.
### Resources
Run `ncl help` for the full list. Common resources:
- **Looking up your own config** — `ncl groups get` or `ncl groups config get` to see your container config.
- **Restarting your container** — `ncl groups restart` (with optional `--rebuild` and `--message`).
- **Checking who's in your group** — `ncl members list`.
- **Seeing your destinations** — `ncl destinations list`.
- **Answering questions about the system** — query `ncl` rather than guessing.
### Access rules
Read commands (list, get) are open. Write commands (create, update, delete, restart, config update, add, remove) require admin approval — the request is held until an admin approves it.
### Approval flow
Write commands require admin approval. Here's what happens:
1. You run the command (e.g. `ncl groups config update --model claude-sonnet-4-5-20250514`).
2. The command returns immediately with an `approval-pending` response — it has **not** been executed yet.
3. An admin or owner gets a notification showing exactly what you requested, with approve/reject options.
4. Once the admin responds:
- **Approved:** the command executes and the result is delivered back to you as a system message in this conversation.
- **Rejected:** you get a system message saying the request was rejected.
You don't need to poll or retry — the result arrives automatically.
### Examples
```bash
# Read commands (no approval needed)
ncl groups get
ncl groups config get
ncl sessions list
ncl destinations list
ncl members list
# Write commands (approval required)
ncl groups restart
ncl groups restart --rebuild --message "Config updated."
ncl groups config update --model claude-sonnet-4-5-20250514
ncl groups config add-mcp-server --name rss --command npx --args '["some-rss-mcp"]'
ncl groups config add-package --npm some-package
ncl members add --user telegram:jane
```
### Important
Config changes via `ncl groups config update` do not take effect until `ncl groups restart`. Run `ncl groups config help` for details.
### Tips
- Use `ncl <resource> help` to see all available fields, types, enums, and which fields are auto-filled.
- Flags use `--hyphen-case` (e.g. `--agent-group-id`), mapped to `underscore_case` DB columns automatically.
- `list` supports filtering by any non-auto column. Default limit is 200 rows; override with `--limit N`.
- Write commands return `approval-pending` immediately — don't treat this as an error. Wait for the system message with the result.
Your final response is delivered via the `## Sending messages`rules in your runtime system prompt (single-destination: just write; multi-destination: use `<message to="name">...</message>` blocks). See that section for the current destination list.
**Every response** must be wrapped in `<message to="name">...</message>`blocks — even if you only have one destination. Bare text outside of `<message>` blocks is scratchpad (logged but never sent). See the `## Sending messages` section in your runtime system prompt for the current destination list and names.
Do not ask the user to give you credentials. Credentials are managed by the user in the OneCLI agent vault. Add a "placeholder" string instead of the credential, and ask the user to add the credential to the vault. You can make a test request before the secret is added and the vault proxy will respond with the local url of the vault dashboard on the user's machine and a link to a form for adding that specific credential.
Do not ask the user to give you credentials or tell them how to create credentials (OAuth, API keys, etc.) — NEVER fabricate credential setup instructions. Credentials are handled by the OneCLI gateway. Use `"onecli-managed"` as the placeholder value for any credential env vars or config fields. After the MCP server is installed and the container restarts, load `/onecli-gateway` for the full credential-handling flow (connect URLs, stubs, error recovery).
Your HTTP requests go through the OneCLI proxy, which injects real credentials automatically. Just call any API directly (Gmail, GitHub, Slack, etc.) — the proxy adds auth before it reaches the service.
Use any method: curl, Python, a CLI tool, whatever fits. If a tool checks for credentials locally, pass any placeholder value — the proxy replaces it with real credentials at request time.
If you get a `401`/`403`/`app_not_connected`, the error response contains a `connect_url` — you MUST show it to the user as a bare URL on its own line (no angle brackets, no markdown link syntax) so they can click to connect. Run `/onecli-gateway` for the full error-handling flow. Never ask the user for API keys or tokens.
description: Introduce yourself to a newly connected channel. Triggered automatically when a channel is first wired. Send a friendly greeting and brief overview of what you can do.
---
# /welcome — Channel Onboarding
# /welcome — Channel Onboarding (Updated)
You've just been connected to a new messaging channel. Introduce yourself to the user.
You've just been connected to a new user. This your time to shine and make a strong first impression. Introduce yourself and guide the user through what you can do. you got this!
## What to do
1. Send a short, warm greeting using `send_message`
2. Mention your name (from your CLAUDE.md)
3. Make it clear you can do a lot — but do NOT list your tools or skills upfront. Keep it open-ended and intriguing
4. End by asking: would they like to explore what you can do, or jump straight into building/creating something?
1. Send a short, warm greeting
2. State your name (from your system prompt / CLAUDE.md)
3. Signal that you're capable of a lot — but don't list everything upfront. Be intriguing, not encyclopedic
4. Ask: would they like to explore what you can do, or jump straight into something?
**If they want to explore:** show one skill or capability at a time. Briefly explain what it does, offer to demo it or let them try it, then ask if they want to see the next one or move on. Drip-feed — never dump a list.
**If they want to explore:** drip-feed one capability at a time. Briefly explain it, offer to demo a compelling example or let them try it. Never dump a full list.
**If they want to jump in:** just go. Help them with whatever they ask.
**If they want to jump in:** just go.
---
## Capabilities to reveal (in order)
Reveal these one at a time, in this sequence. Each should be 2–4 sentences max.
### 1. Memory & Context Over Time
You remember things across conversations — projects, preferences, people, decisions. Users don't have to re-explain context every session. The more they work with you, the more situationally aware you become.
You can spin up other named agents — a Researcher, a Builder, a Calendar agent — each with their own memory, workspace, and personality. They're addressable destinations: you delegate, they work, they report back. These aren't one-shot tasks; they accumulate context across sessions.
### 3. Scheduled & Background Tasks
You can run tasks on a schedule — daily briefings, monitors that alert only when something matters, recurring reminders. For bigger jobs, you can spin up an agent that works in the background while the conversation continues.
### 4. Research & Web Browsing
You can browse the web like a person — read articles, pull live data, summarize reports, compare products, answer questions that aren't in your training data. Ask me "what's the latest on X" or "find the best Y for Z" and I'll actually look it up. Very powerful when combined with scheduled tasks.
### 5. Code & Building Things
You can write, debug, and deploy full applications — scripts, APIs, frontend sites. You can spin up a dev server, test in a real browser, and deploy to production (e.g. Vercel). Concept to live URL.
### 6. Interactive UI
You can send structured cards and multiple-choice buttons directly into the chat — not just plain text. Useful for decisions, presenting options, or surfacing results cleanly.
### 7. Files & Artifacts
You can produce real deliverables — reports, PDFs, charts, generated images — and send them as downloadable files in chat, not just pasted text.
### 8. Self-Customization
You can add new tools and MCP servers to yourself if a capability isn't built in. You can extend your own toolkit when the task requires it.
---
## Trust & Control — always include these
After the capabilities tour (or woven in naturally), cover these two points. Frame them positively — users stay in control.
### Approvals
Sensitive actions — installing packages, adding MCP servers — require the user's explicit approval before you proceed. They'll get a prompt; nothing happens automatically. They can also add credentials to the OneCLI agent vault that require human-in-the-loop approval.
### Access Control
The user owns who can talk to you. Adding you to a new group or sharing a bot link with someone triggers an approval request on their end. Nobody interacts with you without their say-so.
---
## How to interact — always mention this
There are no special commands. Users just talk naturally. If they want something done, they say so. That's it.
---
## Wrapping up
After the tour, finish with an open invitation. Ask if they want help with something specific. Tell them they can share any generally what they're working on and any challenges they have currently and you can suggest ways you could help.
---
## Tone
Warm, confident, and inviting. Make the user feel like they just unlocked something powerful. Match the channel's vibe (casual for Telegram/Discord, slightly more professional for Slack/Teams/email).
Warm, confident, inviting. Make the user feel like they just unlocked something powerful. Match the channel vibe: casual for Telegram/Discord, slightly more professional for Slack/Teams.
## Important
- Scan your available MCP tools and skills so you know what you have — but keep that knowledge in your back pocket. Reveal capabilities naturally, one at a time, only when relevant or when the user asks to explore.
- Never overwhelm with a full list. Discovery should feel like unwrapping, not reading a manual.
- Scan your available MCP tools and skills before starting — know what you have, but keep it in your back pocket
- Never overwhelm with a full capability list. Discovery should feel like unwrapping, not reading a manual
- Confirmations and corrections from the user during onboarding are feedback — save them to memory for future sessions
**`qwibitai/nanoclaw`** (upstream) — core engine with skill definitions (`.claude/skills/`). No channel code on `main`.
**`nanocoai/nanoclaw`** (upstream) — core engine with skill definitions (`.claude/skills/`). No channel code on `main`.
**Channel forks** (`nanoclaw-whatsapp`, `nanoclaw-telegram`, `nanoclaw-slack`, etc.) — each fork = upstream + one channel's code applied. Users clone upstream, then merge a fork into their clone to add a channel.
### 1. [FIXED] Resume branches from stale tree position
When agent teams spawns subagent CLI processes, they write to the same session JSONL. On subsequent `query()` resumes, the CLI reads the JSONL but may pick a stale branch tip (from before the subagent activity), causing the agent's response to land on a branch the host never receives a `result` for. **Fix**: pass `resumeSessionAt` with the last assistant message UUID to explicitly anchor each resume.
Both timers fire at the same time, so containers always exit via hard SIGKILL (code 137) instead of graceful `_close` sentinel shutdown. The idle timeout should be shorter (e.g., 5 min) so containers wind down between messages, while container timeout stays at 30 min as a safety net for stuck agents.
### 3. Cursor advanced before agent succeeds
`processGroupMessages` advances `lastAgentTimestamp` before the agent runs. If the container times out, retries find no messages (cursor already past them). Messages are permanently lost on timeout.
**Symptoms**: `Container exited with code 125: pull access denied for nanoclaw-agent` — the container image disappears overnight or after a few hours, even though you just built it.
**Cause**: If your container runtime has Kubernetes enabled (Rancher Desktop enables it by default), the kubelet runs image garbage collection when disk usage exceeds 85%. NanoClaw containers are ephemeral (run and exit), so `nanoclaw-agent:latest` is never protected by a running container. The kubelet sees it as unused and deletes it — often overnight when no messages are being processed. Other images (docker-compose services) survive because they have long-running containers referencing them.
**Fix**: Disable Kubernetes if you don't need it:
```bash
# Rancher Desktop
rdctl set --kubernetes-enabled=false
# Then rebuild the container image
./container/build.sh
```
**Diagnosis**: Check the k3s log for image GC activity:
- [x] Setup flow wired to channels (channel skills + /manage-channels for registration + verify.ts checks all tokens)
- [x] Channel Info metadata in each channel skill (type, terminology, how-to-find-id, isolation defaults)
- [x] /manage-channels skill (wire channels to agent groups with three isolation levels)
- [x] /init-first-agent skill (standalone first-agent bootstrap; walks the operator through channel pick → identity lookup → DM platform_id resolution → wire → welcome DM; fallback to telegram pair-code or "DM the bot first" lookup for channels without cold DM)
- [x] Cold-DM infrastructure — `ChannelAdapter.openDM?(handle)` optional method, resolved via Chat SDK `chat.openDM` for resolution-required channels (Discord, Slack, Teams, Webex, gChat) and fall-through to the handle directly for direct-addressable channels (Telegram, WhatsApp, iMessage, Matrix, Resend). `src/user-dm.ts::ensureUserDm` caches every resolution in `user_dms` so subsequent cold DMs are a DB read.
**Goal:** get the user out of Claude Code and into their messaging app as quickly as possible, then enable every part of customization, configuration, and setup from inside the chat app. Claude Code is the bootstrap, not the home.
- [~] Minimum-viable bootstrap in Claude Code: install deps, pick one channel, authenticate it, wire it to a default agent group, hand off — nothing else required before the user can leave Claude Code. `/setup` handles deps/auth, `/init-first-agent` handles the first-agent wiring + welcome DM. Still TODO: single top-level entrypoint that composes both, and a true "nothing else required" handoff (today `/setup` still runs through `/manage-channels` for additional channels).
- [~] Post-handoff welcome message in the chat app guides the user through remaining setup (channels, skills, integrations, memory, scheduling, etc.) — `/init-first-agent` stages a `kind:'chat'` / `sender:'system'` welcome prompt that the agent DMs back to the operator via the normal delivery path. Current prompt just introduces the agent; TODO: expand the prompt (or follow-up flow) to walk through remaining setup tasks from within the chat.
- [ ] Add more channels from chat (currently requires returning to Claude Code to run `/add-*` skills)
- [ ] Self-register agent into a new chat room from chat: user gives the agent a channel/group name + approval, and the agent joins via the underlying adapter (e.g. Baileys for WhatsApp), wires the room to an agent group, and posts a first "hi, I'm here" message — no manual invite, no `/add-*` skill, no terminal
- [ ] Authenticate channels from chat (OAuth/token entry via cards, no terminal required)
- [ ] Add credentials / secrets to the OneCLI vault from chat via rich card (agent collects API keys, OAuth tokens, and other secrets through a card flow and writes them into the vault — no `.env` editing, no terminal)
- [ ] Wire channels to agent groups from chat (today lives in `/manage-channels` Claude Code skill — port to in-chat flow with isolation-level question cards)
- [ ] Create new agent groups from chat (`create_agent` exists — expose via user-facing flow, not just agent-called tool)
- [ ] Edit agent group CLAUDE.md / instructions from chat
- [ ] Install / uninstall / configure skills from chat (see Skills & Marketplace section)
- [ ] Install / configure MCP servers from chat (see Skills & Marketplace section)
- [ ] Install packages from chat (today agent can request install_packages — expose a direct user-facing "install X" flow)
- [ ] Manage destinations from chat (list, rename, revoke)
- [ ] Manage permissions from chat (admin list, role assignment, approval policies)
- [ ] Trigger /setup, /debug, /customize, /migrate-nanoclaw from chat (today all require Claude Code)
- [ ] View and edit memory from chat
- [ ] Visualize current setup from chat (ties into Container Skills: installation diagram)
- [ ] Export / share setup from chat (ties into Container Skills: end-of-setup diagram + share)
- [ ] Fallback to Claude Code only when a change requires a code edit the agent can't self-apply (and even then, agent should offer to open Claude Code on the user's behalf)
## Product Focus
**North star:** prioritize skills, flows, and custom setups. Platform work (channels, routing, session DBs, approval flows, MCP tools) is plumbing — it should reach a "boring and reliable" state and then stop absorbing attention. The interesting surface area is what users can *build on top* of that plumbing: skills that add capabilities, conversational flows that orchestrate those skills, and custom per-user setups that compose channels/agents/skills/memory into something personal.
- [ ] Every new feature request should be answered first with "is this a skill?" before being answered with "is this a platform change?"
- [ ] Skills should be the primary extension mechanism users and agents reach for — adding, removing, browsing, editing, debugging
- [ ] Flows (multi-step interactive sequences: setup, onboarding, migration, customize, debug) should be authorable as skills rather than hardcoded into the platform
- [ ] Custom setups (diverging from defaults: multiple agents, cross-channel routing, per-group memory, specialist sub-agents) should be composable from existing primitives without touching core platform code
- [ ] Platform-level work gets budgeted against the question: "does this unblock a class of skills/flows/setups that's otherwise impossible?"
## Routing
- [x] Inbound routing (platform ID + thread ID -> agent group -> session)
- [x] Auto-create messaging group on first message
- [x] Session resolution (shared vs per-thread modes)
- [x] Message writing to session DB with seq numbering
- [x] Host sweep picks up due messages and advances recurrence
- [x] Scheduled outbound messages (no container wake needed)
- [ ] Pre-agent scripts (formatter references scriptOutput but no execution logic)
## Permissions and Approval Flows
- [x] User-level privilege model — `users` + `user_roles` (owner / admin, global or scoped to an agent group). Replaces the old `agent_groups.is_admin` / `messaging_groups.admin_user_id` coupling. See `src/modules/permissions/db/users.ts`, `src/modules/permissions/db/user-roles.ts`, `src/modules/permissions/access.ts`.
- [x] Admin-only command filtering — gate runs host-side in `src/command-gate.ts`, querying `user_roles` directly. The container receives no admin identity (no env var, no fallback).
- [x] Approval routing — `pickApprover` (scoped admin → global admin → owner, dedup) + `pickApprovalDelivery` (first reachable, same-channel-kind tie-break); delivery lands in the approver's DM via `ensureUserDm` / `user_dms` cache. See `src/modules/approvals/primitive.ts`, `src/modules/approvals/onecli-approvals.ts`.
- [x] install_packages (apt/npm, admin approval, name validation both sides, max 20 per request; on approve → handler rebuilds the image, kills the container, schedules a verify-and-report follow-up prompt)
- [x] add_mcp_server (admin approval; on approve → handler updates `container.json`, kills the container — no image rebuild)
- [x] Fire-and-forget model (write request, return immediately; chat notification on approval; container killed so next wake picks up new config/image)
- [~] OneCLI integration for human-loop approvals on credentialed requests (agent touching a credentialed resource → OneCLI gates → approval card to admin → OneCLI releases credential) — SDK 0.3.1 `configureManualApproval` wired into host, routes to admin via existing `pending_approvals` infra
- [ ] Tunneled OneCLI dashboard for credential addition (Telegram Mini Apps aside, iMessage without Apple Business Register, Matrix, email). Signed short-lived URL → browser form served by OneCLI at 10254 → tunnel via cloudflare durable object. Value never touches the chat surface.
- [ ] Self-modification via direct source edits — planned draft/activate flow: RO baseline mount at `/app/src`, RW draft at `/workspace/src-draft`, atomic snapshot into `pending`, admin approval, `cp -a` into baseline, restart + deadman rollback. Unifies runner src, host src, migrations, package.json, container config through one edit path. Collapses the abandoned `create_dev_agent`/`request_swap` dev-agent-in-worktree approach.
- [x] Per-agent local-name routing map (channels and peer agents referenced by local names)
- [x] Destinations stored in inbound.db `destinations` table (moved from JSON file in `b591d7c`) — single source of truth, no separate file
- [x] Host writes the destination map into inbound.db before every container wake; container queries it live on every lookup so admin changes take effect mid-session
- [x] Host delivery to target agent's session DB (`channel_type='agent'` routing in `src/delivery.ts`)
- [x] Agent spawning a new sub-agent (`create_agent` MCP tool, available to any agent, path-traversal guarded)
- [x] Dynamic agent group creation (folder + optional CLAUDE.md at runtime)
- [x] Internal-only agents (agents created without a channel attached)
- [x] Permission delegation from parent to child (bidirectional destination rows inserted at creation)
- [x] Bidirectional routing via inherited routing context; sender info enriched on the target side
- [ ] Specialist sub-agents (browser agent, dev agent — user's agent delegates with request/approval)
- [ ] Browser agent with per-destination permissions between main agent and browser agent (main requests navigation/interaction; browser agent executes in isolated container)
- [ ] Sanitization of browser agent responses before handing back to main agent (strip scripts, inline images, untrusted HTML; prevent prompt injection from web content)
- [ ] Same permission + sanitization model for any sub-agent that accesses sensitive data sources (files, DBs, third-party APIs)
## In-Chat Agent Management
- [x] /clear (resets session)
- [x] /compact (triggers context compaction)
- [~] /context (passes through, no rich formatting)
- [~] /usage (passes through, no rich formatting)
- [~] /cost (passes through, no rich formatting)
- [ ] Smooth session transitions: load context into new sessions, solve cold start problem
- [ ] MCP server marketplace — discover, preview, install
- [ ] Browse skills / MCP marketplace from chat (cards with search, preview, install)
- [ ] Local voice transcription skill — "just works" install flow: when the user sends a voice message and no transcription backend is installed, the agent asks once ("Install local voice transcription?"), and on approval the skill installs a fully-local speech-to-text model (no cloud calls). Subsequent voice messages transcribe automatically.
- [ ] Fully local NanoClaw — OpenCode + Gemma 4 as the agent provider instead of Claude Code, so an entire install can run with zero cloud inference. Requires wiring OpenCode as an agent provider (see Agent Providers) and a setup path that picks local models, pulls weights, and verifies everything runs offline.
## Container Skills
Container skills live inside agent containers at runtime (`container/skills/`) and are loaded into every agent session. These are distinct from feature/operational skills that ship with the host.
- [ ] Customize container skill — agent-driven customization flow (add channel, integration, behavior change) usable from inside any agent session, not just the main repo
- [ ] Debug container skill — inspect logs, session DB, MCP server state, container env, recent errors from inside the agent
- [ ] Build-system container skills:
- [ ] Karpathy LLM Wiki builder (agent scaffolds a persistent wiki knowledge base for a group)
- [ ] Generic build-system framework for agent-authored sub-systems
- [ ] NanoClaw installation diagram skill — agent generates a visual diagram of the user's current setup (agent groups, channels, wirings, destinations, sub-agents, installed packages/MCP servers)
- [ ] Video replay skill — generate Remotion (or similar) videos that replay chat flows and sessions, referencing good UI patterns to produce shareable clips
- [ ] Excitement trigger skill — detects when the user expresses excitement about the agent's capabilities or their setup, and proactively encourages generating a diagram + sharing it
- [ ] End-of-migration diagram skill — at the end of `/migrate-nanoclaw` (or any migration flow), agent generates a visual diagram of the resulting setup and suggests sharing
- [ ] End-of-setup diagram skill — at the end of first-time `/setup`, agent generates a visual diagram and suggests sharing (merges the old "Generate visual diagram of customized instance at end of setup" line from Channel Adapters)
## Webhook Ingestion
- [ ] Generic webhook endpoint for external events
- [ ] GitHub webhook handling
- [ ] CI/CD notification handling
- [ ] Webhook -> messages_in routing
## System Actions
- [ ] register_group from inside agent
- [ ] reset_session from inside agent
- [ ] Delivery failures should round-trip back to the agent as system messages so it can decide how to recover (retry as plain text, simplify, give up), with a hard retry cap + poison-pill backstop in delivery.ts to keep the queue healthy
## Integrations
- [x] Vercel CLI integration in setup process
- [x] Skills for deploying and managing Vercel websites from chat
- [ ] Shared memory with approval flow (write to global memory requires admin approval)
- [ ] Agent memory system skills — skills for building and managing memory systems for an agent: archive/index large collections of files and data, then expose a memory interface the agent can query and update (e.g. QMD-style systems)
## Migration
- [ ] Custom skill/code porting
- [ ] OneCLI migration check — determine if existing installs need OneCLI re-init (credentials re-scoped to new `agent_group.id` identifier, new SDK version, approval handler registered). If needed, add a migration step to `/update-nanoclaw` or a dedicated skill.
Compose agent instructions from a shared base, skill/tool fragments, and per-group memory — replacing the current per-group CLAUDE.md with a host-regenerated entry point.
## Problem
Today each agent group has a single RW `groups/<folder>/CLAUDE.md`, written once at init and never updated. Consequences:
- Upstream improvements to shared agent guidance don't propagate to existing groups
- No way to ship tool-specific guidance with the tool itself (e.g., an agent-browser usage fragment)
- Human-authored identity and agent-accumulated memory live in the same file with no separation
- The `.claude-global.md` symlink + `groups/global/CLAUDE.md` pattern handled the shared base but not per-module fragments
## Design
**Principle: RW = per-group memory, RO = shared content.** Same rule that governs the shared-source refactor, applied to agent instructions.
- `groups/<folder>/.claude-fragments/<name>.md` → `/app/skills/<name>/instructions.md` (for each enabled skill that ships a fragment)
Claude Code auto-loads `CLAUDE.local.md` from cwd without an import line — native behavior. Agent memory works natively; composition only wraps around it.
### Module fragment contract
**Skills.** A skill optionally ships an `instructions.md` at the top of its directory:
When the skill is enabled for a group, the host imports `instructions.md` into the composed CLAUDE.md. `SKILL.md` semantics are unchanged — Claude Code still uses it for skill discovery and on-demand invocation. Most skills won't need an `instructions.md` (SKILL.md is sufficient for on-demand skills); it's only for guidance that should be in context at all times.
**MCP servers.** A `container.json` MCP server entry can contribute a fragment inline:
```jsonc
{
"mcpServers": {
"my-db": {
"command": "...",
"instructions": "Read-only access to the production DB. Never run UPDATE/DELETE without admin approval."
}
}
}
```
Host writes the inline content to `.claude-fragments/mcp-<server-name>.md` at spawn and imports it.
**Global CLIs baked into the image** (agent-browser, vercel, claude-code) have always-present guidance; it belongs in `container/CLAUDE.md`, not as a conditional fragment. Don't try to make universally-present tools dynamic.
### Identity vs memory
All per-group content — human-authored identity ("you are the research agent, be terse") and agent-accumulated memory (inventories, user preferences, learned patterns) — lives in a single `CLAUDE.local.md`. Both humans and agents can edit it.
If the distinction becomes operationally important later (agents confused about what they were told vs. what they learned), split into `identity.md` (human-authored, imported into composed CLAUDE.md) + `CLAUDE.local.md` (agent memory only). Starting with one file.
## Changes
### `container/CLAUDE.md` (new)
Write the shared base: general NanoClaw context, how to engage with users, output conventions, anything that should apply to every agent across every group. Seed from current `groups/global/CLAUDE.md`.
### `container/skills/<name>/instructions.md` (optional, per skill)
Add for any skill that warrants always-in-context guidance. Optional.
### `container.json` schema
Add optional `instructions` field (string) to each MCP server entry.
### `container-runner.ts` spawn-time sync
Extend the skill-symlink sync function (added in the shared-source refactor) to also compose CLAUDE.md. On every spawn:
1. Sync `.claude-shared/skills/<name>` symlinks from `container.json` skill selection.
3. For each enabled skill with an `instructions.md`, create `.claude-fragments/<name>.md` symlink → `/app/skills/<name>/instructions.md`.
4. For each `container.json` MCP server with an `instructions` field, write the inline content to `.claude-fragments/mcp-<server-name>.md`.
5. Write `groups/<folder>/CLAUDE.md` atomically (temp + rename) with import lines in a deterministic order: shared base → skill fragments (alphabetical) → MCP fragments (alphabetical).
6. Remove stale symlinks and fragment files for modules no longer enabled.
### `group-init.ts`
- Stop writing an initial `groups/<folder>/CLAUDE.md` at group creation — host regenerates at first spawn.
- Stop creating the `.claude-global.md` symlink — replaced by `.claude-shared.md` in the composition step.
- Optionally create an empty `groups/<folder>/CLAUDE.local.md` at init as a clear affordance for humans and agents.
### `groups/global/`
Eliminate. The shared base moves to `container/CLAUDE.md`. Any deployment-specific overrides live in the owner's customized `container/CLAUDE.md` (same pattern as any other codebase customization).
## Migration
Breaking change, one-time cutover:
- For every group, rename `groups/<folder>/CLAUDE.md` → `groups/<folder>/CLAUDE.local.md`. Preserves all existing per-group content as memory.
- Move content from `groups/global/CLAUDE.md` (beyond the default stub) into `container/CLAUDE.md`. Delete `groups/global/`.
- Delete stale `.claude-global.md` symlinks in each group dir — the spawn pass creates `.claude-shared.md` instead.
- First spawn after cutover regenerates `CLAUDE.md` with proper imports.
## Interaction with shared-source refactor
This refactor depends on the shared skills mount (`/app/skills/` RO) from the shared-source refactor landing first. It extends the spawn-time sync from "just skill symlinks" to "skill symlinks + CLAUDE.md composition" — both passes share the same helper.
After this refactor, the "Personality / instructions" row in the shared-source per-group customization table splits:
| Resource | Location | Mechanism |
|----------|----------|-----------|
| Agent memory | `groups/<folder>/CLAUDE.local.md` | RW at `/workspace/agent/`, auto-loaded by Claude Code |
| Composed entry | `groups/<folder>/CLAUDE.md` | Host-regenerated at every spawn |
## What triggers what
| Change | Action | Scope |
|--------|--------|-------|
| Edit `container/CLAUDE.md` | Kill running containers (next spawn recomposes) | All groups |
| Add/edit a skill's `instructions.md` | Kill running containers | All groups with the skill enabled |
| Enable/disable a skill in `container.json` | Kill that group's containers | One group |
| Add MCP server with `instructions` field | Kill that group's containers | One group |
| Edit `CLAUDE.local.md` | Nothing — live via RW mount; Claude Code re-reads at next prompt | One group |
| Add a new agent group | Spawn writes `CLAUDE.md` fresh from the composition pass | One group |
Some files were not shown because too many files have changed in this diff
Show More
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.