mirror of https://github.com/qwibitai/nanoclaw.git synced 2026-06-04 10:14:47 +08:00

Files

T

gavrielc bfc626be82 docs: drop v2 framing across CLAUDE.md and 12 docs

Renamed 12 docs/v2-*.md → docs/*.md (already in index from earlier git mv).
Rewrote CLAUDE.md to describe the codebase as just "the codebase" rather
than "v2"; added a "Channels and Providers (skill-installed)" section
reflecting the new model and updated the docs index links.

Agent (general-purpose) cleaned the 12 doc bodies:
- Dropped "NanoClaw v2" / "v2 schema" / "(v2)" prose throughout
- Rewrote inter-doc cross-references docs/v2-X.md → docs/X.md
- Architecture, agent-runner-details: collapsed v1↔v2 comparison tables
  into present-tense facts; added notes that trunk only ships `claude`
  and that channel adapters are skill-installed from the `channels` branch
- Setup-wiring, checklist: dropped v1→v2 migration items that no longer
  apply
- Frozen runtime paths preserved: data/v2.db, data/v2-sessions/,
  container name nanoclaw-v2

git grep confirms remaining `\bv2\b` matches in docs/ are only those
runtime paths.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

2026-04-17 14:53:21 +03:00

21 KiB

Raw Blame History

NanoClaw Checklist

Status: [x] done, [~] partial, [ ] not started

Core Architecture

Session DB replaces IPC (messages_in / messages_out as sole IO)
Central DB (agent groups, messaging groups, sessions, routing)
Host sweep (stale detection via heartbeat file, retry with backoff, recurrence scheduling)
Active delivery polling (1s for running sessions)
Sweep delivery polling (60s across all sessions)
Container runner with session DB mounting
Per-session container lifecycle and idle timeout
Replace hard Idle and Timeout with work aware prompts to user to kill stuck processes
Session resume (sessionId + resumeAt across queries)
Graceful shutdown (SIGTERM/SIGINT handlers)
Orphan container cleanup on startup

Agent Runner (Container)

Poll loop (pending messages, status transitions, idle detection)
Concurrent follow-up polling while agent is thinking
Message formatter (chat, task, webhook, system kinds)
Command categorization (admin, filtered, passthrough)
Transcript archiving (pre-compact hook)
XML message formatting with sender, timestamp
[~] Media handling inbound (native files support for claude)

Agent Providers

Claude provider (Agent SDK, tool allowlist, message stream, session resume)
Mock provider (testing)
Provider factory
Codex provider
OpenCode provider

Channel Adapters

Channel adapter interface (setup, deliver, teardown, typing)
Chat SDK bridge (generic, works with any Chat SDK adapter)
Chat SDK SQLite state adapter (KV, subscriptions, locks, lists)
Discord via Chat SDK
[~] Slack via Chat SDK (adapter + skill written, not tested)
Telegram via Chat SDK (E2E verified: inbound, routing, typing, delivery)
[~] Microsoft Teams via Chat SDK (adapter + skill written, not tested)
[~] Google Chat via Chat SDK (adapter + skill written, not tested)
[~] Linear via Chat SDK (adapter + skill written, not tested)
[~] GitHub via Chat SDK (adapter + skill written, not tested)
WhatsApp Cloud API via Chat SDK (adapter + skill written, not tested)
[~] Resend (email) via Chat SDK (adapter + skill written, not tested)
[~] Matrix via Chat SDK (adapter + skill written, not tested)
[~] Webex via Chat SDK (adapter + skill written, not tested)
[~] iMessage via Chat SDK (adapter + skill written, not tested)
Backward compatibility with native channels (old adapters still work)
Channel barrel wired (src/index.ts imports barrel, skills uncomment)
Setup flow wired to channels (channel skills + /manage-channels for registration + verify.ts checks all tokens)
Channel Info metadata in each channel skill (type, terminology, how-to-find-id, isolation defaults)
/manage-channels skill (wire channels to agent groups with three isolation levels)
/init-first-agent skill (standalone first-agent bootstrap; walks the operator through channel pick → identity lookup → DM platform_id resolution → wire → welcome DM; fallback to telegram pair-code or "DM the bot first" lookup for channels without cold DM)
Cold-DM infrastructure — ChannelAdapter.openDM?(handle) optional method, resolved via Chat SDK chat.openDM for resolution-required channels (Discord, Slack, Teams, Webex, gChat) and fall-through to the handle directly for direct-addressable channels (Telegram, WhatsApp, iMessage, Matrix, Resend). src/user-dm.ts::ensureUserDm caches every resolution in user_dms so subsequent cold DMs are a DB read.
Agent-shared session mode (cross-channel shared sessions, e.g. GitHub + Slack)
Auto-onboarding on channel registration (/welcome skill triggered on first wiring)
Wire different chat modes - mentions, whitelist, approve, etc

Chat-First Setup Flow

Goal: get the user out of Claude Code and into their messaging app as quickly as possible, then enable every part of customization, configuration, and setup from inside the chat app. Claude Code is the bootstrap, not the home.

[~] Minimum-viable bootstrap in Claude Code: install deps, pick one channel, authenticate it, wire it to a default agent group, hand off — nothing else required before the user can leave Claude Code. /setup handles deps/auth, /init-first-agent handles the first-agent wiring + welcome DM. Still TODO: single top-level entrypoint that composes both, and a true "nothing else required" handoff (today /setup still runs through /manage-channels for additional channels).
[~] Post-handoff welcome message in the chat app guides the user through remaining setup (channels, skills, integrations, memory, scheduling, etc.) — /init-first-agent stages a kind:'chat' / sender:'system' welcome prompt that the agent DMs back to the operator via the normal delivery path. Current prompt just introduces the agent; TODO: expand the prompt (or follow-up flow) to walk through remaining setup tasks from within the chat.
Add more channels from chat (currently requires returning to Claude Code to run /add-* skills)
Self-register agent into a new chat room from chat: user gives the agent a channel/group name + approval, and the agent joins via the underlying adapter (e.g. Baileys for WhatsApp), wires the room to an agent group, and posts a first "hi, I'm here" message — no manual invite, no /add-* skill, no terminal
Authenticate channels from chat (OAuth/token entry via cards, no terminal required)
Add credentials / secrets to the OneCLI vault from chat via rich card (agent collects API keys, OAuth tokens, and other secrets through a card flow and writes them into the vault — no .env editing, no terminal)
Wire channels to agent groups from chat (today lives in /manage-channels Claude Code skill — port to in-chat flow with isolation-level question cards)
Create new agent groups from chat (create_agent exists — expose via user-facing flow, not just agent-called tool)
Edit agent group CLAUDE.md / instructions from chat
Install / uninstall / configure skills from chat (see Skills & Marketplace section)
Install / configure MCP servers from chat (see Skills & Marketplace section)
Install packages from chat (today agent can request install_packages — expose a direct user-facing "install X" flow)
Manage scheduled tasks from chat (list, pause, cancel, edit recurrence)
Manage destinations from chat (list, rename, revoke)
Manage permissions from chat (admin list, role assignment, approval policies)
Trigger /setup, /debug, /customize, /migrate-nanoclaw from chat (today all require Claude Code)
View and edit memory from chat
Visualize current setup from chat (ties into Container Skills: installation diagram)
Export / share setup from chat (ties into Container Skills: end-of-setup diagram + share)
Fallback to Claude Code only when a change requires a code edit the agent can't self-apply (and even then, agent should offer to open Claude Code on the user's behalf)

Product Focus

North star: prioritize skills, flows, and custom setups. Platform work (channels, routing, session DBs, approval flows, MCP tools) is plumbing — it should reach a "boring and reliable" state and then stop absorbing attention. The interesting surface area is what users can build on top of that plumbing: skills that add capabilities, conversational flows that orchestrate those skills, and custom per-user setups that compose channels/agents/skills/memory into something personal.

Every new feature request should be answered first with "is this a skill?" before being answered with "is this a platform change?"
Skills should be the primary extension mechanism users and agents reach for — adding, removing, browsing, editing, debugging
Flows (multi-step interactive sequences: setup, onboarding, migration, customize, debug) should be authorable as skills rather than hardcoded into the platform
Custom setups (diverging from defaults: multiple agents, cross-channel routing, per-group memory, specialist sub-agents) should be composable from existing primitives without touching core platform code
Platform-level work gets budgeted against the question: "does this unblock a class of skills/flows/setups that's otherwise impossible?"

Routing

Inbound routing (platform ID + thread ID -> agent group -> session)
Auto-create messaging group on first message
Session resolution (shared vs per-thread modes)
Message writing to session DB with seq numbering
Container waking on new message
Typing indicator triggered on message route
[~] Trigger rule matching (router picks highest-priority agent, regex/mention matching TODO)

Rich Messaging

Interactive cards with buttons (ask_user_question)
Native platform rendering (Discord embeds, buttons)
Message editing
Emoji reactions
File sending from agent (outbox -> delivery)
File upload delivery (buffer-based via adapter)
Markdown formatting
[~] Formatted /usage, /context, /cost output (commands pass through, no rich card formatting)
Context window visibility: show position in context, approaching compaction, when compaction happens, post-compaction state
Threading and replies support
Auto-compact on idle before cache expires

MCP Tools (Container)

send_message (routes via named destinations; to field resolved against agent's local map)
send_file (copy to outbox, write messages_out)
edit_message (routed via destinations)
add_reaction (routed via destinations)
send_card
ask_user_question (blocking poll for response)
schedule_task (with process_after and recurrence)
list_tasks
cancel_task / pause_task / resume_task
create_agent (any agent, creates agent group + folder + bidirectional destinations; host re-normalizes the name, deduplicates folder, path-traversal guarded)
install_packages (apt/npm, owner/admin approval required via pickApprover, strict name validation)
add_mcp_server (owner/admin approval required via pickApprover)
request_rebuild (rebuilds per-agent-group Docker image)

Scheduling

One-shot scheduled messages (process_after / deliver_after)
Recurring tasks via cron expressions
Host sweep picks up due messages and advances recurrence
Scheduled outbound messages (no container wake needed)
Pre-agent scripts (formatter references scriptOutput but no execution logic)

Permissions and Approval Flows

User-level privilege model — users + user_roles (owner / admin, global or scoped to an agent group). Replaces the old agent_groups.is_admin / messaging_groups.admin_user_id coupling. See src/db/users.ts, src/db/user-roles.ts, src/access.ts.
Admin-only command filtering in container — host passes NANOCLAW_ADMIN_USER_IDS (owners + global admins + scoped admins for the agent group) to the agent-runner; poll-loop.ts gates slash commands against that set.
Approval routing — pickApprover (scoped admin → global admin → owner, dedup) + pickApprovalDelivery (first reachable, same-channel-kind tie-break); delivery lands in the approver's DM via ensureUserDm / user_dms cache. See src/access.ts, src/onecli-approvals.ts.
Per-messaging-group unknown-sender gating — messaging_groups.unknown_sender_policy (strict | request_approval | public), enforced in src/router.ts.
Approval flow (sensitive action -> card to admin -> approve/reject -> execute) — pending_approvals table, requestApproval() helper, reuses interactive card infra
Agent requests dependency/package install (install_packages, admin approval, rebuild on approval)
Self-modification — direct tools:
- install_packages (apt/npm, admin approval, name validation both sides, max 20 per request)
- add_mcp_server (admin approval)
- request_rebuild (builds per-agent-group Docker image with approved packages)
- Fire-and-forget model (write request, return immediately; chat notification on approval; container killed so next wake picks up new config/image)
[~] OneCLI integration for human-loop approvals on credentialed requests (agent touching a credentialed resource → OneCLI gates → approval card to admin → OneCLI releases credential) — SDK 0.3.1 configureManualApproval wired into host, routes to admin via existing pending_approvals infra
Tunneled OneCLI dashboard for credential addition (Telegram Mini Apps aside, iMessage without Apple Business Register, Matrix, email). Signed short-lived URL → browser form served by OneCLI at 10254 → tunnel via cloudflare durable object. Value never touches the chat surface.
Self-modification via direct source edits — planned draft/activate flow: RO baseline mount at /app/src, RW draft at /workspace/src-draft, atomic snapshot into pending, admin approval, cp -a into baseline, restart + deadman rollback. Unifies runner src, host src, migrations, package.json, container config through one edit path. Collapses the abandoned create_dev_agent/request_swap dev-agent-in-worktree approach.

Named Destinations + ACL

agent_destinations table (agent_group_id, local_name, target_type, target_id) — migration 004
Per-agent local-name routing map (channels and peer agents referenced by local names)
Destinations stored in inbound.db destinations table (moved from JSON file in b591d7c) — single source of truth, no separate file
Host writes the destination map into inbound.db before every container wake; container queries it live on every lookup so admin changes take effect mid-session
Container loads map at startup, appends system-prompt addendum listing destinations + <message to="name"> syntax
Agent main output parsed for <message to="..."> blocks; <internal>...</internal> treated as scratchpad
Host re-validates every outbound route via hasDestination() — unauthorized drops logged
Inbound formatter adds from="name" via reverse-lookup (consistent namespace both directions)
Single-destination shortcut — agents with one destination don't need <message> wrapping
Backfill from existing messaging_group_agents on migration
Removed NANOCLAW_PLATFORM_ID / CHANNEL_TYPE / THREAD_ID env-var routing entirely

Agent-to-Agent Communication

Host delivery to target agent's session DB (channel_type='agent' routing in src/delivery.ts)
Agent spawning a new sub-agent (create_agent MCP tool, available to any agent, path-traversal guarded)
Dynamic agent group creation (folder + optional CLAUDE.md at runtime)
Internal-only agents (agents created without a channel attached)
Permission delegation from parent to child (bidirectional destination rows inserted at creation)
Bidirectional routing via inherited routing context; sender info enriched on the target side
Specialist sub-agents (browser agent, dev agent — user's agent delegates with request/approval)
Browser agent with per-destination permissions between main agent and browser agent (main requests navigation/interaction; browser agent executes in isolated container)
Sanitization of browser agent responses before handing back to main agent (strip scripts, inline images, untrusted HTML; prevent prompt injection from web content)
Same permission + sanitization model for any sub-agent that accesses sensitive data sources (files, DBs, third-party APIs)

In-Chat Agent Management

/clear (resets session)
/compact (triggers context compaction)
[~] /context (passes through, no rich formatting)
[~] /usage (passes through, no rich formatting)
[~] /cost (passes through, no rich formatting)
Smooth session transitions: load context into new sessions, solve cold start problem
MCP/package installation from chat
Browse MCP marketplace / skills repository from chat

Skills & Marketplace

Install skills from chat (agent requests, admin approves, skill dropped into container skills dir)
Scan skills before install (lint SKILL.md, sandbox-check shell commands, require approval for network/FS-heavy skills)
Scan marketplace npm packages before install (supply-chain check, typo-squat detection, known-bad list)
MCP server marketplace — discover, preview, install
Browse skills / MCP marketplace from chat (cards with search, preview, install)
Local voice transcription skill — "just works" install flow: when the user sends a voice message and no transcription backend is installed, the agent asks once ("Install local voice transcription?"), and on approval the skill installs a fully-local speech-to-text model (no cloud calls). Subsequent voice messages transcribe automatically.
Fully local NanoClaw — OpenCode + Gemma 4 as the agent provider instead of Claude Code, so an entire install can run with zero cloud inference. Requires wiring OpenCode as an agent provider (see Agent Providers) and a setup path that picks local models, pulls weights, and verifies everything runs offline.

Container Skills

Container skills live inside agent containers at runtime (container/skills/) and are loaded into every agent session. These are distinct from feature/operational skills that ship with the host.

Customize container skill — agent-driven customization flow (add channel, integration, behavior change) usable from inside any agent session, not just the main repo
Debug container skill — inspect logs, session DB, MCP server state, container env, recent errors from inside the agent
Build-system container skills:
- Karpathy LLM Wiki builder (agent scaffolds a persistent wiki knowledge base for a group)
- Generic build-system framework for agent-authored sub-systems
NanoClaw installation diagram skill — agent generates a visual diagram of the user's current setup (agent groups, channels, wirings, destinations, sub-agents, installed packages/MCP servers)
Video replay skill — generate Remotion (or similar) videos that replay chat flows and sessions, referencing good UI patterns to produce shareable clips
Excitement trigger skill — detects when the user expresses excitement about the agent's capabilities or their setup, and proactively encourages generating a diagram + sharing it
End-of-migration diagram skill — at the end of /migrate-nanoclaw (or any migration flow), agent generates a visual diagram of the resulting setup and suggests sharing
End-of-setup diagram skill — at the end of first-time /setup, agent generates a visual diagram and suggests sharing (merges the old "Generate visual diagram of customized instance at end of setup" line from Channel Adapters)

Webhook Ingestion

Generic webhook endpoint for external events
GitHub webhook handling
CI/CD notification handling
Webhook -> messages_in routing

System Actions

register_group from inside agent
reset_session from inside agent
Delivery failures should round-trip back to the agent as system messages so it can decide how to recover (retry as plain text, simplify, give up), with a hard retry cap + poison-pill backstop in delivery.ts to keep the queue healthy

Integrations

Vercel CLI integration in setup process
Skills for deploying and managing Vercel websites from chat
Office 365 integration (create/edit documents with inline suggestions)

Memory

Shared memory with approval flow (write to global memory requires admin approval)
Agent memory system skills — skills for building and managing memory systems for an agent: archive/index large collections of files and data, then expose a memory interface the agent can query and update (e.g. QMD-style systems)

Migration

Custom skill/code porting
OneCLI migration check — determine if existing installs need OneCLI re-init (credentials re-scoped to new agent_group.id identifier, new SDK version, approval handler registered). If needed, add a migration step to /update-nanoclaw or a dedicated skill.

Testing

DB layer tests (agent groups, messaging groups, sessions, pending questions)
Channel registry tests
Poll loop / formatter tests
Integration test (container agent-runner)
Host core tests
End-to-end flow tests (message in -> agent -> message out -> delivery)
Delivery polling tests
Host sweep tests (stale detection, recurrence)
Multi-channel integration tests

Rollout

Internal testing across all channels
Migration skill built and tested
PR factory migrated as validation
Blog post / announcement
Video demos of key flows
Vercel coordination

21 KiB Raw Blame History