docs(v2): cross-mount invariants + diagrams; inline a2a routing

- session-manager.ts: shrink the cross-mount invariant header from 31
  lines to 12, keeping each invariant's cause and consequence inline.
- agent-runner/db/connection.ts: parallel cross-mount comment for the
  container-side reader (inbound.db must be journal_mode=DELETE).
- agent-runner/db/messages-out.ts: document that even/odd seq parity
  is load-bearing — seq is the agent-facing message ID returned by
  send_message and consumed by edit_message / add_reaction, looked
  up across both tables.
- v2-checklist.md: record the cross-mount invariants and seq parity
  under Core Architecture so future "simplifications" don't regress
  them.
- scripts/sanity-live-poll.ts: empirical validation harness for the
  three cross-mount invariants — flips each one and observes silent
  message loss / corruption.
- delivery.ts: inline routeAgentMessage at its single callsite (-17
  net lines). The wrapper added more boilerplate than it factored.
- docs/v2-architecture-diagram.{md,html}: rendered Mermaid diagrams
  of the v2 system, message flow, named destinations, entity model,
  and the two-DB split.
- channels/adapter.ts, chat-sdk-bridge.ts, credentials.ts,
  db/sessions.ts, db/db-v2.test.ts: prettier format pass.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
gavrielc
2026-04-12 00:21:12 +03:00
parent c9fa5cdbed
commit 9dda75bb21
13 changed files with 788 additions and 86 deletions
+406
View File
@@ -0,0 +1,406 @@
<!doctype html>
<html lang="en">
<head>
<meta charset="utf-8" />
<meta name="viewport" content="width=device-width,initial-scale=1" />
<title>NanoClaw v2 Architecture</title>
<script src="https://cdn.jsdelivr.net/npm/mermaid@10/dist/mermaid.min.js"></script>
<style>
:root {
--bg: #0b0d12;
--panel: #141821;
--ink: #e7ecf3;
--muted: #8a94a6;
--accent: #7aa2ff;
--border: #232a38;
}
* { box-sizing: border-box; }
html, body {
margin: 0;
padding: 0;
background: var(--bg);
color: var(--ink);
font-family: -apple-system, BlinkMacSystemFont, "SF Pro Text", "Segoe UI", Helvetica, Arial, sans-serif;
font-size: 15px;
line-height: 1.55;
}
header {
padding: 32px 40px 16px;
border-bottom: 1px solid var(--border);
position: sticky;
top: 0;
background: rgba(11, 13, 18, 0.92);
backdrop-filter: saturate(180%) blur(10px);
z-index: 10;
}
header h1 {
margin: 0 0 4px;
font-size: 22px;
font-weight: 600;
letter-spacing: -0.01em;
}
header .sub {
color: var(--muted);
font-size: 13px;
}
nav {
display: flex;
flex-wrap: wrap;
gap: 8px;
margin-top: 14px;
}
nav a {
color: var(--accent);
text-decoration: none;
font-size: 12px;
padding: 4px 10px;
border: 1px solid var(--border);
border-radius: 999px;
background: var(--panel);
}
nav a:hover { border-color: var(--accent); }
main {
max-width: 1280px;
margin: 0 auto;
padding: 28px 40px 80px;
}
section {
margin-bottom: 48px;
}
section h2 {
font-size: 18px;
font-weight: 600;
margin: 0 0 6px;
letter-spacing: -0.005em;
}
section h2 .num {
color: var(--muted);
font-weight: 500;
margin-right: 8px;
}
section p.desc {
color: var(--muted);
margin: 0 0 16px;
max-width: 900px;
}
.diagram {
background: var(--panel);
border: 1px solid var(--border);
border-radius: 14px;
padding: 24px;
overflow-x: auto;
}
.diagram svg { max-width: 100%; height: auto; display: block; margin: 0 auto; }
table {
width: 100%;
border-collapse: collapse;
margin-top: 14px;
font-size: 13px;
}
th, td {
text-align: left;
padding: 10px 12px;
border-bottom: 1px solid var(--border);
}
th {
color: var(--muted);
font-weight: 500;
text-transform: uppercase;
font-size: 11px;
letter-spacing: 0.04em;
}
code {
font-family: "SF Mono", Menlo, Consolas, monospace;
font-size: 12px;
background: #1c2230;
padding: 1px 6px;
border-radius: 4px;
color: #c8d4ee;
}
footer {
color: var(--muted);
font-size: 12px;
text-align: center;
padding: 20px 0 0;
border-top: 1px solid var(--border);
}
</style>
</head>
<body>
<header>
<h1>NanoClaw v2 Architecture</h1>
<div class="sub">Session-DB messaging model · Chat SDK bridge · OneCLI credential gateway · per-session containers</div>
<nav>
<a href="#overview">1 · Overview</a>
<a href="#flow">2 · Message Flow</a>
<a href="#destinations">3 · Destinations &amp; A2A</a>
<a href="#entities">4 · Entity Model</a>
<a href="#twodb">5 · Two-DB Split</a>
</nav>
</header>
<main>
<section id="overview">
<h2><span class="num">1</span>System Overview</h2>
<p class="desc">
Inbound messages land at the Chat SDK bridge, which hands off to the
router. The router resolves the messaging group → agent group → session
and writes to the session's <code>inbound.db</code>. The container runner
spawns a per-session container (auth via OneCLI), and the agent-runner
polls its DB, calls Claude, and writes responses to <code>outbound.db</code>.
Delivery polls the outbound DB, re-validates destinations, and ships
messages back through the same bridge.
</p>
<div class="diagram">
<pre class="mermaid">
flowchart TB
subgraph Platforms["Messaging Platforms"]
P1[Discord]
P2[Telegram]
P3[Slack]
P4[GitHub / Linear]
P5[WhatsApp / iMessage / Teams / GChat / Matrix / Webex / Email]
end
subgraph Host["Host Process (Node)"]
direction TB
Bridge["Chat SDK Bridge<br/>src/channels/chat-sdk-bridge.ts"]
Router["Router<br/>src/router.ts<br/>platformId + threadId → session"]
SessMgr["Session Manager<br/>src/session-manager.ts"]
Runner["Container Runner<br/>src/container-runner.ts<br/>OneCLI ensureAgent + spawn"]
Delivery["Delivery Poller<br/>src/delivery.ts<br/>1s active / 60s sweep"]
Sweep["Host Sweep<br/>src/host-sweep.ts"]
Central[("Central DB · data/v2.db<br/>agent_groups · messaging_groups<br/>messaging_group_agents · sessions<br/>pending_approvals")]
end
subgraph OneCLI["OneCLI Gateway (0.3.1)"]
Vault["Agent Vault<br/>secrets + OAuth"]
Approvals["configureManualApproval"]
SecretsFacade["onecli-secrets.ts<br/>credential collection"]
end
subgraph Session["Per-Session Container"]
direction TB
PollLoop["Poll Loop<br/>container/agent-runner"]
Provider["Claude Agent SDK<br/>(codex / opencode planned)"]
MCP["MCP Tools<br/>send_message · send_file · edit_message<br/>send_card · ask_user_question · schedule_task<br/>create_agent · install_packages · add_mcp_server<br/>request_rebuild · trigger_credential_collection"]
InDB[("inbound.db<br/>host writes · even seq")]
OutDB[("outbound.db<br/>container writes · odd seq")]
end
Folder["Agent Group FS<br/>groups/*<br/>CLAUDE.md · memory · skills"]
P1 & P2 & P3 & P4 & P5 --> Bridge
Bridge --> Router
Router --> Central
Router --> SessMgr
SessMgr --> InDB
SessMgr --> Runner
Runner --> OneCLI
Runner --> PollLoop
PollLoop --> InDB
PollLoop --> Provider
Provider --> MCP
MCP --> OutDB
OutDB --> Delivery
Delivery --> Central
Delivery --> Bridge
Bridge --> P1 & P2 & P3 & P4 & P5
Sweep --> InDB
Sweep --> OutDB
Sweep --> Central
Runner -.mounts.-> Folder
MCP -.approval.-> Approvals
Approvals --> Central
MCP -.credential req.-> SecretsFacade
SecretsFacade --> Vault
Provider -.API calls.-> Vault
</pre>
</div>
</section>
<section id="flow">
<h2><span class="num">2</span>Message Flow</h2>
<p class="desc">
End-to-end path of a single message. The host and container never write
to the same SQLite file — the split between inbound and outbound DBs is
what makes this lock-free under concurrent activity.
</p>
<div class="diagram">
<pre class="mermaid">
sequenceDiagram
participant P as Platform (Telegram)
participant B as Chat SDK Bridge
participant R as Router
participant SM as Session Manager
participant IDB as inbound.db
participant C as Container (agent-runner)
participant ODB as outbound.db
participant D as Delivery Poller
P->>B: new message
B->>R: routeInbound(platformId, threadId, msg)
R->>R: resolve messaging_group → agent_group → session<br/>(agent-shared · shared · per-thread)
R->>SM: ensure session + DBs exist
R->>IDB: INSERT messages_in (even seq)
R->>C: wake container (spawn or signal)
C->>IDB: poll messages_in
C->>C: format xml → Claude SDK stream
C->>ODB: INSERT messages_out (odd seq)<br/>parse &lt;message to='name'&gt; blocks
D->>ODB: 1s active poll / 60s sweep
D->>D: hasDestination() re-validate
D->>B: deliver via adapter
B->>P: send · edit · react · file · card
</pre>
</div>
</section>
<section id="destinations">
<h2><span class="num">3</span>Named Destinations &amp; Agent-to-Agent</h2>
<p class="desc">
Agents address outputs by local name. The host looks up each name against
the agent's destinations table at delivery time — dropping anything
unauthorized. The same table routes agent-to-agent messages to a sibling
agent's <code>inbound.db</code> with bidirectional permission rows.
</p>
<div class="diagram">
<pre class="mermaid">
flowchart LR
subgraph AgentA["Agent Group A (main)"]
A_out["&lt;message to='slack'&gt;...&lt;/message&gt;<br/>&lt;message to='browser-agent'&gt;...&lt;/message&gt;<br/>&lt;internal&gt;scratchpad&lt;/internal&gt;"]
end
subgraph Dests["inbound.db.destinations (per agent)"]
D1["slack → messaging_group 42"]
D2["browser-agent → agent_group 7<br/>(bidirectional)"]
D3["github → messaging_group 13"]
end
subgraph AgentB["Agent Group B (browser sub-agent)"]
B_session["own inbound.db / outbound.db<br/>inherited destination back to A"]
end
Slack[Slack]
GitHub[GitHub PR]
A_out -->|parse + lookup| Dests
D1 -->|deliver| Slack
D2 -->|write to B's inbound.db| B_session
D3 -->|deliver| GitHub
B_session -.reply via 'parent'.-> Dests
</pre>
</div>
</section>
<section id="entities">
<h2><span class="num">4</span>Entity Model</h2>
<p class="desc">
Messaging groups and agent groups are many-to-many, joined via
<code>messaging_group_agents</code>. The <code>session_mode</code>
column selects one of three isolation levels.
</p>
<div class="diagram">
<pre class="mermaid">
erDiagram
agent_groups ||--o{ messaging_group_agents : wired
messaging_groups ||--o{ messaging_group_agents : wired
agent_groups ||--o{ sessions : runs
messaging_groups ||--o{ sessions : context
agent_groups ||--o{ agent_destinations : owns
agent_groups ||--o{ pending_approvals : requests
agent_groups {
int id
string name
string folder
bool is_admin
string agent_provider
json container_config
}
messaging_groups {
int id
string channel_type
string platform_id
string name
bool is_group
string admin_user_id
}
messaging_group_agents {
int messaging_group_id
int agent_group_id
string session_mode
json trigger_rules
int priority
}
sessions {
int id
int agent_group_id
int messaging_group_id
string sdk_session_id
string status
}
</pre>
</div>
<table>
<thead>
<tr><th>Level</th><th>session_mode</th><th>Shared</th><th>Example</th></tr>
</thead>
<tbody>
<tr><td>1 · Shared session</td><td><code>agent-shared</code></td><td>Workspace + memory + conversation</td><td>Slack + GitHub webhooks in one thread</td></tr>
<tr><td>2 · Same agent, separate sessions</td><td><code>shared</code> / <code>per-thread</code></td><td>Workspace + memory only</td><td>One agent across 3 Telegram chats</td></tr>
<tr><td>3 · Separate agent groups</td><td>— (different agent_group_id)</td><td>Nothing</td><td>Personal vs work channels</td></tr>
</tbody>
</table>
</section>
<section id="twodb">
<h2><span class="num">5</span>Two-DB Split</h2>
<p class="desc">
Each SQLite file has exactly one writer. The container touches a
heartbeat file instead of <code>UPDATE</code>-ing a liveness row, so host
sweep can detect staleness via <code>stat(mtime)</code> without opening the
DB. Host uses even seq numbers, container uses odd — collision-free.
</p>
<div class="diagram">
<pre class="mermaid">
flowchart LR
subgraph Mount["/workspace (volume mount)"]
In[("inbound.db")]
Out[("outbound.db")]
HB["/.heartbeat (file touch)"]
end
Host[Host process] -->|writes · even seq| In
Host -->|reads| Out
Container[agent-runner] -->|reads| In
Container -->|writes · odd seq| Out
Container -->|touch every poll| HB
HostSweep[Host sweep] -->|stat mtime| HB
HostSweep -->|reads processing_ack| In
</pre>
</div>
</section>
<footer>NanoClaw v2 · branch <code>v2</code> · generated from docs/v2-checklist.md, v2-architecture-draft.md, v2-isolation-model.md, v2-setup-wiring.md</footer>
</main>
<script>
mermaid.initialize({
startOnLoad: true,
theme: "dark",
securityLevel: "loose",
flowchart: { curve: "basis", padding: 18 },
themeVariables: {
background: "#141821",
primaryColor: "#1c2230",
primaryTextColor: "#e7ecf3",
primaryBorderColor: "#3a465e",
lineColor: "#6b7893",
secondaryColor: "#222a3a",
tertiaryColor: "#1a2030",
fontSize: "14px",
},
});
</script>
</body>
</html>
+200
View File
@@ -0,0 +1,200 @@
# NanoClaw v2 Architecture Diagram
## System Overview
```mermaid
flowchart TB
subgraph Platforms["Messaging Platforms"]
P1[Discord]
P2[Telegram]
P3[Slack]
P4[GitHub / Linear]
P5[WhatsApp / iMessage / Teams / GChat / Matrix / Webex / Email]
end
subgraph Host["Host Process (Node)"]
direction TB
Bridge["Chat SDK Bridge<br/>(src/channels/chat-sdk-bridge.ts)"]
Router["Router<br/>(src/router.ts)<br/>platformId + threadId -> messaging_group -> agent_group -> session"]
SessMgr["Session Manager<br/>(src/session-manager.ts)<br/>creates inbound.db + outbound.db"]
Runner["Container Runner<br/>(src/container-runner.ts)<br/>OneCLI ensureAgent + spawn"]
Delivery["Delivery Poller<br/>(src/delivery.ts)<br/>1s active / 60s sweep"]
Sweep["Host Sweep<br/>(src/host-sweep.ts)<br/>heartbeat, retry, recurrence"]
Central[("Central DB<br/>data/v2.db<br/>agent_groups<br/>messaging_groups<br/>messaging_group_agents<br/>sessions<br/>pending_approvals")]
end
subgraph OneCLI["OneCLI Gateway (0.3.1)"]
Vault["Agent Vault<br/>secrets + OAuth"]
Approvals["configureManualApproval<br/>-> pending_approvals"]
SecretsFacade["src/onecli-secrets.ts<br/>credential collection"]
end
subgraph Session["Per-Session Container (Docker / Apple Container)"]
direction TB
PollLoop["Poll Loop<br/>(container/agent-runner)"]
Provider["Claude Agent SDK<br/>(providers: claude, mock, todo: codex/opencode)"]
MCP["MCP Tools<br/>send_message, send_file, edit_message,<br/>add_reaction, send_card, ask_user_question,<br/>schedule_task, create_agent,<br/>install_packages, add_mcp_server, request_rebuild,<br/>trigger_credential_collection"]
Skills["Container Skills<br/>(container/skills/)"]
InDB[("inbound.db<br/>host writes<br/>even seq<br/>messages_in<br/>destinations<br/>processing_ack")]
OutDB[("outbound.db<br/>container writes<br/>odd seq<br/>messages_out<br/>heartbeat file")]
end
subgraph Groups["Agent Group Filesystem (groups/*)"]
Folder["CLAUDE.md<br/>memory<br/>per-group skills<br/>container_config"]
end
P1 & P2 & P3 & P4 & P5 --> Bridge
Bridge --> Router
Router --> Central
Router --> SessMgr
SessMgr --> InDB
SessMgr --> Runner
Runner --> OneCLI
Runner --> PollLoop
PollLoop --> InDB
PollLoop --> Provider
Provider --> MCP
Provider --> Skills
MCP --> OutDB
OutDB --> Delivery
Delivery --> Central
Delivery --> Bridge
Bridge --> P1 & P2 & P3 & P4 & P5
Sweep --> InDB
Sweep --> OutDB
Sweep --> Central
Runner -.mounts.-> Folder
MCP -.approval.-> Approvals
Approvals --> Central
MCP -.credential req.-> SecretsFacade
SecretsFacade --> Vault
Provider -.API calls.-> Vault
```
## Message Flow (inbound -> agent -> outbound)
```mermaid
sequenceDiagram
participant P as Platform (e.g. Telegram)
participant B as Chat SDK Bridge
participant R as Router
participant SM as Session Manager
participant IDB as inbound.db
participant C as Container (agent-runner)
participant ODB as outbound.db
participant D as Delivery Poller
P->>B: new message
B->>R: routeInbound(platformId, threadId, msg)
R->>R: resolve messaging_group -> agent_group -> session<br/>(agent-shared | shared | per-thread)
R->>SM: ensure session + DBs exist
R->>IDB: INSERT messages_in (even seq)
R->>C: wake container (docker run / already running)
C->>IDB: poll messages_in
C->>C: format xml, stream to Claude SDK
C->>ODB: INSERT messages_out (odd seq)<br/>parse <message to="name"> blocks
D->>ODB: 1s poll (active) / 60s (sweep)
D->>D: hasDestination() re-validate
D->>B: deliver via adapter
B->>P: send message / edit / react / file / card
```
## Named Destinations + Agent-to-Agent
```mermaid
flowchart LR
subgraph AgentA["Agent Group A (main)"]
A_out["output:<br/>&lt;message to='slack'&gt;...&lt;/message&gt;<br/>&lt;message to='browser-agent'&gt;...&lt;/message&gt;<br/>&lt;internal&gt;scratchpad&lt;/internal&gt;"]
end
subgraph Dests["inbound.db.destinations (per agent)"]
D1["slack -> messaging_group 42"]
D2["browser-agent -> agent_group 7<br/>(bidirectional row)"]
D3["github -> messaging_group 13"]
end
subgraph AgentB["Agent Group B (browser sub-agent)"]
B_session["own inbound.db / outbound.db<br/>inherited destination back to A"]
end
Slack[Slack channel]
GitHub[GitHub PR thread]
A_out -->|parse + lookup| Dests
D1 -->|deliver| Slack
D2 -->|write to B's inbound.db| B_session
D3 -->|deliver| GitHub
B_session -.reply via 'parent'.-> Dests
```
## Entity Model + Isolation Levels
```mermaid
erDiagram
agent_groups ||--o{ messaging_group_agents : wired
messaging_groups ||--o{ messaging_group_agents : wired
agent_groups ||--o{ sessions : runs
messaging_groups ||--o{ sessions : context
agent_groups ||--o{ agent_destinations : owns
agent_groups ||--o{ pending_approvals : requests
agent_groups {
int id
string name
string folder
bool is_admin
string agent_provider
json container_config
}
messaging_groups {
int id
string channel_type
string platform_id
string name
bool is_group
string admin_user_id
}
messaging_group_agents {
int messaging_group_id
int agent_group_id
string session_mode "agent-shared | shared | per-thread"
json trigger_rules
int priority
}
sessions {
int id
int agent_group_id
int messaging_group_id
string sdk_session_id
string status
}
```
### Isolation Level Cheatsheet
| Level | `session_mode` | What's shared | Example |
|---|---|---|---|
| 1. Shared session | `agent-shared` | Workspace + memory + conversation | Slack + GitHub webhooks in one thread |
| 2. Same agent, separate sessions | `shared` / `per-thread` | Workspace + memory only | One agent across 3 Telegram chats |
| 3. Separate agent groups | (different `agent_group_id`) | Nothing | Personal vs work channels |
## Two-DB Split (why)
```mermaid
flowchart LR
subgraph Mount["/workspace (volume mounted into container)"]
In[("inbound.db")]
Out[("outbound.db")]
HB["/.heartbeat (file touch)"]
end
Host[Host process] -->|"writes only<br/>(even seq)"| In
Host -->|reads| Out
Container[agent-runner] -->|reads| In
Container -->|"writes only<br/>(odd seq)"| Out
Container -->|touch every poll| HB
HostSweep[Host sweep] -->|stat mtime| HB
HostSweep -->|reads processing_ack| In
note1["Each file has exactly ONE writer.<br/>Eliminates SQLite cross-process write contention.<br/>Collision-free seq numbering."]
```
+3
View File
@@ -8,6 +8,8 @@ Status: [x] done, [~] partial, [ ] not started
- [x] Session DB replaces IPC (messages_in / messages_out as sole IO)
- [x] Two-DB split: inbound.db (host-owned) + outbound.db (container-owned) — zero cross-process write contention
- **Cross-mount invariants (empirically validated, see `scripts/sanity-live-poll.ts`):** (1) `journal_mode=DELETE` on every session DB — WAL's `-shm` is memory-mapped and VirtioFS does not propagate mmap coherency host→guest, so WAL leaves the container's poll loop frozen on an early snapshot with no error; (2) host opens-writes-closes per operation — the close is what invalidates the container's VirtioFS page cache; (3) one writer per file — DELETE-mode with two writers corrupts because journal-unlink doesn't propagate atomically. Each invariant was individually confirmed by flipping it and observing silent message loss or corruption. Do not "simplify" by unifying the DBs, switching to WAL, or keeping a long-lived host connection.
- **Seq parity is load-bearing, not cleanup:** host writes even seqs, container writes odd seqs. The seq is the agent-facing message ID returned by `send_message` and consumed by `edit_message` / `add_reaction`, and `getMessageIdBySeq()` looks up by seq across both tables. Removing parity would let a single ID resolve to the wrong row.
- [x] Central DB (agent groups, messaging groups, sessions, routing)
- [x] Host sweep (stale detection via heartbeat file, retry with backoff, recurrence scheduling)
- [x] Active delivery polling (1s for running sessions)
@@ -166,6 +168,7 @@ Status: [x] done, [~] partial, [ ] not started
- [~] Credential collection from chat — `trigger_credential_collection` MCP tool; agent researches API config, card → modal → `onecli secrets create` via internal facade (`src/onecli-secrets.ts`); credential value never enters agent context
- [ ] Replace `src/onecli-secrets.ts` shell facade with SDK-native secret management when `@onecli-sh/sdk` adds it
- [ ] Per-agent-group secret scoping via OneCLI `agentId` (facade passes it today; CLI ignores it until upstream supports)
- [ ] **Attach newly created secrets to the calling agent**`trigger_credential_collection` today runs `onecli secrets create` but leaves the secret unassigned, so the agent that requested the credential still gets zero injections. Fix options: (a) follow-up `onecli agents set-secrets` call in `src/onecli-secrets.ts` after create, (b) set the agent to `mode=all`, or (c) upstream ask — `onecli secrets create --assign-to-agent-ids <id,...>` so it's a one-shot and orphaned secrets are impossible. Prefer (c); use (a) as the interim.
- [ ] **Chat SDK input support beyond Slack (upstream ask)** — today only Slack's Modal surface works for secure input. The platforms themselves support it, but Chat SDK doesn't expose it:
- [ ] **Discord** — native modal (`InteractionResponseType.Modal` with `ActionRow([TextInput])`). Map `event.openModal(Modal(...))` to the Discord REST callback.
- [ ] **Microsoft Teams** — Adaptive Card with `Input.Text`, delivered as a regular message (inline, no modal-trigger needed).