mirror of
https://github.com/qwibitai/nanoclaw.git
synced 2026-06-04 10:14:47 +08:00
docs: v2 architecture design — session DB, channel adapters, agent provider
Three documents covering the complete v2 architecture: - v2-architecture-draft.md: Core design (per-session SQLite as sole IO, two-level DB, entity model, channel adapters with Chat SDK bridge, container lifecycle, message flow, interactive operations, routing, flexibility model with PR Factory example) - v2-api-details.md: Channel adapter interface definitions, Chat SDK bridge implementation, native channel example, message content format examples, host delivery logic - v2-agent-runner-details.md: AgentProvider interface (stream-in/out), provider implementations (Claude, Codex, OpenCode), poll loop, MCP tool definitions, message formatting, media handling, container startup Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -0,0 +1,763 @@
|
||||
# NanoClaw v2 Agent-Runner Details
|
||||
|
||||
Implementation-level details for the agent-runner inside the container. See [v2-architecture-draft.md](v2-architecture-draft.md) for the high-level design.
|
||||
|
||||
## Separation of Concerns
|
||||
|
||||
The agent-runner has two layers:
|
||||
|
||||
1. **Agent-runner core** — owns the poll loop, message formatting, DB reads/writes, MCP tool implementations, routing, status management, media handling. This is NanoClaw-specific and shared across all providers.
|
||||
|
||||
2. **Agent provider** — owns the SDK interaction. Takes formatted prompts, pushes them to the SDK, yields events back. Each SDK (Claude, Codex, OpenCode) gets its own provider implementation.
|
||||
|
||||
The boundary: the agent-runner decides **what** to send and **what to do** with results. The provider decides **how** to talk to the SDK.
|
||||
|
||||
## AgentProvider Interface
|
||||
|
||||
```typescript
|
||||
interface AgentProvider {
|
||||
/** Start a new query. Returns a handle for streaming input and output. */
|
||||
query(input: QueryInput): AgentQuery;
|
||||
}
|
||||
|
||||
interface QueryInput {
|
||||
/** Initial prompt (already formatted by agent-runner).
|
||||
* String for text-only. ContentBlock[] for multimodal (images, PDFs, audio). */
|
||||
prompt: string | ContentBlock[];
|
||||
|
||||
/** Session ID to resume, if any */
|
||||
sessionId?: string;
|
||||
|
||||
/** Resume from a specific point in the session (provider-specific, may be ignored) */
|
||||
resumeAt?: string;
|
||||
|
||||
/** Working directory inside the container */
|
||||
cwd: string;
|
||||
|
||||
/** MCP server configurations (normalized format — provider translates) */
|
||||
mcpServers: Record<string, McpServerConfig>;
|
||||
|
||||
/** System prompt / developer instructions */
|
||||
systemPrompt?: string;
|
||||
|
||||
/** Environment variables for the SDK process */
|
||||
env: Record<string, string | undefined>;
|
||||
|
||||
/** Additional directories the agent can access */
|
||||
additionalDirectories?: string[];
|
||||
}
|
||||
|
||||
interface McpServerConfig {
|
||||
command: string;
|
||||
args: string[];
|
||||
env: Record<string, string>;
|
||||
}
|
||||
|
||||
interface AgentQuery {
|
||||
/** Push a follow-up message into the active query */
|
||||
push(message: string): void;
|
||||
|
||||
/** Signal that no more input will be sent */
|
||||
end(): void;
|
||||
|
||||
/** Output event stream */
|
||||
events: AsyncIterable<ProviderEvent>;
|
||||
|
||||
/** Force-stop the query (e.g., container shutting down) */
|
||||
abort(): void;
|
||||
}
|
||||
|
||||
type ProviderEvent =
|
||||
| { type: 'init'; sessionId: string }
|
||||
| { type: 'result'; text: string | null }
|
||||
| { type: 'error'; message: string; retryable: boolean; classification?: string }
|
||||
| { type: 'progress'; message: string };
|
||||
```
|
||||
|
||||
### What the interface does NOT include
|
||||
|
||||
- **Message formatting** — the agent-runner formats messages before passing to the provider. The provider receives a ready-to-send prompt string.
|
||||
- **Hooks** — Claude-specific. The Claude provider registers hooks internally (PreCompact, PreToolUse, etc.). Other providers don't need them.
|
||||
- **Tool allowlists** — Claude uses `allowedTools`. Codex uses `approvalPolicy`. OpenCode uses `permission`. Each provider configures this internally based on the same intent: "allow everything, no prompting."
|
||||
- **Session persistence** — Claude persists sessions to disk automatically. Codex and OpenCode manage their own session state. The agent-runner doesn't control this — it just passes `sessionId` and `resumeAt`.
|
||||
- **Sandbox configuration** — provider-specific. Each provider configures its own sandbox internally.
|
||||
|
||||
### Provider event semantics
|
||||
|
||||
- **`init`** — emitted once per query when the provider establishes or resumes a session. The agent-runner captures `sessionId` for future resume.
|
||||
- **`result`** — emitted when the agent produces a complete response. May be emitted multiple times per query (e.g., Claude's multi-turn with subagents). The agent-runner writes each result to messages_out.
|
||||
- **`error`** — emitted on failure. `retryable` indicates whether the agent-runner should retry. `classification` is optional detail (e.g., 'quota', 'auth', 'transport').
|
||||
- **`progress`** — optional, for logging. The agent-runner logs these but doesn't act on them.
|
||||
|
||||
## Provider Implementations
|
||||
|
||||
### Claude Provider
|
||||
|
||||
Wraps `@anthropic-ai/claude-agent-sdk`'s `query()`.
|
||||
|
||||
```typescript
|
||||
class ClaudeProvider implements AgentProvider {
|
||||
query(input: QueryInput): AgentQuery {
|
||||
const stream = new MessageStream(); // AsyncIterable<SDKUserMessage>
|
||||
stream.push(input.prompt);
|
||||
|
||||
const sdkQuery = query({
|
||||
prompt: stream,
|
||||
options: {
|
||||
cwd: input.cwd,
|
||||
resume: input.sessionId,
|
||||
resumeSessionAt: input.resumeAt,
|
||||
systemPrompt: input.systemPrompt
|
||||
? { type: 'preset', preset: 'claude_code', append: input.systemPrompt }
|
||||
: undefined,
|
||||
mcpServers: input.mcpServers, // already the right shape
|
||||
additionalDirectories: input.additionalDirectories,
|
||||
env: input.env,
|
||||
allowedTools: NANOCLAW_TOOL_ALLOWLIST,
|
||||
permissionMode: 'bypassPermissions',
|
||||
allowDangerouslySkipPermissions: true,
|
||||
hooks: {
|
||||
PreCompact: [{ hooks: [preCompactHook] }],
|
||||
PreToolUse: [{ matcher: 'Bash', hooks: [sanitizeBashHook] }],
|
||||
},
|
||||
},
|
||||
});
|
||||
|
||||
return {
|
||||
push: (msg) => stream.push(msg),
|
||||
end: () => stream.end(),
|
||||
abort: () => sdkQuery.close(),
|
||||
events: translateClaudeEvents(sdkQuery),
|
||||
};
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
`translateClaudeEvents` is an async generator that maps SDK messages to `ProviderEvent`:
|
||||
- `message.type === 'system' && message.subtype === 'init'` → `{ type: 'init', sessionId }`
|
||||
- `message.type === 'result'` → `{ type: 'result', text }`
|
||||
- `message.type === 'system' && message.subtype === 'api_retry'` → `{ type: 'error', retryable: true }`
|
||||
- `message.type === 'system' && message.subtype === 'rate_limit_event'` → `{ type: 'error', retryable: false, classification: 'quota' }`
|
||||
- `message.type === 'system' && message.subtype === 'task_notification'` → `{ type: 'progress', message }`
|
||||
- Everything else → logged, not emitted
|
||||
|
||||
**Claude-specific features preserved inside the provider:**
|
||||
- `MessageStream` for async iterable input (push-based)
|
||||
- `resumeSessionAt` for resume at specific message UUID
|
||||
- PreCompact hook for transcript archiving
|
||||
- PreToolUse hook for sanitizing bash env vars
|
||||
- Full tool allowlist
|
||||
- `additionalDirectories` for multi-directory access
|
||||
|
||||
### Codex Provider
|
||||
|
||||
Wraps `@openai/codex-sdk`.
|
||||
|
||||
```typescript
|
||||
class CodexProvider implements AgentProvider {
|
||||
query(input: QueryInput): AgentQuery {
|
||||
const codex = new Codex(this.buildOptions(input));
|
||||
const thread = input.sessionId
|
||||
? codex.resumeThread(input.sessionId, this.threadOptions(input))
|
||||
: codex.startThread(this.threadOptions(input));
|
||||
|
||||
const abortController = new AbortController();
|
||||
let pendingFollowUp: string | null = null;
|
||||
|
||||
return {
|
||||
push: (msg) => {
|
||||
// Codex doesn't support streaming input.
|
||||
// Store the follow-up and abort the current turn.
|
||||
pendingFollowUp = msg;
|
||||
abortController.abort();
|
||||
},
|
||||
end: () => { /* no-op — Codex turns end naturally */ },
|
||||
abort: () => abortController.abort(),
|
||||
events: this.run(thread, input.prompt, abortController, () => pendingFollowUp),
|
||||
};
|
||||
}
|
||||
|
||||
private async *run(thread, prompt, abortController, getPendingFollowUp): AsyncIterable<ProviderEvent> {
|
||||
let currentPrompt = prompt;
|
||||
|
||||
while (true) {
|
||||
try {
|
||||
const streamed = await thread.runStreamed(currentPrompt, {
|
||||
signal: abortController.signal,
|
||||
});
|
||||
|
||||
let sessionId: string | undefined;
|
||||
let resultText = '';
|
||||
|
||||
for await (const event of streamed.events) {
|
||||
if (event.type === 'thread.started') {
|
||||
sessionId = event.thread_id;
|
||||
yield { type: 'init', sessionId };
|
||||
}
|
||||
if (event.type === 'item.completed' && event.item.type === 'agent_message') {
|
||||
resultText = event.item.text || resultText;
|
||||
}
|
||||
if (event.type === 'turn.failed') {
|
||||
yield { type: 'error', message: event.error.message, retryable: false };
|
||||
return;
|
||||
}
|
||||
}
|
||||
|
||||
yield { type: 'result', text: resultText || null };
|
||||
|
||||
// Check if a follow-up was queued during this turn
|
||||
const followUp = getPendingFollowUp();
|
||||
if (followUp) {
|
||||
currentPrompt = followUp;
|
||||
// Reset for next iteration
|
||||
continue;
|
||||
}
|
||||
|
||||
return;
|
||||
} catch (err) {
|
||||
if (abortController.signal.aborted && getPendingFollowUp()) {
|
||||
// Aborted because of follow-up — restart with new prompt
|
||||
currentPrompt = getPendingFollowUp();
|
||||
abortController = new AbortController();
|
||||
continue;
|
||||
}
|
||||
throw err;
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**Codex-specific behavior inside the provider:**
|
||||
- `developer_instructions` for system prompt (loaded from CLAUDE.md)
|
||||
- `git init` in workspace (Codex requires a git repo)
|
||||
- Abort+restart pattern for follow-up messages
|
||||
- `sandboxMode`, `approvalPolicy`, `networkAccessEnabled` from env vars
|
||||
- Conversation archiving (Codex doesn't have PreCompact)
|
||||
|
||||
### OpenCode Provider
|
||||
|
||||
Wraps `@opencode-ai/sdk`.
|
||||
|
||||
```typescript
|
||||
class OpenCodeProvider implements AgentProvider {
|
||||
query(input: QueryInput): AgentQuery {
|
||||
// OpenCode runs a local server — create it once, reuse across queries
|
||||
const { client, server } = await createOpencode({ config: this.buildConfig(input) });
|
||||
const { stream } = await client.event.subscribe();
|
||||
|
||||
let aborted = false;
|
||||
let pendingFollowUp: string | null = null;
|
||||
|
||||
return {
|
||||
push: (msg) => {
|
||||
pendingFollowUp = msg;
|
||||
server.close(); // interrupt current query
|
||||
},
|
||||
end: () => { /* no-op */ },
|
||||
abort: () => { aborted = true; server.close(); },
|
||||
events: this.run(client, server, stream, input, () => pendingFollowUp),
|
||||
};
|
||||
}
|
||||
|
||||
private async *run(client, server, stream, input, getPendingFollowUp): AsyncIterable<ProviderEvent> {
|
||||
const session = await client.session.create();
|
||||
yield { type: 'init', sessionId: session.data.id };
|
||||
|
||||
await client.session.promptAsync({
|
||||
path: { id: session.data.id },
|
||||
body: { parts: [{ type: 'text', text: input.prompt }] },
|
||||
});
|
||||
|
||||
for await (const event of stream) {
|
||||
if (event.type === 'session.idle') {
|
||||
// Collect result text from accumulated message parts
|
||||
const resultText = this.extractResult(event);
|
||||
yield { type: 'result', text: resultText };
|
||||
|
||||
const followUp = getPendingFollowUp();
|
||||
if (followUp) {
|
||||
await client.session.promptAsync({
|
||||
path: { id: session.data.id },
|
||||
body: { parts: [{ type: 'text', text: followUp }] },
|
||||
});
|
||||
continue;
|
||||
}
|
||||
|
||||
return;
|
||||
}
|
||||
|
||||
if (event.type === 'session.error') {
|
||||
yield { type: 'error', message: event.properties?.error?.data?.message, retryable: false };
|
||||
return;
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**OpenCode-specific behavior inside the provider:**
|
||||
- Local gRPC/HTTP server lifecycle (`server.close()`)
|
||||
- SSE event stream for output
|
||||
- Provider/model selection via config (`OPENCODE_PROVIDER`, `OPENCODE_MODEL`)
|
||||
- MCP config format translation (`type: 'local'`, `command: [cmd, ...args]`, `environment`)
|
||||
- System prompt injected via `<system>` prefix in prompt text
|
||||
- No resume support (sessions are always new or reused by ID)
|
||||
|
||||
## Agent-Runner Core
|
||||
|
||||
Everything below is handled by the agent-runner, not the provider.
|
||||
|
||||
### Poll Loop
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────┐
|
||||
│ │
|
||||
│ 1. Query messages_in for pending rows │
|
||||
│ WHERE status = 'pending' │
|
||||
│ AND (process_after IS NULL │
|
||||
│ OR process_after <= now()) │
|
||||
│ │
|
||||
│ 2. If rows found: │
|
||||
│ a. Set status = 'processing' │
|
||||
│ b. Format messages by kind │
|
||||
│ c. Strip routing fields │
|
||||
│ d. Call provider.query(prompt) │
|
||||
│ e. Process provider events │
|
||||
│ f. Write results to messages_out │
|
||||
│ g. Set status = 'completed' │
|
||||
│ │
|
||||
│ 3. While query is active: │
|
||||
│ - Continue polling messages_in │
|
||||
│ - New messages → provider.push() │
|
||||
│ │
|
||||
│ 4. When query finishes: │
|
||||
│ - Back to step 1 │
|
||||
│ - If no messages, sleep + re-poll │
|
||||
│ │
|
||||
└─────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
**Concurrent polling during active query:** While the provider is running a query, the agent-runner continues polling messages_in on a short interval (~500ms). New pending messages are formatted and pushed into the active query via `provider.push()`. This lets follow-up messages arrive while the agent is processing — Claude handles this natively, Codex/OpenCode handle it via abort+restart internally.
|
||||
|
||||
**Idle behavior:** When no messages are pending and no query is active, the agent-runner sleeps briefly (1s) and re-polls. The container stays warm until the host kills it (idle timeout).
|
||||
|
||||
**Idle detection exceptions:** The container should NOT be considered idle when:
|
||||
- An `ask_user_question` tool call is pending (waiting for user response in messages_in)
|
||||
- The agent is actively working (tool calls in progress, subagents running)
|
||||
|
||||
The agent-runner signals "busy" status to the host. The mechanism for this is provider-specific — for Claude, the query AsyncGenerator is still yielding events. For others, the agent-runner can write a heartbeat or status indicator to the session DB that the host checks before killing.
|
||||
|
||||
### Message Formatting
|
||||
|
||||
The agent-runner transforms messages_in rows into a prompt string. The provider receives a ready-to-send string — it doesn't know about message kinds or routing.
|
||||
|
||||
**Routing field stripping:** `platform_id`, `channel_type`, `thread_id` are never included in the prompt. They're stored as context for writing messages_out.
|
||||
|
||||
**Single message formatting by kind:**
|
||||
|
||||
- **`chat`** — format into message XML:
|
||||
```xml
|
||||
<message sender="John" time="2024-01-01 10:00">
|
||||
Check this PR
|
||||
</message>
|
||||
```
|
||||
|
||||
- **`chat-sdk`** — extract fields from serialized Chat SDK message:
|
||||
```xml
|
||||
<message sender="John (john@slack)" time="2024-01-01 10:00">
|
||||
Check this PR
|
||||
[image: screenshot.png — https://signed-url...]
|
||||
</message>
|
||||
```
|
||||
Attachments are listed inline. Images/PDFs that Claude handles natively are passed as content blocks (see Media Handling below).
|
||||
|
||||
- **`task`** — task prompt, optionally with script output:
|
||||
```
|
||||
[SCHEDULED TASK]
|
||||
|
||||
Script output:
|
||||
{"data": ...}
|
||||
|
||||
Instructions:
|
||||
Review open PRs
|
||||
```
|
||||
|
||||
- **`webhook`** — webhook payload:
|
||||
```
|
||||
[WEBHOOK: github/pull_request]
|
||||
|
||||
{"action": "opened", "pull_request": {...}}
|
||||
```
|
||||
|
||||
- **`system`** — host action result (response to an earlier system request):
|
||||
```
|
||||
[SYSTEM RESPONSE]
|
||||
|
||||
Action: register_agent_group
|
||||
Status: success
|
||||
Result: {"agent_group_id": "ag-456"}
|
||||
```
|
||||
|
||||
**Batch formatting:** Multiple pending messages are combined into one prompt:
|
||||
|
||||
```xml
|
||||
<context timezone="America/Los_Angeles">
|
||||
<messages>
|
||||
<message sender="John" time="10:00">Check this PR</message>
|
||||
<message sender="Jane" time="10:01">Already on it</message>
|
||||
</messages>
|
||||
```
|
||||
|
||||
Mixed kinds (e.g., a chat message + a system response) are combined with clear delimiters. Each section is labeled by kind.
|
||||
|
||||
**Command detection:** Messages starting with `/` are checked against a command list. Recognized commands bypass formatting and are passed raw to the provider (for Claude's slash command handling) or intercepted by the agent-runner (for NanoClaw-level commands like session reset).
|
||||
|
||||
### Routing
|
||||
|
||||
When the agent-runner picks up messages_in rows, it captures the routing fields from the batch:
|
||||
|
||||
```typescript
|
||||
interface RoutingContext {
|
||||
platformId: string | null;
|
||||
channelType: string | null;
|
||||
threadId: string | null;
|
||||
inReplyTo: string | null; // messages_in.id of the triggering message
|
||||
}
|
||||
```
|
||||
|
||||
When writing messages_out (either from provider results or MCP tool calls), the agent-runner copies this routing context by default. The agent never sees routing fields — it just produces text. The routing is implicit: "respond to whoever sent the message."
|
||||
|
||||
MCP tools that target a different destination (e.g., `send_to_agent`, `send_message` with explicit channel) override the routing context for that specific messages_out row.
|
||||
|
||||
### Status Management
|
||||
|
||||
The agent-runner manages the `status` and `status_changed` fields on messages_in:
|
||||
|
||||
```
|
||||
pending → processing → completed
|
||||
→ failed (if provider returns error and max retries exhausted)
|
||||
```
|
||||
|
||||
- **Pick up:** `UPDATE messages_in SET status = 'processing', status_changed = now(), tries = tries + 1 WHERE id IN (...)`
|
||||
- **Complete:** `UPDATE messages_in SET status = 'completed', status_changed = now() WHERE id IN (...)`
|
||||
- **Error:** Agent-runner does NOT set `failed` — it leaves the message as `processing`. The host detects stale processing via `status_changed` and handles retry logic (reset to pending with backoff). This keeps retry policy on the host side.
|
||||
|
||||
### MCP Tools
|
||||
|
||||
The agent-runner runs an MCP server (same as v1) that exposes NanoClaw tools to the agent. In v2, all tools write to the session DB instead of IPC files.
|
||||
|
||||
**DB path:** The MCP server receives the session DB path via environment variable. It opens a second connection to the same SQLite file (WAL mode allows concurrent access).
|
||||
|
||||
#### send_message
|
||||
|
||||
Send a chat message to the current conversation (or a specified destination).
|
||||
|
||||
```typescript
|
||||
{
|
||||
name: 'send_message',
|
||||
params: {
|
||||
text: string, // message content
|
||||
channel?: string, // optional: target channel type (default: reply to origin)
|
||||
platformId?: string, // optional: target platform ID
|
||||
threadId?: string, // optional: target thread ID
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
Implementation: write a `messages_out` row with `kind: 'chat'`. If channel/platformId/threadId are provided, use those as routing. Otherwise, copy from the current routing context.
|
||||
|
||||
#### send_file
|
||||
|
||||
Send a file to the current conversation.
|
||||
|
||||
```typescript
|
||||
{
|
||||
name: 'send_file',
|
||||
params: {
|
||||
path: string, // file path (relative to /workspace/agent/ or absolute)
|
||||
text?: string, // optional accompanying message
|
||||
filename?: string, // display name (default: basename of path)
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
Implementation:
|
||||
1. Generate a message ID
|
||||
2. Create `outbox/{messageId}/` directory
|
||||
3. Copy the file into the outbox directory
|
||||
4. Write a `messages_out` row with `files: [filename]` in the content
|
||||
|
||||
#### send_card
|
||||
|
||||
Send a structured card (interactive or display-only).
|
||||
|
||||
```typescript
|
||||
{
|
||||
name: 'send_card',
|
||||
params: {
|
||||
card: CardElement, // card structure (title, children, actions)
|
||||
fallbackText?: string, // text fallback for platforms without card support
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
Implementation: write a `messages_out` row with `kind: 'chat-sdk'` and the card structure in content.
|
||||
|
||||
#### ask_user_question
|
||||
|
||||
Send an interactive question and wait for the user's response. This is a **blocking tool call** — the tool doesn't return until the user responds.
|
||||
|
||||
```typescript
|
||||
{
|
||||
name: 'ask_user_question',
|
||||
params: {
|
||||
question: string,
|
||||
options: string[], // button labels
|
||||
timeout?: number, // seconds (default: 300)
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
Implementation:
|
||||
1. Generate a `questionId`
|
||||
2. Write a `messages_out` row with `operation: 'ask_question'`, the question, options, and questionId
|
||||
3. Poll `messages_in` for a row with matching `questionId` in content
|
||||
4. When found, return the `selectedOption` as the tool result
|
||||
5. If timeout expires, return a timeout error as the tool result
|
||||
|
||||
The agent's execution is paused at this tool call. The provider's query keeps running (Claude holds the tool call open). The agent-runner polls for the response in a separate loop.
|
||||
|
||||
#### edit_message
|
||||
|
||||
Edit a previously sent message.
|
||||
|
||||
```typescript
|
||||
{
|
||||
name: 'edit_message',
|
||||
params: {
|
||||
messageId: string, // integer ID as shown to the agent
|
||||
text: string, // new content
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
Implementation: write a `messages_out` row with `operation: 'edit'`, the message ID, and new text.
|
||||
|
||||
#### add_reaction
|
||||
|
||||
Add an emoji reaction to a message.
|
||||
|
||||
```typescript
|
||||
{
|
||||
name: 'add_reaction',
|
||||
params: {
|
||||
messageId: string, // integer ID as shown to the agent
|
||||
emoji: string, // emoji name (e.g., 'thumbs_up')
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
Implementation: write a `messages_out` row with `operation: 'reaction'`.
|
||||
|
||||
#### send_to_agent
|
||||
|
||||
Send a message to another agent group.
|
||||
|
||||
```typescript
|
||||
{
|
||||
name: 'send_to_agent',
|
||||
params: {
|
||||
agentGroupId: string, // target agent group
|
||||
text: string, // message content
|
||||
sessionId?: string, // optional: target specific session
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
Implementation: write a `messages_out` row with `channel_type: 'agent'`, `platform_id: agentGroupId`, `thread_id: sessionId`.
|
||||
|
||||
#### schedule_task
|
||||
|
||||
Schedule a one-shot or recurring task.
|
||||
|
||||
```typescript
|
||||
{
|
||||
name: 'schedule_task',
|
||||
params: {
|
||||
prompt: string, // task prompt
|
||||
processAfter: string, // ISO timestamp for first run
|
||||
recurrence?: string, // cron expression (optional)
|
||||
script?: string, // pre-agent script (optional)
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
Implementation: write a `messages_in` row (to self) with `kind: 'task'`, `process_after`, and optionally `recurrence`. The host sweep picks it up when due.
|
||||
|
||||
#### list_tasks
|
||||
|
||||
List active scheduled/recurring tasks.
|
||||
|
||||
```typescript
|
||||
{
|
||||
name: 'list_tasks',
|
||||
params: {}
|
||||
}
|
||||
```
|
||||
|
||||
Implementation: query `messages_in WHERE recurrence IS NOT NULL AND status != 'failed'`.
|
||||
|
||||
#### cancel_task / pause_task / resume_task
|
||||
|
||||
Modify a scheduled task.
|
||||
|
||||
```typescript
|
||||
{
|
||||
name: 'cancel_task',
|
||||
params: { taskId: string }
|
||||
}
|
||||
// pause_task: set status = 'paused' (new status value for recurring tasks)
|
||||
// resume_task: set status = 'pending'
|
||||
```
|
||||
|
||||
Implementation: update the messages_in row directly.
|
||||
|
||||
#### register_agent_group
|
||||
|
||||
Register a new agent group (admin only).
|
||||
|
||||
```typescript
|
||||
{
|
||||
name: 'register_agent_group',
|
||||
params: {
|
||||
name: string,
|
||||
folder: string,
|
||||
platformId: string, // messaging group to wire to
|
||||
channelType: string,
|
||||
triggerRules?: object,
|
||||
sessionMode?: 'shared' | 'per-thread',
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
Implementation: write a `messages_out` row with `kind: 'system'`, `action: 'register_agent_group'`. The host reads, validates admin permission, creates the entity rows in the central DB, and writes a `system` messages_in response.
|
||||
|
||||
### Media Handling
|
||||
|
||||
#### Inbound (messages_in → agent prompt)
|
||||
|
||||
The agent-runner inspects attachments in chat/chat-sdk messages and handles them based on type and provider capability:
|
||||
|
||||
**Provider-native content blocks:**
|
||||
|
||||
| Type | Claude | Codex / OpenCode |
|
||||
|------|--------|------------------|
|
||||
| Images (JPEG, PNG, GIF, WebP) | Native image content block | Save to disk |
|
||||
| PDFs | Native document content block | Save to disk |
|
||||
| Audio | Native audio content block | Save to disk |
|
||||
| Other files (code, data, video, archives) | Save to disk | Save to disk |
|
||||
|
||||
**"Save to disk"** means: download to `/workspace/downloads/{messageId}/`, reference in the prompt text:
|
||||
|
||||
```
|
||||
<message sender="John" time="10:00">
|
||||
Check this spreadsheet
|
||||
[file available at: /workspace/downloads/msg-123/data.xlsx]
|
||||
</message>
|
||||
```
|
||||
|
||||
The agent can use tools (Read, Bash) to access saved files.
|
||||
|
||||
For channels where direct download isn't possible (e.g., WhatsApp buffered streams), the channel adapter serves the media via a local URL. The agent-runner downloads from that URL.
|
||||
|
||||
**Content block construction (Claude):** The agent-runner builds multi-part `MessageParam` content: `[{ type: 'image', source: { type: 'base64', media_type, data } }, { type: 'text', text: '...' }]`. The prompt passed to the provider is not a plain string in this case — the `QueryInput.prompt` field needs to support structured content for Claude. The provider's `query()` method handles the format-specific construction.
|
||||
|
||||
**Content block construction (Codex/OpenCode):** Everything is text. File references are inlined in the prompt string. The provider receives a plain string prompt.
|
||||
|
||||
#### Outbound (agent → messages_out)
|
||||
|
||||
Handled via the `send_file` MCP tool (see above). The agent explicitly decides to send a file — the agent-runner doesn't scan output for file references.
|
||||
|
||||
### Pre-Agent Scripts (Tasks)
|
||||
|
||||
For `task` kind messages with a `script` field in the content:
|
||||
|
||||
1. Agent-runner writes the script to a temp file
|
||||
2. Executes with `bash` (30s timeout)
|
||||
3. Parses last line of stdout as JSON: `{ wakeAgent: boolean, data?: unknown }`
|
||||
4. If `wakeAgent === false`: mark message as completed, don't invoke the provider
|
||||
5. If `wakeAgent === true`: enrich the prompt with script output, then invoke the provider
|
||||
|
||||
Same as v1 behavior.
|
||||
|
||||
### Transcript Archiving
|
||||
|
||||
The agent-runner archives conversation transcripts before context compaction. For Claude, this is handled via the PreCompact hook (provider-internal). For other providers that don't have hooks, the agent-runner archives after each query completes based on the provider's output.
|
||||
|
||||
Archive location: `/workspace/agent/conversations/{date}-{summary}.md`
|
||||
|
||||
### Session Resume
|
||||
|
||||
The agent-runner tracks `sessionId` and `resumeAt` across queries:
|
||||
|
||||
- `sessionId` — captured from `ProviderEvent { type: 'init' }`. Passed back to `QueryInput.sessionId` on the next query.
|
||||
- `resumeAt` — Claude-specific (last assistant message UUID). Stored by the agent-runner, passed to `QueryInput.resumeAt`. Providers that don't support this ignore it.
|
||||
|
||||
These are ephemeral to the container's lifetime. When the container is killed and restarted, the host passes the stored `sessionId` from the central DB's sessions table. `resumeAt` is lost on container restart (the provider resumes from the end of the session).
|
||||
|
||||
### Container Startup
|
||||
|
||||
The agent-runner receives configuration via:
|
||||
|
||||
- **Environment variables:** `AGENT_PROVIDER` (claude/codex/opencode), `NANOCLAW_ADMIN_USER_ID`, provider-specific vars (API keys, model overrides), `TZ`
|
||||
- **Fixed mount paths:** Session DB at `/workspace/session.db`. Agent group folder at `/workspace/agent/`. System prompt from `/workspace/agent/CLAUDE.md` and `/workspace/global/CLAUDE.md`.
|
||||
- **Optional startup config:** Some config may be passed as a JSON file at a fixed path (e.g., `/workspace/config.json`) for things like the session ID to resume, assistant name, and admin user ID. This avoids overloading environment variables.
|
||||
|
||||
The agent-runner reads config, creates the provider, and enters the poll loop. No stdin, no initial prompt — messages are already in the session DB.
|
||||
|
||||
### Provider Factory
|
||||
|
||||
```typescript
|
||||
type ProviderName = 'claude' | 'codex' | 'opencode';
|
||||
|
||||
function createProvider(name: ProviderName, config: ProviderConfig): AgentProvider {
|
||||
switch (name) {
|
||||
case 'claude': return new ClaudeProvider(config);
|
||||
case 'codex': return new CodexProvider(config);
|
||||
case 'opencode': return new OpenCodeProvider(config);
|
||||
default: throw new Error(`Unknown provider: ${name}`);
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
The provider name comes from the container's environment (`AGENT_PROVIDER` env var), set by the host based on `agent_groups.agent_provider` or `sessions.agent_provider`.
|
||||
|
||||
`ProviderConfig` contains provider-specific settings (API keys, model overrides, etc.) passed via environment variables — not via the interface. Each provider reads what it needs from `env`.
|
||||
|
||||
## What Stays From v1
|
||||
|
||||
- MCP server is a separate Node process spawned by the provider (via `mcpServers` config)
|
||||
- The MCP server binary is shared across providers — same tools, same DB access
|
||||
- CLAUDE.md loading (global + per-group) — agent-runner reads and passes as `systemPrompt`
|
||||
- Additional directories discovery (`/workspace/extra/*`)
|
||||
- Logging via stderr (`[agent-runner] ...`)
|
||||
|
||||
## What Changes From v1
|
||||
|
||||
| v1 | v2 |
|
||||
|----|----|
|
||||
| stdin JSON envelope | Poll session DB |
|
||||
| IPC input files for follow-ups | Same DB poll + `provider.push()` |
|
||||
| stdout markers for output | Write messages_out rows |
|
||||
| MCP tools write IPC files | MCP tools write DB rows |
|
||||
| `_close` sentinel for shutdown | Host kills container externally |
|
||||
| `runQuery()` function with inline Claude SDK | `AgentProvider` interface + per-SDK implementations |
|
||||
| Single provider (Claude) | Pluggable providers (Claude, Codex, OpenCode, future) |
|
||||
| `ContainerInput` via stdin | Provider config via env vars + session DB for messages |
|
||||
| IPC polling for follow-ups | DB polling + provider.push() |
|
||||
|
||||
## Related Documents
|
||||
|
||||
- **[v2-architecture-draft.md](v2-architecture-draft.md)** — High-level architecture (session DB schema, central DB, channel adapters, message flow)
|
||||
- **[v2-api-details.md](v2-api-details.md)** — Channel adapter interface, message content examples, host delivery logic
|
||||
@@ -0,0 +1,360 @@
|
||||
# NanoClaw v2 API Details
|
||||
|
||||
Implementation-level details for the v2 architecture. See [v2-architecture-draft.md](v2-architecture-draft.md) for the high-level design.
|
||||
|
||||
## Channel Adapter Interface
|
||||
|
||||
### NanoClaw Channel Interface (v2)
|
||||
|
||||
```typescript
|
||||
interface ChannelSetup {
|
||||
// Conversation configs from central DB — passed at setup, not queried by adapter
|
||||
conversations: ConversationConfig[];
|
||||
|
||||
// Host callbacks
|
||||
onInbound(platformId: string, threadId: string | null, message: InboundMessage): void;
|
||||
onMetadata(platformId: string, name?: string, isGroup?: boolean): void;
|
||||
}
|
||||
|
||||
interface ConversationConfig {
|
||||
platformId: string;
|
||||
agentGroupId: string;
|
||||
triggerPattern?: string; // regex string (for native channels)
|
||||
requiresTrigger: boolean;
|
||||
sessionMode: 'shared' | 'per-thread';
|
||||
}
|
||||
|
||||
interface ChannelAdapter {
|
||||
name: string;
|
||||
channelType: string;
|
||||
|
||||
// Lifecycle
|
||||
setup(config: ChannelSetup): Promise<void>;
|
||||
teardown(): Promise<void>;
|
||||
isConnected(): boolean;
|
||||
|
||||
// Outbound delivery
|
||||
deliver(platformId: string, threadId: string | null, message: OutboundMessage): Promise<void>;
|
||||
|
||||
// Optional
|
||||
setTyping?(platformId: string, threadId: string | null): Promise<void>;
|
||||
syncConversations?(): Promise<ConversationInfo[]>;
|
||||
updateConversations?(conversations: ConversationConfig[]): void;
|
||||
}
|
||||
|
||||
// Inbound message from adapter to host
|
||||
interface InboundMessage {
|
||||
id: string;
|
||||
kind: 'chat' | 'chat-sdk';
|
||||
content: unknown; // JSON blob — NanoClaw chat format or Chat SDK SerializedMessage
|
||||
timestamp: string;
|
||||
}
|
||||
|
||||
// Outbound message from host to adapter
|
||||
interface OutboundMessage {
|
||||
kind: 'chat' | 'chat-sdk';
|
||||
content: unknown; // JSON blob — matches the kind
|
||||
}
|
||||
```
|
||||
|
||||
### Chat SDK Bridge
|
||||
|
||||
Wraps a Chat SDK adapter + Chat instance to conform to the NanoClaw ChannelAdapter interface.
|
||||
|
||||
```typescript
|
||||
function createChatSdkBridge(
|
||||
adapter: Adapter,
|
||||
chatConfig: { concurrency?: ConcurrencyStrategy }
|
||||
): ChannelAdapter {
|
||||
let chat: Chat;
|
||||
let hostCallbacks: ChannelSetup;
|
||||
|
||||
return {
|
||||
name: adapter.name,
|
||||
channelType: adapter.name,
|
||||
|
||||
async setup(config) {
|
||||
hostCallbacks = config;
|
||||
|
||||
chat = new Chat({
|
||||
adapters: { [adapter.name]: adapter },
|
||||
state: new SqliteStateAdapter(),
|
||||
concurrency: chatConfig.concurrency ?? 'concurrent',
|
||||
});
|
||||
|
||||
// Subscribe registered conversations
|
||||
for (const conv of config.conversations) {
|
||||
if (conv.agentGroupId) {
|
||||
await chat.state.subscribe(conv.platformId);
|
||||
}
|
||||
}
|
||||
|
||||
// Subscribed threads → forward all messages
|
||||
chat.onSubscribedMessage(async (thread, message) => {
|
||||
const channelId = adapter.channelIdFromThreadId(thread.id);
|
||||
config.onInbound(channelId, thread.id, {
|
||||
id: message.id,
|
||||
kind: 'chat-sdk',
|
||||
content: message.toJSON(),
|
||||
timestamp: message.metadata.dateSent.toISOString(),
|
||||
});
|
||||
});
|
||||
|
||||
// @mention in unsubscribed thread → discovery
|
||||
chat.onNewMention(async (thread, message) => {
|
||||
const channelId = adapter.channelIdFromThreadId(thread.id);
|
||||
config.onInbound(channelId, thread.id, {
|
||||
id: message.id,
|
||||
kind: 'chat-sdk',
|
||||
content: message.toJSON(),
|
||||
timestamp: message.metadata.dateSent.toISOString(),
|
||||
});
|
||||
// Subscribe so future messages in this thread are received
|
||||
await thread.subscribe();
|
||||
});
|
||||
|
||||
// DMs → always forward
|
||||
chat.onDirectMessage(async (thread, message) => {
|
||||
config.onInbound(thread.id, null, {
|
||||
id: message.id,
|
||||
kind: 'chat-sdk',
|
||||
content: message.toJSON(),
|
||||
timestamp: message.metadata.dateSent.toISOString(),
|
||||
});
|
||||
await thread.subscribe();
|
||||
});
|
||||
|
||||
await chat.initialize();
|
||||
},
|
||||
|
||||
async deliver(platformId, threadId, message) {
|
||||
const tid = threadId ?? platformId;
|
||||
if (message.kind === 'chat-sdk') {
|
||||
const content = message.content as Record<string, unknown>;
|
||||
if (content.operation === 'edit') {
|
||||
await adapter.editMessage(tid, content.messageId as string,
|
||||
{ markdown: content.text as string });
|
||||
} else if (content.operation === 'reaction') {
|
||||
await adapter.addReaction(tid, content.messageId as string,
|
||||
content.emoji as string);
|
||||
} else {
|
||||
await adapter.postMessage(tid, content as AdapterPostableMessage);
|
||||
}
|
||||
} else {
|
||||
const content = message.content as { text: string };
|
||||
await adapter.postMessage(tid, { markdown: content.text });
|
||||
}
|
||||
},
|
||||
|
||||
async setTyping(platformId, threadId) {
|
||||
await adapter.startTyping(threadId ?? platformId);
|
||||
},
|
||||
|
||||
async teardown() {
|
||||
await chat.shutdown();
|
||||
},
|
||||
|
||||
isConnected() { return true; },
|
||||
|
||||
updateConversations(conversations) {
|
||||
// Subscribe new conversations, could unsubscribe removed ones
|
||||
for (const conv of conversations) {
|
||||
if (conv.agentGroupId) {
|
||||
chat.state.subscribe(conv.platformId);
|
||||
}
|
||||
}
|
||||
},
|
||||
};
|
||||
}
|
||||
```
|
||||
|
||||
### Native NanoClaw Channel (no Chat SDK)
|
||||
|
||||
Native channels implement the ChannelAdapter interface directly. Example structure for WhatsApp/Baileys:
|
||||
|
||||
```typescript
|
||||
function createWhatsAppChannel(): ChannelAdapter {
|
||||
let socket: WASocket;
|
||||
let config: ChannelSetup;
|
||||
|
||||
return {
|
||||
name: 'whatsapp',
|
||||
channelType: 'whatsapp',
|
||||
|
||||
async setup(setup) {
|
||||
config = setup;
|
||||
socket = await connectBaileys();
|
||||
|
||||
socket.on('messages.upsert', (event) => {
|
||||
for (const msg of event.messages) {
|
||||
const jid = msg.key.remoteJid;
|
||||
const conv = config.conversations.find(c => c.platformId === jid);
|
||||
|
||||
// Trigger check (native — adapter does this, not host)
|
||||
if (conv?.requiresTrigger && conv.triggerPattern) {
|
||||
if (!new RegExp(conv.triggerPattern).test(msg.message?.conversation || '')) {
|
||||
return; // Doesn't match trigger
|
||||
}
|
||||
}
|
||||
|
||||
config.onInbound(jid, null, {
|
||||
id: msg.key.id,
|
||||
kind: 'chat',
|
||||
content: {
|
||||
sender: msg.pushName || msg.key.participant,
|
||||
senderId: msg.key.participant || msg.key.remoteJid,
|
||||
text: msg.message?.conversation || '',
|
||||
attachments: [],
|
||||
isFromMe: msg.key.fromMe,
|
||||
},
|
||||
timestamp: new Date(msg.messageTimestamp * 1000).toISOString(),
|
||||
});
|
||||
}
|
||||
});
|
||||
},
|
||||
|
||||
async deliver(platformId, threadId, message) {
|
||||
const content = message.content as { text: string };
|
||||
await socket.sendMessage(platformId, { text: content.text });
|
||||
},
|
||||
|
||||
async setTyping(platformId) {
|
||||
await socket.sendPresenceUpdate('composing', platformId);
|
||||
},
|
||||
|
||||
async teardown() {
|
||||
await socket.logout();
|
||||
},
|
||||
|
||||
isConnected() { return !!socket; },
|
||||
};
|
||||
}
|
||||
```
|
||||
|
||||
## Session DB Schema Details
|
||||
|
||||
### messages_in content examples
|
||||
|
||||
**`chat`** — simple NanoClaw format:
|
||||
```json
|
||||
{
|
||||
"sender": "John",
|
||||
"senderId": "user123",
|
||||
"text": "Check this PR",
|
||||
"attachments": [{ "type": "image", "url": "https://signed-url..." }],
|
||||
"isFromMe": false
|
||||
}
|
||||
```
|
||||
|
||||
**`chat-sdk`** — full Chat SDK `SerializedMessage`:
|
||||
```json
|
||||
{
|
||||
"_type": "chat:Message",
|
||||
"id": "msg-1",
|
||||
"threadId": "slack:C123:1234.5678",
|
||||
"text": "Check this PR",
|
||||
"formatted": { "type": "root", "children": [...] },
|
||||
"author": { "userId": "U123", "userName": "john", "fullName": "John", "isBot": false, "isMe": false },
|
||||
"metadata": { "dateSent": "2024-01-01T00:00:00Z", "edited": false },
|
||||
"attachments": [{ "type": "image", "url": "https://...", "name": "screenshot.png" }],
|
||||
"isMention": true,
|
||||
"links": []
|
||||
}
|
||||
```
|
||||
|
||||
**Question response** (from user clicking an interactive card):
|
||||
```json
|
||||
{
|
||||
"sender": "John",
|
||||
"senderId": "user123",
|
||||
"text": "Yes",
|
||||
"questionId": "q-123",
|
||||
"selectedOption": "Yes",
|
||||
"isFromMe": false
|
||||
}
|
||||
```
|
||||
|
||||
### messages_out content examples
|
||||
|
||||
**Normal chat message:**
|
||||
```json
|
||||
{ "text": "LGTM, merging now" }
|
||||
```
|
||||
|
||||
**Chat SDK markdown:**
|
||||
```json
|
||||
{ "markdown": "## Review Summary\n**Status**: Approved\n\nNo issues found." }
|
||||
```
|
||||
|
||||
**Card:**
|
||||
```json
|
||||
{
|
||||
"card": {
|
||||
"type": "card",
|
||||
"title": "Deployment Approval",
|
||||
"children": [
|
||||
{ "type": "text", "content": "Deploy v2.1.0 to production?" },
|
||||
{ "type": "actions", "children": [
|
||||
{ "type": "button", "id": "approve", "label": "Approve", "style": "primary" },
|
||||
{ "type": "button", "id": "reject", "label": "Reject", "style": "danger" }
|
||||
]}
|
||||
]
|
||||
},
|
||||
"fallbackText": "Deployment Approval: Deploy v2.1.0 to production? [Approve] [Reject]"
|
||||
}
|
||||
```
|
||||
|
||||
**Ask user question:**
|
||||
```json
|
||||
{
|
||||
"operation": "ask_question",
|
||||
"questionId": "q-123",
|
||||
"question": "How should we handle the failing test?",
|
||||
"options": ["Skip it", "Fix and retry", "Abort deployment"]
|
||||
}
|
||||
```
|
||||
|
||||
**Edit message:**
|
||||
```json
|
||||
{ "operation": "edit", "messageId": "3", "text": "Updated: LGTM with minor comments on line 42" }
|
||||
```
|
||||
|
||||
**Reaction:**
|
||||
```json
|
||||
{ "operation": "reaction", "messageId": "5", "emoji": "thumbs_up" }
|
||||
```
|
||||
|
||||
**System action:**
|
||||
```json
|
||||
{ "action": "reset_session", "payload": { "session_id": "sess-123", "reason": "Skills updated" } }
|
||||
```
|
||||
|
||||
## Host Delivery Logic
|
||||
|
||||
The host reads messages_out and dispatches based on `kind` and `operation`:
|
||||
|
||||
```typescript
|
||||
async function deliverMessage(row: MessagesOutRow, adapter: ChannelAdapter) {
|
||||
const content = JSON.parse(row.content);
|
||||
|
||||
// System actions — host handles internally
|
||||
if (row.kind === 'system') {
|
||||
await handleSystemAction(content);
|
||||
return;
|
||||
}
|
||||
|
||||
// Agent-to-agent — write to target session DB
|
||||
if (isAgentDestination(row)) {
|
||||
await writeToAgentSession(row);
|
||||
return;
|
||||
}
|
||||
|
||||
// Channel delivery — delegate to adapter
|
||||
await adapter.deliver(row.platform_id, row.thread_id, {
|
||||
kind: row.kind,
|
||||
content,
|
||||
});
|
||||
}
|
||||
```
|
||||
|
||||
The adapter's `deliver()` method handles operation dispatch internally (post vs edit vs reaction).
|
||||
@@ -0,0 +1,792 @@
|
||||
# NanoClaw v2 Architecture (Draft)
|
||||
|
||||
## Core Idea
|
||||
|
||||
Each agent session has a mounted SQLite DB. The DB is the one and only IO mechanism between host and container. No IPC files, no stdin piping. Two tables: messages_in (host → agent-runner) and messages_out (agent-runner → host). Everything is a message.
|
||||
|
||||
## Two-Level DB
|
||||
|
||||
**Central DB (host process):**
|
||||
- Agent groups, conversations, routing tables
|
||||
- Maps platform IDs → agent groups → sessions
|
||||
- Channel adapters don't touch this directly — the host does the lookup
|
||||
|
||||
**Per-session DB (mounted into container):**
|
||||
- messages_in (written by host, read by agent-runner)
|
||||
- messages_out (written by agent-runner, read by host)
|
||||
- Everything is a message: chat, tasks, webhooks, system actions, agent-to-agent — all use these two tables
|
||||
- One DB per session, not per agent group
|
||||
|
||||
## Agent Groups vs Sessions
|
||||
|
||||
An agent group has its own filesystem — folder, CLAUDE.md, skills, container config. Multiple sessions can share the same agent group (same filesystem, same skills) but each session gets its own DB mounted at a known path. Each session = a separate container with the same agent group's filesystem but a different session DB.
|
||||
|
||||
## Message Flow
|
||||
|
||||
```
|
||||
Platform event
|
||||
→ Channel adapter (trigger check, ID extraction)
|
||||
→ Returns: { platformChannelId, platformThreadId, triggered }
|
||||
→ Host maps platformChannelId + platformThreadId → agent group + session
|
||||
→ Host writes message to session's DB
|
||||
→ Host calls wakeUpAgent(session)
|
||||
→ Container spins up (or is already running)
|
||||
→ Agent-runner polls its session DB, finds new messages
|
||||
→ Agent-runner processes with Claude
|
||||
→ Agent-runner writes response to session DB
|
||||
→ Host polls active session DBs for responses
|
||||
→ Host reads response, looks up conversation, delivers through channel adapter
|
||||
```
|
||||
|
||||
## Channel Adapters
|
||||
|
||||
Channel adapters are responsible for:
|
||||
1. Receiving platform events (webhooks, polling, websockets — platform-specific)
|
||||
2. **Filtering**: deciding which messages to forward to the host for processing. This can be stateless (regex trigger match) or stateful (e.g., "was the bot mentioned in this thread at some point? If so, forward all subsequent messages"). The adapter receives a stream of unfiltered platform messages and decides which ones to pass on. How it decides is an implementation detail — NanoClaw doesn't know or care.
|
||||
3. Extracting and standardizing two IDs:
|
||||
- **Platform channel ID** — identifies the conversation (WhatsApp group, Slack channel, email thread)
|
||||
- **Platform thread ID** — optional sub-context (Slack thread, GitHub PR comment thread)
|
||||
4. Outbound delivery — sending responses back to the platform
|
||||
|
||||
The channel adapter does NOT know about agent group IDs or session IDs. It returns platform-level identifiers. The host maps those to the entity model.
|
||||
|
||||
The two-level ID scheme (channel ID + thread ID) gives flexibility:
|
||||
- Want every Slack thread to be a separate session? Return unique thread IDs.
|
||||
- Want all messages in a Slack channel to share a session? Return the same thread ID (or null).
|
||||
- This is configured per-channel, not globally.
|
||||
|
||||
### Channel Adapter Configuration
|
||||
|
||||
Adapters are stateless — they receive config from the host at setup time, not from the DB directly.
|
||||
|
||||
**What lives in code (per channel type, doesn't change at runtime):**
|
||||
- Auto-registration behavior (enabled/disabled, how it works)
|
||||
- Sender allowlist rules
|
||||
- Whether allowlisted senders can auto-register groups
|
||||
- Platform-specific connection and message handling
|
||||
|
||||
These are decisions made when setting up the channel adapter. Change them = change the code.
|
||||
|
||||
**What lives in the DB (per group, varies group to group):**
|
||||
- Which agent group handles it
|
||||
- Trigger / filter rules (regex, @mention-only, exclude certain senders, etc.)
|
||||
- Response scope (respond to all messages vs only triggered/allowlisted)
|
||||
- Session mode (shared vs per-thread)
|
||||
|
||||
The host reads per-group config from the DB and passes it to the adapter at setup. If config changes at runtime (admin agent registers a new group, changes a trigger), the host calls the adapter's update method.
|
||||
|
||||
### Auto-Registration
|
||||
|
||||
When the adapter forwards a message from an unknown group, the host needs to decide whether to create the group and a session for it.
|
||||
|
||||
**The adapter controls whether to forward unknown messages** — based on its code-level auto-registration rules (sender allowlist, group-add detection, etc.). If the adapter forwards it, the host creates the group + session.
|
||||
|
||||
**Session creation for known groups:**
|
||||
- Shared session mode: host finds the existing session or creates one if it's the first message
|
||||
- Per-thread session mode: host looks up by threadId. If no session exists for this thread, auto-creates one with the same agent group
|
||||
|
||||
**The code-level rules are channel-specific:**
|
||||
- WhatsApp: if an allowlisted number adds the bot to a group → auto-register. If an unknown number DMs → depends on the adapter's configuration.
|
||||
- Email: if the sender is known → auto-register the thread. If unknown → drop.
|
||||
- Slack: if someone @mentions the bot in a new channel → adapter decides whether to forward based on its rules.
|
||||
|
||||
No `channel_configs` table — channel-type-level behavior is baked into the adapter code.
|
||||
|
||||
### Chat SDK Integration
|
||||
|
||||
Chat SDK adapters are wrapped per-channel:
|
||||
- Each Chat SDK adapter gets its own Chat instance
|
||||
- Concurrency mode is configured per-channel (concurrent for chat, queue for tasks, debounce for webhooks)
|
||||
- A bridge wraps the Chat instance + adapter to conform to NanoClaw's standard channel interface
|
||||
- Chat SDK handles: webhook parsing, dedup, message history, platform API calls, rich content delivery
|
||||
- NanoClaw handles: routing, agent lifecycle, session management
|
||||
|
||||
**Chat SDK's subscription model:**
|
||||
|
||||
Chat SDK has its own thread-level subscription concept (distinct from NanoClaw's channel-level registration):
|
||||
- `onNewMention` / `onNewMessage(regex)` — fires on first contact (e.g., @mention in a Slack thread)
|
||||
- `thread.subscribe()` — opts into all future messages in that thread
|
||||
- `onSubscribedMessage` — fires for all messages in subscribed threads
|
||||
|
||||
This is sub-channel granularity. NanoClaw registers at the channel level ("listen to this Discord channel"). Chat SDK subscribes at the thread level ("track this specific Slack thread"). The bridge lets Chat SDK manage its own subscriptions internally — NanoClaw doesn't interfere with or replicate this.
|
||||
|
||||
**Platform capability differences:**
|
||||
|
||||
Capabilities vary significantly across adapters (see [Chat SDK adapter docs](https://chat-sdk.dev/docs/adapters)):
|
||||
- **Slack**: Full rich content (Block Kit cards, modals, streaming, reactions, ephemeral messages)
|
||||
- **Discord**: Embeds, buttons, streaming via post+edit
|
||||
- **WhatsApp (Cloud API)**: DMs only, interactive reply buttons, no streaming, no reactions
|
||||
- **GitHub/Linear**: Markdown comments, no interactive elements
|
||||
- **Telegram**: Inline keyboard buttons, streaming via post+edit
|
||||
|
||||
The host/bridge handles graceful degradation — if an agent posts a card on a platform that doesn't support cards, it falls back to text.
|
||||
|
||||
Non-Chat-SDK channels (WhatsApp via Baileys, Gmail, custom integrations) implement the NanoClaw channel interface directly — no bridge, no Chat SDK types.
|
||||
|
||||
## Container Lifecycle
|
||||
|
||||
The host is an orchestrator:
|
||||
1. **Spawn** — when wakeUpAgent is called and no container exists for the session
|
||||
2. **Idle kill** — when a container has no unprocessed messages for some timeout period
|
||||
3. **Limits** — MAX_CONCURRENT_CONTAINERS caps active containers
|
||||
|
||||
When a container spins up, the agent-runner immediately starts polling its session DB. Messages are already there waiting.
|
||||
|
||||
## Media Handling
|
||||
|
||||
### Inbound
|
||||
|
||||
Media is not downloaded by the host. Instead:
|
||||
- Messages include download URLs (signed URLs where possible)
|
||||
- Agent-runner downloads and processes media inside the container
|
||||
- For channels where signed URLs don't work (e.g., WhatsApp with buffered streams), the channel adapter downloads the media and serves it via a local URL/server that the container can access
|
||||
|
||||
**Native content blocks (provider-dependent):**
|
||||
|
||||
The agent-runner detects file types and passes supported types as native content blocks where the provider supports it:
|
||||
|
||||
| Type | Claude | Codex | OpenCode |
|
||||
|------|--------|-------|----------|
|
||||
| Images (JPEG, PNG, GIF, WebP) | Native image content block | Save to disk, reference in prompt | Save to disk, reference in prompt |
|
||||
| PDFs | Native document content block | Save to disk | Save to disk |
|
||||
| Audio | Native audio content block | Save to disk | Save to disk |
|
||||
| Other files (code, data, video, archives) | Save to disk | Save to disk | Save to disk |
|
||||
|
||||
"Save to disk" means downloaded to `/workspace/downloads/{messageId}/` and referenced in the prompt text as an available file path. The agent can use tools (Read, Bash) to access it.
|
||||
|
||||
The agent-runner builds the prompt differently per provider. For Claude, it constructs multi-part `MessageParam` content with image/document blocks. For Codex/OpenCode, everything is text with file path references.
|
||||
|
||||
### Outbound
|
||||
|
||||
Outbound file delivery is tool-based. The agent calls a tool (e.g., `send_file`) with a file path. The agent-runner moves the file to the outbox and writes the messages_out row.
|
||||
|
||||
```
|
||||
/workspace/
|
||||
outbox/
|
||||
{message_id}/ ← one dir per messages_out row
|
||||
chart.png
|
||||
report.pdf
|
||||
```
|
||||
|
||||
messages_out content references filenames only:
|
||||
|
||||
```json
|
||||
{ "text": "Here's the chart", "files": ["chart.png", "report.pdf"] }
|
||||
```
|
||||
|
||||
No paths in the DB — the convention is the contract. The host reads files from `outbox/{message_id}/` in the mounted session folder and delivers them via the adapter (Chat SDK `FileUpload` with buffer data, or platform-specific upload for native channels). Host cleans up the outbox directory after successful delivery.
|
||||
|
||||
Outbound files use a dedicated `send_file` MCP tool (separate from `send_message`). See [v2-agent-runner-details.md](v2-agent-runner-details.md) for the tool interface.
|
||||
|
||||
### Message Deduplication
|
||||
|
||||
Dedup is the channel adapter's responsibility. Chat SDK handles this internally. Native adapters track platform message IDs as needed. The host does not deduplicate — if the adapter forwards it, the host writes it.
|
||||
|
||||
## Session DB Schema
|
||||
|
||||
Two tables. JSON blobs for content — schema-free, format varies by `kind`.
|
||||
|
||||
```sql
|
||||
-- Host writes, agent-runner reads
|
||||
CREATE TABLE messages_in (
|
||||
id TEXT PRIMARY KEY,
|
||||
kind TEXT NOT NULL, -- 'chat' | 'chat-sdk' | 'task' | 'webhook' | 'system'
|
||||
timestamp TEXT NOT NULL,
|
||||
status TEXT DEFAULT 'pending', -- 'pending' | 'processing' | 'completed' | 'failed'
|
||||
status_changed TEXT, -- ISO timestamp of last status change
|
||||
process_after TEXT, -- ISO timestamp. NULL = process immediately.
|
||||
recurrence TEXT, -- cron expression. NULL = one-shot.
|
||||
tries INTEGER DEFAULT 0, -- number of processing attempts
|
||||
|
||||
-- routing (agent-runner copies to messages_out; agent never sees these)
|
||||
platform_id TEXT,
|
||||
channel_type TEXT,
|
||||
thread_id TEXT,
|
||||
|
||||
-- payload (structure depends on kind)
|
||||
content TEXT NOT NULL -- JSON blob
|
||||
);
|
||||
|
||||
-- Agent-runner writes, host reads
|
||||
CREATE TABLE messages_out (
|
||||
id TEXT PRIMARY KEY,
|
||||
in_reply_to TEXT, -- references messages_in.id (optional)
|
||||
timestamp TEXT NOT NULL,
|
||||
delivered INTEGER DEFAULT 0,
|
||||
deliver_after TEXT, -- ISO timestamp. NULL = deliver immediately.
|
||||
recurrence TEXT, -- cron expression. NULL = one-shot.
|
||||
|
||||
-- routing (default: copied from messages_in by agent-runner)
|
||||
kind TEXT NOT NULL, -- 'chat' | 'chat-sdk' | 'task' | 'webhook' | 'system'
|
||||
platform_id TEXT,
|
||||
channel_type TEXT,
|
||||
thread_id TEXT,
|
||||
|
||||
-- payload (format matches kind)
|
||||
content TEXT NOT NULL -- JSON blob
|
||||
);
|
||||
|
||||
```
|
||||
|
||||
### Scheduling
|
||||
|
||||
One-shot and recurring tasks use the same tables — no separate scheduler.
|
||||
|
||||
**One-shot:** `process_after` (inbound) or `deliver_after` (outbound) with `recurrence = NULL`.
|
||||
|
||||
**Recurring:** Same, plus a `recurrence` cron expression. After the host marks a row as handled/delivered, if `recurrence` is set, it inserts a new row with `process_after`/`deliver_after` advanced to the next cron occurrence. Next time is computed from the scheduled time (not wall clock) to prevent drift.
|
||||
|
||||
**Host sweep** (every ~60s across all session DBs):
|
||||
- `messages_in WHERE status = 'pending' AND (process_after IS NULL OR process_after <= now())` → wake agent
|
||||
- `messages_in WHERE status = 'processing' AND status_changed < (now - stale_threshold)` → stale detection, increment tries, reset to pending with backoff
|
||||
- `messages_out WHERE delivered = 0 AND (deliver_after IS NULL OR deliver_after <= now())` → deliver
|
||||
- After completing/delivering a row with `recurrence`, insert next occurrence
|
||||
|
||||
**Active container poll** (~1s) checks the same conditions but only for sessions with running containers.
|
||||
|
||||
**Agent-runner creates schedules** by writing messages_in (to itself) or messages_out (reminders/notifications) with `process_after` and optionally `recurrence`.
|
||||
|
||||
### messages_in content by kind
|
||||
|
||||
**`chat`** — simple NanoClaw format. Any channel can produce this.
|
||||
```json
|
||||
{
|
||||
"sender": "John",
|
||||
"senderId": "user123",
|
||||
"text": "Check this PR",
|
||||
"attachments": [{ "type": "image", "url": "https://signed-url..." }],
|
||||
"isFromMe": false
|
||||
}
|
||||
```
|
||||
|
||||
**`chat-sdk`** — full Chat SDK `SerializedMessage`, passed through from bridge adapter. Includes `author`, `text`, `formatted` (mdast AST), `attachments`, `isMention`, `links`, `metadata`.
|
||||
|
||||
**`task`** — scheduled task firing.
|
||||
```json
|
||||
{ "prompt": "Review open PRs", "script": "scripts/review.sh" }
|
||||
```
|
||||
|
||||
**`webhook`** — raw webhook payload.
|
||||
```json
|
||||
{ "source": "github", "event": "pull_request", "payload": { ... } }
|
||||
```
|
||||
|
||||
**`system`** — host action result (response to a system action the agent requested).
|
||||
```json
|
||||
{ "action": "register_group", "status": "success", "result": { "agent_group_id": "ag-456" } }
|
||||
```
|
||||
|
||||
### messages_out content by kind
|
||||
|
||||
Output `kind` determines the format and delivery adapter. Default: agent-runner copies `kind` and routing fields from the messages_in row it's responding to.
|
||||
|
||||
**`chat`** — simple NanoClaw format. NanoClaw channel delivers via `sendMessage(text)`.
|
||||
```json
|
||||
{ "text": "LGTM, merging now" }
|
||||
```
|
||||
|
||||
**`chat-sdk`** — Chat SDK `AdapterPostableMessage`. Bridge adapter delivers via `thread.post()`. Can be markdown, card, or raw — adapter handles platform conversion.
|
||||
```json
|
||||
{ "markdown": "## Review\n**LGTM**", "attachments": [...] }
|
||||
```
|
||||
```json
|
||||
{ "card": { "type": "card", "title": "Review", "children": [...] }, "fallbackText": "..." }
|
||||
```
|
||||
|
||||
**`task`** — task result. Host logs and optionally notifies.
|
||||
```json
|
||||
{ "result": "3 PRs reviewed", "status": "success" }
|
||||
```
|
||||
|
||||
**`webhook`** — webhook response. Host sends HTTP response or notifies.
|
||||
```json
|
||||
{ "response": { "status": 200, "body": { ... } } }
|
||||
```
|
||||
|
||||
**`system`** — host action request (register group, reset session, etc.). Host reads, validates permissions, executes, writes result back as a `system` messages_in row.
|
||||
```json
|
||||
{ "action": "reset_session", "payload": { "session_id": "sess-123" } }
|
||||
```
|
||||
|
||||
### Interactive Operations (Cards, Reactions, Edits)
|
||||
|
||||
All interactive operations flow through messages_in/out — the DB is the only IO boundary for the container. The agent uses MCP tools; the agent-runner translates tool calls into structured messages_out rows; the host delivers through the appropriate adapter method.
|
||||
|
||||
**Cards with user interaction (e.g., "Ask User Question"):**
|
||||
|
||||
1. Agent calls `ask_user_question` tool with question + options
|
||||
2. Agent-runner writes messages_out with the question card
|
||||
3. Host delivers as interactive card through adapter (e.g., Slack Block Kit buttons)
|
||||
4. User clicks an option
|
||||
5. Platform sends event back to adapter → host writes messages_in with the response
|
||||
6. Agent-runner reads messages_in, matches to pending tool call, returns selection to agent as tool result
|
||||
|
||||
The agent-runner holds the tool call open while waiting for the user's response in messages_in. The round-trip goes: agent → messages_out → host → platform → user clicks → platform → host → messages_in → agent-runner → agent.
|
||||
|
||||
**Approvals:**
|
||||
|
||||
Two patterns, both handled at the host level:
|
||||
- **Implicit**: Agent calls a tool that requires approval. Host intercepts, sends approval card to admin, waits for response, then executes or rejects. The agent doesn't know about the approval step.
|
||||
- **Explicit**: Agent explicitly requests approval via a tool. Agent-runner writes the approval request to messages_out. Same flow as "ask user question" — response comes back through messages_in.
|
||||
|
||||
In both cases, the approval and action execution happen on the host side, not the agent side.
|
||||
|
||||
**Approval routing:** Each messaging group has a designated admin stored in the central DB (`messaging_groups.admin_user_id`). Default is whoever set up the group, can be reassigned. When an action requires approval, the host sends an approval card to the admin's DM conversation (not the channel the agent is operating in). The admin responds there, and the host relays the result back to the agent's session. Approval cards are host-generated (not agent-initiated) — they have a standardized format.
|
||||
|
||||
> **TODO: flesh out** — How does the host find the admin's DM conversation? What happens if the admin hasn't set up a DM channel? Is the approval list configurable per agent group or global?
|
||||
|
||||
**Editing a sent message:**
|
||||
|
||||
Agent calls an `edit_message` tool with the message ID and new content. Agent-runner writes messages_out with an edit operation. Host calls `adapter.editMessage()`. Messages in the agent's context include integer IDs so the agent can reference them.
|
||||
|
||||
**Reactions:**
|
||||
|
||||
Agent calls `add_reaction` tool with message ID and emoji. Agent-runner writes messages_out with a reaction operation. Host calls `adapter.addReaction()`.
|
||||
|
||||
**Operations in messages_out content:**
|
||||
|
||||
```json
|
||||
// Normal message (default)
|
||||
{ "text": "LGTM" }
|
||||
|
||||
// Interactive card
|
||||
{ "operation": "ask_question", "question": "Approve deployment?", "options": ["Yes", "No", "Defer"] }
|
||||
|
||||
// Edit existing message
|
||||
{ "operation": "edit", "messageId": "3", "text": "Updated: LGTM with minor comments" }
|
||||
|
||||
// Reaction
|
||||
{ "operation": "reaction", "messageId": "5", "emoji": "thumbs_up" }
|
||||
```
|
||||
|
||||
The host reads the `operation` field (if present) and calls the right adapter method. No operation field = normal message delivery. Platform capabilities vary — the host/bridge handles graceful degradation (e.g., reaction on a platform that doesn't support it → skip or send as text).
|
||||
|
||||
### Agent-to-Agent Communication
|
||||
|
||||
Sending a message to another agent uses the same routing fields as channel delivery. The agent-runner sets `channel_type: 'agent'` and `platform_id` to the target agent group ID. Optionally, `thread_id` can target a specific session (null = find or create the default session).
|
||||
|
||||
From the sending agent's perspective, it's the same mechanism as sending to Slack or WhatsApp — just a messages_out row with different routing. The host reads it, checks that this agent group has permission to message the target, resolves the target session, and writes a messages_in row to that session's DB.
|
||||
|
||||
```json
|
||||
// messages_out routing fields
|
||||
{ "kind": "chat", "channel_type": "agent", "platform_id": "pr-worker", "thread_id": null }
|
||||
// messages_out content
|
||||
{ "text": "Reset your session and re-review", "sender": "Supervisor", "senderId": "agent:pr-admin" }
|
||||
```
|
||||
|
||||
The receiving agent gets a normal chat message. It doesn't need to know the source is another agent unless that's relevant context.
|
||||
|
||||
### Routing
|
||||
|
||||
**Default behavior:** Agent-runner copies routing fields (`kind`, `platform_id`, `channel_type`, `thread_id`) from the messages_in row to messages_out. Response goes back where it came from.
|
||||
|
||||
**Host validation:** Before delivering, the host checks that this agent group is permitted to send to the destination. The agent-runner copies routing; the host validates.
|
||||
|
||||
**Multi-destination pattern (customization):** An agent may need to send to a different channel than the origin (e.g., a webhook triggers a Slack notification). This is supported via custom code, not built into the core:
|
||||
|
||||
1. Add a `destinations` table to the session DB mapping logical names to routing fields
|
||||
2. Populate it from the host when setting up the session
|
||||
3. Modify the agent's prompt to list available destinations
|
||||
4. Agent chooses a destination by name; agent-runner resolves to routing fields
|
||||
5. Host validates as usual
|
||||
|
||||
This is documented as a pattern, not a built-in feature.
|
||||
|
||||
## What Stays the Same
|
||||
- Container isolation via filesystem mounts
|
||||
- Credential proxy (OneCLI)
|
||||
- Per-agent-group workspace (folder, CLAUDE.md, skills)
|
||||
- Polling-based (not event-driven)
|
||||
- Per-agent-group agent-runner recompilation on container startup (agent can modify its own source, request rebuild/restart, changes persist across teardowns)
|
||||
|
||||
## What Changes
|
||||
|
||||
| Component | v1 | v2 |
|
||||
|-----------|----|----|
|
||||
| Host ↔ container IO | stdin + IPC files | Mounted session DB (messages_in / messages_out) |
|
||||
| Container input | Prompt string piped to stdin | Agent-runner polls messages_in |
|
||||
| Container output | stdout markers | Agent-runner writes to messages_out |
|
||||
| Agent commands | IPC JSON files | messages_out with `kind: 'system'` |
|
||||
| Agent-to-agent | Not supported | messages_out with target agent routing |
|
||||
| Scheduling | Separate scheduler + task table | `process_after` / `deliver_after` + `recurrence` on messages |
|
||||
| Media | Not supported | Signed URLs, downloaded in container |
|
||||
| Channel adapters | Custom per-platform | Chat SDK bridge + standard interface |
|
||||
| Routing | Host checks registeredGroups map | Channel adapter extracts IDs, host maps to entities |
|
||||
| Concurrency | GroupQueue (in-memory) | Chat SDK per-channel + container limits |
|
||||
| Session scoping | One session per agent group folder | Per-session DB, multiple sessions per agent group |
|
||||
|
||||
## Design Decisions
|
||||
|
||||
**Session DB location:** Not in the agent group folder. Separate directory (e.g., `sessions/{session_id}/`). Each session gets its own folder containing `session.db` and the Claude SDK's `.claude/` directory. The session identity IS the folder — no need to track Claude SDK session IDs.
|
||||
|
||||
**Container mount structure:**
|
||||
|
||||
```
|
||||
/workspace/ ← mount: session folder (read-write)
|
||||
.claude/ ← Claude SDK session data (auto-created)
|
||||
session.db ← session SQLite DB
|
||||
outbox/ ← agent-runner writes outbound files here
|
||||
agent/ ← mount: agent group folder (nested, read-write)
|
||||
CLAUDE.md ← agent instructions
|
||||
skills/ ← agent skills
|
||||
... working files
|
||||
```
|
||||
|
||||
Two directory mounts: session folder at `/workspace`, agent group folder at `/workspace/agent/`. The agent-runner CDs into `/workspace/agent/` to run the agent. Claude SDK writes `.claude/` at `/workspace/.claude/` (root of the workspace). The session DB is at `/workspace/session.db`.
|
||||
|
||||
This works on both Docker (nested bind mounts) and Apple Container (directory mounts only — no file-level mounts, but nested directory mounts are supported).
|
||||
|
||||
**Session DB concurrent access:** The host writes messages_in, the agent-runner writes messages_out. Both access the same SQLite file simultaneously. WAL mode handles this — SQLite allows concurrent readers, and the two sides write to different tables so writer contention is minimal. The host enables WAL mode when creating the session DB.
|
||||
|
||||
**Session management:** Host-managed. The host creates session folders and mounts them. The container only sees its own session folder.
|
||||
|
||||
**Session creation (no race condition):**
|
||||
|
||||
1. Message arrives, host checks central DB for a session matching this group + thread
|
||||
2. No session exists → host atomically creates session row in central DB, creates the session folder, creates the session DB, writes the message
|
||||
3. More messages arrive before container starts → host finds the existing session, writes to the same session DB
|
||||
4. Container starts, mounts the folder, agent-runner finds messages waiting
|
||||
|
||||
The central DB session row creation is the serialization point. No Claude SDK session ID to coordinate — the SDK discovers its own session data in `.claude/` when the agent runs.
|
||||
|
||||
**System actions:** The agent uses MCP tools (register group, reset session, schedule task, etc.). The agent-runner handles these tool calls and writes a structured, deterministic messages_out row with `kind: 'system'`. This is not natural language — it's a programmatic, structured payload that the host processes deterministically. Host validates permissions, executes, and writes the result back as a `system` messages_in row.
|
||||
|
||||
**Container lifecycle:** No warm pool. Containers are spawned on demand (wakeUpAgent) and torn down from the outside by the host when idle. Existing idle detection + teardown mechanism carries over.
|
||||
|
||||
## Operational Behavior
|
||||
|
||||
### Output Delivery
|
||||
|
||||
NanoClaw does not stream tokens to users. The Claude Agent SDK's `query()` yields complete results. The agent-runner writes one complete message to messages_out per result. The host delivers complete messages to channels.
|
||||
|
||||
Message editing is supported as an explicit operation (agent calls an `edit_message` tool), not as a streaming mechanism.
|
||||
|
||||
Typing indicators: host sets typing when a container is active for a session, clears when the container exits or a response appears in messages_out.
|
||||
|
||||
### Message Batching
|
||||
|
||||
When multiple messages arrive while the container is down, they accumulate as `handled = 0` rows in messages_in. When the container wakes up, the agent-runner queries all unhandled messages and processes them as a batch — same as v1 where multiple messages are formatted into a single `<messages>` XML block.
|
||||
|
||||
### Message Lifecycle
|
||||
|
||||
```
|
||||
pending → processing → completed
|
||||
→ failed (after max retries)
|
||||
```
|
||||
|
||||
- **pending**: Written by host. Ready to be picked up (if `process_after` is null or past).
|
||||
- **processing**: Agent-runner sets this when it picks up the message. `status_changed` is set to now. Prevents other polls from re-picking the same message.
|
||||
- **completed**: Agent-runner sets this after successful processing.
|
||||
- **failed**: Set after max retries exhausted.
|
||||
|
||||
**Stale detection**: If a message is `processing` but `status_changed` is too old (e.g., >10 minutes), the host assumes the container crashed. It resets the message to `pending`, increments `tries`, and sets `process_after` with exponential backoff.
|
||||
|
||||
### Error Handling and Retries
|
||||
|
||||
Retries use `process_after` with exponential backoff. Each retry increments `tries` and pushes `process_after` further out:
|
||||
|
||||
- Try 1: immediate
|
||||
- Try 2: +5s
|
||||
- Try 3: +10s
|
||||
- Try 4: +20s
|
||||
- Try 5: +40s
|
||||
- After max retries: status set to `failed`
|
||||
|
||||
The host computes this — not the agent-runner. When the host detects a stale `processing` message or the container exits with an error, it increments `tries`, computes the next `process_after`, and resets status to `pending`.
|
||||
|
||||
**Output-sent protection**: If messages_out already has delivered rows for a batch, don't retry (prevents duplicate messages to user).
|
||||
|
||||
### Host Polling
|
||||
|
||||
Two tiers:
|
||||
- **Active containers (~1s)**: Poll session DBs for new messages_out rows to deliver
|
||||
- **All sessions (~60s)**: Sweep all session DBs for due `process_after` / `deliver_after` timestamps, handle recurrence
|
||||
|
||||
## Flexibility Model
|
||||
|
||||
The architecture is **flexible for code changes, not configurable for everything**. Advanced setups (like the PR Factory below) use custom routing logic and host-side hooks — not database config columns.
|
||||
|
||||
### What the base architecture must support primitively
|
||||
|
||||
These are the building blocks. None require special abstractions — they fall out of per-session DBs, host-managed routing, and messages_out with `kind: 'system'`:
|
||||
|
||||
1. **Multiple agent groups on the same channel with content-based routing.** Different messages in the same thread can route to different agent groups based on content (e.g., @mention routes to supervisor, normal messages route to worker). The channel adapter's routing logic — custom code — decides.
|
||||
|
||||
2. **Per-thread sessions from a shared agent group.** Multiple sessions share the same agent group (filesystem, skills, CLAUDE.md) but each gets its own session DB. Standard for worker pools.
|
||||
|
||||
3. **Session reset and replay.** Create a new session for the same thread. Mark old messages as unhandled so the poll picks them up again. Old output stays visible in the platform (e.g., Discord thread) for comparison. This is an action an agent can request — not automatic.
|
||||
|
||||
4. **Cross-session read access.** Some agents can query other sessions' data. Different access levels: manager sees messages_in/messages_out (review content). Supervisor sees full internals (agent logs, tool calls, debug traces). This is just filesystem/DB access — mount or query the right paths.
|
||||
|
||||
5. **Context duplication into new sessions.** When a supervisor is invoked in a worker's thread, a new session is created with relevant messages copied in. Custom host-side code handles this.
|
||||
|
||||
6. **Agent-initiated host actions.** The agent uses MCP tools (reset session, update skills, etc.). The agent-runner handles the tool call and writes a structured `system` messages_out row. The host reads and executes with permission checks. The agent can request, but the host decides.
|
||||
|
||||
### Example: PR Factory
|
||||
|
||||
Three agent groups, one Discord channel (PR Factory), plus an admin channel:
|
||||
|
||||
| Role | Agent Group | Where | Session model |
|
||||
|------|-------------|-------|---------------|
|
||||
| **Worker** | pr-worker | PR Factory threads | One session per thread (per PR) |
|
||||
| **Manager** | pr-manager | PR Factory channel | Single session, queries across worker sessions |
|
||||
| **Supervisor** | pr-admin | Admin channel + PR Factory (when @tagged) | Main session in admin channel; per-thread session when invoked in worker threads |
|
||||
|
||||
**Worker flow:** GitHub PR → Discord thread → worker agent reviews (triage, review, test plan). Each thread gets a session from the shared pr-worker group.
|
||||
|
||||
**Feedback flow:** User @tags supervisor in worker threads → custom routing sends to supervisor with a new session containing the thread's messages (duplicated). Supervisor collects feedback to filesystem. Worker doesn't see supervisor messages.
|
||||
|
||||
**Iteration flow:** User discusses feedback with supervisor in admin channel → supervisor suggests skill changes (shown as rich card with diff) → user approves → supervisor applies changes via host action → supervisor requests session reset + replay → workers re-review same PRs with updated skills in same threads but fresh sessions → user compares reviews side by side.
|
||||
|
||||
**Manager flow:** User talks to manager in PR Factory main channel (not in threads). Manager can search across all worker session DBs (messages_in/messages_out) to answer questions like "how many PRs today?" or "what topics are trending?" Can request actions (close PR, re-open).
|
||||
|
||||
**What's custom code vs. base architecture:**
|
||||
|
||||
| Capability | Base architecture | Custom code (PR Factory) |
|
||||
|-----------|-------------------|-------------------------|
|
||||
| Per-thread sessions | ✓ platformThreadId → session | |
|
||||
| Shared agent group across sessions | ✓ Multiple sessions, one group | |
|
||||
| Writing messages to session DB | ✓ Standard flow | |
|
||||
| @mention routing to different agent | | ✓ Channel adapter routing logic |
|
||||
| Context duplication into supervisor session | | ✓ Host-side hook on supervisor invocation |
|
||||
| Session reset + replay | ✓ Primitives (new session, mark unhandled) | ✓ Supervisor action triggers it |
|
||||
| Skill updates | ✓ Filesystem writes | ✓ Supervisor action applies changes |
|
||||
| Cross-session queries | ✓ DB/filesystem access | ✓ Manager's tools know where to look |
|
||||
| Rich card output | ✓ Structured output in messages_out | |
|
||||
|
||||
## Central DB Schema
|
||||
|
||||
The central DB handles routing and entity management. All content and execution state lives in per-session DBs.
|
||||
|
||||
```sql
|
||||
-- Agent workspaces: folder, skills, CLAUDE.md, container config
|
||||
CREATE TABLE agent_groups (
|
||||
id TEXT PRIMARY KEY,
|
||||
name TEXT NOT NULL,
|
||||
folder TEXT NOT NULL UNIQUE,
|
||||
is_admin INTEGER DEFAULT 0,
|
||||
agent_provider TEXT, -- default for sessions (null = system default)
|
||||
container_config TEXT, -- JSON: { additionalMounts, timeout }
|
||||
created_at TEXT NOT NULL
|
||||
);
|
||||
|
||||
-- Platform groups/channels (WhatsApp group, Slack channel, Discord channel, email thread, etc.)
|
||||
CREATE TABLE messaging_groups (
|
||||
id TEXT PRIMARY KEY,
|
||||
channel_type TEXT NOT NULL, -- 'whatsapp', 'slack', 'discord', 'telegram', 'email'
|
||||
platform_id TEXT NOT NULL, -- platform-specific ID (JID, channel ID, etc.)
|
||||
name TEXT,
|
||||
is_group INTEGER DEFAULT 0,
|
||||
admin_user_id TEXT, -- platform user ID of the group admin (default: whoever set it up)
|
||||
created_at TEXT NOT NULL,
|
||||
UNIQUE(channel_type, platform_id)
|
||||
);
|
||||
|
||||
-- Which agent groups handle which messaging groups, with what rules
|
||||
CREATE TABLE messaging_group_agents (
|
||||
id TEXT PRIMARY KEY,
|
||||
messaging_group_id TEXT NOT NULL REFERENCES messaging_groups(id),
|
||||
agent_group_id TEXT NOT NULL REFERENCES agent_groups(id),
|
||||
trigger_rules TEXT, -- JSON: { pattern, mentionOnly, excludeSenders, includeSenders }
|
||||
response_scope TEXT DEFAULT 'all', -- 'all' | 'triggered' | 'allowlisted'
|
||||
session_mode TEXT DEFAULT 'shared', -- 'shared' | 'per-thread'
|
||||
priority INTEGER DEFAULT 0, -- higher = checked first when multiple agents match
|
||||
created_at TEXT NOT NULL,
|
||||
UNIQUE(messaging_group_id, agent_group_id)
|
||||
);
|
||||
|
||||
-- Sessions: one folder = one session = one container when running
|
||||
-- Folder path is derived: sessions/{agent_group_id}/{session_id}/
|
||||
CREATE TABLE sessions (
|
||||
id TEXT PRIMARY KEY,
|
||||
agent_group_id TEXT NOT NULL REFERENCES agent_groups(id),
|
||||
messaging_group_id TEXT REFERENCES messaging_groups(id), -- null for internal/spawned sessions
|
||||
thread_id TEXT, -- platform thread ID (null for shared session mode)
|
||||
agent_provider TEXT, -- override per session (null = inherit from agent_group)
|
||||
status TEXT DEFAULT 'active', -- 'active' | 'closed'
|
||||
container_status TEXT DEFAULT 'stopped', -- 'running' | 'idle' | 'stopped'
|
||||
last_active TEXT, -- last message activity timestamp
|
||||
created_at TEXT NOT NULL
|
||||
);
|
||||
CREATE INDEX idx_sessions_agent_group ON sessions(agent_group_id);
|
||||
CREATE INDEX idx_sessions_lookup ON sessions(messaging_group_id, thread_id);
|
||||
|
||||
-- Pending interactive questions (cards waiting for user response)
|
||||
-- Host writes when delivering a question card, deletes when response received
|
||||
CREATE TABLE pending_questions (
|
||||
question_id TEXT PRIMARY KEY,
|
||||
session_id TEXT NOT NULL REFERENCES sessions(id),
|
||||
message_out_id TEXT NOT NULL, -- the messages_out row that sent the card
|
||||
platform_id TEXT, -- where the card was delivered
|
||||
channel_type TEXT,
|
||||
thread_id TEXT,
|
||||
created_at TEXT NOT NULL
|
||||
);
|
||||
```
|
||||
|
||||
### Pending Question Flow
|
||||
|
||||
When the host delivers a messages_out row with `operation: 'ask_question'`:
|
||||
1. Host delivers the card via the channel adapter
|
||||
2. Host writes a `pending_questions` row mapping `question_id` → `session_id`
|
||||
|
||||
When a Chat SDK `ActionEvent` (button click) arrives:
|
||||
1. Bridge extracts `actionId` from the event
|
||||
2. Host looks up `pending_questions` by `question_id` (derived from actionId — the bridge maintains the mapping)
|
||||
3. Host finds the target session, writes a messages_in row with `questionId` + `selectedOption`
|
||||
4. Host deletes the `pending_questions` row
|
||||
5. Agent-runner picks up the messages_in row, matches to the pending tool call, returns the selection
|
||||
|
||||
This avoids scanning session DBs. The central DB is the routing lookup — same pattern as message routing.
|
||||
|
||||
Also used for host-generated approval cards: when the host sends an approval request to the admin's DM, it writes a `pending_questions` row. The admin's response is routed back to the originating session.
|
||||
|
||||
### Container lifecycle states
|
||||
|
||||
```
|
||||
stopped → running → idle → stopped
|
||||
↗
|
||||
idle → running (new message while warm)
|
||||
```
|
||||
|
||||
- **stopped**: No container. Swept at 60s for due scheduled messages.
|
||||
- **running**: Actively processing. Polled at 1s for messages_out.
|
||||
- **idle**: Done processing, container still warm (up to 30 min timeout). Polled at 1s so new messages are picked up quickly.
|
||||
- After idle timeout → host kills container → stopped.
|
||||
|
||||
### Migration from v1
|
||||
|
||||
| v1 table | v2 |
|
||||
|----------|-----|
|
||||
| `registered_groups` | Split into `agent_groups` + `messaging_groups` + `messaging_group_agents` |
|
||||
| `chats` | Absorbed into `messaging_groups` |
|
||||
| `messages` | Content moves to per-session DBs (messages_in) |
|
||||
| `sessions` (folder → sdk_session_id) | New `sessions` table (folder derived from ID) |
|
||||
| `scheduled_tasks` | Moved to per-session DBs (messages_in with recurrence) |
|
||||
| `task_run_logs` | Dropped — results are in session DB messages_out |
|
||||
| `router_state` | Dropped — replaced by message status in session DBs |
|
||||
|
||||
## Agent-Runner Architecture
|
||||
|
||||
The agent-runner is the process inside the container. It mediates between the session DB and the Claude SDK — polling for work, formatting messages for the agent, translating tool calls into DB rows, and managing the agent lifecycle.
|
||||
|
||||
### IO Model
|
||||
|
||||
All IO goes through the session DB. No stdin, no stdout markers, no IPC files.
|
||||
|
||||
| v1 | v2 |
|
||||
|----|----|
|
||||
| Initial input from stdin (JSON envelope) | Poll `messages_in` |
|
||||
| Follow-up messages from IPC files | Same poll — new rows appear |
|
||||
| Output via stdout markers | Write `messages_out` rows |
|
||||
| MCP tools write IPC files | MCP tools write DB rows |
|
||||
| `_close` sentinel signals shutdown | Host kills container (idle timeout) or agent-runner exits when no pending work |
|
||||
|
||||
### Poll Loop
|
||||
|
||||
1. Query `messages_in WHERE status = 'pending' AND (process_after IS NULL OR process_after <= now())`
|
||||
2. If rows found: set `status = 'processing'`, `status_changed = now()` on each
|
||||
3. Batch messages into a single prompt (strip routing fields, format by kind)
|
||||
4. Push into Claude SDK's MessageStream
|
||||
5. Process agent output → write `messages_out` rows
|
||||
6. Set processed messages to `status = 'completed'`
|
||||
7. Back to step 1. If no messages found, sleep briefly and re-poll (container stays warm for idle timeout)
|
||||
|
||||
### Message Formatting by Kind
|
||||
|
||||
Agent-runner strips routing fields (`platform_id`, `channel_type`, `thread_id`) before formatting. The agent never sees routing info — it only sees content.
|
||||
|
||||
- **`chat`** — format into `<messages>` XML block (same as v1)
|
||||
- **`chat-sdk`** — extract text, author, attachments from serialized message; format into `<messages>` XML
|
||||
- **`task`** — format as `[SCHEDULED TASK]` prefix + prompt. Run pre-script if present (same as v1).
|
||||
- **`webhook`** — format as `[WEBHOOK: source/event]` + JSON payload
|
||||
- **`system`** — host action results (e.g., "register_group succeeded"). Format as system context, not chat.
|
||||
|
||||
Mixed batches (e.g., a chat message + a system result both pending) are combined into one prompt with clear delimiters.
|
||||
|
||||
### MCP Tools
|
||||
|
||||
All v1 IPC-file-based tools are replaced with direct DB writes.
|
||||
|
||||
**Carried over (new implementation):**
|
||||
|
||||
| Tool | What it does |
|
||||
|------|-------------|
|
||||
| `send_message` | Write `messages_out` row, `kind: 'chat'` |
|
||||
| `send_file` | Move file to `outbox/{msg_id}/`, write `messages_out` with filenames |
|
||||
| `schedule_task` | Write `messages_in` row (to self) with `process_after` + `recurrence`. Or `messages_out` with `deliver_after` for outbound reminders. |
|
||||
| `list_tasks` | Query `messages_in WHERE recurrence IS NOT NULL` |
|
||||
| `pause_task` / `resume_task` / `cancel_task` | Modify `messages_in` rows (update status, clear/set recurrence) |
|
||||
| `register_agent_group` | Write `messages_out`, `kind: 'system'`, `action: 'register_agent_group'` |
|
||||
|
||||
**New tools:**
|
||||
|
||||
| Tool | What it does |
|
||||
|------|-------------|
|
||||
| `ask_user_question` | Write `messages_out` with question card. Hold tool call open, poll `messages_in` for response matching `questionId`. Return selection as tool result. |
|
||||
| `edit_message` | Write `messages_out` with `operation: 'edit'` |
|
||||
| `add_reaction` | Write `messages_out` with `operation: 'reaction'` |
|
||||
| `send_to_agent` | Write `messages_out` with `channel_type: 'agent'`, `platform_id: '{target}'` |
|
||||
| `send_card` | Write `messages_out` with card structure |
|
||||
|
||||
See [v2-agent-runner-details.md](v2-agent-runner-details.md) for full MCP tool parameter definitions.
|
||||
|
||||
### Cards
|
||||
|
||||
**Agent-initiated (outbound):** Tool-based. Agent calls `ask_user_question` (interactive card with options) or `send_card` (structured card). Agent-runner writes the card structure to messages_out. Host/adapter handles platform-specific rendering (Slack Block Kit, Discord embeds, Telegram inline keyboard, text fallback).
|
||||
|
||||
**Host-initiated (approval cards):** When an action requires approval, the host generates a standardized approval card and sends it to the admin's DM. These are not agent-initiated — the agent doesn't know about the approval step. The card format is fixed (action description + approve/deny buttons).
|
||||
|
||||
**Inbound (card responses):** Not a card — it's a messages_in row with `questionId` + `selectedOption` in the content. Agent-runner matches to the pending `ask_user_question` tool call and returns the selection as the tool result.
|
||||
|
||||
### Commands
|
||||
|
||||
Messages starting with `/` are checked against three lists:
|
||||
|
||||
**Whitelisted commands (pass-through to agent):**
|
||||
- Standard slash commands that the agent provider handles natively (e.g., Claude's built-in commands)
|
||||
- Passed raw, no `<messages>` XML wrapping
|
||||
|
||||
**Admin-only commands (require admin sender):**
|
||||
- `/remote-control` — remote control session
|
||||
- `/clear` — clear session context
|
||||
- `/compact` — force context compaction
|
||||
- If sent by a non-admin user, the command is rejected with an error message. Not forwarded to the agent.
|
||||
|
||||
**Filtered commands (dropped entirely):**
|
||||
- Commands that don't make sense in the NanoClaw context or could cause issues
|
||||
- Silently dropped — no error, no forwarding
|
||||
|
||||
The command lists are hardcoded in the agent-runner. Admin verification: the agent-runner checks the `senderId` in the message content against the messaging group's `admin_user_id` (passed to the container as config).
|
||||
|
||||
### Recurring Tasks
|
||||
|
||||
The agent-runner processes recurring task messages like any other messages_in row. After the agent-runner marks a recurring message as `completed`, the **host** handles inserting the next occurrence (new messages_in row with `process_after` advanced to next cron time). The agent-runner doesn't manage recurrence — it just processes what it finds.
|
||||
|
||||
Pre-scripts work the same as v1: if a task message has a `script` field, run it first. If `wakeAgent = false`, mark completed without invoking Claude.
|
||||
|
||||
### Agent-to-Agent Messaging
|
||||
|
||||
**Outbound:** Agent calls `send_to_agent` tool → agent-runner writes messages_out with `channel_type: 'agent'`, `platform_id` = target agent group ID. Host validates permissions and writes to target session's messages_in.
|
||||
|
||||
**Inbound:** Messages from other agents arrive as normal `chat` messages_in rows. The content includes `sender` and `senderId` (e.g., `"senderId": "agent:pr-admin"`). No special formatting — the agent sees it as a chat message.
|
||||
|
||||
### What Stays From v1
|
||||
|
||||
- AgentProvider interface wraps SDK-specific query logic (Claude, Codex, OpenCode)
|
||||
- Session resume via provider-specific mechanisms
|
||||
- System prompt loading from CLAUDE.md files
|
||||
- PreCompact hook for transcript archiving (Claude provider)
|
||||
- Script execution for task-kind messages
|
||||
|
||||
## Open Questions
|
||||
|
||||
- **Approval routing** — how does the host find the admin's DM conversation? What if no DM channel exists? Is the approval list configurable per agent group or global?
|
||||
- **MCP server lifecycle** — does the MCP server process persist across multiple queries in the same container, or restart each time?
|
||||
- **Container startup config** — what config (if any) is passed to the container at launch beyond env vars? The session DB is at a fixed mount path. System prompt comes from CLAUDE.md. Provider name comes from env. What else?
|
||||
- **Idle detection with pending questions** — when `ask_user_question` is waiting for a response, the container should not be considered idle. Also need to detect when the agent is still working (active tool calls, subagents) and avoid killing the container even if no messages_out have been written recently.
|
||||
|
||||
## Related Documents
|
||||
|
||||
- **[v2-api-details.md](v2-api-details.md)** — Channel adapter interface (NanoClaw + Chat SDK bridge), message content examples, host delivery logic
|
||||
- **[v2-agent-runner-details.md](v2-agent-runner-details.md)** — AgentProvider interface, MCP tools, message formatting, media handling, provider implementations (Claude, Codex, OpenCode)
|
||||
Reference in New Issue
Block a user