docs: update token count to 196k tokens · 98% of context window

chore: bump version to 2.1.17
Merge pull request #2759 from assapin/fix/budget-error-surfaced-to-user
2026-06-18 18:29:35 +08:00 · 2026-06-16 11:15:10 +00:00 · 2026-06-16 11:15:04 +00:00 · 2026-06-16 14:14:48 +03:00 · 2026-06-16 11:35:45 +03:00 · 2026-06-16 09:55:25 +03:00
14 changed files with 264 additions and 100 deletions
@@ -37,7 +37,7 @@ rm -f src/providers/codex.ts \
      container/agent-runner/src/providers/codex.factory.test.ts \
      container/agent-runner/src/providers/codex.turns.test.ts \
      container/agent-runner/src/providers/codex-app-server.test.ts \
-      container/agent-runner/src/providers/codex-dockerfile.test.ts \
+      container/agent-runner/src/providers/codex-cli-tools.test.ts \
      setup/providers/codex.ts \
      setup/providers/codex.test.ts \
      setup/providers/codex-registration.test.ts
@@ -47,9 +47,19 @@ This skill itself (`.claude/skills/add-codex/`) stays — it ships with trunk so

 `container/AGENTS.md` stays only if another installed provider uses agent surfaces; otherwise remove it too.

-## 4. Revert the Dockerfile
+## 4. Remove the CLI manifest entry

-Delete the `ARG CODEX_VERSION=...` line and the `RUN pnpm install -g "@openai/codex@${CODEX_VERSION}"` line from `container/Dockerfile`.
+Delete the `@openai/codex` entry from `container/cli-tools.json`:
+
+```bash
+node -e '
+  const fs = require("fs");
+  const file = "container/cli-tools.json";
+  const tools = JSON.parse(fs.readFileSync(file, "utf8")).filter((t) => t.name !== "@openai/codex");
+  const fmt = (t) => "  { " + Object.entries(t).map(([k, v]) => JSON.stringify(k) + ": " + JSON.stringify(v)).join(", ") + " }";
+  fs.writeFileSync(file, "[\n" + tools.map(fmt).join(",\n") + "\n]\n");
+'
+```

 ## 5. Vault secret (optional)

@@ -5,9 +5,9 @@ description: Use Codex (OpenAI's codex app-server) as a full agent provider —

 # Codex agent provider

-> Shortcut: `pnpm exec tsx setup/index.ts --step provider-auth codex` performs this whole install (manifest-driven from the providers branch: files, barrels, Dockerfile pin, image rebuild) plus auth in one command. The steps below are the same operations, for agent-driven or manual application.
+> Shortcut: `pnpm exec tsx setup/index.ts --step provider-auth codex` performs this whole install (manifest-driven from the providers branch: files, barrels, CLI manifest entry, image rebuild) plus auth in one command. The steps below are the same operations, for agent-driven or manual application.

-NanoClaw selects each group's agent backend from `container_configs.provider` (default `claude`). This skill installs the Codex provider: copy the payload from the `providers` branch, append one import to each of the three provider barrels, add the pinned Codex CLI to the Dockerfile, rebuild, then run the vault auth walk-through.
+NanoClaw selects each group's agent backend from `container_configs.provider` (default `claude`). This skill installs the Codex provider: copy the payload from the `providers` branch, append one import to each of the three provider barrels, add the pinned Codex CLI to the container manifest (`container/cli-tools.json`), rebuild, then run the vault auth walk-through.

 The provider runs `codex app-server` as a child process speaking JSON-RPC over stdio: native streaming, MCP tools, server-side conversation history (the continuation is a thread id, no on-disk transcript). Credentials are **vault-only**: OneCLI serves a sentinel `auth.json` stub into the container and swaps the real ChatGPT token or API key on the wire — no key in `.env`, nothing readable in the container.

@@ -21,7 +21,7 @@ Check whether the payload is already wired (a prior apply, or a trunk that still
 - `container/agent-runner/src/providers/codex.ts` and `codex-app-server.ts`
 - `setup/providers/codex.ts`
 - `import './codex.js';` in `src/providers/index.ts`, `container/agent-runner/src/providers/index.ts`, and `setup/providers/index.ts`
- `ARG CODEX_VERSION=` in `container/Dockerfile`
+- an `@openai/codex` entry in `container/cli-tools.json`

 ### Fetch and copy

@@ -45,7 +45,7 @@ Container (`container/agent-runner/src/providers/`):
 - `exchange-archive.test.ts` — writer behavior
 - `codex-registration.test.ts` — barrel-driven container registration guard
 - `codex.factory.test.ts`, `codex.turns.test.ts`, `codex-app-server.test.ts` — provider behavior
- `codex-dockerfile.test.ts` — structural guard for the Dockerfile install
+- `codex-cli-tools.test.ts` — structural guard for the Codex entry in `container/cli-tools.json`

 Setup (`setup/providers/`):
 - `codex.ts` — picker entry self-registration + the vault auth walk-through + install check
@@ -62,15 +62,24 @@ Append `import './codex.js';` to each of:
 - `container/agent-runner/src/providers/index.ts`
 - `setup/providers/index.ts`

-### Dockerfile
+### CLI manifest

-Copy the two Codex lines verbatim from the branch (the branch's Dockerfile is the canonical pin — do not hand-type a version):
+The agent's global Node CLIs install from `container/cli-tools.json` (a json-merge seam), not hand-edited Dockerfile layers. Add Codex by appending one entry — `@openai/codex` has no native postinstall, so no `onlyBuilt`:

 ```bash
-git show origin/providers:container/Dockerfile | grep -A1 'ARG CODEX_VERSION'
+node -e '
+  const fs = require("fs");
+  const file = "container/cli-tools.json";
+  const tools = JSON.parse(fs.readFileSync(file, "utf8"));
+  if (!tools.some((t) => t.name === "@openai/codex")) {
+    tools.push({ name: "@openai/codex", version: "0.138.0" });
+    const fmt = (t) => "  { " + Object.entries(t).map(([k, v]) => JSON.stringify(k) + ": " + JSON.stringify(v)).join(", ") + " }";
+    fs.writeFileSync(file, "[\n" + tools.map(fmt).join(",\n") + "\n]\n");
+  }
+'
 ```

-Add the `ARG CODEX_VERSION=<pinned>` line to the version-args block and the `RUN pnpm install -g "@openai/codex@${CODEX_VERSION}"` line to the global-install block (its own layer).
+The version (`0.138.0`) is the canonical pin — keep it in sync with `setup/add-codex.sh`. The Dockerfile already installs every manifest entry via pinned `pnpm install -g`; no Dockerfile edit is needed.

 ### Build

@@ -80,6 +89,22 @@ pnpm exec tsc -p container/agent-runner/tsconfig.json --noEmit
 ./container/build.sh
 ```

+### Restart the host
+
+The image rebuild does not reload the **host**. Codex's host contribution
+(`src/providers/codex.ts`) registers the `/home/node/.codex` bind mount + env
+passthrough, and the running host only picks it up on restart. Skip this and the
+first Codex turn fails with `EACCES` writing `/home/node/.codex/config.toml` —
+with no mount, Docker auto-creates the dir root-owned and the non-root container
+user can't write to it.
+
+```bash
+# macOS (launchd)
+launchctl kickstart -k gui/$(id -u)/com.nanoclaw
+# Linux (systemd)
+systemctl --user restart nanoclaw
+```
+
 ### Validate

 ```bash
@@ -91,6 +116,8 @@ The registration tests import only the real barrels — they go red if a barrel

 ## Authenticate

+> **Run this in a separate, real terminal — it is interactive.** It prompts for ChatGPT-subscription vs OpenAI-API-key and then drives a browser/device login, so it needs a TTY to answer prompts.
+
 ```bash
 pnpm exec tsx setup/index.ts --step provider-auth codex
 ```
@@ -113,5 +140,5 @@ There is no install-wide default provider. Setup's provider picker sets codex on
 ## Troubleshooting

 - **Container dies at boot, channel silent:** `grep 'Container exited non-zero' logs/nanoclaw.error.log` — the `stderrTail` carries the reason (e.g. `Unknown provider: codex. Registered: claude` means the barrels aren't wired in the running build).
- **In-channel `Error: spawn codex ENOENT` on every message:** the image predates the Dockerfile edit — re-run `./container/build.sh`.
+- **In-channel `Error: spawn codex ENOENT` on every message:** the image predates the manifest entry — re-run `./container/build.sh`.
 - **Auth errors mid-conversation:** the vault secret is missing or stale — re-run `pnpm exec tsx setup/index.ts --step provider-auth codex` (subscription re-login updates the vault copy).
@@ -0,0 +1,39 @@
+// Structural guard for the Codex CLI install in container/cli-tools.json.
+//
+// @openai/codex is a CLI *binary* installed from the global-CLI manifest (a
+// json-merge seam), not an importable package, so the barrel-driven
+// registration tests cannot see it. This test reads the real cli-tools.json
+// and asserts the @openai/codex entry is present and pinned to an exact
+// version. It goes red if the manifest entry is dropped or unpins.
+//
+// Runs under bun (same suite as the container registration test):
+//   cd container/agent-runner && bun test src/providers/codex-cli-tools.test.ts
+
+import { existsSync, readFileSync } from 'fs';
+import path from 'path';
+
+import { describe, it, expect } from 'bun:test';
+
+// container/agent-runner/src/providers/ -> container/cli-tools.json
+const MANIFEST = path.join(import.meta.dir, '..', '..', '..', 'cli-tools.json');
+const manifestPresent = existsSync(MANIFEST);
+
+// Read lazily — `describe.skipIf` still runs the body to register tests, so the
+// read has to be guarded for the bare-branch (no manifest) case.
+const tools: Array<{ name: string; version: string }> = manifestPresent
+  ? JSON.parse(readFileSync(MANIFEST, 'utf8'))
+  : [];
+const codex = tools.find((t) => t.name === '@openai/codex');
+
+// cli-tools.json is a trunk file; on the bare providers branch it isn't present,
+// so skip there. In an installed tree (trunk + this payload) it must carry the
+// pinned @openai/codex entry.
+describe.skipIf(!manifestPresent)('container/cli-tools.json codex CLI install', () => {
+  it('includes the @openai/codex entry', () => {
+    expect(codex).toBeDefined();
+  });
+
+  it('pins it to an exact semver (no latest, no ranges)', () => {
+    expect(codex?.version).toMatch(/^\d+\.\d+\.\d+(?:[-+][0-9A-Za-z.-]+)?$/);
+  });
+});
@@ -1,30 +0,0 @@
-// Structural guard for the Codex CLI install in container/Dockerfile.
-//
-// @openai/codex is a CLI *binary* installed via the Dockerfile, not an
-// importable package, so the barrel-driven registration tests cannot see it.
-// This test reads the real Dockerfile and asserts the version ARG and the
-// `pnpm install -g` line for @openai/codex are both present. It goes red if
-// either Dockerfile edit is dropped or drifts.
-//
-// Runs under bun (same suite as the container registration test):
-//   cd container/agent-runner && bun test src/providers/codex-dockerfile.test.ts
-
-import { readFileSync } from 'fs';
-import path from 'path';
-
-import { describe, it, expect } from 'bun:test';
-
-// container/agent-runner/src/providers/ -> container/Dockerfile
-const DOCKERFILE = path.join(import.meta.dir, '..', '..', '..', 'Dockerfile');
-
-describe('container/Dockerfile codex CLI install', () => {
-  const dockerfile = readFileSync(DOCKERFILE, 'utf8');
-
-  it('declares the CODEX_VERSION ARG', () => {
-    expect(dockerfile).toMatch(/ARG\s+CODEX_VERSION=/);
-  });
-
-  it('installs the @openai/codex CLI pinned to that ARG', () => {
-    expect(dockerfile).toMatch(/pnpm install -g\s+"@openai\/codex@\$\{CODEX_VERSION\}"/);
-  });
-});
@@ -121,6 +121,7 @@ Bucket the upstream changed files:
 - **Host source** (`src/`): may conflict if user modified the same files
 - **Container** (`container/`): triggers container rebuild (+ typecheck if `agent-runner/src/` changed)
 - **Build/config** (`package.json`, `pnpm-lock.yaml`, `tsconfig*.json`): lockfile changes trigger dep install
+- **Version pins** (`versions.json`): a changed `onecli-gateway` / `onecli-cli` value requires upgrading the OneCLI gateway/CLI to match — see Step 5.5
 - **Other**: docs, tests, setup scripts, misc

 **Large drift check:** If the upstream commit count and age suggest the user has a lot of catching up to do, mention that `/migrate-nanoclaw` might be a better fit — it extracts customizations and reapplies them on clean upstream instead of merging. Offer it as an option but don't push.
@@ -215,6 +216,11 @@ If build fails:
 - Do not refactor unrelated code.
 - If unclear, ask the user before making changes.

+# Step 5.5: OneCLI upgrade (if pins moved)
+The OneCLI gateway and CLI are external components pinned in `versions.json`; when a pin moves, the running version must be upgraded to match or the new code may fail against it.
+
+If `git diff <backup-tag-from-step-1>..HEAD -- versions.json` shows the `onecli-gateway` or `onecli-cli` value changed, follow `docs/onecli-upgrades.md` before the service restart (Step 8). Otherwise skip.
+
 # Step 6: Breaking changes check
 After validation succeeds, check if the update introduced any breaking changes.

@@ -4,7 +4,8 @@ All notable changes to NanoClaw will be documented in this file.

 ## [Unreleased]

- [BREAKING] **`@onecli-sh/sdk` 0.5.0 -> 2.2.1 — requires a OneCLI server with the `/v1` API** (older servers 404 every SDK call). The sanctioned gateway and CLI versions are pinned in `versions.json`; the `onecli` setup step enforces them. **Migration:** [docs/onecli-upgrades.md](docs/onecli-upgrades.md).
+- **Budget/billing-exhausted LLM turns now reach the user instead of being silently dropped.** When a turn ends in a non-retryable provider error (e.g. an Anthropic `403 billing_error`) with no `<message>` wrapping, the agent-runner delivers the provider's notice to the originating channel and stops re-nudging the failing gateway. `providers/claude.ts` now surfaces the SDK's `is_error` flag (and the error subtype's `errors[]` text); `poll-loop.ts` delivers that text and skips the re-wrap retry. Fixes the case where a spend-limit notice produced silence plus a turn-after-turn retry loop.
+- [BREAKING] **`@onecli-sh/sdk` 0.5.0 -> 2.2.1 — requires a OneCLI server with the `/v1` API** (older servers 404 every SDK call). The sanctioned gateway and CLI versions are pinned in `versions.json`. **The gateway is a separate component — updating NanoClaw does not upgrade it for you:** `/update-nanoclaw` upgrades it when the pin moves, otherwise upgrade manually. **Migration:** [docs/onecli-upgrades.md](docs/onecli-upgrades.md).
 - **New agent provider: Codex (OpenAI) — run `/add-codex`.** Full runtime via `codex app-server` (planning, MCP tools, server-side history, resume). Trunk ships the seams and the skill; the payload installs from the `providers` branch (the skill, the setup picker, or `--step provider-auth codex`). Auth is vault-only — no credential ever enters a container.
 - **Setup can now select, install, and authenticate a non-default agent provider.** A provider registry feeds the setup picker, an installer pulls the provider's payload from its branch, a vault auth walkthrough runs (`--step provider-auth`), and the picked provider is set on the first agent (a DB property) before its first spawn. Default (Claude) installs are unaffected — picking Claude changes nothing.
 - **Provider choice is explicit per group — no install-wide default.** Provider is a DB property set via `ncl groups config update --provider` + restart; creation is provider-agnostic.
@@ -69,8 +69,8 @@ For ad-hoc queries from skills or scripts, use the in-tree wrapper rather than t
 | `src/modules/permissions/access.ts` | `canAccessAgentGroup` — owner / global admin / scoped admin / member resolution against `user_roles` + `agent_group_members` |
 | `src/modules/approvals/primitive.ts` | `pickApprover`, `pickApprovalDelivery`, `requestApproval`, approval-handler registry |
 | `src/command-gate.ts` | Router-side admin command gate — queries `user_roles` directly (no env var, no container-side check) |
-| `src/onecli-approvals.ts` | OneCLI credentialed-action approval bridge |
-| `src/user-dm.ts` | Cold-DM resolution + `user_dms` cache |
+| `src/modules/approvals/onecli-approvals.ts` | OneCLI credentialed-action approval bridge |
+| `src/modules/permissions/user-dm.ts` | Cold-DM resolution + `user_dms` cache |
 | `src/group-init.ts` | Per-agent-group filesystem scaffold (CLAUDE.md, skills, agent-runner-src overlay) |
 | `src/db/container-configs.ts` | CRUD for `container_configs` table (per-group container runtime config) |
 | `src/backfill-container-configs.ts` | Migrates legacy `container.json` files into the DB on startup |
@@ -152,7 +152,7 @@ Key files: `src/container-restart.ts`, `src/container-runner.ts` (`killContainer

 ## Secrets / Credentials / OneCLI

-API keys, OAuth tokens, and auth credentials are managed by the OneCLI gateway. Secrets are injected into per-agent containers at request time — none are passed in env vars or through chat context. The container agent sees this via the `onecli-gateway` container skill (`container/skills/onecli-gateway/SKILL.md`), which teaches it how the proxy works, how to handle auth errors, and to never ask for raw credentials. Host-side wiring: `src/onecli-approvals.ts`, `ensureAgent()` in `container-runner.ts`. Run `onecli --help`.
+API keys, OAuth tokens, and auth credentials are managed by the OneCLI gateway. Secrets are injected into per-agent containers at request time — none are passed in env vars or through chat context. The container agent sees this via the `onecli-gateway` container skill (`container/skills/onecli-gateway/SKILL.md`), which teaches it how the proxy works, how to handle auth errors, and to never ask for raw credentials. Host-side wiring: `src/modules/approvals/onecli-approvals.ts`, `ensureAgent()` in `container-runner.ts`. Run `onecli --help`.

 ### Secret modes

@@ -4,8 +4,9 @@ import { initTestSessionDb, closeSessionDb, getInboundDb, getOutboundDb } from '
 import { getPendingMessages, markCompleted } from './db/messages-in.js';
 import { getUndeliveredMessages } from './db/messages-out.js';
 import { formatMessages, extractRouting } from './formatter.js';
-import { isCorruptionError } from './poll-loop.js';
+import { isCorruptionError, processQuery } from './poll-loop.js';
 import { MockProvider } from './providers/mock.js';
+import type { AgentQuery, ProviderEvent } from './providers/types.js';

 beforeEach(() => {
  initTestSessionDb();
@@ -379,6 +380,64 @@ describe('end-to-end with mock provider', () => {
  });
 });

+/**
+ * Build a one-shot stub query that yields init + a single result event, then
+ * ends. `pushes` records any follow-ups the loop tried to inject (e.g. the
+ * re-wrap nudge), so a test can assert the loop did NOT re-hammer.
+ */
+function makeResultQuery(result: ProviderEvent): { query: AgentQuery; pushes: string[] } {
+  const pushes: string[] = [];
+  async function* events(): AsyncGenerator<ProviderEvent> {
+    yield { type: 'init', continuation: 'sess-1' };
+    yield result;
+  }
+  return {
+    pushes,
+    query: {
+      push: (m: string) => {
+        pushes.push(m);
+      },
+      end: () => {},
+      events: events(),
+      abort: () => {},
+    },
+  };
+}
+
+const ERR_ROUTING = {
+  platformId: 'chan-1',
+  channelType: 'discord',
+  threadId: null,
+  inReplyTo: 'm1',
+};
+
+describe('error result with no <message> envelope', () => {
+  it('delivers a budget/billing error to the triggering channel and does not nudge', async () => {
+    const budgetText = 'Spending limit reached. Add your own key at https://example.com/keys';
+    const { query, pushes } = makeResultQuery({ type: 'result', text: budgetText, isError: true });
+
+    await processQuery(query, ERR_ROUTING, ['m1'], 'claude', undefined, 'prompt', undefined);
+
+    const out = getUndeliveredMessages();
+    expect(out).toHaveLength(1);
+    expect(JSON.parse(out[0].content).text).toBe(budgetText);
+    expect(out[0].platform_id).toBe('chan-1');
+    expect(out[0].channel_type).toBe('discord');
+    // No re-wrap nudge — an error result must not re-hammer the gateway.
+    expect(pushes).toHaveLength(0);
+  });
+
+  it('still nudges (and does not deliver) a normal unwrapped result', async () => {
+    const { query, pushes } = makeResultQuery({ type: 'result', text: 'bare text, no envelope' });
+
+    await processQuery(query, ERR_ROUTING, ['m1'], 'claude', undefined, 'prompt', undefined);
+
+    expect(getUndeliveredMessages()).toHaveLength(0);
+    expect(pushes).toHaveLength(1);
+    expect(pushes[0]).toContain('was not delivered');
+  });
+});
+
 describe('isCorruptionError', () => {
  it('matches the Docker Desktop macOS torn-read symptom', () => {
    expect(isCorruptionError('database disk image is malformed')).toBe(true);
@@ -323,7 +323,7 @@ interface QueryResult {
  continuation?: string;
 }

-async function processQuery(
+export async function processQuery(
  query: AgentQuery,
  routing: RoutingContext,
  initialBatchIds: string[],
@@ -482,28 +482,43 @@ async function processQuery(
        // at all — either way the turn is finished.
        markCompleted(initialBatchIds);
        if (event.text) {
-          const { hasUnwrapped } = dispatchResultText(event.text, routing);
-          const willRetryWrapping = hasUnwrapped && !unwrappedNudged;
-          notifyExchangeComplete(onExchangeComplete, {
-            prompt: archivePrompts[0] ?? initialPrompt,
-            result: event.text,
-            continuation: queryContinuation ?? initialContinuation,
-            status: hasUnwrapped ? 'undelivered' : 'completed',
-          });
-          if (willRetryWrapping) {
-            unwrappedNudged = true;
-            const destinations = getAllDestinations();
-            const names = destinations.map((d) => d.name).join(', ');
-            query.push(
-              `<system>Your response was not delivered — it was not wrapped in <message to="name">...</message> blocks. ` +
-                `All output must be wrapped: use <message to="name"> for content to send, or <internal> for scratchpad. ` +
-                `Your destinations: ${names}. ` +
-                `Please re-send your response with the correct wrapping.</system>`,
-            );
+          const { sent, hasUnwrapped } = dispatchResultText(event.text, routing);
+          if (sent === 0 && event.isError === true) {
+            // Non-retryable error turn (e.g. a 403 billing_error) with no
+            // <message> envelope: deliver the notice instead of dropping it as
+            // scratchpad, and skip the re-wrap nudge — it would just re-hammer
+            // the failing gateway turn after turn.
+            deliverErrorResult(event.text, routing);
+            notifyExchangeComplete(onExchangeComplete, {
+              prompt: archivePrompts[0] ?? initialPrompt,
+              result: event.text,
+              continuation: queryContinuation ?? initialContinuation,
+              status: 'error',
+            });
+            archivePrompts.shift();
+          } else {
+            const willRetryWrapping = hasUnwrapped && !unwrappedNudged;
+            notifyExchangeComplete(onExchangeComplete, {
+              prompt: archivePrompts[0] ?? initialPrompt,
+              result: event.text,
+              continuation: queryContinuation ?? initialContinuation,
+              status: hasUnwrapped ? 'undelivered' : 'completed',
+            });
+            if (willRetryWrapping) {
+              unwrappedNudged = true;
+              const destinations = getAllDestinations();
+              const names = destinations.map((d) => d.name).join(', ');
+              query.push(
+                `<system>Your response was not delivered — it was not wrapped in <message to="name">...</message> blocks. ` +
+                  `All output must be wrapped: use <message to="name"> for content to send, or <internal> for scratchpad. ` +
+                  `Your destinations: ${names}. ` +
+                  `Please re-send your response with the correct wrapping.</system>`,
+              );
+            }
+            // The wrapping-retry result answers the SAME user prompt — keep it
+            // queued so the retry archives against it, not the nudge text.
+            if (!willRetryWrapping) archivePrompts.shift();
          }
-          // The wrapping-retry result answers the SAME user prompt — keep it
-          // queued so the retry archives against it, not the nudge text.
-          if (!willRetryWrapping) archivePrompts.shift();
        } else {
          archivePrompts.shift();
        }
@@ -557,6 +572,26 @@ function handleEvent(event: ProviderEvent, _routing: RoutingContext): void {
  }
 }

+/**
+ * Deliver a turn's text straight to the channel the batch arrived on. Used when
+ * a turn ends in a provider error (e.g. a non-retryable 403 billing_error) with
+ * no <message> envelope: the notice would otherwise be dropped as scratchpad.
+ * This is the same user-facing write the outer catch block does, minus the
+ * `Error:` prefix — the provider's text is already a user-facing message.
+ */
+function deliverErrorResult(text: string, routing: RoutingContext): void {
+  log('Error result with no <message> envelope — delivering to channel');
+  writeMessageOut({
+    id: generateId(),
+    in_reply_to: routing.inReplyTo,
+    kind: 'chat',
+    platform_id: routing.platformId,
+    channel_type: routing.channelType,
+    thread_id: routing.threadId,
+    content: JSON.stringify({ text }),
+  });
+}
+
 /**
 * Parse the agent's final text for <message to="name">...</message> blocks
 * and dispatch each one to its resolved destination. Text outside of blocks
@@ -440,8 +440,13 @@ export class ClaudeProvider implements AgentProvider {
        if (message.type === 'system' && message.subtype === 'init') {
          yield { type: 'init', continuation: message.session_id };
        } else if (message.type === 'result') {
-          const text = 'result' in message ? (message as { result?: string }).result ?? null : null;
-          yield { type: 'result', text };
+          // `result` text exists only on subtype:"success"; error subtypes
+          // (e.g. a non-retryable 403 billing_error) carry their message in
+          // `errors[]` instead. Surface either so the poll-loop can deliver a
+          // billing/quota notice to the user rather than dropping the turn.
+          const m = message as { result?: string; is_error?: boolean; errors?: string[] };
+          const text = m.result ?? (m.errors && m.errors.length > 0 ? m.errors.join('\n') : null);
+          yield { type: 'result', text, isError: m.is_error === true };
        } else if (message.type === 'system' && (message as { subtype?: string }).subtype === 'api_retry') {
          yield { type: 'error', message: 'API retry', retryable: true };
        } else if (message.type === 'system' && (message as { subtype?: string }).subtype === 'rate_limit_event') {
@@ -125,7 +125,13 @@ export interface AgentQuery {

 export type ProviderEvent =
  | { type: 'init'; continuation: string }
-  | { type: 'result'; text: string | null }
+  /**
+   * A completed turn. `isError` is set when the underlying SDK flagged the
+   * turn as an error (e.g. a non-retryable Anthropic 403 billing_error). The
+   * poll-loop uses it to surface the result text to the user instead of
+   * dropping it as un-wrapped scratchpad, and to skip the re-wrap nudge.
+   */
+  | { type: 'result'; text: string | null; isError?: boolean }
  | { type: 'error'; message: string; retryable: boolean; classification?: string }
  | { type: 'progress'; message: string }
  /**
@@ -1,6 +1,6 @@
 {
  "name": "nanoclaw",
-  "version": "2.1.15",
+  "version": "2.1.17",
  "description": "Personal Claude assistant. Lightweight, secure, customizable.",
  "type": "module",
  "packageManager": "pnpm@10.33.0",
@@ -1,5 +1,5 @@
-<svg xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" width="90" height="20" role="img" aria-label="194k tokens, 97% of context window">
-  <title>194k tokens, 97% of context window</title>
+<svg xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" width="90" height="20" role="img" aria-label="196k tokens, 98% of context window">
+  <title>196k tokens, 98% of context window</title>
  <linearGradient id="s" x2="0" y2="100%">
    <stop offset="0" stop-color="#bbb" stop-opacity=".1"/>
    <stop offset="1" stop-opacity=".1"/>
@@ -15,8 +15,8 @@
      <g fill="#fff" text-anchor="middle" font-family="Verdana,Geneva,DejaVu Sans,sans-serif" font-size="11">
        <text aria-hidden="true" x="26" y="15" fill="#010101" fill-opacity=".3">tokens</text>
        <text x="26" y="14">tokens</text>
-        <text aria-hidden="true" x="71" y="15" fill="#010101" fill-opacity=".3">194k</text>
-        <text x="71" y="14">194k</text>
+        <text aria-hidden="true" x="71" y="15" fill="#010101" fill-opacity=".3">196k</text>
+        <text x="71" y="14">196k</text>
      </g>
    </g>
  </a>
@@ -1,9 +1,9 @@
 #!/usr/bin/env bash
 #
 # Install the Codex agent provider non-interactively: copy the payload from the
-# `providers` branch, wire the three provider barrels, and pin the Codex CLI in
-# the Dockerfile. The image rebuild is the caller's job (the setup container
-# step / `./container/build.sh`).
+# `providers` branch, wire the three provider barrels, and add the Codex CLI to
+# the container manifest (container/cli-tools.json). The image rebuild is the
+# caller's job (the setup container step / `./container/build.sh`).
 #
 # Emits exactly one status block on stdout (ADD_CODEX); all chatty progress
 # goes to stderr. Keep in sync with .claude/skills/add-codex/SKILL.md.
@@ -12,7 +12,8 @@ set -euo pipefail
 PROJECT_ROOT="$(cd "$(dirname "${BASH_SOURCE[0]}")/.." && pwd)"
 cd "$PROJECT_ROOT"

-# Keep in sync with the providers-branch Dockerfile and add-codex SKILL.md.
+# Keep in sync with add-codex SKILL.md. This is the canonical Codex CLI pin —
+# it lands in container/cli-tools.json (the global-CLI manifest), not the Dockerfile.
 CODEX_VERSION="0.138.0"

 # Resolve the remote carrying the providers branch (same nanoclaw remote that
@@ -38,7 +39,7 @@ PAYLOAD_FILES=(
  container/agent-runner/src/providers/codex.factory.test.ts
  container/agent-runner/src/providers/codex.turns.test.ts
  container/agent-runner/src/providers/codex-app-server.test.ts
-  container/agent-runner/src/providers/codex-dockerfile.test.ts
+  container/agent-runner/src/providers/codex-cli-tools.test.ts
  setup/providers/codex.ts
  setup/providers/codex.test.ts
  setup/providers/codex-registration.test.ts
@@ -63,11 +64,11 @@ emit_status() {
 log() { echo "[add-codex] $*" >&2; }

 # Idempotent: a complete install has the host provider file, the host barrel
-# import, and the Dockerfile pin. Any missing → (re)install.
+# import, and the Codex CLI in the container manifest. Any missing → (re)install.
 need_install() {
  [ ! -f src/providers/codex.ts ] && return 0
  ! grep -q "^import './codex.js';" src/providers/index.ts 2>/dev/null && return 0
-  ! grep -q '@openai/codex@' container/Dockerfile 2>/dev/null && return 0
+  ! grep -q '@openai/codex' container/cli-tools.json 2>/dev/null && return 0
  return 1
 }

@@ -94,22 +95,27 @@ if need_install; then
    grep -q "^import './codex.js';" "$b" || printf "import './codex.js';\n" >> "$b"
  done

-  log "Pinning Codex CLI in the Dockerfile…"
-  DF=container/Dockerfile
-  if ! grep -q "^ARG CODEX_VERSION=" "$DF"; then
-    # Version ARG ahead of the first ARG in the version-args block.
-    awk -v ins="ARG CODEX_VERSION=${CODEX_VERSION}" \
-      'add!=1 && /^ARG /{print ins; add=1} {print}' "$DF" > "$DF.tmp" && mv "$DF.tmp" "$DF"
-  fi
-  if ! grep -q '@openai/codex@' "$DF"; then
-    # Install RUN block (its own cache layer) before the ncl CLI wrapper anchor.
-    awk 'add!=1 && /# ---- ncl CLI wrapper/ {
-           print "RUN --mount=type=cache,target=/root/.cache/pnpm \\"
-           print "    pnpm install -g \"@openai/codex@${CODEX_VERSION}\""
-           print ""
-           add=1
-         } {print}' "$DF" > "$DF.tmp" && mv "$DF.tmp" "$DF"
-  fi
+  log "Adding the Codex CLI to the container manifest (cli-tools.json)…"
+  # A json-merge: append { name, version } if absent. The Dockerfile installs
+  # every manifest entry via pinned `pnpm install -g` — no Dockerfile edit, no
+  # awk surgery. @openai/codex has no native postinstall, so no "onlyBuilt".
+  MANIFEST=container/cli-tools.json
+  node -e '
+    const fs = require("fs");
+    const [file, name, version] = process.argv.slice(1);
+    const tools = JSON.parse(fs.readFileSync(file, "utf8"));
+    if (!tools.some((t) => t.name === name)) {
+      tools.push({ name, version });
+      const fmt = (t) =>
+        "  { " +
+        Object.entries(t).map(([k, v]) => JSON.stringify(k) + ": " + JSON.stringify(v)).join(", ") +
+        " }";
+      fs.writeFileSync(file, "[\n" + tools.map(fmt).join(",\n") + "\n]\n");
+    }
+  ' "$MANIFEST" "@openai/codex" "${CODEX_VERSION}" || {
+    emit_status failed "failed to add @openai/codex to ${MANIFEST}"
+    exit 1
+  }
 fi

 emit_status ok
Author	SHA1	Message	Date
github-actions[bot]	ee7f891698	docs: update token count to 196k tokens · 98% of context window	2026-06-16 11:15:10 +00:00
github-actions[bot]	7fde348e2b	chore: bump version to 2.1.17	2026-06-16 11:15:04 +00:00
Gabi Simons	122135e6dc	Merge pull request #2759 from assapin/fix/budget-error-surfaced-to-user fix(agent-runner): deliver budget/billing error turns instead of dropping them	2026-06-16 14:14:48 +03:00
Gabi Simons	8563fb0681	Merge remote-tracking branch 'origin/main' into fix/budget-error-surfaced-to-user # Conflicts: # CHANGELOG.md	2026-06-16 11:35:45 +03:00
omri-maya	0155ab1943	Merge pull request #2775 from nanocoai/docs/onecli-gateway-upgrade-notice docs(changelog): clarify the OneCLI gateway is a separate, operator-driven upgrade	2026-06-16 09:55:25 +03:00
Koshkoshinsk	d1f94fcd24	docs(changelog): clarify the OneCLI gateway is a separate, operator-driven upgrade The breaking notice said the onecli setup step enforces the pinned versions, which is only true for fresh installs — on an existing install, updating does not upgrade the running gateway. Clarify that the gateway is separate: /update-nanoclaw upgrades it when the pin moves, otherwise upgrade manually per docs/onecli-upgrades.md.	2026-06-15 20:25:42 +03:00
gavrielc	dd60983f7f	Merge pull request #2774 from nanocoai/feat/update-nanoclaw-onecli-pin feat(update-nanoclaw): upgrade OneCLI gateway when its pinned version moves	2026-06-15 20:09:01 +03:00
Koshkoshinsk	096b8bf589	feat(update-nanoclaw): upgrade OneCLI gateway when its pinned version moves When an update moves the onecli-gateway/onecli-cli pin in versions.json, the running gateway must be upgraded to match — otherwise the new code's @onecli-sh/sdk calls fail (404 on /v1/agents) and agents can't spawn. update-nanoclaw never detected this, so the upgrade was silently skipped. Add a conditional step that follows docs/onecli-upgrades.md before restart when the pin moves.	2026-06-15 19:37:23 +03:00
Gabi Simons	59c4d33adc	Merge branch 'main' into fix/budget-error-surfaced-to-user	2026-06-15 17:42:01 +03:00
omri-maya	5f5c28d18d	Merge pull request #2773 from nanocoai/docs/codex-fix-docs docs(add-codex): drop redundant TTY warning in auth note	2026-06-15 16:04:28 +03:00
Koshkoshinsk	b92d1f9343	docs(add-codex): drop redundant TTY warning in auth note The 'don't run via `!` prefix or Bash tool' sentence was redundant with the leading 'Run this in a separate, real terminal — it is interactive.' Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-15 15:32:04 +03:00
Gabi Simons	e03c5c194a	Merge branch 'main' into fix/budget-error-surfaced-to-user	2026-06-15 12:17:20 +03:00
Daniel M	acbb1144b7	Merge pull request #2769 from nanocoai/docs/codex-interactive-host-restart docs(add-codex): flag interactive auth step + add host-restart step	2026-06-15 02:24:06 +03:00
Koshkoshinsk	028897f38f	docs(add-codex): flag interactive auth step + add host-restart step - Authenticate: run in a separate real terminal, not Claude Code's `!` prefix or an agent Bash tool — the provider-auth picker + browser/device login need an interactive TTY, so those prompts stall otherwise (CDX-002). - add a "Restart the host" step after the image rebuild so the host reloads Codex's /home/node/.codex mount + env; skipping it left the dir root-owned and the container hit EACCES writing config.toml (CDX-003). Refs CDX-002, CDX-003. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-15 01:58:30 +03:00
gavrielc	ac0a799cbf	refactor(add-codex): install Codex CLI via cli-tools.json, not the Dockerfile `adfae67` moved the agent's global Node CLIs into container/cli-tools.json so a skill adds one with a json-merge instead of editing the Dockerfile. The Codex provider install was left behind — add-codex.sh still awk'd an ARG + RUN into the Dockerfile and its test guarded that shape. Migrate add-codex to the seam: - add-codex.sh appends { name: "@openai/codex", version } to cli-tools.json (idempotent json-merge); install/idempotency gates read the manifest. - SKILL.md / REMOVE.md document the manifest append/removal, not Dockerfile edits. - codex-dockerfile.test.ts -> codex-cli-tools.test.ts, asserting the manifest entry (skips when the manifest is absent, e.g. the bare providers branch). Pairs with the providers-branch commit that drops the codex Dockerfile lines, renames the payload test, and points the setup install-check at the manifest. Verified end-to-end: full add-codex install into a clean worktree leaves the Dockerfile codex-free, the manifest correctly appended and idempotent; vitest cli-tools.test.ts (6) and bun codex-cli-tools.test.ts (2) green; host tsc clean. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-14 21:40:44 +03:00
github-actions[bot]	e3986eb58c	chore: bump version to 2.1.16	2026-06-14 18:29:28 +00:00
github-actions[bot]	6d0d48d585	docs: update token count to 195k tokens · 98% of context window	2026-06-14 18:29:25 +00:00
gavrielc	a142c496f7	Merge pull request #2756 from nanocoai/provider-selection feat(providers): operator-driven provider selection, switching, and memory migration	2026-06-14 21:29:12 +03:00
Daniel M	ed8b4149e7	Merge pull request #2764 from glifocat/docs/fix-claude-md-relocated-paths docs(CLAUDE.md): fix two relocated Key Files paths	2026-06-14 18:13:31 +03:00
glifocat	d5ce02d1b8	docs(CLAUDE.md): fix two relocated Key Files paths The Key Files table and the Secrets/OneCLI section referenced src/onecli-approvals.ts and src/user-dm.ts, but both files were moved under src/modules/ (src/modules/approvals/onecli-approvals.ts and src/modules/permissions/user-dm.ts). onecli-approvals.ts is already cited at its correct new path elsewhere in the same doc, so this was a partial-rename miss. Docs only — no code changes. Closes #2763 Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-14 17:01:40 +02:00
assafpin	01433bae32	fix(agent-runner): deliver budget/billing error turns instead of dropping them A turn that ends in a non-retryable provider error (e.g. an Anthropic 403 billing_error) comes back from the streaming SDK as a result with is_error=true and no <message> envelope. dispatchResultText treated it as scratchpad and dropped it, then the poll-loop pushed a re-wrap nudge -> new turn -> same error, re-hammering the gateway until idle-kill. The user saw silence. - providers/claude.ts: surface is_error on the result event, and fall back to errors[] for the message text (error subtypes carry no result). - poll-loop.ts: when a result has no <message> blocks and is_error, deliver the notice verbatim to the originating channel and skip the nudge. Verified live (real agent image + SDK, 403 mock): the notice is delivered to the channel and the retry loop is gone. Refs #2751	2026-06-14 12:56:02 +03:00