Files
nanoclaw/container/Dockerfile
T
Chip Tonkin 1e7cb8b8c8 feat(providers): add codex provider via app-server JSON-RPC
Adds `codex` as a third AgentProvider alongside `claude` and `opencode`.

## Goal

Bring Codex up to the same feature bar as the Claude Agent SDK integration:
persistent sessions, streaming output, MCP tool access, and native
compaction — all driven through the AgentProvider interface without
polluting the poll-loop with per-SDK quirks.

## Why not the @openai/codex-sdk

The SDK-wrapping approach (sketched in agent-runner-details.md) was
tried first but fell short of Claude Agent SDK's feature set on
several fronts. The `codex app-server` JSON-RPC protocol closes most
of the gap:

| Capability                  | Claude SDK | Codex SDK | Codex app-server |
|-----------------------------|:----------:|:---------:|:----------------:|
| Session resume              |     ✓      |     ✓     |        ✓         |
| Streaming event output      |     ✓      |     ✓     |        ✓         |
| Mid-turn input push         |     ✓      |     ✗     | queue + drain*   |
| MCP servers                 |     ✓      |  config   |     config       |
| Approval hooks              |    hooks   |  limited  |  server reqs     |
| Native compaction trigger   |    auto    |     ✗     |        ✓         |
| Multi-turn subagents        |    Task    |     ✗     |   spawnAgent     |
| Config overrides            |     API    |     JS    |   -c flags       |

*Codex turns don't accept mid-turn input. The provider queues push()
 and drains between turns — matches the opencode provider pattern and
 the poll-loop doesn't dispatch new messages mid-turn anyway.

## Known shortcomings (both tractable in follow-ups)

**No native file/web tools.** Claude ships Read/Write/Edit/Glob/Grep/
WebFetch/WebSearch in-SDK. Codex (both SDK and app-server) leaves the
agent to shell out via bash — slower, messier output. Mitigation: a
follow-up can expose these as MCP tools on the container side, reusing
the existing `send_message` / `send_file` tool plumbing.

**Higher tool-call latency.** Every tool call round-trips through
JSON-RPC over stdio (~10-50ms per hop vs Claude's in-process dispatch).
Mitigation: batch MCP calls, pipeline commands — same playbook as
opencode.

Neither blocks feature parity for the common conversational + MCP
cases this PR targets.

## Files

- container/agent-runner/src/providers/codex-app-server.ts
    JSON-RPC transport: spawn, request/response, notification dispatch,
    auto-approval, thread lifecycle, MCP config.toml writer. Kept
    separate so it can be unit-tested without the full provider.
- container/agent-runner/src/providers/codex.ts
    CodexProvider implementing AgentProvider. Emits `activity` on every
    notification so idle timer stays honest. isSessionInvalid matches
    on stale-thread-ID errors for clean recovery.
- container/agent-runner/src/providers/codex.factory.test.ts
    Factory + isSessionInvalid + supportsNativeSlashCommands coverage.
- src/providers/codex.ts
    Host-side container contribution. Per-session ~/.codex copy with
    auth.json copied from host, so the container rewrites config.toml
    on every wake without touching the host's. Forwards OPENAI_API_KEY
    / CODEX_MODEL / OPENAI_BASE_URL.
- container/Dockerfile
    Pins CODEX_VERSION=0.121.0, adds @openai/codex to the pnpm global
    install block alongside claude-code/agent-browser/vercel.

## Validation

Smoke-tested end-to-end against real codex-cli 0.118 on macOS: observed
init → activity → progress → result events with a stable thread ID as
continuation, result text matched expected output. Unit tests: 26 pass
in agent-runner, 137 in host. Typecheck + prettier clean on all added
files.
2026-04-19 10:30:47 -04:00

116 lines
4.4 KiB
Docker

# syntax=docker/dockerfile:1.7
# NanoClaw Agent Container
# Runs Claude Agent SDK in isolated Linux VM with browser automation.
#
# Runtime split:
# - agent-runner (our TypeScript code): Bun
# - globally-installed Node CLIs (claude-code, agent-browser, vercel): pnpm + Node
FROM node:22-slim
# ---- Build-time arguments ----------------------------------------------------
# CJK fonts add ~200MB. Opt in only if you render Chinese/Japanese/Korean text.
ARG INSTALL_CJK_FONTS=false
# Pin CLI versions for reproducibility. Bump deliberately — unpinned installs
# mean every rebuild silently picks up the latest and can break in lockstep
# across all users.
ARG CLAUDE_CODE_VERSION=2.1.112
ARG CODEX_VERSION=0.121.0
ARG AGENT_BROWSER_VERSION=latest
ARG VERCEL_VERSION=latest
ARG BUN_VERSION=1.3.12
# ---- System dependencies -----------------------------------------------------
# tini: correct PID 1 / signal forwarding so outbound.db writes finalize on
# SIGTERM instead of being orphaned by the shell entrypoint.
RUN --mount=type=cache,target=/var/cache/apt,sharing=locked \
--mount=type=cache,target=/var/lib/apt,sharing=locked \
apt-get update && apt-get install -y --no-install-recommends \
chromium \
fonts-liberation \
fonts-noto-color-emoji \
libgbm1 \
libnss3 \
libatk-bridge2.0-0 \
libgtk-3-0 \
libx11-xcb1 \
libxcomposite1 \
libxdamage1 \
libxrandr2 \
libasound2 \
libpangocairo-1.0-0 \
libcups2 \
libdrm2 \
libxshmfence1 \
ca-certificates \
curl \
git \
tini \
unzip \
&& if [ "$INSTALL_CJK_FONTS" = "true" ]; then \
apt-get install -y --no-install-recommends fonts-noto-cjk; \
fi \
&& rm -rf /var/lib/apt/lists/*
# Chromium path for agent-browser / Playwright consumers
ENV AGENT_BROWSER_EXECUTABLE_PATH=/usr/bin/chromium
ENV PLAYWRIGHT_CHROMIUM_EXECUTABLE_PATH=/usr/bin/chromium
# Belt-and-braces: prevent Playwright's postinstall from downloading its own
# ~300MB Chromium. We've already installed the system one above.
ENV PLAYWRIGHT_SKIP_BROWSER_DOWNLOAD=1
# ---- Bun runtime -------------------------------------------------------------
# Install via the official script (handles multi-arch detection), then move
# the binary to /usr/local/bin so the non-root `node` user can execute it.
RUN curl -fsSL https://bun.sh/install | bash -s "bun-v${BUN_VERSION}" && \
install -m 0755 /root/.bun/bin/bun /usr/local/bin/bun && \
rm -rf /root/.bun
# ---- pnpm + global Node CLIs -------------------------------------------------
ENV PNPM_HOME="/pnpm"
ENV PATH="$PNPM_HOME:$PATH"
RUN corepack enable
# agent-browser has a postinstall build script — pnpm skips these by default.
# Allowlist it via .npmrc so the install doesn't silently produce a broken
# package. Pinned versions so every rebuild is reproducible.
RUN --mount=type=cache,target=/root/.cache/pnpm \
echo "only-built-dependencies[]=agent-browser" > /root/.npmrc && \
pnpm install -g \
"@anthropic-ai/claude-code@${CLAUDE_CODE_VERSION}" \
"@openai/codex@${CODEX_VERSION}" \
"agent-browser@${AGENT_BROWSER_VERSION}" \
"vercel@${VERCEL_VERSION}"
# ---- agent-runner ------------------------------------------------------------
WORKDIR /app
# Copy manifest + lockfile first so the install layer caches independently of
# source edits.
COPY agent-runner/package.json agent-runner/bun.lock ./
RUN --mount=type=cache,target=/root/.bun/install/cache \
bun install --frozen-lockfile
# Source. Bun runs TS directly — no tsc build step. The host remounts this
# path at runtime via `src/container-runner.ts` so source edits on the host
# take effect without rebuilding the image; the baked copy is the fallback.
COPY agent-runner/ ./
# ---- Entrypoint --------------------------------------------------------------
COPY entrypoint.sh /app/entrypoint.sh
RUN chmod +x /app/entrypoint.sh
# ---- Workspace + permissions -------------------------------------------------
RUN mkdir -p /workspace/group /workspace/global /workspace/extra && \
chown -R node:node /workspace && \
chmod 755 /home/node
USER node
WORKDIR /workspace/group
# tini is PID 1, reaps zombies, forwards signals cleanly. entrypoint.sh does
# `exec bun ...` so bun runs as tini's direct child.
ENTRYPOINT ["/usr/bin/tini", "--", "/app/entrypoint.sh"]