Merge branch 'main' into container-limits

feat(container): per-container CPU/memory limits (opt-in)
Pass CONTAINER_CPU_LIMIT / CONTAINER_MEMORY_LIMIT through to `docker run` as --cpus / --memory in buildContainerArgs. Both default to empty, so spawn args are byte-identical to today unless an operator opts in — no risk of OOM-ing existing workloads. Caps an agent container's CPU/memory so one agent can't monopolize the host. Swap is a deployment concern (--memory is a hard cap on a swapless host); not managed here. Structural tests assert each flag is pushed and guarded by its env knob, matching the existing buildContainerArgs structural-test convention. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-27 18:34:58 +08:00 · 2026-06-25 21:34:00 +03:00 · 2026-06-25 15:39:16 +03:00 · 2026-06-25 11:27:54 +03:00 · 2026-06-23 15:41:05 +03:00 · 2026-06-23 15:26:50 +03:00
10 changed files with 338 additions and 22 deletions
@@ -246,30 +246,40 @@ If one or more `[BREAKING]` lines are found:
 - For each skill the user selects, invoke it using the Skill tool.
 - After all selected skills complete (or if user chose Skip), proceed to Step 7 (skill updates check).

-# Step 7: Check for skill and channel/provider updates
+# Step 7: Skill updates (part of updating NanoClaw)

-## 7a: Skill branches
-Check if skills are distributed as branches in this repo:
- `git branch -r --list 'upstream/skill/*'`
+Updating your installed skills is **part of** updating NanoClaw, not an optional
+extra. Channel and provider code ships on long-lived branches (`channels`,
+`providers`) that the host merge above doesn't touch — so stopping here leaves
+that code on whatever version you installed, which is how an important upstream
+fix gets silently left behind. The default is to continue into `/update-skills`,
+which re-applies your installed channels/providers to pull their latest code.

-If any `upstream/skill/*` branches exist:
- Use AskUserQuestion to ask: "Upstream has skill branches. Would you like to check for skill updates?"
-  - Option 1: "Yes, check for updates" (description: "Runs /update-skills to check for and apply skill branch updates")
-  - Option 2: "No, skip" (description: "You can run /update-skills later any time")
- If user selects yes, invoke `/update-skills` using the Skill tool.
+Detect whether anything is installed: read `src/channels/index.ts` and
+`src/providers/index.ts`, collecting `import './<name>.js';` lines (excluding
+`cli`).

-## 7b: Channel and provider updates
-Detect installed channels by reading `src/channels/index.ts` and collecting all `import './<name>.js';` lines (excluding `cli`). For providers, check `src/providers/index.ts` the same way.
+- If nothing is installed: skip silently and proceed to Step 7.9.
+- If one or more are installed: continue into skill updates.

-If any channels/providers are installed AND `upstream/channels` or `upstream/providers` branches exist:
- List the installed channels/providers.
- Use AskUserQuestion to ask: "Would you like to update your installed channels/providers? Re-running `/add-<name>` is safe — it only updates code files, credentials and wiring are untouched."
-  - One option per installed channel/provider (e.g., "Update Slack (/add-slack)")
-  - "Skip — I'll update them later"
-  - Set `multiSelect: true`
- For each selected option, invoke the corresponding `/add-<channel>` or `/add-<provider>` skill.
+**Hand-off — default in, minimal opt-out.** Use AskUserQuestion (single-select).
+Name the installed skills in the question so the choice is concrete:
+- Question: "Skill updates are part of this NanoClaw update — your installed
+  channels/providers (<list the detected ones>) ride separate branches the host
+  update didn't touch. Continue into `/update-skills` to bring them up to date?"
+- Option 1 (Recommended): "Continue into skill updates" — description: "Runs
+  `/update-skills`, which re-applies your installed channels/providers to pull
+  their latest upstream code. You pick which ones there."
+- Option 2: "Skip — I'll run `/update-skills` myself later" — description: "Your
+  installed skill code stays as-is and may be behind upstream."

-If no channels/providers are installed, skip silently.
+Keep it to these two options — the per-skill selection lives inside
+`/update-skills`, not here.
+
+- On "Continue": invoke `/update-skills` using the Skill tool. (If the re-apply
+  touches container code, `/update-skills` rebuilds the agent image itself — see
+  its Step 4 — so nothing container-related is owed back here.)
+- On "Skip": note that `/update-skills` can be run anytime, then proceed.

 Proceed to Step 7.9.

@@ -85,6 +85,7 @@ For each selected skill (process one at a time):
 After all selected skills are re-applied:
 - `pnpm run build`
 - `pnpm test` (do not fail the flow if tests are not configured)
+- If the re-apply changed any files under `container/` (`git diff --name-only -- container/` is non-empty), rebuild the agent image so new sessions pick up the new code: `./container/build.sh`. Skill code that lives in the container (e.g. a provider's runtime) keeps running the old image until this is done — the rebuild is what makes the fix live, not the file copy. If nothing under `container/` changed (e.g. only a channel adapter was re-applied), skip it.

 Each channel/provider skill copies in its own registration test; those run as part of `pnpm test` and assert the barrel still registers the adapter against the freshly fetched code.

@@ -4,6 +4,7 @@ All notable changes to NanoClaw will be documented in this file.

 ## [Unreleased]

+- **Optional per-container resource caps.** `CONTAINER_CPU_LIMIT` and `CONTAINER_MEMORY_LIMIT` pass through to `docker run` as `--cpus` / `--memory` (`container-runner.ts`). Both empty by default — no flag added, spawn args byte-identical to today — so existing installs are unaffected. Set them to cap an agent container's CPU/memory so one agent can't monopolize the host (e.g. `CONTAINER_CPU_LIMIT=2`, `CONTAINER_MEMORY_LIMIT=8g`). Swap is intentionally not managed here: `--memory` is a hard cap on a swapless host.
 - [BREAKING] **Chat SDK pinned to `4.29.0` (was `4.26.0` via `^4.24.0`).** `chat` and the `@chat-adapter/*` channel adapters are version-locked — the adapter's `ChatInstance` must match the bridge's, so a mismatched pair fails to typecheck at `createChatSdkBridge(...)`. `chat` is therefore pinned exactly, and the channel-adapter install pins move with it — the `/add-<channel>` SKILL.md steps and `setup/*.sh` scripts on `main`, plus the adapter code on the `channels` branch. Core installs with no channel (only `cli`) are unaffected. **Migration:** if any channel is installed (Slack, Discord, Telegram, Teams, …), re-run its `/add-<channel>` skill to pull the matching `4.29.0` adapter.
 - **Budget/billing-exhausted LLM turns now reach the user instead of being silently dropped.** When a turn ends in a non-retryable provider error (e.g. an Anthropic `403 billing_error`) with no `<message>` wrapping, the agent-runner delivers the provider's notice to the originating channel and stops re-nudging the failing gateway. `providers/claude.ts` now surfaces the SDK's `is_error` flag (and the error subtype's `errors[]` text); `poll-loop.ts` delivers that text and skips the re-wrap retry. Fixes the case where a spend-limit notice produced silence plus a turn-after-turn retry loop.
 - [BREAKING] **`@onecli-sh/sdk` 0.5.0 -> 2.2.1 — requires a OneCLI server with the `/v1` API** (older servers 404 every SDK call). The sanctioned gateway and CLI versions are pinned in `versions.json`. **The gateway is a separate component — updating NanoClaw does not upgrade it for you:** `/update-nanoclaw` upgrades it when the pin moves, otherwise upgrade manually. **Migration:** [docs/onecli-upgrades.md](docs/onecli-upgrades.md).
@@ -341,6 +341,12 @@ export const CONTAINER_IMAGE = process.env.CONTAINER_IMAGE || 'nanoclaw-agent:la
 export const CONTAINER_TIMEOUT = parseInt(process.env.CONTAINER_TIMEOUT || '1800000', 10); // 30min default
 export const IDLE_TIMEOUT = parseInt(process.env.IDLE_TIMEOUT || '1800000', 10); // 30min — keep container alive after last result
 export const MAX_CONCURRENT_CONTAINERS = Math.max(1, parseInt(process.env.MAX_CONCURRENT_CONTAINERS || '5', 10) || 5);
+// Per-container resource caps → `docker run --cpus/--memory`. Empty default =
+// no flag = unbounded (today's behavior). Opt in to bound a fleet sharing one
+// host: CONTAINER_CPU_LIMIT=2, CONTAINER_MEMORY_LIMIT=8g. Swap is a host concern
+// (run the host swapless to make --memory a hard cap); not managed here.
+export const CONTAINER_CPU_LIMIT = process.env.CONTAINER_CPU_LIMIT || '';
+export const CONTAINER_MEMORY_LIMIT = process.env.CONTAINER_MEMORY_LIMIT || '';

 export const TRIGGER_PATTERN = new RegExp(`^@${ASSISTANT_NAME}\\b`, 'i');
 ```
@@ -0,0 +1,138 @@
+import fs from 'fs';
+import os from 'os';
+import path from 'path';
+
+import { afterEach, describe, expect, it, vi } from 'vitest';
+
+import { getLaunchdLabel, getSystemdUnit } from '../src/install-slug.js';
+import { cleanupUnhealthyPeers } from './peer-cleanup.js';
+
+// The reaper deletes config files from ~/Library/LaunchAgents (or the systemd
+// user dir). We point HOME at a throwaway temp dir so real registrations are
+// never touched, and force os.platform() so the launchd/systemd branch runs
+// regardless of the host running the suite. The best-effort unload inside the
+// reaper (launchctl/systemctl) is swallowed when the binary is absent, so these
+// tests are deterministic on both macOS and Linux CI.
+
+function tempHome(): string {
+  return fs.mkdtempSync(path.join(os.tmpdir(), 'peer-cleanup-'));
+}
+
+function writePlist(filePath: string, target: string): void {
+  fs.writeFileSync(
+    filePath,
+    `<?xml version="1.0" encoding="UTF-8"?>
+<plist version="1.0"><dict>
+  <key>ProgramArguments</key>
+  <array><string>/usr/bin/node</string><string>${target}</string></array>
+</dict></plist>`,
+  );
+}
+
+function writeUnit(filePath: string, target: string): void {
+  fs.writeFileSync(filePath, `[Service]\nExecStart=/usr/bin/node ${target}\n`);
+}
+
+const created: string[] = [];
+
+afterEach(() => {
+  vi.restoreAllMocks();
+  for (const dir of created.splice(0)) {
+    fs.rmSync(dir, { recursive: true, force: true });
+  }
+});
+
+describe('cleanupUnhealthyPeers — dead launchd registrations', () => {
+  function setup(): { home: string; agentsDir: string; projectRoot: string } {
+    const home = tempHome();
+    created.push(home);
+    const agentsDir = path.join(home, 'Library', 'LaunchAgents');
+    fs.mkdirSync(agentsDir, { recursive: true });
+    vi.spyOn(os, 'homedir').mockReturnValue(home);
+    vi.spyOn(os, 'platform').mockReturnValue('darwin');
+    return { home, agentsDir, projectRoot: path.join(home, 'install') };
+  }
+
+  it('removes a plist whose target binary is gone', () => {
+    const { agentsDir, projectRoot } = setup();
+    const dead = path.join(agentsDir, 'com.nanoclaw-v2-dead.plist');
+    writePlist(dead, path.join(agentsDir, 'gone', 'dist', 'index.js'));
+
+    const result = cleanupUnhealthyPeers(projectRoot);
+
+    expect(fs.existsSync(dead)).toBe(false);
+    expect(result.removed.map((r) => r.label)).toContain('com.nanoclaw-v2-dead');
+  });
+
+  it('leaves a plist whose target still exists', () => {
+    const { agentsDir, projectRoot } = setup();
+    const liveTarget = path.join(agentsDir, 'live', 'dist', 'index.js');
+    fs.mkdirSync(path.dirname(liveTarget), { recursive: true });
+    fs.writeFileSync(liveTarget, '// host entry');
+    const live = path.join(agentsDir, 'com.nanoclaw-v2-live.plist');
+    writePlist(live, liveTarget);
+
+    const result = cleanupUnhealthyPeers(projectRoot);
+
+    expect(fs.existsSync(live)).toBe(true);
+    expect(result.removed).toHaveLength(0);
+  });
+
+  it("never reaps this install's own plist, even with a missing target", () => {
+    const { agentsDir, projectRoot } = setup();
+    const ownLabel = getLaunchdLabel(projectRoot);
+    const own = path.join(agentsDir, `${ownLabel}.plist`);
+    writePlist(own, path.join(agentsDir, 'gone', 'dist', 'index.js'));
+
+    const result = cleanupUnhealthyPeers(projectRoot);
+
+    expect(fs.existsSync(own)).toBe(true);
+    expect(result.removed).toHaveLength(0);
+  });
+
+  it('ignores an unrecognized plist (no dist/index.js target)', () => {
+    const { agentsDir, projectRoot } = setup();
+    const weird = path.join(agentsDir, 'com.nanoclaw-v2-weird.plist');
+    fs.writeFileSync(weird, '<plist><dict></dict></plist>');
+
+    const result = cleanupUnhealthyPeers(projectRoot);
+
+    expect(fs.existsSync(weird)).toBe(true);
+    expect(result.removed).toHaveLength(0);
+  });
+});
+
+describe('cleanupUnhealthyPeers — dead systemd registrations', () => {
+  function setup(): { unitDir: string; projectRoot: string } {
+    const home = tempHome();
+    created.push(home);
+    const unitDir = path.join(home, '.config', 'systemd', 'user');
+    fs.mkdirSync(unitDir, { recursive: true });
+    vi.spyOn(os, 'homedir').mockReturnValue(home);
+    vi.spyOn(os, 'platform').mockReturnValue('linux');
+    return { unitDir, projectRoot: path.join(home, 'install') };
+  }
+
+  it('removes a unit whose target binary is gone', () => {
+    const { unitDir, projectRoot } = setup();
+    const dead = path.join(unitDir, 'nanoclaw-v2-dead.service');
+    writeUnit(dead, path.join(unitDir, 'gone', 'dist', 'index.js'));
+
+    const result = cleanupUnhealthyPeers(projectRoot);
+
+    expect(fs.existsSync(dead)).toBe(false);
+    expect(result.removed.map((r) => r.label)).toContain('nanoclaw-v2-dead');
+  });
+
+  it("never reaps this install's own unit", () => {
+    const { unitDir, projectRoot } = setup();
+    const ownUnit = getSystemdUnit(projectRoot);
+    const own = path.join(unitDir, `${ownUnit}.service`);
+    writeUnit(own, path.join(unitDir, 'gone', 'dist', 'index.js'));
+
+    const result = cleanupUnhealthyPeers(projectRoot);
+
+    expect(fs.existsSync(own)).toBe(true);
+    expect(result.removed).toHaveLength(0);
+  });
+});
@@ -11,6 +11,14 @@
 *   - launchd: `state != running` AND `runs > UNHEALTHY_RUNS_THRESHOLD`
 *   - systemd: unit is in `failed` state, OR `activating` with many restarts
 *
+ * Separately, a peer registration is "dead" when the program it launches no
+ * longer exists on disk — almost always a deleted test checkout or worktree.
+ * The service manager keeps retrying the missing binary forever, and the
+ * health probes can't see it because an unloaded/inactive job doesn't report
+ * via `launchctl print` / `systemctl show`. Deleting an install's folder
+ * without running the uninstaller leaves these behind, so they accumulate. We
+ * unload and delete the orphaned config file outright.
+ *
 * Healthy peers are left alone — multiple installs can coexist fine now that
 * container-reaper is label-scoped.
 */
@@ -35,6 +43,7 @@ export interface PeerStatus {
 export interface PeerCleanupResult {
  checked: PeerStatus[];
  unloaded: PeerStatus[];
+  removed: Array<{ label: string; configPath: string }>;
  failures: Array<{ label: string; err: string }>;
 }

@@ -50,7 +59,39 @@ export function cleanupUnhealthyPeers(projectRoot: string = process.cwd()): Peer
  if (platform === 'linux') {
    return cleanupSystemdPeers(projectRoot);
  }
-  return { checked: [], unloaded: [], failures: [] };
+  return { checked: [], unloaded: [], removed: [], failures: [] };
+}
+
+/**
+ * Unload a dead peer's job (best-effort) and delete its orphaned config file.
+ * `unload` runs first and may throw harmlessly when the job isn't loaded or the
+ * service-manager binary is absent (e.g. exercising launchd cleanup on Linux).
+ */
+function reapDeadPeer(
+  result: PeerCleanupResult,
+  peer: { label: string; configPath: string },
+  unload: () => void,
+  kind: string,
+  missingTarget: string,
+): void {
+  try {
+    unload();
+  } catch {
+    /* job not loaded — nothing to unload */
+  }
+  try {
+    fs.rmSync(peer.configPath, { force: true });
+    log.info(`Removed dead peer ${kind}`, {
+      label: peer.label,
+      configPath: peer.configPath,
+      missingTarget,
+    });
+    result.removed.push(peer);
+  } catch (err) {
+    const message = err instanceof Error ? err.message : String(err);
+    log.warn(`Failed to remove dead peer ${kind}`, { label: peer.label, err: message });
+    result.failures.push({ label: peer.label, err: message });
+  }
 }

 // ---- launchd (macOS) --------------------------------------------------------
@@ -58,7 +99,7 @@ export function cleanupUnhealthyPeers(projectRoot: string = process.cwd()): Peer
 function cleanupLaunchdPeers(projectRoot: string): PeerCleanupResult {
  const ownLabel = getLaunchdLabel(projectRoot);
  const agentsDir = path.join(os.homedir(), 'Library', 'LaunchAgents');
-  const result: PeerCleanupResult = { checked: [], unloaded: [], failures: [] };
+  const result: PeerCleanupResult = { checked: [], unloaded: [], removed: [], failures: [] };

  let plists: string[];
  try {
@@ -76,6 +117,20 @@ function cleanupLaunchdPeers(projectRoot: string): PeerCleanupResult {
    const label = path.basename(plistPath, '.plist');
    if (label === ownLabel) continue;

+    const missingTarget = deadLaunchdTarget(plistPath);
+    if (missingTarget) {
+      reapDeadPeer(
+        result,
+        { label, configPath: plistPath },
+        // Best-effort unload in case launchd still has it registered; throwing
+        // (not loaded, or launchctl absent off-macOS) is expected and ignored.
+        () => execFileSync('launchctl', ['unload', plistPath], { stdio: 'pipe' }),
+        'launchd plist',
+        missingTarget,
+      );
+      continue;
+    }
+
    const status = probeLaunchdPeer(label, plistPath, uid);
    if (!status) continue;
    result.checked.push(status);
@@ -121,12 +176,32 @@ function probeLaunchdPeer(label: string, plistPath: string, uid: number): PeerSt
  return { label, configPath: plistPath, state, runs, unhealthy };
 }

+/**
+ * Returns the program path a launchd plist launches when that program no longer
+ * exists on disk (a dead registration), or undefined when the plist is
+ * unreadable, has an unrecognized shape, or its target still exists — in which
+ * case the plist must not be touched.
+ */
+function deadLaunchdTarget(plistPath: string): string | undefined {
+  let xml: string;
+  try {
+    xml = fs.readFileSync(plistPath, 'utf-8');
+  } catch {
+    return undefined;
+  }
+  // ProgramArguments is [nodePath, "<projectRoot>/dist/index.js"]; the host
+  // entry point is the stable marker to match on.
+  const target = /<string>([^<]*\/dist\/index\.js)<\/string>/.exec(xml)?.[1];
+  if (!target) return undefined;
+  return fs.existsSync(target) ? undefined : target;
+}
+
 // ---- systemd (Linux) --------------------------------------------------------

 function cleanupSystemdPeers(projectRoot: string): PeerCleanupResult {
  const ownUnit = getSystemdUnit(projectRoot);
  const unitDir = path.join(os.homedir(), '.config', 'systemd', 'user');
-  const result: PeerCleanupResult = { checked: [], unloaded: [], failures: [] };
+  const result: PeerCleanupResult = { checked: [], unloaded: [], removed: [], failures: [] };

  let units: string[];
  try {
@@ -141,6 +216,22 @@ function cleanupSystemdPeers(projectRoot: string): PeerCleanupResult {
  for (const unit of units) {
    if (unit === ownUnit) continue;

+    const unitPath = path.join(unitDir, `${unit}.service`);
+    const missingTarget = deadSystemdTarget(unitPath);
+    if (missingTarget) {
+      reapDeadPeer(
+        result,
+        { label: unit, configPath: unitPath },
+        () => {
+          execFileSync('systemctl', ['--user', 'disable', '--now', `${unit}.service`], { stdio: 'pipe' });
+          execFileSync('systemctl', ['--user', 'daemon-reload'], { stdio: 'pipe' });
+        },
+        'systemd unit',
+        missingTarget,
+      );
+      continue;
+    }
+
    const status = probeSystemdPeer(unit);
    if (!status) continue;
    result.checked.push(status);
@@ -184,3 +275,21 @@ function probeSystemdPeer(unit: string): PeerStatus | null {
    return null;
  }
 }
+
+/**
+ * Returns the program path a systemd unit launches when that program no longer
+ * exists on disk (a dead registration), or undefined when the unit is
+ * unreadable, has an unrecognized shape, or its target still exists.
+ */
+function deadSystemdTarget(unitPath: string): string | undefined {
+  let unit: string;
+  try {
+    unit = fs.readFileSync(unitPath, 'utf-8');
+  } catch {
+    return undefined;
+  }
+  // ExecStart=<nodePath> <projectRoot>/dist/index.js
+  const target = /^ExecStart=\S+\s+(\S+\/dist\/index\.js)\s*$/m.exec(unit)?.[1];
+  if (!target) return undefined;
+  return fs.existsSync(target) ? undefined : target;
+}
@@ -72,6 +72,12 @@ export async function run(_args: string[]): Promise<void> {
      labels: peerReport.unloaded.map((p) => p.label),
    });
  }
+  if (peerReport.removed.length > 0) {
+    log.warn('Removed dead peer NanoClaw registrations (target binary missing)', {
+      count: peerReport.removed.length,
+      labels: peerReport.removed.map((p) => p.label),
+    });
+  }

  if (platform === 'macos') {
    setupLaunchd(projectRoot, nodePath, homeDir);
@@ -38,6 +38,11 @@ export const ONECLI_API_KEY = process.env.ONECLI_API_KEY || envConfig.ONECLI_API
 export const MAX_MESSAGES_PER_PROMPT = Math.max(1, parseInt(process.env.MAX_MESSAGES_PER_PROMPT || '10', 10) || 10);
 export const IDLE_TIMEOUT = parseInt(process.env.IDLE_TIMEOUT || '1800000', 10); // 30min default — how long to keep container alive after last result
 export const MAX_CONCURRENT_CONTAINERS = Math.max(1, parseInt(process.env.MAX_CONCURRENT_CONTAINERS || '5', 10) || 5);
+// Per-container resource caps, passed through to `docker run`. Default empty =
+// no flag added = today's unbounded behavior (don't OOM existing OSS workloads).
+// Operators opt in: CONTAINER_CPU_LIMIT=2, CONTAINER_MEMORY_LIMIT=8g.
+export const CONTAINER_CPU_LIMIT = process.env.CONTAINER_CPU_LIMIT || '';
+export const CONTAINER_MEMORY_LIMIT = process.env.CONTAINER_MEMORY_LIMIT || '';

 function escapeRegex(str: string): string {
  return str.replace(/[.*+?^${}()|[\]\\]/g, '\\$&');
@@ -47,6 +47,37 @@ describe('buildContainerArgs ordering invariant (structural)', () => {
  });
 });

+describe('per-container resource limits (structural)', () => {
+  // CONTAINER_CPU_LIMIT / CONTAINER_MEMORY_LIMIT pass through to `docker run` as
+  // --cpus / --memory, but only when set. The default is empty string → no flag →
+  // today's unbounded behavior (don't OOM existing OSS workloads). Swap is not
+  // managed here (a swapless host makes --memory a hard cap). buildContainerArgs
+  // needs a live gateway to drive, so guard the wiring structurally: the flags
+  // must be pushed, and each must be guarded by its env knob so empty emits nothing.
+  it('reads both limit knobs from config', () => {
+    const src = fs.readFileSync(path.join(process.cwd(), 'src', 'container-runner.ts'), 'utf-8');
+    expect(src).toContain('CONTAINER_CPU_LIMIT');
+    expect(src).toContain('CONTAINER_MEMORY_LIMIT');
+  });
+
+  it('guards --cpus behind a truthy CONTAINER_CPU_LIMIT', () => {
+    const src = fs.readFileSync(path.join(process.cwd(), 'src', 'container-runner.ts'), 'utf-8');
+    expect(src).toMatch(/if \(CONTAINER_CPU_LIMIT\)[\s\S]*?args\.push\('--cpus', CONTAINER_CPU_LIMIT\)/);
+  });
+
+  it('guards --memory behind a truthy CONTAINER_MEMORY_LIMIT (and sets no swap flag)', () => {
+    const src = fs.readFileSync(path.join(process.cwd(), 'src', 'container-runner.ts'), 'utf-8');
+    expect(src).toMatch(/if \(CONTAINER_MEMORY_LIMIT\) args\.push\('--memory', CONTAINER_MEMORY_LIMIT\)/);
+    expect(src).not.toContain('--memory-swap');
+  });
+
+  it('defaults both knobs to empty string in config (no flag = unbounded)', () => {
+    const cfg = fs.readFileSync(path.join(process.cwd(), 'src', 'config.ts'), 'utf-8');
+    expect(cfg).toContain("CONTAINER_CPU_LIMIT = process.env.CONTAINER_CPU_LIMIT || ''");
+    expect(cfg).toContain("CONTAINER_MEMORY_LIMIT = process.env.CONTAINER_MEMORY_LIMIT || ''");
+  });
+});
+
 describe('container boot-failure tripwire (structural)', () => {
  // A container that dies at boot (unknown provider, missing CLI binary, bad
  // config) explains itself only on stderr — which logs at debug, below the
@@ -10,9 +10,11 @@ import path from 'path';
 import { OneCLI } from '@onecli-sh/sdk';

 import {
+  CONTAINER_CPU_LIMIT,
  CONTAINER_IMAGE,
  CONTAINER_IMAGE_BASE,
  CONTAINER_INSTALL_LABEL,
+  CONTAINER_MEMORY_LIMIT,
  DATA_DIR,
  GROUPS_DIR,
  ONECLI_API_KEY,
@@ -434,6 +436,13 @@ async function buildContainerArgs(
 ): Promise<string[]> {
  const args: string[] = ['run', '--rm', '--name', containerName, '--label', CONTAINER_INSTALL_LABEL];

+  // Per-container resource caps (opt-in; empty = unbounded, today's behavior).
+  // Only --memory is set. Whether that's a hard cap depends on the host having no
+  // swap (a deployment concern) — on a swapless host --memory is hard and a runaway
+  // is OOM-killed; we don't manage swap from here.
+  if (CONTAINER_CPU_LIMIT) args.push('--cpus', CONTAINER_CPU_LIMIT);
+  if (CONTAINER_MEMORY_LIMIT) args.push('--memory', CONTAINER_MEMORY_LIMIT);
+
  // Environment — only vars read by code we don't own.
  // Everything NanoClaw-specific is in container.json (read by runner at startup).
  args.push('-e', `TZ=${TIMEZONE}`);
Author	SHA1	Message	Date
gavrielc	8d3eca7027	Merge branch 'main' into container-limits	2026-06-25 21:34:00 +03:00
Omri Maya	1d6bba4d3f	feat(container): per-container CPU/memory limits (opt-in) Pass CONTAINER_CPU_LIMIT / CONTAINER_MEMORY_LIMIT through to `docker run` as --cpus / --memory in buildContainerArgs. Both default to empty, so spawn args are byte-identical to today unless an operator opts in — no risk of OOM-ing existing workloads. Caps an agent container's CPU/memory so one agent can't monopolize the host. Swap is a deployment concern (--memory is a hard cap on a swapless host); not managed here. Structural tests assert each flag is pushed and guarded by its env knob, matching the existing buildContainerArgs structural-test convention. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-25 15:39:16 +03:00
amit-shafnir	9bb69c0e50	Merge pull request #2830 from amit-shafnir/fix/peer-dead-plist-reaper fix(setup): reap dead peer service registrations whose binary is gone	2026-06-25 11:27:54 +03:00
gavrielc	add6145f1c	Merge pull request #2826 from nanocoai/fix/skill-updates-nudge-and-container-rebuild fix(update-skills): nudge into skill updates, rebuild container on re-apply	2026-06-23 15:41:05 +03:00
gavrielc	4e14d08173	Merge pull request #2834 from nanocoai/chore/bump-chat-sdk-4.29.0 chore(deps): move chat SDK + channel-adapter pins to 4.29.0	2026-06-23 15:26:50 +03:00
Amit Shafnir	15292ae76c	fix(setup): reap dead peer service registrations whose binary is gone The setup preflight unloads crash-looping peers but ignores a more common leftover: a launchd plist (or systemd unit) whose program no longer exists, left behind when a NanoClaw checkout is deleted without running the uninstaller. The health probe can't see these because an unloaded/inactive job doesn't report via `launchctl print` / `systemctl show`, so they accumulate — the OS keeps retrying a missing binary forever. Detect a registration as dead when its `dist/index.js` target is absent on disk, then unload (best-effort) and delete the orphaned config file. Own-label and still-valid registrations are never touched. Adds peer-cleanup.test.ts (the file previously had no tests) covering both platforms: dead target removed, live target kept, own registration spared, unrecognized config ignored. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-21 22:58:51 +03:00
Koshkoshinsk	055cf49bd5	fix(update-skills): nudge into skill updates, rebuild container on re-apply /update-nanoclaw Step 7 framed skill updates as an optional, "safe to skip" extra, so an important channel/provider fix — shipped on the channels/providers branches the host merge never touches — could be silently missed. Reframe it as part of the update: default into /update-skills, name the installed skills, and leave one minimal opt-out. Move the container image rebuild into /update-skills Step 4: when a re-apply changes files under container/ (e.g. a provider's runtime), rebuild so new sessions actually run the new code. Living in update-skills covers both the standalone and via-update-nanoclaw paths; the update-nanoclaw Step 7.5 that briefly owned this is removed. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01YHaa6bp25E62AuUJyW1V5J	2026-06-21 17:08:51 +03:00