---
name: braiin
description: Guide to integrating the `braiin` npm library (TypeScript LLM orchestrator). Covers the three primitives (Tool, Agent, Orchestrator), the 4-action LLM protocol (describe / call / finish / abort), OrchestratorConfig options, task result shape, toolTraces persistence, streaming the final answer via the onToken callback, both backends (OpenAI-compatible HTTP API and local Claude Code CLI), and best practices. Use when writing code that imports `braiin`, designing tools/agents, streaming responses, or integrating an LLM orchestration layer.
---

# braiin

## What it is

`braiin` (Behavioral Reasoning AI for Intelligent Navigation) is a TypeScript orchestrator that lets an LLM route between specialized **agents** and their **tools** to accomplish a task. Two backends are supported:

- **OpenAI-compatible HTTP API** (default) — works with any provider exposing `/chat/completions`: OpenAI, Anthropic (compat), OpenRouter, Together, Groq, Ollama, vLLM, Azure, etc.
- **Local Claude Code CLI** — spawns the `claude` binary as a child process; the CLI maintains its own session memory so `history` does not need to be re-sent every turn.

- Package: `braiin` on npm. Peer dependency: `openai ^6.0.0`. For the Claude Code backend, `claude` must be installed and on `PATH`.
- The orchestrator and the LLM exchange a strict JSON protocol; free-form text replies are rejected.
- Source-of-truth README: the library itself is small (~300 LOC core); when in doubt, read `node_modules/braiin/dist` types.

## Mental model

Three primitives, nested:

```
Orchestrator ── has ──► Agent[] ── has ──► Tool[]
```

- **Tool**: one atomic capability (e.g. `read-file`, `fetch-user`). Has a `tag`, description, input schema (string or structured), and a `call(input)` that returns a string.
- **Agent**: a themed bundle of tools (e.g. "file-agent", "user-agent") with a name and description.
- **Orchestrator**: owns the agents and runs the reasoning loop. `executeTask(prompt)` returns a `TaskResult`.

## Public API

```ts
import {
  createAgent,
  createOrchestrator,
  Tool,
  Agent,
  Orchestrator,
  TaskResult,
  ToolTrace,
  LLMMessage,
  LLMMessageRole
} from 'braiin'
```

`createAgent` and `createOrchestrator` are the only factories exported. Types are exported for annotation, not runtime use.

## Defining a Tool

### Simple string input

```ts
import { Tool } from 'braiin'

export const userRetrieverTool: Tool = {
  tag: 'user-retriever',
  description: "Retrieve a user's information from its name",
  input: "The user's name",
  output: "A JSON object with the user's info, or empty string if not found",
  call: async (userName) => {
    const user = users.find(u => u.name === userName)
    return user ? JSON.stringify(user) : ''
  }
}
```

### Structured object input

```ts
export const writeFileTool: Tool = {
  tag: 'write-file',
  description: 'Write content to a file',
  input: [
    { name: 'path',    description: 'File path',    required: true },
    { name: 'content', description: 'File content', required: true }
  ],
  output: 'Confirmation message',
  call: async (input) => {
    const { path, content } = input as Record<string, string>
    await fs.writeFile(path, content)
    return `Wrote ${content.length} chars to ${path}`
  }
}
```

**Rules for Tool authors:**
- `call` MUST return a `string`. Serialize objects with `JSON.stringify`.
- `call` MAY throw — the orchestrator catches and returns `status:'error'`.
- `tag` must be unique **per agent** (the lookup key).
- `description`, `input`, `output` are all read by the LLM — write them as short instructions, not prose.
- When `input` is an array, the LLM receives an object `{ [name]: value }`. When it's a string, `input` arrives as a plain string.

## Creating an Agent

```ts
import { createAgent } from 'braiin'

const userAgent = createAgent(
  'user-agent',
  'You are a useful assistant that answers questions about users.',
  [userRetrieverTool, userBirthYearTool]
)
```

Agent names must be unique across the orchestrator. Pick kebab-case names that the LLM can recognize without ambiguity.

## Creating the Orchestrator

```ts
import { createOrchestrator } from 'braiin'

const orchestrator = createOrchestrator(
  [userAgent, fileAgent],
  {
    apiKey: process.env.OPENAI_API_KEY!,
    model: 'gpt-4o',
    temperature: 0
  }
)

const result = await orchestrator.executeTask('When was User 1 born?')
```

## OrchestratorConfig — all options

| Option | Default | When to set |
| --- | --- | --- |
| `backend` | `'openai'` | `'openai'` for OpenAI-compatible HTTP API, `'claude-code'` for the local Claude Code CLI. |
| `optionalPrompt` | `undefined` | Extra instructions appended to the system prompt. For tone, domain facts, output constraints, etc. |
| `maxSteps` | `50` | Hard cap on reasoning iterations. Lower it aggressively (5–15) for bounded tasks to fail fast. |
| `stepsInterval` | `undefined` | ms to wait between steps. Useful for rate-limited providers. |
| `timeoutMs` | `60000` | Per-LLM-call timeout. Combined with `signal` via `AbortSignal.any`. |
| `signal` | `undefined` | User-provided `AbortSignal` to cancel in-flight calls. |
| `llmService` | *internal* | Inject a mock `LLMService` for testing — see "Testing". |
| **OpenAI backend** | | (used when `backend` is `'openai'` or omitted) |
| `apiKey` | *required* | API key for the provider. |
| `model` | `'gpt-4o'` | Model ID. Anything the endpoint accepts. |
| `serverUrl` | `'https://api.openai.com/v1'` | Any OpenAI-compatible base URL. |
| `temperature` | `0` | Keep at `0` for reliable protocol adherence. Raise only for creative final answers. |
| `maxTokens` | `8192` | Per-call cap on completion tokens. |
| `enablePromptCaching` | `false` | Enable when the system prompt is large. See "Prompt caching" below. |
| `enforceJsonOutput` | `false` | Passes `response_format: { type: 'json_object' }` to the API. Forces the LLM to emit strict JSON (no prose preamble). Supported by OpenAI, Anthropic (compat), OpenRouter, Groq. **Stays active during streaming** (`onToken`) — the answer is streamed out of the JSON `answer` field, so enforcement and live streaming work together. See "Enforcing JSON output" below. |
| **Claude Code backend** | | (used when `backend` is `'claude-code'`) |
| `sessionId` | *required* | UUID identifying the Claude Code session. Create-or-resume: if the UUID is unknown, a new session is created with that exact ID; if it exists, the session resumes silently. |
| `cliPath` | `'claude'` | Path to the `claude` binary. Override only if not on `PATH`. |

## Local Claude Code backend

Set `backend: 'claude-code'` and pass a `sessionId` (UUID) to use the locally-installed `claude` CLI as the LLM backend instead of an HTTP API:

```ts
import { randomUUID } from 'node:crypto'

const orchestrator = createOrchestrator(
  [userAgent],
  {
    backend: 'claude-code',
    sessionId: randomUUID()
  }
)
```

Each `executeTask` call spawns `claude -p --session-id <uuid> --system-prompt <braiin-prompt> --tools "" --output-format json <prompt>` as a child process and parses the JSON result.

Key properties of this backend:

- **Session memory is owned by the CLI.** The `history` argument passed to `LLMService.ask` is intentionally ignored — Claude Code maintains the conversation transcript on disk under the given session ID. This saves tokens (no history re-injection) and means the same `sessionId` reused across orchestrators picks up where the previous task left off.
- **Tool use is fully disabled** for security (`--tools ""`). Claude Code acts as a pure LLM; the only tool execution path is through your BRAIIN tools.
- **No true token streaming.** The CLI returns the full result at once. Both callbacks still fire but in a single emit (plus the `[[END]]` marker): `logCallback` gets the whole step response, and `onToken` gets the whole final answer at the end — not incremental chunks.
- **`enforceJsonOutput` does not apply.** The CLI has no `response_format` equivalent. The strict JSON discipline relies entirely on the BRAIIN system prompt and the tolerant `extractJson` parser.
- **`enablePromptCaching` does not apply.** Claude Code already caches its session prefix internally.
- **Errors map to `LLMResponse.error`** (not thrown): missing CLI binary, non-zero exit, invalid JSON, timeout/abort. The orchestrator surfaces them as `TaskResult { status: 'error', answer: '...' }` like the HTTP backend.
- **Cancellation**: `signal` and `timeoutMs` work — they kill the spawned process with SIGTERM. The session state is persisted before exit, so the next call with the same `sessionId` resumes cleanly.

When to use it:
- You want to avoid managing API keys and rate limits during local development.
- The user already has a Claude Code subscription and you don't want to bill API tokens for orchestration.
- You want long-running multi-turn conversations where re-injecting `history` would be wasteful.

When not to use it:
- Production / multi-tenant environments where you cannot guarantee `claude` is on `PATH` everywhere.
- Throughput-sensitive workloads — child-process spawn per turn is significantly slower than an HTTP request.
- Cases where `enforceJsonOutput` is critical (very small/old models that hallucinate prose preambles).

## The LLM protocol (4 actions)

The orchestrator and the LLM exchange one JSON object per turn. Every reply MUST include an `action` field.

```jsonc
// 1. Ask an agent for the schemas of its tools
{ "action": "describe", "agent": "user-agent" }

// 2. Call one of an agent's tools
{ "action": "call", "agent": "user-agent", "tool": "user-retriever", "input": "User 1" }
// or with structured input:
{ "action": "call", "agent": "file-agent", "tool": "write-file",
  "input": { "path": "/tmp/x", "content": "..." } }

// 3. Return the final answer
{ "action": "finish", "answer": "User 1 was born in 2001." }

// 4. Give up with a reason
{ "action": "abort", "reason": "Cannot find any tool that returns birth dates." }
```

A typical chain: `describe` → `call` → (more calls) → `finish`. The orchestrator also accepts **legacy shapes** (`{"tool":"finished","input":"..."}`, `{"tool":"none","input":"..."}`, `{"agent":"...","tool":"...","input":"..."}`) for backward compatibility, but new integrations should emit the canonical `action` form.

The parser (`extractJson`) tolerates markdown fences and surrounding prose — the LLM does not have to emit pure JSON, but it must contain a valid balanced JSON object somewhere.

**Streaming the `finish` answer.** The `finish` action is always plain JSON — `{"action":"finish","answer":"..."}` — whether or not you stream. When an `onToken` callback is passed to `executeTask` (see "Observability & streaming"), the orchestrator streams the **value of the `answer` field** as the model writes it, decoding JSON string escapes on the fly. No special wire format and no marker: the protocol is identical to the non-streaming case, so `enforceJsonOutput` can stay on.

## TaskResult

```ts
interface TaskResult {
  status: 'success' | 'error'
  answer: string              // the final answer OR the error message
  toolTraces: ToolTrace[]     // every tool call made during this task
}

interface ToolTrace {
  tool: string                        // tool tag
  input: string | Record<string, any> // whatever was passed
  result: string                      // whatever `call` returned
}
```

**Always check `status` before using `answer`** — an error message lives in the same field.

## Persistence via toolTraces (follow-up questions)

Pass prior `toolTraces` to re-use their results without re-fetching:

```ts
const first = await orchestrator.executeTask("Tell me about User 1's wife")
// first.toolTraces holds ONLY the traces produced by this call

const second = await orchestrator.executeTask(
  'Has User 1 been married before?',
  [],                  // history (conversation-level LLM messages, optional)
  first.toolTraces     // inject as known context (re-used, not re-fetched)
)
// second.toolTraces holds ONLY the second call's traces.
// Accumulate across turns yourself if you need a running history:
const allTraces = [...first.toolTraces, ...second.toolTraces]
```

`TaskResult.toolTraces` contains **only the traces produced during that call** (since v0.5.0). The `toolTraces` you pass in are injected as a synthetic `system` message (`"Known context from previous interactions: ..."`) so the model reuses their results instead of re-fetching — but they are **not echoed back** into the next result. The caller owns accumulation. (Before v0.5.0 the passed traces were prepended to the returned `toolTraces`, conflating "context in" with "produced out".)

The `history` parameter is distinct: it is a raw `LLMMessage[]` of prior user/assistant exchanges for multi-turn conversations.

## Observability & streaming

`executeTask` takes two **independent** callbacks with different jobs — pass either, both, or neither:

```ts
await orchestrator.executeTask(
  prompt,
  history,      // LLMMessage[]  (optional)
  toolTraces,   // ToolTrace[]   (optional)
  logCallback,  // (log: string)   => void — debug: raw protocol, every step
  onToken       // (token: string) => void — clean final answer, streamed live
)
```

### `logCallback` (4th arg) — debug stream

Providing `logCallback` switches the LLM call to **streaming mode**. It receives the **raw response of every step** — `describe`, `call`, *and* `finish` — chunk by chunk (plus a `[[END]]` marker per step). This is the *whole JSON protocol*, not a clean answer: use it for tracing/debugging, not for showing to an end user. The orchestrator still receives a full `LLMResponse` once each step's stream closes.

### `onToken` (5th arg) — final-answer stream

Provide `onToken` to stream **only the final answer**, token by token, exactly as the model writes it — no JSON, no intermediate `describe`/`call` steps. This is what you wire to a chat UI.

```ts
await orchestrator.executeTask('Explain X', [], [], undefined, (t) => process.stdout.write(t))
```

It is **opt-in and costs no extra LLM call**. When `onToken` is set, the orchestrator runs a stateful filter over the streamed deltas that parses the JSON incrementally and forwards **only the characters inside the top-level `answer` field** (escapes decoded on the fly). The `describe`/`call`/`abort` steps have no `answer` key, so they emit nothing to `onToken` — the filter is self-gating across the whole chain. The only buffered part is the tiny `{"action":"finish","answer":"` envelope prefix.

Notes:
- The same final answer is still returned in `TaskResult.answer` after the chain completes — `onToken` is purely additive.
- `onToken` is **fully compatible with `enforceJsonOutput`** (and they are recommended together): the model keeps emitting strict JSON, and only the `answer` value is streamed. Protocol JSON and model reasoning can never leak into the stream.
- On the **Claude Code backend** (non-streaming), `onToken` still fires but delivers the answer in a **single emit** at the end rather than token-by-token. Same for `logCallback`.

## Enforcing JSON output (`enforceJsonOutput`)

Some LLMs occasionally prepend prose to their JSON reply ("Sure, to answer your question, I'll first… {…}"). The orchestrator's parser tolerates this, but the preamble still leaks to `logCallback` / any UI that surfaces raw responses.

Turn on `enforceJsonOutput` and the orchestrator passes `response_format: { type: 'json_object' }` to the API. The provider will refuse to emit anything that isn't a valid JSON object — no preamble possible. It applies on every step, **including while streaming** (`onToken`): the answer is read out of the JSON `answer` field, so enforcement and live streaming compose cleanly.

Support per provider:
- **OpenAI**: native, stable on `gpt-4o`, `gpt-4o-mini`, `gpt-4-turbo`, `gpt-3.5-turbo-0125+`.
- **Anthropic (OpenAI-compat endpoint)**: accepted, mapped to their JSON mode on Claude 3.5+.
- **OpenRouter, Groq, Together**: forward the flag to the underlying model when supported.
- **Local/old providers (vLLM, Ollama, older endpoints)**: may reject the unknown field — keep the flag off in that case.

Caveat: providers require the word "JSON" to appear somewhere in the prompt. BRAIIN's system prompt already mentions it repeatedly, so this is satisfied.

## Prompt caching (`enablePromptCaching`)

The system prompt (agent descriptions + protocol) is identical across every step of a chain. Turn on `enablePromptCaching` and the orchestrator will send the system message as a content-block with `cache_control: { type: 'ephemeral' }`:

- **Anthropic (native or OpenAI-compat)**: reads the marker and caches the prefix when ≥1024 tokens (2048 on Haiku).
- **OpenAI**: ignores the marker but does automatic prefix caching at the same threshold — same savings, no side-effect.
- **Other OpenAI-compat providers**: the marker is usually ignored silently.

Rule of thumb: enable it whenever your agents/tools descriptions push the system prompt over ~1k tokens or when chains routinely exceed 5 steps.

## Best practices

- **Start `temperature` at `0`.** The protocol is strict; creativity hurts it. Raise only if the final answer needs flair.
- **Lower `maxSteps` aggressively.** The default `50` is a ceiling. Use `10` or less for focused tasks — errors surface faster.
- **Make tool `description`/`input`/`output` LLM-facing, not dev-facing.** These strings ARE the prompt. Specify exact input shape, return shape (JSON? plain text?), and failure modes.
- **Keep tool outputs short.** The full output goes back into the history on the next step. Summarize or truncate oversized payloads (e.g. paginate, return counts + samples).
- **One agent per domain.** Avoid one mega-agent with 30 tools — narrow bundles help the LLM pick correctly.
- **Unique tool tags per agent.** Duplicates across different agents are fine; duplicates inside one agent are a bug.
- **Always handle `status === 'error'`.** Never assume success.
- **For long-running tasks, wire `signal`** (via `AbortController`) to cancellation UI.
- **Use `optionalPrompt` for global constraints.** Output language, forbidden topics, JSON schemas for the final answer, etc.
- **Point `serverUrl` at any OpenAI-compatible endpoint** — don't pull in a second SDK for Anthropic, OpenRouter, Ollama, vLLM, etc.
- **Stream the final answer to UIs with `onToken`** (5th arg), not `logCallback`. `logCallback` is the raw multi-step protocol for debugging; `onToken` is the clean final answer.

## Testing

Inject a mock `LLMService` via `config.llmService` to bypass the real API:

```ts
import { LLMService } from 'braiin/dist/service/llm.service'   // internal path
// or re-create the interface locally:
interface MockLLMService {
  ask: (sys: string, prompt: string, history: any[]) => Promise<any>
}

const mockLLM: MockLLMService = {
  ask: async () => ({
    id: 't', object: 'chat.completion', created: 0, model: 'mock',
    choices: [{ index: 0, message: { role: 'assistant',
      content: '{"action":"finish","answer":"ok"}' }, finish_reason: 'stop' }]
  })
}

const o = createOrchestrator([agent], { apiKey: '', llmService: mockLLM as any })
```

Script the `ask` function to return successive JSON responses to drive the chain through every branch you want to test.

## Common pitfalls

- **Tool returns non-string.** `call` must stringify. Returning an object silently breaks the next step's LLM input.
- **Tool description written for humans.** The LLM needs imperative schemas, not marketing copy.
- **Forgetting `status` check.** Errors surface in `answer`, not via `throw`.
- **Very long tool outputs.** They compound across steps and blow up token bills. Summarize.
- **Setting `temperature > 0.3`.** The protocol breaks quickly as temperature rises.
- **Passing `history` when you meant `toolTraces`.** Remember the signature: `executeTask(prompt, history?, toolTraces?, logCallback?, onToken?)`.
- **Confusing the two callbacks.** `logCallback` (4th) streams the raw JSON of *every* step; `onToken` (5th) streams *only* the clean final answer. For a chat UI you want `onToken`.
- **Expecting a clean answer stream from `logCallback`.** It emits the whole protocol (describe/call/finish), not just the answer — use `onToken` for that.
