# Debugg AI — MCP Server AI-powered browser testing via the [Model Context Protocol](https://modelcontextprotocol.io). Point it at any URL (or localhost) and describe what to test — an AI agent browses your app and returns pass/fail with screenshots.

## Setup **Requires Node.js 20.20.0 or later** (transitive requirement from `posthog-node@^5.26.0`). Get an API key at [debugg.ai](https://debugg.ai), then add to your MCP client config: ```json { "mcpServers": { "debugg-ai": { "command": "npx", "args": ["-y", "@debugg-ai/debugg-ai-mcp"], "env": { "DEBUGGAI_API_KEY": "your_api_key_here" } } } } ``` Or with Docker: ```bash docker run -i --rm --init -e DEBUGGAI_API_KEY=your_api_key quinnosha/debugg-ai-mcp ``` ## Tools The server exposes **8** tools: three **Browser** tools plus one **action-based** tool per managed entity. The headline tools are `check_app_in_browser` (full AI agent) and `probe_page` (lightweight no-LLM page probe). The rest — `project`, `environment`, `test_suite`, `test_case`, `executions` — each take an `action` discriminator (e.g. `{"action":"list"}`) that selects the operation. Destructive `delete` actions require confirmation (an elicitation prompt where supported, otherwise `confirm: true`). ### Browser #### `check_app_in_browser` Runs an AI browser agent against your app. The agent navigates, interacts, and reports back with screenshots. Localhost URLs are auto-tunneled via ngrok. | Parameter | Type | Description | |-----------|------|-------------| | `description` | string **required** | What to test (natural language) | | `url` | string **required** | Target URL — `http://localhost:3000` is auto-tunneled | | `environmentId` | string | UUID of a specific environment | | `credentialId` | string | UUID of a specific credential | | `credentialRole` | string | Pick a credential by role (e.g. `admin`, `guest`) | | `username` | string | Username for login (ephemeral — not persisted) | | `password` | string | Password for login (ephemeral — not persisted) | | `repoName` | string | Override auto-detected git repo name (e.g. `my-org/my-repo`) | One focused check per call. The agent has a ~25-step internal budget; split broader suites across multiple calls. Every successful run returns a `browserSession` block alongside the screenshot — presigned S3 URLs for the captured **HAR** (full network trace) and **console log** (every JS console message). Use them to detect refetch loops, hydration errors, and other runtime issues that pass type-checks and unit tests: ```json "browserSession": { "harUrl": "https://...session_18139.har?X-Amz-...", "consoleLogUrl": "https://...session_18139_console.json?X-Amz-...", "recordingUrl": "https://...session_18139_recording.webm?X-Amz-...", "harStatus": "downloaded", "consoleLogStatus": "downloaded", "harRedactionStatus": "redacted", "consoleLogRedactionStatus": "redacted" } ``` URLs are short-lived presigned S3 — refetch the parent execution via `executions {action:"get", uuid}` to renew. `harStatus` / `consoleLogStatus` disambiguate `'downloaded'` (URL fetchable), `'not_available'` (page emitted nothing), `'failed'` (capture broke). On a fresh run the URLs are commonly `null` because capture uploads async after the agent finishes — poll `executions {action:"get", uuid: executionId}` until status reaches `'downloaded'`. Authorization / Cookie / `token`/`secret`/`api_key` headers are scrubbed server-side before the artifacts are persisted. #### `trigger_crawl` Fires a server-side browser-agent crawl to populate the project's knowledge graph. Localhost URLs tunnel automatically. Returns `{executionId, status, targetUrl, durationMs, outcome?, crawlSummary?, knowledgeGraph?, browserSession?}` with `knowledgeGraph.imported === true` on successful ingestion. The `browserSession` block (HAR + console-log URLs, same shape as above) is also present on completed crawls. #### `probe_page` **Lightweight no-LLM batch page probe.** Pass 1-20 URLs; each navigates, waits for load, and returns rendered state — screenshot + page metadata + structured console errors + network summary. No agent loop, no LLM cost, no scenario assertions. Use it for "did I just break /settings?", multi-route smoke after a refactor, CI per-PR sweeps, and quick is-it-up checks where `check_app_in_browser`'s 60-150s agent loop is overkill. | Parameter | Type | Description | |-----------|------|-------------| | `targets` | array **required** | 1-20 entries: `[{url, waitForSelector?, waitForLoadState?, timeoutMs?}]` | | `targets[].url` | string **required** | Public URL or localhost (auto-tunneled) | | `targets[].waitForLoadState` | enum | `'load'` (default) / `'domcontentloaded'` / `'networkidle'` | | `targets[].waitForSelector` | string | Optional CSS selector to wait for after navigation | | `targets[].timeoutMs` | number | Per-URL timeout, 1000-30000 (default 10000) | | `includeHtml` | boolean | Return raw HTML in each result (default false) | | `captureScreenshots` | boolean | Return one PNG per target (default true) | The whole batch shares a single backend execution + browser session + tunnel — 5 URLs in one call is dramatically faster than 5 parallel single-URL calls. Per-URL `error` field preserves batch resilience: a single failed target doesn't fail the others. **`networkSummary` aggregation key is `origin + pathname`** — refetch loops (`?n=0..4` repeatedly hitting the same endpoint) collapse into a single entry with the count, so `/api/poll` showing up with `count: 47` is the actionable "infinite refetch loop" signal users originally asked for. Performance budget: <10s for 1 URL, <25s for 20. Localhost dead-port returns `LocalServerUnreachable` in <2s without burning a workflow execution. ### `project` | Action | Params | Result | |--------|--------|--------| | `get` | `{uuid}` | Curated project detail | | `list` | `{q?, page?, pageSize?}` | Paginated summaries | | `create` | `{name, platform, (teamUuid\|teamName), (repoUuid\|repoName)}` | Created project | Team and repo resolve by **either** uuid **or** name (case-insensitive exact match; `NotFound` if none, `AmbiguousMatch` if multiple). There is **no** `update`/`delete` — rename or delete a project from the DebuggAI web app. ### `environment` | Action | Params | Result | |--------|--------|--------| | `get` | `{uuid, projectUuid?}` | Env with credentials inlined (passwords never returned) | | `list` | `{projectUuid?, q?, page?, pageSize?}` | Paginated envs, each with a credentials array | | `create` | `{name, url, description?, projectUuid?, credentials?}` | Created env (optionally seeds credentials) | | `update` | `{uuid, name?, url?, description?, addCredentials?, updateCredentials?, removeCredentialIds?}` | Patched env; credential ops run **remove → update → add** | | `delete` | `{uuid, projectUuid?, confirm?}` | Deletes env (cascades credentials) — **requires confirmation** | `projectUuid` auto-resolves from the git repo when omitted. Per-cred failures surface in `credentialWarnings[]` without blocking the env op. ### `test_suite` | Action | Params | Result | |--------|--------|--------| | `list` | `{projectUuid\|projectName, search?, page?, pageSize?}` | Paginated suites with status + pass rate | | `create` | `{name, description, projectUuid\|projectName}` | Created suite | | `run` | `{suiteUuid\|(suiteName+project), targetUrl?}` | Triggers all tests async | | `results` | `{suiteUuid\|(suiteName+project)}` | Suite + per-test outcomes | | `delete` | `{suiteUuid\|(suiteName+project), confirm?}` | Soft-delete — **requires confirmation** | ### `test_case` | Action | Params | Result | |--------|--------|--------| | `create` | `{name, description, agentTaskDescription, suiteUuid\|(suiteName+project), relativeUrl?, maxSteps?}` | Created test case (not auto-run) | | `update` | `{testUuid, name?, description?, agentTaskDescription?}` | Patched test case | | `delete` | `{testUuid, confirm?}` | Soft-delete — **requires confirmation** | ### `executions` | Action | Params | Result | |--------|--------|--------| | `get` | `{uuid}` | Full detail (`nodeExecutions` + state + errorInfo) + screenshot/gif artifacts | | `list` | `{status?, projectUuid?, page?, pageSize?}` | Paginated summaries | 404 from the backend surfaces as `isError: true` with `{error: 'NotFound', message, uuid}`. Credentials are **always** returned without passwords. ### Pagination Every filter-mode response is paginated. Response shape: ```json { "filter": { "...echoed query params..." }, "pageInfo": { "page": 1, "pageSize": 20, "totalCount": 47, "totalPages": 3, "hasMore": true }, "": [ ... ] } ``` Pass optional `page` (1-indexed, default 1) and `pageSize` (default 20, max 200; oversized values are clamped). No response is ever silently truncated. ## Resources Alongside tools, the server exposes the read-only entities as MCP **resources** so clients can browse and @-mention them as context: | URI | What | |---|---| | `debugg-ai://projects` | All projects (first page) | | `debugg-ai://environments` | Environments for the auto-detected project | | `debugg-ai://executions` | Recent executions (first page) | | `debugg-ai://project/{uuid}` | One project, full detail | | `debugg-ai://environment/{uuid}` | One environment (credentials inline, passwords redacted) | | `debugg-ai://execution/{uuid}` | One execution, full node detail + artifact links | Reads dispatch to the same handlers as the `project` / `environment` / `executions` tools, so the data and auth are identical. Resources are additive — clients without resource support keep using the tools. ### Security invariants - Passwords are write-only. They never appear in any response body from any tool. - Tunnel URLs (`*.ngrok.debugg.ai`) are stripped from all browser-agent responses, including agent-authored text. - 404s from the backend surface as `isError: true` with `{error: 'NotFound', ...}`, never as thrown exceptions. - Missing `DEBUGGAI_API_KEY` surfaces as a structured tool error on first invocation — the server still registers and lists tools normally. ## Migration to v3.0.0 (action-based tools) v3 consolidated the 20 per-verb tools into 8 action-based tools. Old tool → new `tool {action}`: | Removed | Replacement | |---------|-------------| | `search_projects` | `project {action:"get"}` / `project {action:"list"}` | | `create_project` | `project {action:"create"}` | | `update_project`, `delete_project` | **Dropped** — use the DebuggAI web app | | `search_environments` | `environment {action:"get"}` / `{action:"list"}` | | `create_environment` / `update_environment` / `delete_environment` | `environment {action:"create"\|"update"\|"delete"}` | | `create_test_suite` / `search_test_suites` / `run_test_suite` / `get_test_suite_results` / `delete_test_suite` | `test_suite {action:"create"\|"list"\|"run"\|"results"\|"delete"}` | | `create_test_case` / `update_test_case` / `delete_test_case` | `test_case {action:"create"\|"update"\|"delete"}` | | `search_executions` | `executions {action:"get"\|"list"}` | | `trigger_crawl` `headless` param | **Dropped** — always headless | `delete` actions now require confirmation (elicitation prompt, or `confirm: true`). Clients pick up the new surface on MCP restart. ## Migration from v1.x (breaking change in v2.0.0) v2 collapsed a 22-tool surface to 11. Old-tool → new-tool mapping: | Removed | Replacement | |---------|-------------| | `list_projects`, `get_project` | `search_projects` (uuid mode vs filter mode) | | `list_environments`, `get_environment` | `search_environments` | | `list_credentials`, `get_credential` | `search_environments` — credentials inline on each env | | `create_credential` | `create_environment({credentials: [...]})` seed, or `update_environment({addCredentials: [...]})` | | `update_credential` | `update_environment({updateCredentials: [{uuid, ...patch}]})` | | `delete_credential` | `update_environment({removeCredentialIds: [uuid]})` | | `list_teams`, `list_repos` | `create_project({teamName, repoName})` — name resolution with ambiguity handling | | `list_executions`, `get_execution` | `search_executions` | | `cancel_execution` | **Dropped** — backend spin-down is automatic | Response-shape changes: the bare `count` field on list responses is gone — use `pageInfo.totalCount`. ## Configuration | Env var | Required | Purpose | |---|---|---| | `DEBUGGAI_API_KEY` | yes | Backend API key. Aliases: `DEBUGGAI_API_TOKEN`, `DEBUGGAI_JWT_TOKEN`. | | `DEBUGGAI_API_URL` | no | Backend base URL. Defaults to `https://api.debugg.ai`. | | `DEBUGGAI_TOKEN_TYPE` | no | `token` (default) or `bearer`. | | `LOG_LEVEL` | no | `error` / `warn` / `info` (default) / `debug`. | | `POSTHOG_API_KEY` | no | Override the embedded telemetry project key (e.g. private fork). | | `DEBUGGAI_TELEMETRY_DISABLED` | no | Set to `1` / `true` / `yes` / `on` to disable telemetry entirely. | ```bash DEBUGGAI_API_KEY=your_api_key ``` ## Remote / HTTP transport (optional) By default the server speaks **stdio** (local `npx`). It can instead run as a hosted, multi-user remote MCP over **stateless Streamable HTTP** + OAuth: ```bash DEBUGGAI_MCP_TRANSPORT=http PORT=3000 DEBUGGAI_TOKEN_TYPE=bearer npx -y @debugg-ai/debugg-ai-mcp@latest ``` It is an OAuth **Resource Server**: every `POST /mcp` needs `Authorization: Bearer `; missing/invalid tokens get a `401` with a `WWW-Authenticate` pointing at the RFC 9728 metadata, and clients run the OAuth flow against the advertised authorization server. The bearer is request-scoped — `api.debugg.ai` validates it. | Endpoint | Purpose | |---|---| | `POST /mcp` | MCP Streamable HTTP (bearer-protected) | | `GET /.well-known/oauth-protected-resource` | RFC 9728 metadata (authorization server discovery) | | `GET /health` | Load-balancer / ECS health check | | Env var | Default | Purpose | |---|---|---| | `DEBUGGAI_MCP_TRANSPORT` | `stdio` | Set to `http` for the remote transport | | `PORT` | `3000` | HTTP listen port | | `DEBUGGAI_MCP_PUBLIC_URL` | `https://mcp.debugg.ai` | This server's public resource URL (RFC 9728 `resource`) | | `DEBUGGAI_OAUTH_ISSUER` | `https://auth.debugg.ai` | Authorization server advertised to clients | | `DEBUGGAI_TOKEN_TYPE` | `token` | Set to `bearer` so OAuth tokens forward as `Authorization: Bearer` | stdio installs need none of these. ## Telemetry The MCP server ships with telemetry enabled by default — an embedded write-only PostHog project key (`phc_*`) so the team can observe cache hit rates, poll cadence, tunnel reliability, and other operational metrics across the install base. Captured events: | Event | When | |---|---| | `tool.executed` / `tool.failed` | Per tool call | | `workflow.executed` | Per browser-agent execution (carries `pollCount`, `durationMs`, `finalIntervalMs`) | | `tunnel.provisioned` / `tunnel.provision_retry` / `tunnel.stopped` | Per tunnel lifecycle event | | `template.lookup` / `project.lookup` | Cache hit/miss with `durationMs` on cold-call | Privacy posture: - The distinct ID is `SHA-256(api_key).slice(0, 16)` — never the raw key, no PII. - `phc_*` keys are write-only by PostHog convention; safe to embed in source. - Set `DEBUGGAI_TELEMETRY_DISABLED=1` to opt out entirely (resolves to a no-op provider; no events leave the process). The active mode is logged at boot: ``` Telemetry enabled (PostHog, DebuggAI default project). Set DEBUGGAI_TELEMETRY_DISABLED=1 to opt out. Telemetry enabled (PostHog, custom POSTHOG_API_KEY) Telemetry disabled (DEBUGGAI_TELEMETRY_DISABLED is set) ``` ## Local Development ```bash npm install npm run build npm run test:e2e # real end-to-end evals against the backend ``` The eval suite spawns the built MCP server as a subprocess, exercises every tool against a real backend, and writes per-flow artifacts to `scripts/evals/artifacts//`. See `scripts/evals/flows/` for the individual scenarios. ### MCP registration: `debugg-ai-local` vs `debugg-ai` This repo ships a `.mcp.json` that registers a **project-scoped** server named `debugg-ai-local` pointing at `node dist/index.js` — the freshly-built local code. It only activates when Claude Code's working directory is this repo. Your other projects should use the **user-scoped** `debugg-ai` registration that pulls from the published npm package: ```bash npm run mcp:global # registers debugg-ai in ~/.claude.json to npx -y @debugg-ai/debugg-ai-mcp ``` After editing code here, run `npm run mcp:local` (which just rebuilds) so the next invocation of `debugg-ai-local` picks up your changes. ## Links [Dashboard](https://app.debugg.ai) · [Docs](https://debugg.ai/docs) · [Issues](https://github.com/debugg-ai/debugg-ai-mcp/issues) · [Discord](https://debugg.ai/discord) --- Apache-2.0 License © 2025 DebuggAI