---
name: cost-aware-execution
description: |
  Estimate token / API cost BEFORE executing expensive operations and ask
  for confirmation if estimated cost exceeds a threshold. Stops silent
  bill burn from "scan whole codebase" or "summarize 200 files" requests.
  Pairs with cost-watchdog (which monitors spend) and learning-loop
  (which records cost-vs-value for future calibration).

  Use when:
  - Operation might iterate over many files / many requests
  - Planning a multi-agent dispatch (each agent = $)
  - Web search / scrape / large fetch
  - Recursive operations on large directories
  - Loops that LLM-process per-item without batching
---

# Cost-Aware Execution — No Silent Burns

LLM tokens, API calls, and cloud compute all have meters. This skill
forces a brief estimate-and-confirm step before operations that could
burn through budget invisibly. The goal is not to be cheap — it's to be
deliberate.

## When to Estimate (cost gates)

```
ALWAYS estimate before:
  - Iterating LLM calls over a list (N items × cost-per-call)
  - "Summarize all", "review all", "scan all" patterns
  - Recursive directory operations involving content reads
  - Multi-agent parallel dispatches with > 3 voters
  - External API operations with per-call pricing (search, scrape, image gen)

ESTIMATE without confirm gate when:
  - Estimated cost < soft threshold (default $0.50)

CONFIRM REQUIRED when:
  - Estimated cost > soft threshold ($0.50)

HARD STOP when:
  - Estimated cost > hard threshold ($10) without explicit user override
  - Loop has no upper bound on iterations
```

## Estimation Workflow

```
STEP 1 — IDENTIFY the cost driver
  - LLM calls? → tokens × $/1k
  - External API? → calls × $/call
  - Compute? → seconds × $/sec
  - Egress? → GB × $/GB

STEP 2 — COUNT the units
  - If iterating: how many items?
  - If LLM: estimate tokens per call (input + output)
  - If recursive: walk the tree first, count files (don't read yet)

STEP 3 — COMPUTE expected cost
  Formula: units × per-unit price = $estimate

  Use cost-watchdog's pricing heuristics. When uncertain, multiply by 1.5
  for safety margin.

STEP 4 — COMPARE to threshold
  If user previously set a budget for this session, use it.
  Otherwise defaults:
    Soft: $0.50  → "FYI, this will cost approx $X. Proceeding."
    Hard: $10.00 → "This is approx $X. Confirm before proceeding."

STEP 5 — RECOMMEND cheaper alternatives if applicable
  - Use Haiku instead of Sonnet for simple summaries (~5x cheaper)
  - Batch related items into one LLM call
  - Cache prior responses if recurring
  - Sample N items instead of processing all
  - Use grep/regex first to filter before LLM
```

## Cost Estimation Cheatsheet

```
LLM call cost (rough, per 1M tokens, verify current):
  Sonnet 4.5:  in $3 / out $15
  Haiku 4.5:   in $1 / out $5
  GPT-4o:      in $2.5 / out $10
  GPT-4o-mini: in $0.15 / out $0.60
  Embedding:   ~$0.10/M (any provider)

Per-call quick math (avg case):
  Code review of 1 file (~2k input, 500 output):
    Sonnet: ~$0.013
    Haiku:  ~$0.005
  Whole-codebase summary (50 files, 100k input, 5k output):
    Sonnet: ~$0.38
    Haiku:  ~$0.13
  Multi-agent vote (4 agents × ~$0.05 each):
    ~$0.20

External APIs:
  Exa search:        $0.005 / query
  Context7 lookup:   typically free tier
  Z.ai web-search:   varies by plan
  Image generation:  $0.04 - $0.10 / image (Stable Diffusion-class)

Cloud compute (per hour, on-demand):
  EC2 t3.medium:     ~$0.04/hr
  EC2 m6i.4xlarge:   ~$0.77/hr
  GPU g5.xlarge:     ~$1.00/hr
```

## Output Format

Below soft threshold:
```
💰 EST: ~$0.08 — proceeding (N=12 items, model=Haiku)
```

Above soft threshold:
```
💰 ESTIMATE: $0.85
   Driver:  47 file reviews × Sonnet ($0.018 each)
   Cheaper alts:
     - Use Haiku → est $0.30  (85% cheaper, suitable for first pass)
     - Filter to test/ files only (8 files) → est $0.14
   Proceed with original plan? [y / haiku / filter / abort]
```

Above hard threshold:
```
🛑 HALT — ESTIMATE: $14.20
   This exceeds hard budget. Required: explicit confirmation.

   Driver:  3 voters × 200 files × Sonnet
   Recommendations to reduce:
     1. Sample 50 files first (statistical confidence): $3.55
     2. Use Haiku for first 2 voters: $5.90
     3. Run on changed files only (git diff): est $0.40

   Reply "yes proceed at $14.20" to override, or pick option above.
```

## Token Counting Tips

```
Quick eyeball:
  - 1 token ≈ 0.75 English words ≈ 4 characters
  - 1 page of code ≈ 600-1000 tokens
  - A typical SKILL.md file ≈ 2k-3k tokens
  - A 10k LOC project ≈ 100k-150k tokens to fully ingest

For accuracy use tiktoken / Anthropic's token counter when budgets are tight.
```

## Anti-Patterns

- Estimating after running (defeats the point — it's a pre-check)
- Always saying "yes proceed" to skip the gate (lose the muscle)
- Estimating in dollars when the answer is engineer-time
  (sometimes $5 of LLM saves an hour of human work — that's a bargain)
- Hard-stopping on legitimately useful work because it's "expensive"
  (the gate is for awareness, not denial)
- Ignoring caching opportunities ("we'll cache later")
- Per-item LLM calls when batch would work fine

## Integration

- Auto-invoked when proactive-mode plans iteration over large collections
- Pairs with `cost-watchdog`: this skill prevents new burns;
  cost-watchdog catches existing waste
- Feeds `learning-loop`: track when estimates were accurate vs not, to
  improve future calibration per-project
