# Executable Feedback Loop Guide

Practical guide for implementing the execute-before-return pattern in code-generating agents. Based on REF-013 MetaGPT research findings.

## Overview

The executable feedback loop ensures code-generating agents **test their output before returning it**. Research shows this pattern yields +4.2% improvement on HumanEval benchmarks and reduces human revision cost by 63% (from 2.25 revision cycles down to 0.83).

### Core Principle

```
Generate code → Execute tests → Pass? → Return to user
                              → Fail? → Analyze → Fix → Re-execute
```

Never return untested code. Every code artifact must pass through the feedback loop before delivery.

## Quick Start

### Minimal Feedback Loop

For agents generating code, the minimum viable loop:

```yaml
# 1. Generate code
code_artifact:
  path: "src/utils/validate.ts"
  language: typescript
  code_type: new_function

# 2. Generate tests
test_files:
  - "test/unit/utils/validate.test.ts"

# 3. Execute
execution_config:
  test_framework: jest
  test_command: "npx jest test/unit/utils/validate.test.ts"

# 4. Retry on failure (max 3 attempts)
retry_policy:
  max_attempts: 3
  escalation_on_max: true
```

### Integration with Agent Workflow

```
Agent receives task
    │
    ├─ 1. Check debug memory for similar past work
    │     └─ Load patterns from .aiwg/ralph/debug-memory/
    │
    ├─ 2. Generate code
    │     └─ Apply learnings from debug memory
    │
    ├─ 3. Generate tests (if not present)
    │     ├─ Happy path tests
    │     ├─ Edge case tests
    │     └─ Error handling tests
    │
    ├─ 4. Execute tests
    │     ├─ Capture all output
    │     └─ Record in debug memory
    │
    ├─ 5. Analyze results
    │     ├─ PASS → Record success, return code
    │     └─ FAIL → Analyze, fix, re-execute (up to max_attempts)
    │
    └─ 6. Finalize
          ├─ Update debug memory with learnings
          └─ Return code (or escalate if max attempts reached)
```

## Phase-by-Phase Walkthrough

### Phase 1: Pre-Generation (Check Debug Memory)

Before writing code, check for relevant past experience:

```yaml
pre_generation:
  check_history: true
  lookback_window: 10  # Recent executions
  patterns_to_check:
    - similar_file_edits
    - same_test_failures
    - recurring_error_types
  apply_learnings: true
```

**What to look for:**
- Has this file been modified before? What went wrong?
- Are there known patterns for this type of code?
- What error types are common in this module?

**Example:**
```
Debug memory shows: src/auth/*.ts has had 3 past sessions
  - Pattern: "Missing null check" occurred 2 times
  - Fix template: "Add null/undefined guard at function entry"
  - Preemptive action: Include null checks in initial generation
```

### Phase 2: Code Generation

Generate code with awareness of past failures:

```markdown
Agent Thought (Goal): Generate validateInput function for user registration
Agent Thought (Extraction): Debug memory shows null-check pattern for this module
Agent Thought (Reasoning): I'll include null guards upfront to avoid known failure
```

### Phase 3: Test Generation

Generate tests based on code type:

| Code Type | Required Tests |
|-----------|---------------|
| New function | Happy path + edge cases + error handling |
| Bug fix | Regression test for the specific bug |
| Refactor | All existing tests must still pass |
| API endpoint | Integration + error cases + validation |

**Coverage Requirements:**

| Code Type | Minimum Coverage |
|-----------|-----------------|
| New function | 80% |
| Bug fix | 100% of fix |
| Refactor | Match original |
| API endpoint | 90% |

### Phase 4: Test Execution

Execute tests and capture results:

```yaml
execution:
  command: "npx jest test/unit/utils/validate.test.ts --verbose"
  timeout: 120s
  capture:
    - stdout
    - stderr
    - exit_code
    - coverage_report
```

**Record everything** — full output is needed for failure analysis.

### Phase 5: Failure Analysis

When tests fail, perform structured analysis:

```yaml
failure_analysis:
  - test: "test_null_input"
    error_type: "TypeError"
    error_message: "Cannot read property 'length' of null"
    stack_trace_snippet: "at validateInput (src/utils/validate.ts:42)"
    root_cause: "Missing null check in validateInput()"
    fix_strategy: "Add null/undefined guard at line 42"
    confidence: 0.95
```

**Analysis protocol:**

1. **Parse error message** — What type of error? Where did it occur?
2. **Check debug memory** — Has this pattern occurred before?
3. **Identify root cause** — Why did the test fail?
4. **Design fix** — What specific change will resolve it?
5. **Assess confidence** — How sure are we this fix is correct?

**Do NOT:**
- Retry with random changes
- Skip failure analysis
- Apply fixes without understanding root cause
- Ignore patterns from debug memory

### Phase 6: Fix and Verify

Apply targeted fix and re-execute:

```yaml
fix_applied:
  description: "Added null check: if (!input) return { valid: false }"
  diff_summary: "+3/-0 lines"
  files_modified: ["src/utils/validate.ts"]

verification:
  command: "npx jest test/unit/utils/validate.test.ts --verbose"
  result: all_passing
  regression_check: no_new_failures
```

**Regression guard:** If previously passing tests start failing after a fix, ABORT. The fix introduced a regression.

### Phase 7: Completion or Escalation

**On success:**
```yaml
completion:
  status: passed
  attempts_used: 2
  coverage_achieved: 92%
  debug_memory_updated: true
  learnings_recorded:
    - pattern: "Null check at module boundary"
      fix_template: "if (!input) return default"
```

**On max attempts reached:**
```yaml
escalation:
  status: escalated
  attempts_used: 3
  include_in_report:
    - original_code
    - all_test_results
    - failure_analyses
    - fix_attempts
    - debug_memory_summary
  notification:
    channel: issue_comment
    message: |
      ## Execution Feedback Escalation

      **File**: src/utils/validate.ts
      **Attempts**: 3/3

      ### Failures
      [Summary of persistent failures]

      ### Analysis
      [Root cause analysis across attempts]

      ### Attempted Fixes
      [What was tried and why it didn't work]

      **Human review required**
```

## Debug Memory

### Structure

Debug memory persists across sessions in `.aiwg/ralph/debug-memory/`:

```
.aiwg/ralph/debug-memory/
├── session-abc123.yaml      # Individual session records
├── session-def456.yaml
└── patterns/
    └── learned-patterns.yaml # Cross-session learnings
```

### Session Record

Each session records the full execution history:

```yaml
session_id: "abc123"
file_path: "src/utils/validate.ts"
status: passed
created_at: "2026-01-25T10:00:00Z"

executions:
  - attempt: 1
    test_results:
      passed: 6
      failed: 2
    failures:
      - test: "test_null_input"
        error_type: "TypeError"
        root_cause: "Missing null check"
    fix_applied: "Added null guard"

  - attempt: 2
    test_results:
      passed: 8
      failed: 0

learnings:
  patterns_identified:
    - pattern: "Null check missing at module boundary"
      frequency: 1
      fix_template: "Add null/undefined guard at function entry"
```

### Cross-Session Learning

Agents should query debug memory before generating code:

```yaml
pre_generation_query:
  file: "src/utils/validate.ts"
  module: "src/utils/*"

  results:
    past_sessions: 3
    common_patterns:
      - "Null check missing" (frequency: 2)
      - "Edge case not handled" (frequency: 1)
    recommended_preemptive_actions:
      - "Include null/undefined guards"
      - "Add edge case tests for empty string and whitespace"
```

## Integration with Agent Loop

When the executable feedback loop runs inside an agent loop:

```yaml
ralph_integration:
  # Every Al iteration includes code execution
  execution_gate:
    require_passing_tests: true
    allow_skip: false

  # Debug memory persists across Al iterations
  debug_memory:
    persist_per_iteration: true
    cross_iteration_learning: true

  # Test pass rate contributes to Al progress metric
  progress_metric:
    include_test_pass_rate: true
    weight: 0.3
```

**Al + Executable Feedback flow:**

```
Al Iteration 1:
  ├─ Generate code
  ├─ Run executable feedback loop (up to 3 attempts)
  ├─ Tests pass? → Al marks progress
  └─ Tests fail after 3 attempts? → Al escalates

Al Iteration 2:
  ├─ Load debug memory from iteration 1
  ├─ Generate improved code (using learnings)
  ├─ Run executable feedback loop
  └─ Continue...
```

## Escalation Scenarios

### Scenario 1: Simple Fix (Attempt 2 Success)

```
Attempt 1: 6/8 tests pass
  Analysis: Missing null check → Fix: Add guard clause
Attempt 2: 8/8 tests pass ✓
  Result: Return code to user
```

### Scenario 2: Complex Bug (Escalation After 3 Attempts)

```
Attempt 1: 4/10 tests pass
  Analysis: Race condition in async handler → Fix: Add mutex
Attempt 2: 6/10 tests pass (improved but not fixed)
  Analysis: Mutex scope too narrow → Fix: Widen lock scope
Attempt 3: 7/10 tests pass (still failing)
  Analysis: Underlying architecture issue
  Result: ESCALATE to human with full context
```

### Scenario 3: Regression Detected (Immediate Abort)

```
Attempt 1: 8/10 tests pass
  Analysis: Edge case not handled → Fix: Add boundary check
Attempt 2: 7/10 tests pass (REGRESSION: test_basic_flow now failing)
  Result: ABORT — fix introduced regression
  Action: Revert to attempt 1 code, escalate
```

## Metrics Dashboard

Track these metrics to monitor feedback loop effectiveness:

| Metric | Target | Current | Status |
|--------|--------|---------|--------|
| First-attempt pass rate | >70% | — | — |
| Average attempts to pass | <2.0 | — | — |
| Escalation rate | <10% | — | — |
| Debug memory reuse rate | >30% | — | — |
| Coverage met rate | >90% | — | — |

### Metric Definitions

- **First-attempt pass rate**: % of code that passes all tests on first try
- **Average attempts to pass**: Mean attempts before success (excluding escalations)
- **Escalation rate**: % of workflows that exhaust all attempts
- **Debug memory reuse rate**: % of sessions that benefit from past learnings
- **Coverage met rate**: % of code meeting minimum coverage requirements

## Troubleshooting

### "Tests keep failing with the same error"

1. Check if the root cause analysis is correct
2. Review debug memory for similar patterns
3. Verify the fix actually addresses the root cause (not a symptom)
4. If 3 attempts fail: escalate — the issue may be architectural

### "Coverage requirement not met"

1. Check if test generation is comprehensive enough
2. Verify coverage tool is configured correctly
3. Add tests for uncovered branches/paths
4. For refactors: compare against original coverage baseline

### "Regression detected after fix"

1. ABORT immediately — do not attempt further fixes
2. Revert to pre-fix code state
3. Analyze what the fix broke
4. Escalate with both the original failure and regression context

### "Debug memory not finding relevant patterns"

1. Check file path matching (exact vs module-level)
2. Widen search to module or directory level
3. Check error type matching (exact vs category)
4. Debug memory may not have enough history yet

## Anti-Patterns

### 1. Skipping Execution

```
# WRONG — returning code without testing
Generate code → Return to user

# RIGHT — always execute before return
Generate code → Execute tests → Verify → Return to user
```

### 2. Random Retry

```
# WRONG — no analysis before retry
Test failed → Change something random → Retry

# RIGHT — structured analysis then targeted fix
Test failed → Analyze root cause → Design targeted fix → Retry
```

### 3. Ignoring Debug Memory

```
# WRONG — starting fresh every time
Generate code → Hit same null-check bug → Fix → Next session → Same bug

# RIGHT — learn from history
Check debug memory → Known null-check pattern → Include guard → Tests pass first try
```

### 4. Exceeding Retry Limit

```
# WRONG — infinite retries
max_attempts: 999  # Will waste tokens on unfixable issues

# RIGHT — bounded retries with escalation
max_attempts: 3
escalation_on_max: true  # Human takes over
```

## Schema Reference

The executable feedback loop conforms to:
- `@$AIWG_ROOT/agentic/code/frameworks/sdlc-complete/schemas/flows/executable-feedback.yaml` — Workflow schema
- `@$AIWG_ROOT/agentic/code/addons/ralph/schemas/debug-memory.yaml` — Debug memory schema
- `@$AIWG_ROOT/agentic/code/addons/ralph/schemas/actionable-feedback.yaml` — Feedback schema
- `@$AIWG_ROOT/agentic/code/addons/ralph/schemas/iteration-analytics.yaml` — Analytics schema

## References

- @.aiwg/research/findings/REF-013-metagpt.md — MetaGPT research findings
- @$AIWG_ROOT/agentic/code/frameworks/sdlc-complete/rules/executable-feedback.md — Executable feedback rules
- @$AIWG_ROOT/agentic/code/frameworks/sdlc-complete/schemas/flows/executable-feedback.yaml — Workflow schema
- @$AIWG_ROOT/agentic/code/addons/ralph/schemas/debug-memory.yaml — Debug memory schema
- @$AIWG_ROOT/agentic/code/addons/ralph/docs/reflection-memory-guide.md — Related: Reflexion memory guide
- #101 — Implementation issue