---
goat-flow-reference-version: "1.12.1"
---
# goat-review Reference Examples

Extended Excuse/Reality table, finding-format examples, and constraint rationale for `/goat-review`.

## Worked Example - Refuted Template Suspicion

Use this shape when Pass 1 raises a plausible template or output-format suspicion and Pass 2 disproves it. This example uses real `goat-review` files and anchors.

**Review surface:** `SKILL.md`, `references/automated-review.md`, `references/refuter-spec.md`

**Pass 1 suspicion (diff-only):**
- `SKILL.md` (search: `Review Integrity`) may omit the automated-review and refuter integrity lines even though the references require them.

**Pass 2 actions:**
1. Open `SKILL.md` and re-read `Review Integrity`.
2. Search for `Automated-reviewer overlap`.
3. Search for `Refuter pass`.
4. Open `references/automated-review.md` (search: `Automated-reviewer overlap`) and `references/refuter-spec.md` (search: `Review Integrity Extension`) to compare the reference contract with the main output template.

**Expected outcome:**
- Mark the suspicion `REFUTED` when `SKILL.md` contains both output-template lines.
- Do not surface a final finding.
- Write a refutation ledger entry:
  - Original suspicion: `SKILL.md` may omit automated-review and refuter integrity lines.
  - Refuting evidence: `SKILL.md` (search: `Automated-reviewer overlap`); `SKILL.md` (search: `Refuter pass`).
  - Rationale: the main template now exposes both conditional integrity extensions, so the references are reachable during normal review output.

**Zero-finding final note:** "Checked Review Integrity against both optional references; no issue surfaced because the output template includes the required conditional lines."

## Worked Example - Full Output Block (end-to-end)

The refuted-suspicion example above walks one Pass 1 -> Pass 2 transition. This example shows the **entire Output Format filled in**, so a cold reader sees how every section is populated with real values rather than the empty template in `SKILL.md` (search: `## Output Format`). Following the convention in `## Finding Format Examples` below, it uses real `goat-review` anchors: treat it as the block to emit when a reviewed diff *weakens* the cited rules. The illustrative review surface is a small PR that (a) replaces the Blast Radius Rule's external call-site grep with a `// TODO` comment and (b) deletes the refuter authentication pre-flight. Proof class is `STATIC` for both findings because the review surface is documentation: the reviewer verifies by re-reading the live file, not by executing it.

**Review surface:** PR mode (`gh pr view 412`); base `origin/main`, head `a1b2c3d`; 2 files, 86 changed lines; under the 20-file / 3000-line chunking threshold, so no chunking. Copilot reviewed the PR. Pass 3 auto-triggered by the `[MUST:needs-decision]` finding (trigger 3); the refuter received the full findings list and upheld both.

```markdown
## TL;DR
Reviewed PR #412 (2 files, 86 lines): the diff removes the contract-change call-site search and the refuter auth gate. One MUST and one SHOULD surfaced; both confirmed in Pass 2 and upheld cross-model. Do not merge as-is.

## Review Integrity
- Scope snapshot: source=PR, base=origin/main, head=a1b2c3d, uncommitted=no, chunking=no
- Files opened in Pass 2: 2/2  (diff-only: none)
- Evidence: 2 OBSERVED / 0 INFERRED
- Refutations logged: 1
- Size: 2 files, 86 lines  (chunked: no)
- Automated-reviewer overlap: 1 overlap with copilot-pull-request-reviewer, 1 net-new
- Refuter pass: yes; confirmed=2, refuted=0, unresolved=0, leads-verified=0, model=codex
- Degradation flags: spec-drift-skipped
- Conclusion: confident

## Findings

### MUST
- [MUST:needs-decision] [CONFIRMED-CROSS-MODEL] [overlap:copilot-pull-request-reviewer] **Blast Radius Rule no longer forces a call-site search on contract changes** `SKILL.md` (search: `Blast Radius Rule`) - the diff replaces the `rg`/`grep` consumer search with a `// TODO`, so a signature, return-type, or event-shape change can now ship without a single consumer verified; downstream callers break at runtime with no review signal. Needs-decision because relaxing the rule is a policy call the author may have intended. | Footgun: none | Evidence: OBSERVED | Proof: STATIC

### SHOULD
- [SHOULD:patch] [CONFIRMED-CROSS-MODEL] [new] **Refuter pre-flight auth check deleted** `references/refuter-spec.md` (search: `Pre-flight Check`) - removing the `codex login status` / `claude auth status` gate lets Pass 3 spawn an unauthenticated refuter that fails silently and is recorded as a clean `confirmed=0` instead of `cross-model-refuter-failed`; reviews then read as cross-verified when they were not. | Footgun: none | Evidence: OBSERVED | Proof: STATIC

## Systemic Patterns
<!-- 2 findings with distinct root causes and distinct fixes - no systemic parent emitted -->

## Pre-existing Nearby
- None.

## Pre-existing Issues
- None in scope for this diff.

## Breaking Changes
- None to the emitted skill contract; both findings remove safeguards rather than change a public interface.

## Top 5 Risks (cross-tier)
1. [MUST:needs-decision] **Contract changes ship without call-site verification** `SKILL.md` (search: `Blast Radius Rule`) - highest-harm regression: silent runtime breakage in unverified consumers.
2. [SHOULD:patch] **Unauthenticated refuter passes as success** `references/refuter-spec.md` (search: `Pre-flight Check`) - false cross-model confidence on every Pass 3.

## Ship Verdict
Decision: **NO**
Reasoning: The MUST finding (Blast Radius Rule removal) remains unaddressed by the diff, which forces NO per the Ship Verdict rule (unresolved MUST -> NO); the cross-model refuter upheld both findings. The SHOULD compounds the risk by masking refuter failures. Review Integrity is `confident`, so the verdict is not downgraded further.
Confidence: HIGH

## What's Good
- The diff keeps the two-pass discipline and the Refutation Ledger path intact.

## What I Didn't Examine
- The CI workflow that invokes the refuter (out of diff scope); flagged for the author.
```

## Finding Format Examples

Use concrete harm and proof class. These examples use real anchors from this skill surface; apply them when a reviewed diff removes, bypasses, or contradicts the cited rule.

**Systemic pattern:**

```markdown
## Systemic Patterns
- [SHOULD:patch] **Group repeated output-contract drift under one parent** - affected anchors: `SKILL.md` (search: `MUST group 3+ related findings as systemic patterns`), `SKILL.md` (search: `## Systemic Patterns`); repeated failure: three related findings share one output-contract root cause; harm: reviewers scatter one root cause across separate bullets, making the required fix easy to under-scope. | Evidence: OBSERVED | Proof: STATIC
```

**PR automated-review overlap:**

```markdown
- [SHOULD:patch] [overlap:copilot-pull-request-reviewer] **Report PR metadata ingestion failure explicitly** `references/automated-review.md` (search: `automated-review-uningested`) - If `gh pr view` returns `reviews,comments` but parsing fails, the review must degrade explicitly instead of reporting no bot findings; otherwise duplicated findings look net-new. | Footgun: none | Evidence: OBSERVED | Proof: STATIC
```

## Excuse/Reality Table (Full)

| Excuse | Reality |
|--------|---------|
| "Trusted author wrote it, Pass 2 will just refute everything - skip it" | In-group trust has historically produced the worst misses in auth/signing/rate-limit code. Open the files. |
| "CI is green, so boundary and signing edges are already covered" | CI tests what was thought of. Review looks for what wasn't. Green CI raises, not answers, the Pass-2 question. |
| "Tight window + demo tomorrow - MAY-only cosmetic pass is proportionate" | An incomplete review merged into a demo window is worse than a `coverage-degraded` conclusion returned on time. |
| "Findings would be zero anyway, so Review Integrity is paperwork" | Review Integrity IS the zero-findings signal. `files-not-opened` tells the reader you stopped early. |
| "The symbol is unique enough that grep is overkill" | Unique symbols still need external verification because the bug is in the consumer, not the emitter. |
| "Refuted suspicions are noise - logging them wastes tokens" | The ledger is the integrity surface. Without it, REFUTED is indistinguishable from "didn't bother to check." |
