# Committee

## Overview

This framework enables users to assemble surgical context and create iterative prompts using templates to create chained LLM workflows.

## Core Example: Service Analysis

Let's illustrate the core workflow with an example designed to analyze different microservices based on their specific documentation and source code.

**1. `workflow.yaml`:**

Defines file collections, global context, and the structured `services` object intended for iteration.

```yaml
name: "service-analysis-workflow"
description: "Analyze multiple services using their specific docs and code"
outputPath: "_output/service-analysis"

# Define file collections
files:
  # General Docs
  architectureDoc: "docs/ARCHITECTURE.md"
  # Auth Service Files
  authConfigDoc: "docs/AUTH-CONFIG.md"
  authCode: ["src/auth/**/*.js", "!src/auth/legacy/**"]
  # Data Service Files
  dataModelsDoc: "docs/DATA-MODELS.md"
  dataCode: "src/data/**/*.js"

# Define universally accessible global variables
global_variables:
  # General context available to all tasks
  overallArchitecture: "{{ files.architectureDoc }}"

# Define data structures for set iteration
iterable_objects:
  # Structured object containing service-specific context
  services: # Target for 'for_each: services' in a set
    auth: # Key becomes 'item.key' during iteration
      # Value becomes 'item.value'
      description: "Authentication and Authorization Service"
      contact: "auth-team@example.com"
      # Embed CONTENT of auth-specific files
      configDocContent: "{{ files.authConfigDoc }}"
      codeContent: "{{ files.authCode }}"
    data: # Key becomes 'item.key'
      # Value becomes 'item.value'
      description: "Data Processing and Storage Service"
      contact: "data-team@example.com"
      # Embed CONTENT of data-specific files
      modelsDocContent: "{{ files.dataModelsDoc }}"
      codeContent: "{{ files.dataCode }}"

# Define the sequence of sets
sets:
  - useSet: analyze-service # Iterate over 'services' defined in iterable_objects
    for_each: services
```

*(**Note:** The `{{ files.collectionName }}` syntax within `global_variables` or `iterable_objects` embeds the **formatted content** of the files.)*

**2. `sets/analyze-service.set.yaml`:**

Defines a set that iterates over the `services` object defined in the workflow's `iterable_objects`.

```yaml
name: "analyze-service"
description: "Run analysis tasks for each service defined in the context"
# Iterates over the 'services' object from workflow.yaml's iterable_objects
# Each item will be { key: serviceName, value: serviceObject }
for_each: services

tasks:
  # These tasks run in parallel for each service
  - useTask: identify-service-patterns
    # Task context automatically includes 'item', 'item.key', 'item.value'
    # and variables from 'global_variables' like 'overallArchitecture'
  - useTask: suggest-service-improvements
```

**3. `tasks/analyze-service.md`:**

A task template showing how to access the context provided by the iteration and global variables.

````markdown
Analyze the service: **{{ item.key }}**

**Service Description:** {{ item.value.description }}
**Contact:** {{ item.value.contact }}

**Overall Architecture Context:**
```
{{ overallArchitecture }} # Accessing a global_variable
```

**Service-Specific Configuration Documentation:**
```
{{ item.value.configDocContent }} # Accessing data from item.value
```

**Service-Specific Code:**
```
{{ item.value.codeContent }} # Accessing data from item.value
```

**Analysis Request:**

Based on the overall architecture and the specific documentation and code for the `{{ item.key }}` service, please perform the analysis requested by the calling task (e.g., identify patterns, suggest improvements).

````

This example demonstrates how to:
- Define multiple file sources.
- Define `global_variables` accessible everywhere.
- Structure data for iteration under `iterable_objects`.
- Iterate over this structured data using `for_each`.
- Access the iteration key (`item.key`), iteration value (`item.value.*`), and global variables within a task template.

---

## Key Concepts

Now let's dive deeper into the core components illustrated above.

### Workflows

A workflow is the top-level container defined in `workflow.yaml`, as seen in the Core Example. It specifies:
- `global_variables` accessible throughout the workflow. These form the base context.
- `iterable_objects` defining data structures (arrays/objects) intended for set iteration via `for_each`.
- Named file collections (`files:`) to gather context using glob patterns. File *content* is typically embedded into `global_variables` or `iterable_objects`.
- An ordered sequence of `sets` to be executed.

### Sets

Sets group related tasks. Sets defined in the workflow's `sets:` list are executed **sequentially** in the order they appear.

Within a single set, the tasks listed are executed **in parallel**. Sets can optionally iterate over arrays or objects defined in the workflow's `iterable_objects` using `for_each`.

### Tasks

Tasks are templated prompts (stored as `.md` files) that perform a specific action using an LLM, like the `analyze-service.md` template in the Core Example. Each task runs with a context including:
- `global_variables` (from `workflow.yaml`).
- Iteration variables (`item`, `item.key`, `item.value` if the set uses `for_each`).
- Outputs from tasks in *previous* sets, accessed via `prior_outputs` defined in the set file.

**Important:** Due to parallel execution within a set, a task cannot access the output of *another task running in the same set*. Input/output dependencies must be managed by sequencing tasks across different sets.

**Escaping Template Syntax:** If you need to include literal `{{` or `}}` characters in your template without them being interpreted as variables, you can escape them with a backslash: `\{{` will render as `{{`, and `\}}` will render as `}}`.

### Referencing Output from Iterated Sets

When dealing with outputs from previous iterated sets, there are two main scenarios:

1.  **Accessing Corresponding Iteration Output:** When both the previous set (e.g., `set1`) and the current set (e.g., `set2`) iterate over the **same `for_each` target**, you often need to access the output from the previous set's task corresponding to the *current item* being processed in the current set.
    *   **Syntax:** `setName.taskName[this].output`
    *   **Use Case:** An iterated set needs the *specific* output from the *same iteration* of a previous iterated set.
    *   **Result:** Resolves to the single output value for the current iteration.

    **Example (`set2` iterated, needs corresponding output from iterated `set1`):**
    ```yaml
    # In sets/set2.set.yaml (for_each: services)
    prior_outputs:
      # Get the analyze-service output for the current service
      analysis_result: "{{ set1.analyze-service[this].output }}"
    ```

2.  **Collecting All Iteration Outputs:** When a subsequent set (often a non-iterated set, e.g., `setB`) needs to gather **all** the individual outputs generated by a task within a previous *iterated* set (e.g., `setA`).
    *   **Syntax:** `setName.taskName[*].output`
    *   **Use Case:** A later set needs to aggregate or process the results from *all* iterations of a previous iterated task.
    *   **Result:** Resolves to an **array** containing all the output values generated across all iterations of the specified task.

    **Example (`setB` non-iterated, needs all outputs from iterated `setA`):**
    ```yaml
    # In sets/setB.set.yaml (NOT iterated)
    prior_outputs:
      # Gather all results from setA's analyze-item task into an array
      all_analysis_results: "{{ setA.analyze-item[*].output }}"
    ```
    *(**Note:** The task template using `{{ all_analysis_results }}` will receive these outputs as a newline-separated string by default. Handle accordingly in your prompt.)*

### Referencing Output from Non-Iterated Sets

If the previous set was **not** iterated, you simply reference its task output directly:

- `setName.taskName.output`: Output from a **non-iterated** task in a previous set. The `taskName` used here **must** match the `useTask` value from the task definition in the previous set's YAML file.

**Example (`set2` non-iterated, needs output from non-iterated `set1`):**
```yaml
# In sets/set2.set.yaml (NOT iterated)
prior_outputs:
  taskA_result: "{{ set1.taskA.output }}"
```

**Why other syntaxes fail:**
- `"{{ set1.analyze-service.output }}"`: Refers to the *entire array* of outputs from the iterated task, not the specific one needed.
- `"{{ set1.analyze-service[item.key].output }}"`: The `prior_outputs` resolver doesn't evaluate `{{item.key}}` within the reference string; it looks for a literal key `item.key`.

**Important Convention:** Task outputs are *always* stored and referenced using the exact name specified in the `useTask` field. There is no option to rename outputs.

**Example Set Configuration (`*.set.yml`):**

If `set1` (non-iterated) contains a task `useTask: taskA`, and `set2` (non-iterated) needs its output:

```yaml
name: set2
tasks:
  - useTask: process-output
    prior_outputs:
      # Map the reference to a local variable name for use in the task template
      taskA_result: "{{ set1.taskA.output }}" # Reference uses the original task name 'taskA'
```

**Example Task Template (`tasks/process-output.md`):**

```markdown
Processing output for file {{ item.path }}.

Result from Task A in Set 1:
{{ taskA_result }} # Access the output via the name defined in prior_outputs
```

**Note:** Referencing outputs from tasks within the *same* parallel set execution is unreliable and should be avoided. Structure your workflow with sequential sets for dependencies.

## Where Data Comes From: Defining Your Context

Understanding where different types of data are defined and accessed is important for using Committee. The framework uses the following structure:

1.  **Global Variables:** Defined in the top-level `global_variables:` block of your `workflow.yaml`. These are accessible to all sets and tasks throughout the workflow execution.
2.  **File Collections & Content:** File sources are defined in the `files:` block of `workflow.yaml`. To make file *content* available for LLM analysis, embed it into variables within the `workflow.yaml` `global_variables:` or `iterable_objects:` blocks using `{{ files.collectionName }}`. Task templates (`.md`) can reference `{{ files.collectionName }}` to get a list of *paths*.
3.  **Iteration Data (`item`):** Data structures (arrays or objects) intended for iteration using `for_each` are defined in the `iterable_objects:` block of `workflow.yaml`. The `for_each: objectName` directive within a `*.set.yml` file targets one of these workflow iterable objects. Tasks within that set then access the current iteration's data via the `item` object (or `item.key` / `item.value` for object iteration).
4.  **Task Outputs (via Prior Outputs):** Outputs from previous tasks are made available to a subsequent task via the `prior_outputs:` block defined under that task in its `*.set.yml` file. This block maps a local name (used in the task template) to the structured output reference string (e.g., `setName.taskName[iterationKey].output`).

Essentially, **`workflow.yaml` is the primary location for defining the initial context (`global_variables`), data sources (`files`), and data for iteration (`iterable_objects`)**, while `*.set.yml` files orchestrate the execution flow and manage dependencies on previously generated task outputs via `prior_outputs`.

## File Collection Handling

You define named file collections in `workflow.yaml` using file paths or glob patterns (`include`/`exclude`):

```yaml
# workflow.yaml
name: "code-review-workflow"
files:
  sourceCode:
    include: ["src/**/*.js"]
    exclude: ["src/vendor/**"]
  testFiles: "test/**/*.test.js"
  docs: ["README.md", "CONTRIBUTING.md"]
# ... global_variables, iterable_objects, and sets follow ...
```

These collections are primarily used to inject context into your workflow. The way you reference a collection using `{{ files.collectionName }}` has **two behaviors depending on where it is used**:

1.  **In `workflow.yaml` (`global_variables:` or `iterable_objects:`):**
    *   **Behavior:** Embeds the **full content** of each file within the collection directly into the variable's string value. Each file's content is automatically prefixed with a Markdown header indicating its path (e.g., `# path/to/file.js`).
    *   **Purpose:** This is the primary mechanism for **injecting substantial file content** (like source code, documentation) into the context, making it available to subsequent sets and tasks for direct LLM analysis.
    *   **Example (`workflow.yaml`):**
        ```yaml
        global_variables:
          # Embeds the content of all files matching src/**/*.js,
          # each block prefixed with '# filepath'
          sourceContext: "{{ files.sourceCode }}"
          # Embeds content of README.md and CONTRIBUTING.md
          docsContext: "{{ files.docs }}"
        ```

2.  **In Task Templates (`*.md` files):**
    *   **Behavior:** Renders a **newline-separated list of the file paths** belonging to that collection. It does **not** embed the file content here.
    *   **Purpose:** Useful for providing informational context within a task prompt, such as listing related files for the LLM's reference, *without* including their potentially large content directly in that specific prompt.
    *   **Example (`tasks/review-code.md`):**
        ```markdown
        Review the following source code file `{{ item.path }}`:

        ```javascript
        {{ item.content }} # Assuming iteration over a file collection
        ```

        Consider related test files (paths listed below):
        {{ files.testFiles }} # Lists paths from the 'testFiles' collection
        ```

**Key Distinction:** Use `{{ files.collectionName }}` in `workflow.yaml` (`global_variables` or `iterable_objects`) to provide the *content* needed for LLM analysis. Use it in task templates (`.md`) when you only need to reference the *paths* of the files.

*(Note: Advanced pattern filtering within the template tag like `{{ files.collectionName:*.js }}` is not currently implemented.)*

## Two-Phase Thinking

Tasks can optionally perform a preliminary "thinking" step before generating the final response. This is useful for complex analysis or reasoning tasks. Configure this using YAML frontmatter at the top of your task's `.md` file:

```yaml
--- 
name: "complex-analysis-task" # Optional: Task name for clarity
thinking: true                 # REQUIRED: Enables the thinking phase
thinking_prompt: "path/to/thinking-prompt.md" # Optional: Use a separate prompt file for the thinking phase
thinking_instruction: "Analyze the input step-by-step..." # Optional: Specific instruction for the thinking phase
thinking_params:
  temperature: 0.2             # Optional: LLM parameters specifically for the thinking phase
---

# Main Task Prompt

Based on the preceding analysis, provide the final answer.

Context:
{{ context }}
```

- If `thinking: true`, the framework first runs the thinking phase (using the main prompt or `thinking_prompt` if provided, potentially guided by `thinking_instruction`).
- The output of the thinking phase is then automatically prepended to the context provided to the main task prompt for generating the final response.
- You can control LLM parameters specifically for the thinking step using `thinking_params`.

## Using the Framework

### Installation

```bash
# Navigate to the project root directory

# Install globally (recommended for CLI use)
npm install -g . 

# Or install locally
npm install . 
```

### Basic Usage

1. Create a workflow directory (e.g., `my-workflow/`) containing:
   - `workflow.yaml` (workflow definition)
   - `sets/` directory (with `.set.yaml` or `.set.yml` set definitions)
   - `tasks/` directory (with `.md` task prompt files)
2. Configure your environment variables (e.g., in a `.env` file in your project or system):
   ```dotenv
   # Required for using Anthropic API (if not using --local)
   ANTHROPIC_API_KEY=your_api_key_here
   # Optional: Specify default model (defaults exist, e.g., Claude 3 Haiku for --lite, Sonnet otherwise)
   # DEFAULT_MODEL=claude-3-sonnet-20240229 

   # Optional: Set maximum tokens for LLM responses (default: 10000)
   # MAX_TOKENS=100000

   # Optional: Set maximum number of concurrent API requests (default: 10)
   # MAX_PARALLEL_REQUESTS=15

   # Optional: Set minimum delay between starting parallel API requests (in seconds, default: 0.1)
   # Useful for proactively avoiding rate limits based on request frequency.
   # REQUEST_DELAY_SECONDS=0.5

   # Optional: Set maximum number of retries for failed API calls (default: 20)
   # LLM_MAX_RETRIES=10

   # Optional: For using a local LLM (requires --local flag)
   # Needs a running server compatible with OpenAI API spec (e.g., Ollama, LM Studio)
   LOCAL_LLM_URL=http://localhost:11434 # Default Ollama URL example
   # Optional: Specify model served by local URL (required if server hosts multiple)
   # LOCAL_LLM_MODEL=llama3 
   ```
3. Run the workflow from your terminal:

```