# Evals with memory

Agents that use memory in `thread` scope — including observational memory — require a thread ID at run time. When an eval invokes the agent without one, you'll see:

```text
ObservationalMemory (scope: 'thread') requires a threadId, but none was found in RequestContext or MessageList.
```

This page covers the three working patterns for running Mastra evals against memory-enabled agents, what each path supports, and which one to pick. A complete runnable repro for all three approaches lives in [`examples/evals-with-memory`](https://github.com/mastra-ai/mastra/tree/main/examples/evals-with-memory).

## When to use which approach

| Goal                                            | Approach                                                                                  |
| ----------------------------------------------- | ----------------------------------------------------------------------------------------- |
| One shared conversation across every item       | [`runEvals` with global `targetOptions.memory`](#shared-thread-with-runevals)             |
| One independent thread per item, simple CI loop | [`runEvals` per item](#per-item-threads-with-runevals)                                    |
| Per-item threads driven by a stored `Dataset`   | [`dataset.startExperiment` with an inline task](#dataset-experiments-with-an-inline-task) |

Pre-seeding `RequestContext` with `MastraMemory` is **not** a supported way to drive memory into an agent. Thread resolution reads `args.memory.thread` — `RequestContext.MastraMemory` is populated by `prepare-memory-step` after the agent has already resolved its thread.

## Shared thread with `runEvals`

`runEvals` accepts `targetOptions`, which is forwarded to `agent.generate()`. Passing `memory: { thread, resource }` runs every data item against the same thread — useful for testing recall across a multi-turn conversation.

```typescript
import { runEvals } from '@mastra/core/evals'
import { supportAgent } from './support-agent'
import { recallScorer } from '../scorers/recall-scorer'

const memory = await supportAgent.getMemory()
await memory!.createThread({ threadId: 'eval-thread', resourceId: 'ci-user' })

const result = await runEvals({
  target: supportAgent,
  scorers: [recallScorer],
  targetOptions: {
    memory: { thread: 'eval-thread', resource: 'ci-user' },
  },
  data: [
    { input: 'My order number is 12345' },
    { input: 'What is my order number?', groundTruth: '12345' },
  ],
})
```

`targetOptions` is **global per call**. There is no per-item override on `RunEvalsDataItem` today.

## Per-item threads with `runEvals`

When each data item needs its own thread (the common CI shape), call `runEvals` once per item with a unique `targetOptions.memory` and aggregate the scores yourself.

```typescript
import { randomUUID } from 'node:crypto'
import { runEvals } from '@mastra/core/evals'
import { supportAgent } from './support-agent'
import { recallScorer } from '../scorers/recall-scorer'

const memory = await supportAgent.getMemory()
const resourceId = 'ci-user'

const items = [
  { input: 'Cats are mammals', groundTruth: 'mammals' },
  { input: 'Dogs are mammals too', groundTruth: 'mammals' },
]

// `runEvals` returns `{ scores: Record<string, number>; summary: { totalItems } }`.
const scores: number[] = []
for (const item of items) {
  const threadId = `eval-${randomUUID()}`
  await memory!.createThread({ threadId, resourceId, title: item.input })

  const result = await runEvals({
    target: supportAgent,
    scorers: [recallScorer],
    targetOptions: { memory: { thread: threadId, resource: resourceId } },
    data: [item],
  })

  scores.push(result.scores[recallScorer.id])
}

const average = scores.reduce((a, b) => a + b, 0) / scores.length
```

> **Note:** Create the thread before running the eval. Observational memory in `thread` scope reads from a record that must already exist.

## Dataset experiments with an inline task

`dataset.startExperiment({ target: agent })` does **not** forward a `memory` option to the agent — only `requestContext`. To run a stored dataset against a memory-enabled agent, use an inline `task` function and stash `{ threadId, resourceId }` in each item's `metadata`. The scorer pipeline still runs as normal.

```typescript
import { randomUUID } from 'node:crypto'
import { mastra } from '../index'
import { supportAgent } from '../agents/support-agent'
import { recallScorer } from '../scorers/recall-scorer'

const memory = await supportAgent.getMemory()
const resourceId = 'ci-user'

const items = [
  { input: 'Cats are mammals', groundTruth: 'mammals', thread: `ds-${randomUUID()}` },
  { input: 'Dogs are mammals too', groundTruth: 'mammals', thread: `ds-${randomUUID()}` },
]

for (const it of items) {
  await memory!.createThread({ threadId: it.thread, resourceId, title: it.input })
}

const dataset = await mastra.datasets.create({
  name: 'support-recall',
  description: 'Per-item memory via inline task + item metadata',
})

await dataset.addItems({
  items: items.map(it => ({
    input: it.input,
    groundTruth: it.groundTruth,
    metadata: { threadId: it.thread, resourceId },
  })),
})

const summary = await dataset.startExperiment({
  scorers: [recallScorer],
  task: async ({ input, metadata }) => {
    const { threadId, resourceId: rid } = (metadata ?? {}) as {
      threadId: string
      resourceId: string
    }
    const result = await supportAgent.generate(input as string, {
      memory: { thread: threadId, resource: rid },
    })
    return result.text
  },
})
```

The inline `task` receives the item's `metadata`, so each row can drive its own thread without changing the agent or any scorer.

> **Note:** Visit [runEvals reference](https://mastra.ai/reference/evals/run-evals) and [Dataset reference](https://mastra.ai/reference/datasets/dataset) for full configuration.

## Related

- [Running scorers in CI](https://mastra.ai/docs/evals/running-in-ci)
- [Running experiments](https://mastra.ai/docs/evals/datasets/running-experiments)
- [Observational memory](https://mastra.ai/docs/memory/observational-memory)
- [runEvals API reference](https://mastra.ai/reference/evals/run-evals)