# dataset.startExperiment()

**Added in:** `@mastra/core@1.4.0`

Runs an experiment on the dataset and waits for completion. Executes all items against a target (agent, workflow, or scorer) with optional scoring.

## Usage example

```typescript
import { Mastra } from '@mastra/core'

const mastra = new Mastra({
  /* storage config */
})

const dataset = await mastra.datasets.get({ id: 'dataset-id' })

// Run against a registered agent with a flat scorer list
const summary = await dataset.startExperiment({
  targetType: 'agent',
  targetId: 'my-agent',
  scorers: ['accuracy', 'relevancy'],
  maxConcurrency: 10,
})

// Or pass the same categorised shape accepted by runEvals
const summary2 = await dataset.startExperiment({
  targetType: 'agent',
  targetId: 'my-agent',
  scorers: {
    agent: [accuracyScorer],
    trajectory: [toolOrderScorer],
  },
})

// For workflow targets, score individual steps with their own scorers
const summary3 = await dataset.startExperiment({
  targetType: 'workflow',
  targetId: 'my-workflow',
  scorers: {
    workflow: [overallScorer],
    steps: {
      'fetch-data': [fetchScorer],
      transform: [transformScorer],
    },
    trajectory: [executionPathScorer],
  },
})

console.log(`${summary.succeededCount}/${summary.totalItems} succeeded`)
console.log(`Status: ${summary.status}`)
console.log(`${summary2.succeededCount}/${summary2.totalItems} succeeded`)
console.log(`Status: ${summary2.status}`)
```

## Parameters

**targetType** (`'agent' | 'workflow' | 'scorer'`): Type of registered target to run items against. Use with \`targetId\`.

**targetId** (`string`): ID of the registered target. Use with \`targetType\`.

**scorers** (`(MastraScorer | string)[] | AgentScorerConfig | WorkflowScorerConfig`): Scorers to evaluate each result. Accepts a flat array of \`MastraScorer\` instances or registered scorer IDs, or the same categorised config shape used by \`runEvals\` (\`AgentScorerConfig\` / \`WorkflowScorerConfig\`). Trajectory scorers (\`type: "trajectory"\`) automatically receive a pre-extracted \`Trajectory\` as their output regardless of which form is used. For workflow targets, per-step scorers can be passed via \`scorers: { steps: { stepId: \[...] } }\` and run against each step's output; their results carry the originating \`stepId\` and keep \`targetScope: "span"\` (matching \`runEvals\`).

**name** (`string`): Display name for the experiment.

**description** (`string`): Description of the experiment.

**metadata** (`Record<string, unknown>`): Arbitrary metadata for the experiment.

**version** (`number`): Pin to a specific dataset version. Defaults to the latest version.

**maxConcurrency** (`number`): Maximum concurrent item executions. Defaults to \`5\`.

**signal** (`AbortSignal`): AbortSignal for cancelling the experiment.

**itemTimeout** (`number`): Per-item execution timeout in milliseconds.

**maxRetries** (`number`): Maximum retries per item on failure. Defaults to \`0\` (no retries). Abort errors are never retried.

## Returns

**result** (`Promise<ExperimentSummary>`): Summary of the completed experiment.

**result.experimentId** (`string`): Unique ID of the experiment.

**result.status** (`'pending' | 'running' | 'completed' | 'failed'`): Final status of the experiment.

**result.totalItems** (`number`): Total number of items in the dataset.

**result.succeededCount** (`number`): Number of items that succeeded.

**result.failedCount** (`number`): Number of items that failed.

**result.skippedCount** (`number`): Number of items skipped (e.g., due to abort).

**result.completedWithErrors** (`boolean`): \`true\` if the run completed but some items failed.

**result.startedAt** (`Date`): When the experiment started.

**result.completedAt** (`Date`): When the experiment completed.

**result.results** (`ItemWithScores[]`): All item results with their scores.

**result.results.itemId** (`string`): ID of the dataset item.

**result.results.itemVersion** (`number`): Dataset version of the item when executed.

**result.results.input** (`unknown`): Input data passed to the target.

**result.results.output** (`unknown | null`): Output from the target, or \`null\` if failed.

**result.results.groundTruth** (`unknown | null`): Expected output from the dataset item.

**result.results.error** (`{ message: string; stack?: string; code?: string } | null`): Structured error if execution failed.

**result.results.startedAt** (`Date`): When item execution started.

**result.results.completedAt** (`Date`): When item execution completed.

**result.results.retryCount** (`number`): Number of retry attempts.

**result.results.scores** (`ScorerResult[]`): Results from all scorers for this item.

**result.results.scores.scorerId** (`string`): ID of the scorer.

**result.results.scores.scorerName** (`string`): Display name of the scorer.

**result.results.scores.score** (`number | null`): Computed score, or \`null\` if the scorer failed.

**result.results.scores.reason** (`string | null`): Reason/explanation for the score.

**result.results.scores.error** (`string | null`): Error message if the scorer failed.

## Related

- [dataset.startExperimentAsync()](https://mastra.ai/reference/datasets/startExperimentAsync)
- [dataset.listExperiments()](https://mastra.ai/reference/datasets/listExperiments)
- [DatasetsManager.compareExperiments()](https://mastra.ai/reference/datasets/compareExperiments)