Datasets

Collections of test data for evaluating your AI pipelines. Create once, run experiments against them repeatedly, and trust that past results stay reproducible.

How It Works

A dataset is a named collection of test items within an organization. Each item has an input (the data your AI pipeline will process) and an optional expected output (ground truth for scoring).

The workflow is: create dataset → add items → link to experiment → run → compare results.

Dataset Modality

Datasets are organized by input modality. Today only text is supported — items hold text-based input (plain strings or structured JSON with text fields) and JSON expected output. Image, audio, and multimodal datasets are reserved for future releases.

Because the modality is just text, the shape of input and expectedOutput is flexible JSON: use plain strings for simple prompts, or structured objects when you need fields like document text, messages, context, or tags.

Creating a Dataset

Create a dataset within your organization. Specify a name and modality — the first version is created automatically so you can start adding items right away.

Request

// POST /v1/datasets
{
  "name": "Invoice Samples",
  "description": "Representative invoices for extraction testing",
  "type": "text"
}

Response

{
  "id": "ds_456",
  "name": "Invoice Samples",
  "type": "text",
  "latestVersionId": "dsv_789",
  "createdAt": "2026-04-08T10:00:00Z"
}

Adding Items

Add test items to the latest version. Each item has an input (required) and optional expectedOutput, both stored as JSON.

Request

// POST /v1/datasets/{datasetId}/versions/{versionId}/items
{
  "input": {
    "inputText": "Invoice #INV-2026-001\nVendor: Acme Corp\nDate: 2026-03-15\nTotal: $1,250.00"
  },
  "expectedOutput": {
    "invoiceNumber": "INV-2026-001",
    "vendor": "Acme Corp",
    "date": "2026-03-15",
    "total": 1250.00
  },
  "tags": ["simple", "english"]
}

Response

{
  "id": "dsi_101",
  "datasetVersionId": "dsv_789",
  "input": { "inputText": "Invoice #INV-2026-001..." },
  "expectedOutput": { "invoiceNumber": "INV-2026-001", ... },
  "tags": ["simple", "english"],
  "createdAt": "2026-04-08T10:05:00Z"
}

Editing Items

Dataset items have a simple mutability rule designed to keep experiment history trustworthy:

input is immutable. Once an item is created, its input cannot be changed. This guarantees that any experiment run against the item was evaluated against exactly the data you see today. To change the test data, delete the item and add a new one.
expectedOutput is editable. Ground truth is often added or refined over time — you may label an item's expected output days or weeks after creating it, or correct it once you learn what the right answer should be. Update it whenever you need to; subsequent experiment runs will score against the new ground truth.
Deletion is soft. Deleted items disappear from listings and are excluded from new experiment runs, but they are not removed from the database. Past experiment results still reference the original item IDs and inputs, so historical reports remain intact.

Why inputs are locked

If you need to evolve a dataset over time (e.g. snapshot a known-good set of items before expanding it), the dataset version API is available for manual snapshots. New versions are not created automatically when you edit ground truth or delete items.

Listing Items

Retrieve items for a specific dataset version with pagination. Soft-deleted items are excluded:

GET /v1/datasets/{datasetId}/versions/{versionId}/items?page=0&size=25

API Reference

Method	Endpoint	Description
`POST`	`/v1/datasets`	Create a dataset
`GET`	`/v1/datasets`	List datasets (paginated)
`GET`	`/v1/datasets/{id}`	Get a dataset
`PUT`	`/v1/datasets/{id}`	Update a dataset
`DELETE`	`/v1/datasets/{id}`	Delete a dataset
`POST`	`/v1/datasets/{id}/versions`	Create a new version
`GET`	`/v1/datasets/{id}/versions`	List versions
`GET`	`/v1/datasets/{id}/versions/latest`	Get latest version
`POST`	`.../versions/{versionId}/items`	Add an item
`POST`	`.../versions/{versionId}/items/bulk`	Add items in bulk
`GET`	`.../versions/{versionId}/items`	List items (paginated)
`PATCH`	`.../versions/{versionId}/items/{itemId}/expected-output`	Update an item's `expectedOutput` (input is immutable)
`DELETE`	`.../versions/{versionId}/items/{itemId}`	Soft-delete an item