Datasets

Collections of test data for evaluating your AI pipelines. Create once, run experiments against them repeatedly, and trust that past results stay reproducible.

How It Works

A dataset is a named collection of test items within an organization. Each item has an input (the data your AI pipeline will process) and an optional expected output (ground truth for scoring).

The workflow is: create dataset → add items → link to experiment → run → compare results.

Dataset Modality

Datasets are organized by input modality. Today only text is supported — items hold text-based input (plain strings or structured JSON with text fields) and JSON expected output. Image, audio, and multimodal datasets are reserved for future releases.

Because the modality is just text, the shape of input and expectedOutput is flexible JSON: use plain strings for simple prompts, or structured objects when you need fields like document text, messages, context, or tags.

Creating a Dataset

Create a dataset within your organization. Specify a name and modality — the first version is created automatically so you can start adding items right away.

Request

// POST /v1/datasets
{
  "name": "Invoice Samples",
  "description": "Representative invoices for extraction testing",
  "type": "text"
}

Response

{
  "id": "ds_456",
  "name": "Invoice Samples",
  "type": "text",
  "latestVersionId": "dsv_789",
  "createdAt": "2026-04-08T10:00:00Z"
}

Adding Items

Add test items to the latest version. Each item has an input (required) and optional expectedOutput, both stored as JSON.

Request

// POST /v1/datasets/{datasetId}/versions/{versionId}/items
{
  "input": {
    "inputText": "Invoice #INV-2026-001\nVendor: Acme Corp\nDate: 2026-03-15\nTotal: $1,250.00"
  },
  "expectedOutput": {
    "invoiceNumber": "INV-2026-001",
    "vendor": "Acme Corp",
    "date": "2026-03-15",
    "total": 1250.00
  },
  "tags": ["simple", "english"]
}

Response

{
  "id": "dsi_101",
  "datasetVersionId": "dsv_789",
  "input": { "inputText": "Invoice #INV-2026-001..." },
  "expectedOutput": { "invoiceNumber": "INV-2026-001", ... },
  "tags": ["simple", "english"],
  "createdAt": "2026-04-08T10:05:00Z"
}

Editing Items

Dataset items have a simple mutability rule designed to keep experiment history trustworthy:

  • input is immutable. Once an item is created, its input cannot be changed. This guarantees that any experiment run against the item was evaluated against exactly the data you see today. To change the test data, delete the item and add a new one.
  • expectedOutput is editable. Ground truth is often added or refined over time — you may label an item's expected output days or weeks after creating it, or correct it once you learn what the right answer should be. Update it whenever you need to; subsequent experiment runs will score against the new ground truth.
  • Deletion is soft. Deleted items disappear from listings and are excluded from new experiment runs, but they are not removed from the database. Past experiment results still reference the original item IDs and inputs, so historical reports remain intact.

Listing Items

Retrieve items for a specific dataset version with pagination. Soft-deleted items are excluded:

GET /v1/datasets/{datasetId}/versions/{versionId}/items?page=0&size=25

API Reference

MethodEndpointDescription
POST/v1/datasetsCreate a dataset
GET/v1/datasetsList datasets (paginated)
GET/v1/datasets/{id}Get a dataset
PUT/v1/datasets/{id}Update a dataset
DELETE/v1/datasets/{id}Delete a dataset
POST/v1/datasets/{id}/versionsCreate a new version
GET/v1/datasets/{id}/versionsList versions
GET/v1/datasets/{id}/versions/latestGet latest version
POST.../versions/{versionId}/itemsAdd an item
POST.../versions/{versionId}/items/bulkAdd items in bulk
GET.../versions/{versionId}/itemsList items (paginated)
PATCH.../versions/{versionId}/items/{itemId}/expected-outputUpdate an item's expectedOutput (input is immutable)
DELETE.../versions/{versionId}/items/{itemId}Soft-delete an item

Was this page helpful?