Managing Testsets

This guide covers how to create, list, and retrieve testsets using the Agenta SDK or REST API.

Open in Google Colaboratory

Creating a Testset

Use ag.testsets.acreate() to create a new testset with data:

Python SDK
REST API

import agenta as ag

ag.init()

# Create a testset with simple data
testset = await ag.testsets.acreate(
    data=[
        {"country": "Germany", "capital": "Berlin"},
        {"country": "France", "capital": "Paris"},
        {"country": "Spain", "capital": "Madrid"}
    ],
    name="Country Capitals",
)

testset_id = testset.testset_id or testset.id
print(f"Testset ID: {testset_id}")
print(f"Revision ID: {testset.id}")
print(f"Name: {testset.name}")
print(f"Slug: {testset.slug}")

curl -X POST "https://cloud.agenta.ai/api/preview/simple/testsets/" \
  -H "Content-Type: application/json" \
  -H "Authorization: ApiKey YOUR_API_KEY" \
  -d '{
    "testset": {
      "name": "Country Capitals",
      "slug": "country-capitals",
      "data": {
        "testcases": [
          {"data": {"country": "Germany", "capital": "Berlin"}},
          {"data": {"country": "France", "capital": "Paris"}},
          {"data": {"country": "Spain", "capital": "Madrid"}}
        ]
      }
    }
  }'

Parameters:

data: A list of dictionaries containing your test data. Each dictionary represents one testcase.
name: The name of your testset.

Returns: A TestsetRevision object containing:

id: The UUID of the created testset revision
testset_id: The parent testset UUID (stable across revisions)
name: The testset name
slug: The revision slug
version: The revision version string (e.g. "1")
data: The test data (with testcases structure)

Sample Output:

{
    "id": "01963413-3d39-7650-80ce-3ad5d688da6c",
    "name": "Country Capitals",
    "slug": "3ad5d688da6c",
    "data": {
        "testcases": [
            {"data": {"country": "Germany", "capital": "Berlin"}},
            {"data": {"country": "France", "capital": "Paris"}},
            {"data": {"country": "Spain", "capital": "Madrid"}}
        ]
    }
}

tip

The data parameter accepts a simple list of dictionaries. The SDK automatically converts this to the structured TestsetRevisionData format internally.

Upserting a Testset

Use ag.testsets.aupsert() to create a testset or replace an existing one with the same name.

The function first searches for a testset matching the provided name (or testset_id if given). If it finds one, it replaces all testcases with your new data and creates a new revision. If no match exists, it creates a new testset.

Each update creates a new revision while keeping the same testset_id. This allows you to track changes over time and reference specific versions of your test data.

warning

Upsert performs a full replacement. All existing testcases are removed and replaced with the data you provide. The operation does not merge or append testcases.

Python SDK
REST API

import agenta as ag

ag.init()

# First call creates a testset with 2 testcases
testset = await ag.testsets.aupsert(
    name="Country Capitals",
    data=[
        {"country": "Germany", "capital": "Berlin"},
        {"country": "France", "capital": "Paris"},
    ],
)

# Second call replaces all testcases with these 3
# France is removed because it's not in the new data
testset = await ag.testsets.aupsert(
    name="Country Capitals",
    data=[
        {"country": "Germany", "capital": "Berlin"},
        {"country": "Spain", "capital": "Madrid"},
        {"country": "Italy", "capital": "Rome"},
    ],
)
# Result: testset now contains Germany, Spain, Italy (not France)

Use the PUT endpoint with the testset ID to replace all testcases:

curl -X PUT "https://cloud.agenta.ai/api/preview/simple/testsets/{testset_id}" \
  -H "Content-Type: application/json" \
  -H "Authorization: ApiKey YOUR_API_KEY" \
  -d '{
    "testset": {
      "id": "YOUR_TESTSET_ID",
      "name": "Country Capitals",
      "data": {
        "testcases": [
          {"data": {"country": "Germany", "capital": "Berlin"}},
          {"data": {"country": "Spain", "capital": "Madrid"}},
          {"data": {"country": "Italy", "capital": "Rome"}}
        ]
      }
    }
  }'

Parameters:

name: The testset name. Used to find an existing testset when testset_id is not provided.
data (required): The testcases that will replace all existing data.
testset_id (optional): Updates this specific testset directly, skipping the name lookup.

Returns: A TestsetRevision object containing the created or updated testset.

When to use each method

Use aupsert() when you want to keep a testset synchronized with your data source. This works well in CI/CD pipelines where you regenerate test data on each run. Use acreate() when you need a new testset every time.

Listing Testsets

To list all testsets in your project, use ag.testsets.alist():

Python SDK
REST API

import agenta as ag

ag.init()

# List all testsets
testsets = await ag.testsets.alist()

print(f"Found {len(testsets)} testsets:")
for testset in testsets:
    testset_id = testset.testset_id or testset.id
    print(f"  - {testset.name} (testset_id: {testset_id})")

curl -X POST "https://cloud.agenta.ai/api/preview/simple/testsets/query" \
  -H "Content-Type: application/json" \
  -H "Authorization: ApiKey YOUR_API_KEY" \
  -d '{}'

Parameters: None required.

Returns: A list of TestsetRevision objects. For each item:

id: The latest revision UUID
testset_id: The parent testset UUID
name: The testset name
slug: The revision slug
Additional metadata fields

Sample Output:

[
    {
        "id": "01963413-3d39-7650-80ce-3ad5d688da6c",
        "name": "Country Capitals",
        "slug": "country-capitals"
    },
    {
        "id": "01963520-4e4a-8761-91df-4be6e799eb7d",
        "name": "Math Problems",
        "slug": "math-problems"
    }
]

Retrieving a Testset

Use ag.testsets.aretrieve() to fetch a testset. You can retrieve either the latest revision or a specific historical revision.

Retrieving the Latest Revision

Pass the testset_id to get the most recent version of a testset:

Python SDK
REST API

import agenta as ag

ag.init()

# Retrieve the latest revision
testset = await ag.testsets.aretrieve(testset_id=testset_id)

if testset:
    print(f"Testcases: {len(testset.data.testcases)}")

curl -X GET "https://cloud.agenta.ai/api/preview/simple/testsets/{testset_id}" \
  -H "Authorization: ApiKey YOUR_API_KEY"

Retrieving a Specific Revision

Pass the testset_revision_id to get an exact historical version. This is useful when you need to reproduce an evaluation or compare different versions of your test data.

Python SDK
REST API

import agenta as ag

ag.init()

# Retrieve a specific revision
testset = await ag.testsets.aretrieve(testset_revision_id=revision_id)

if testset:
    print(f"Version: {testset.version}")
    print(f"Testcases: {len(testset.data.testcases)}")

curl -X POST "https://cloud.agenta.ai/api/preview/testsets/revisions/retrieve" \
  -H "Content-Type: application/json" \
  -H "Authorization: ApiKey YOUR_API_KEY" \
  -d '{
    "testset_revision_ref": {
      "id": "YOUR_REVISION_ID"
    }
  }'

Parameters:

testset_id: Retrieves the latest revision of this testset.
testset_revision_id: Retrieves this exact revision.

Returns: A TestsetRevision object containing:

id: The revision UUID
testset_id: The parent testset UUID (stable across revisions)
version: The revision version number
data: The testcases for this revision

info

Each update creates a new revision. The testset_id stays the same, but the revision_id changes. Store revision IDs when you need to reference exact versions later (for example, when logging which test data was used in an evaluation).

Retrieving a Testset by Name

You can find a testset by name by filtering the results from the query endpoint:

Python SDK
REST API

import agenta as ag

ag.init()

async def get_testset_by_name(name: str):
    """Helper function to find a testset by name."""
    testsets = await ag.testsets.alist()

    if not testsets:
        return None

    for testset in testsets:
        if testset.name == name:
            return testset

    return None

# Usage
testset = await get_testset_by_name("Country Capitals")

if testset:
    testset_id = testset.testset_id or testset.id
    print(f"Found testset: {testset.name} (testset_id: {testset_id}, revision_id: {testset.id})")
else:
    print("Testset not found")

curl -X POST "https://cloud.agenta.ai/api/preview/simple/testsets/query" \
  -H "Content-Type: application/json" \
  -H "Authorization: ApiKey YOUR_API_KEY" \
  -d '{
    "testset": {
      "name": "Country Capitals"
    }
  }'

Helper Pattern

This pattern shows how you can implement your own helper functions to filter and find testsets based on custom criteria. You can extend this to search by tags or other metadata fields.

Working with Test Data

Once you have a testset, you can access the testcases within it:

Python SDK
REST API

import agenta as ag

ag.init()

# Retrieve a testset
testset = await ag.testsets.aretrieve(testset_id=testset_id)

# Access testcases
if testset and testset.data and testset.data.testcases:
    for testcase in testset.data.testcases:
        print(f"Testcase: {testcase.data}")
        # Use testcase.data in your evaluation

When you retrieve a testset via the API, the response includes the testcases in the data.testcases array:

{
  "testset": {
    "id": "01963413-3d39-7650-80ce-3ad5d688da6c",
    "name": "Country Capitals",
    "data": {
      "testcases": [
        {
          "id": "bf2de79d-bcd0-569e-92aa-735bbdd0b447",
          "data": {"country": "Germany", "capital": "Berlin"}
        },
        {
          "id": "f54345c8-939c-5d03-9950-b62b876b10bd",
          "data": {"country": "France", "capital": "Paris"}
        }
      ]
    }
  }
}

Each testcase contains a data field with the dictionary you provided during creation. You can use these testcases directly in your evaluations.

Creating a Testset​

Upserting a Testset​

Listing Testsets​

Retrieving a Testset​

Retrieving the Latest Revision​

Retrieving a Specific Revision​

Retrieving a Testset by Name​

Working with Test Data​

Creating a Testset

Upserting a Testset

Listing Testsets

Retrieving a Testset

Retrieving the Latest Revision

Retrieving a Specific Revision

Retrieving a Testset by Name

Working with Test Data