Skip to main content

Managing Testsets

This guide covers how to create, list, and retrieve testsets using the Agenta SDK or REST API.

Creating a Testset

Use ag.testsets.acreate() to create a new testset with data:

import agenta as ag

ag.init()

# Create a testset with simple data
testset = await ag.testsets.acreate(
data=[
{"country": "Germany", "capital": "Berlin"},
{"country": "France", "capital": "Paris"},
{"country": "Spain", "capital": "Madrid"}
],
name="Country Capitals",
)

testset_id = testset.testset_id or testset.id
print(f"Testset ID: {testset_id}")
print(f"Revision ID: {testset.id}")
print(f"Name: {testset.name}")
print(f"Slug: {testset.slug}")

Parameters:

  • data: A list of dictionaries containing your test data. Each dictionary represents one testcase.
  • name: The name of your testset.

Returns: A TestsetRevision object containing:

  • id: The UUID of the created testset revision
  • testset_id: The parent testset UUID (stable across revisions)
  • name: The testset name
  • slug: The revision slug
  • version: The revision version string (e.g. "1")
  • data: The test data (with testcases structure)

Sample Output:

{
"id": "01963413-3d39-7650-80ce-3ad5d688da6c",
"name": "Country Capitals",
"slug": "3ad5d688da6c",
"data": {
"testcases": [
{"data": {"country": "Germany", "capital": "Berlin"}},
{"data": {"country": "France", "capital": "Paris"}},
{"data": {"country": "Spain", "capital": "Madrid"}}
]
}
}
tip

The data parameter accepts a simple list of dictionaries. The SDK automatically converts this to the structured TestsetRevisionData format internally.

Upserting a Testset

Use ag.testsets.aupsert() to create a testset or replace an existing one with the same name.

The function first searches for a testset matching the provided name (or testset_id if given). If it finds one, it replaces all testcases with your new data and creates a new revision. If no match exists, it creates a new testset.

Each update creates a new revision while keeping the same testset_id. This allows you to track changes over time and reference specific versions of your test data.

warning

Upsert performs a full replacement. All existing testcases are removed and replaced with the data you provide. The operation does not merge or append testcases.

import agenta as ag

ag.init()

# First call creates a testset with 2 testcases
testset = await ag.testsets.aupsert(
name="Country Capitals",
data=[
{"country": "Germany", "capital": "Berlin"},
{"country": "France", "capital": "Paris"},
],
)

# Second call replaces all testcases with these 3
# France is removed because it's not in the new data
testset = await ag.testsets.aupsert(
name="Country Capitals",
data=[
{"country": "Germany", "capital": "Berlin"},
{"country": "Spain", "capital": "Madrid"},
{"country": "Italy", "capital": "Rome"},
],
)
# Result: testset now contains Germany, Spain, Italy (not France)

Parameters:

  • name: The testset name. Used to find an existing testset when testset_id is not provided.
  • data (required): The testcases that will replace all existing data.
  • testset_id (optional): Updates this specific testset directly, skipping the name lookup.

Returns: A TestsetRevision object containing the created or updated testset.

When to use each method

Use aupsert() when you want to keep a testset synchronized with your data source. This works well in CI/CD pipelines where you regenerate test data on each run. Use acreate() when you need a new testset every time.

Listing Testsets

To list all testsets in your project, use ag.testsets.alist():

import agenta as ag

ag.init()

# List all testsets
testsets = await ag.testsets.alist()

print(f"Found {len(testsets)} testsets:")
for testset in testsets:
testset_id = testset.testset_id or testset.id
print(f" - {testset.name} (testset_id: {testset_id})")

Parameters: None required.

Returns: A list of TestsetRevision objects. For each item:

  • id: The latest revision UUID
  • testset_id: The parent testset UUID
  • name: The testset name
  • slug: The revision slug
  • Additional metadata fields

Sample Output:

[
{
"id": "01963413-3d39-7650-80ce-3ad5d688da6c",
"name": "Country Capitals",
"slug": "country-capitals"
},
{
"id": "01963520-4e4a-8761-91df-4be6e799eb7d",
"name": "Math Problems",
"slug": "math-problems"
}
]

Retrieving a Testset

Use ag.testsets.aretrieve() to fetch a testset. You can retrieve either the latest revision or a specific historical revision.

Retrieving the Latest Revision

Pass the testset_id to get the most recent version of a testset:

import agenta as ag

ag.init()

# Retrieve the latest revision
testset = await ag.testsets.aretrieve(testset_id=testset_id)

if testset:
print(f"Testcases: {len(testset.data.testcases)}")

Retrieving a Specific Revision

Pass the testset_revision_id to get an exact historical version. This is useful when you need to reproduce an evaluation or compare different versions of your test data.

import agenta as ag

ag.init()

# Retrieve a specific revision
testset = await ag.testsets.aretrieve(testset_revision_id=revision_id)

if testset:
print(f"Version: {testset.version}")
print(f"Testcases: {len(testset.data.testcases)}")

Parameters:

  • testset_id: Retrieves the latest revision of this testset.
  • testset_revision_id: Retrieves this exact revision.

Returns: A TestsetRevision object containing:

  • id: The revision UUID
  • testset_id: The parent testset UUID (stable across revisions)
  • version: The revision version number
  • data: The testcases for this revision
info

Each update creates a new revision. The testset_id stays the same, but the revision_id changes. Store revision IDs when you need to reference exact versions later (for example, when logging which test data was used in an evaluation).

Retrieving a Testset by Name

You can find a testset by name by filtering the results from the query endpoint:

import agenta as ag

ag.init()

async def get_testset_by_name(name: str):
"""Helper function to find a testset by name."""
testsets = await ag.testsets.alist()

if not testsets:
return None

for testset in testsets:
if testset.name == name:
return testset

return None

# Usage
testset = await get_testset_by_name("Country Capitals")

if testset:
testset_id = testset.testset_id or testset.id
print(f"Found testset: {testset.name} (testset_id: {testset_id}, revision_id: {testset.id})")
else:
print("Testset not found")
Helper Pattern

This pattern shows how you can implement your own helper functions to filter and find testsets based on custom criteria. You can extend this to search by tags or other metadata fields.

Working with Test Data

Once you have a testset, you can access the testcases within it:

import agenta as ag

ag.init()

# Retrieve a testset
testset = await ag.testsets.aretrieve(testset_id=testset_id)

# Access testcases
if testset and testset.data and testset.data.testcases:
for testcase in testset.data.testcases:
print(f"Testcase: {testcase.data}")
# Use testcase.data in your evaluation

Each testcase contains a data field with the dictionary you provided during creation. You can use these testcases directly in your evaluations.