Eval Files

Evaluation files define the test cases, targets, and evaluators for an evaluation run. AgentV supports two formats: YAML and JSONL.

YAML Format

The primary format. A single file contains metadata, execution config, and tests:

description: Math problem solving evaluation
execution:
  target: default

assertions:
  - name: correctness
    type: llm-grader
    prompt: ./graders/correctness.md

tests:
  - id: addition
    criteria: Correctly calculates 15 + 27 = 42
    input: What is 15 + 27?
    expected_output: "42"

Top-level Fields

Field	Description
`description`	Human-readable description of the evaluation
`dataset`	Optional dataset identifier
`execution`	Default execution config (`target`, `fail_on_error`, etc.)
`workspace`	Suite-level workspace config — inline object or string path to an external workspace file
`tests`	Array of individual tests, or a string path to an external file
`assert`	Suite-level evaluators appended to each test unless `execution.skip_defaults: true` is set on the test
`input`	Suite-level input messages prepended to each test’s input unless `execution.skip_defaults: true` is set on the test

Metadata Fields

You can add structured metadata to your eval file using these optional top-level fields. Metadata is parsed when the name field is present:

Field	Description
`name`	Machine-readable identifier (lowercase, hyphens, max 64 chars). Triggers metadata parsing.
`description`	Human-readable description (max 1024 chars)
`version`	Eval version string (e.g., `"1.0"`)
`author`	Author or team identifier
`tags`	Array of string tags for categorization
`license`	License identifier (e.g., `"MIT"`, `"Apache-2.0"`)
`requires`	Dependency constraints (e.g., `agentv: ">=0.30.0"`)

name: export-screening
description: Evaluates export control screening accuracy
version: "1.0"
author: acme-compliance
tags: [compliance, agents]
license: Apache-2.0
requires:
  agentv: ">=0.30.0"

tests:
  - id: denied-party
    criteria: Identifies denied parties correctly
    input: Screen "Acme Corp" against denied parties list

Suite-level Assert

The assert field is the canonical way to define suite-level evaluators. Suite-level assertions are appended to every test’s evaluators unless a test sets execution.skip_defaults: true.

description: API response validation
assertions:
  - type: is-json
    required: true
  - type: contains
    value: "status"

tests:
  - id: health-check
    criteria: Returns health status
    input: Check API health

assert supports all evaluator types, including deterministic assertion types (contains, regex, is_json, equals) and rubrics. See Tests for per-test assert usage.

Suite-level Input

The input field defines messages that are prepended to every test’s input. This avoids repeating the same prompt or system context in each test case — following the same pattern as suite-level assert.

description: Travel assistant evaluation
input:
  - role: user
    content:
      - type: file
        value: ./system-prompt.md

tests: ./cases.yaml

Each test in cases.yaml only needs its own query:

- id: japan-spring
  criteria: Recommends spring for cherry blossoms
  input: When is the best time to visit Japan?

The effective input at runtime becomes [...suite input, ...test input].

Suite-level input accepts the same formats as test-level input:

String — wrapped as [{ role: "user", content: "..." }]
Message array — used as-is, including file references

To opt out for a specific test, set execution.skip_defaults: true (same flag that skips suite-level assert).

Suite-level Input Files

The input_files field provides a shorthand for attaching shared file references to every test. When a test has a string input, the suite-level files are prepended as type: file content blocks in a single user message — the same shape produced by per-test input_files.

description: Schema review evaluation
input_files:
  - ./shared-context.md
  - ./schema.json

tests:
  - id: summarize
    criteria: Summarizes the important constraints
    input: Summarize the important constraints.
  - id: validate
    criteria: Identifies validation gaps
    input: What validation is missing?

Each test’s effective input becomes a single user message with [file blocks..., text block].

Per-test input_files overrides the suite-level value (it does not merge). To opt out, set execution.skip_defaults: true on the test.

Tests as String Path

Instead of inlining tests in the same file, you can point tests to an external YAML or JSONL file. This is the inverse of the sidecar pattern — the metadata file references the test data:

name: my-eval
description: My evaluation suite
execution:
  target: default
tests: ./cases.yaml

The path is resolved relative to the eval file’s directory. The external file should contain a YAML array of test objects or a JSONL file with one test per line.

Environment Variable Interpolation

All string fields in eval files support ${{ VAR }} syntax for environment variable interpolation. This enables portable eval configs that work across machines and CI environments without hardcoded paths.

workspace:
  repos:
    - path: ./RepoA
      source:
        type: local
        path: "${{ REPO_A_PATH }}"

tests:
  - id: test-1
    input: "Evaluate the code in ${{ PROJECT_NAME }}"
    criteria: "${{ EVAL_CRITERIA }}"

Behavior

Syntax: ${{ VARIABLE_NAME }} with optional whitespace around the name
Missing variables resolve to an empty string
Partial interpolation is supported: ${{ HOME }}/repos/${{ PROJECT }} becomes /home/user/repos/myproject
Non-string values (numbers, booleans) are not affected
Interpolation is applied recursively to all nested objects and arrays
Works in YAML eval files, external YAML/JSONL case files, and external workspace config files
.env files in the directory hierarchy are loaded automatically before interpolation

Example: Portable Workspace Config

# workspace.yaml — works on any machine
repos:
  - path: ./my-repo
    source:
      type: local
      path: "${{ MY_REPO_LOCAL_PATH }}"

MY_REPO_LOCAL_PATH=/home/dev/repos/my-repo

JSONL Format

For large-scale evaluations, AgentV supports JSONL (JSON Lines) format. Each line is a single test:

{"id": "test-1", "criteria": "Calculates correctly", "input": "What is 2+2?"}
{"id": "test-2", "criteria": "Provides explanation", "input": "Explain variables"}

Sidecar Metadata

An optional YAML sidecar file provides metadata and execution config. Place it alongside the JSONL file with the same base name:

dataset.jsonl + dataset.eval.yaml:

description: Math evaluation dataset
dataset: math-tests
execution:
  target: azure-base
assertions:
  - name: correctness
    type: llm-grader
    prompt: ./graders/correctness.md

Benefits of JSONL

Streaming-friendly — process line by line
Git-friendly — diffs show individual case changes
Programmatic generation — easy to create from scripts
Industry standard — compatible with DeepEval, LangWatch, Hugging Face datasets

Converting Between Formats

Use the convert command to switch between YAML and JSONL:

agentv convert evals/dataset.eval.yaml --format jsonl
agentv convert evals/dataset.jsonl --format yaml