Convert skills to Tessl tiles and create eval scenarios to measure skill effectiveness.
Overall
score
92%
Does it follow best practices?
Validation for skill structure
Generate evaluation scenarios that measure whether agents follow instructions from skills.
Skills must be packaged in a Tessl tile (directory with tile.json + skill folders). If not, use the converting-skill-to-tessl-tile skill first. Ask the user where to put the tile if its not specified.
Its possible for a tile to contain multiple skills. In this case, split the tile into multiple tiles, one for each skill first.
Read references/scenario-generation.md before starting. It will guide you through the workflow of researching the tile and creating all the expected files in the correct formats.
<tile>/evals/
├── instructions.json # Json containing list of all instructions in the skill
├── summary.json # Feasible scenarios
├── summary_infeasible.json # Infeasible capabilities (no folders)
└── scenario-N/
├── task.md # Goal description (may include inlined inputs)
├── criteria.json # Scoring rubric (must sum to 100)
└── capability.txt # Single line: capability being testedThe eval is one-shot and file-based:
Mark capabilities as infeasible if they won't work in this sandbox.
Once ready, you can trigger the eval run on the tessl platform.
tessl eval run <path/to/tile>
tessl eval view-status <status_id> --json
tessl eval list