A curated collection of Agent Skills for working with dbt, to help AI agents understand and execute dbt workflows more effectively.
91
Does it follow best practices?
Validation for skill structure
This document covers conventions and patterns for working on the skill-eval CLI tool.
src/skill_eval/
├── cli.py # Typer CLI commands (run, grade, report, review)
├── models.py # Data models: Scenario, SkillSet, load_scenario()
├── runner.py # Execution: Runner class, RunResult, RunTask
├── grader.py # Auto-grading with Claude CLI
├── reporter.py # Report generation from grades
├── selector.py # Interactive TUI selectors for runs/scenarios
└── logging.py # Loguru configuration with context supportData flow: cli.py loads scenarios via models.py, executes via runner.py, grades via grader.py, and reports via reporter.py.
We use Typer for the CLI.
Use the appropriate output method based on context:
User-facing CLI output (command results, prompts): Use typer.echo()
typer.echo(f"Run directory: {run_dir}")
typer.echo("Error: file not found", err=True)Progress logging (during execution): Use logger from logging.py
from skill_eval.logging import logger
logger.info("Starting scenario")
logger.debug("Tool called: Read")
logger.warning("Timeout reached")
logger.success("Completed")
# With context (for parallel runs)
ctx_logger = logger.bind(scenario="my-scenario", skill_set="with-skill")
ctx_logger.info("Starting") # Shows: [T0/my-scenario/with-skill] StartingNever use print() for output.
New commands go in cli.py:
@app.command()
def mycommand(
arg: str = typer.Argument(..., help="Required argument"),
flag: bool = typer.Option(False, "--flag", "-f", help="Optional flag"),
) -> None:
"""Command description shown in --help."""
typer.echo(f"Running with {arg}")When modifying CLI commands that work with scenarios or skill sets, check if the underlying dataclasses need updates:
In models.py:
Grade - grading result (success, score, tool_usage, notes, etc.)SkillSet - skills, mcp_servers, allowed_toolsScenario - name, path, prompt, skill_sets, descriptionIn runner.py:
RunResult - scenario results with output, success, tools_used, skills_invoked, etc.RunTask - task definition for parallel execution (scenario, skill_set, run_dir)Use dataclasses.asdict() to convert dataclasses to dicts for YAML serialization.
When modifying grading:
Grade dataclass in models.py if adding new fieldsGRADING_PROMPT_TEMPLATE in grader.py if changing what Claude evaluatesparse_grade_response() to extract new fields into Gradereporter.py to display new fieldscli.py with @app.command()typer.echo() for all outputtests/README.md usage sectionuv run ty check src/SkillSet dataclass in models.pyload_scenario() to parse the new fieldRunner.run_scenario() if it affects executionRunResult dataclass in runner.py_parse_json_output() to extract new data from Claude's outputrun_scenario()GRADING_PROMPT_TEMPLATE in grader.pyparse_grade_response() to handle new fieldsinit_grades_file() for manual grading templatereporter.py to show new fields in reportsWe use concurrent.futures.ThreadPoolExecutor for parallel runs:
from concurrent.futures import ThreadPoolExecutor, as_completed
with ThreadPoolExecutor(max_workers=max_workers) as executor:
future_to_task = {executor.submit(fn, task): task for task in tasks}
for future in as_completed(future_to_task):
result = future.result()See Runner.run_parallel() in runner.py for the full implementation.
Run the ty type checker before committing:
uv run ty check src/Fix any type errors. Common issues:
| None typeslist[str] not List[str] (Python 3.11+)Tests live in tests/ and use pytest.
uv run pytest # all tests
uv run pytest tests/test_cli.py # specific file
uv run pytest -k "test_grade" # by name patternEvery new feature needs tests. This includes:
Key dependencies in pyproject.toml:
typer - CLI frameworkpyyaml - YAML parsingclaude-code-transcripts - HTML transcript generationloguru - Logging with context supporttextual - TUI for interactive selectionDev dependencies:
pytest - testingty - type checkingInstall with Tessl CLI
npx tessl i dbt-labs/dbt-agent-skills@1.1.0evals
scenarios
dbt-docs-arguments
dbt-docs-unit-test-fixtures
dbt-job-failure
dbt-unit-test-format-choice
example-yaml-error
fusion-migration-triage-basic
fusion-migration-triage-blocked
fusion-triage-cat-a-static-analysis
fusion-triage-cat-b-dict-meta-get
fusion-triage-cat-b-unexpected-config
fusion-triage-cat-b-unused-schema
fusion-triage-cat-b-yaml-syntax
fusion-triage-cat-c-hardcoded-fqn
tests
scripts
skills
dbt
skills
adding-dbt-unit-test
references
answering-natural-language-questions-with-dbt
building-dbt-semantic-layer
configuring-dbt-mcp-server
fetching-dbt-docs
scripts
running-dbt-commands
troubleshooting-dbt-job-errors
references
using-dbt-for-analytics-engineering
dbt-migration
skills
migrating-dbt-core-to-fusion
migrating-dbt-project-across-platforms