A curated collection of Agent Skills for working with dbt, to help AI agents understand and execute dbt workflows more effectively.
91
91%
Does it follow best practices?
Impact
Pending
No eval scenarios have been run
Advisory
Suggest reviewing before use
This document covers conventions and patterns for working on the skill-eval CLI tool.
src/skill_eval/
├── cli.py # Typer CLI commands (run, grade, report, review)
├── models.py # Data models: Scenario, SkillSet, load_scenario()
├── runner.py # Execution: Runner class, RunResult, RunTask
├── grader.py # Auto-grading with Claude CLI
├── reporter.py # Report generation from grades
├── selector.py # Interactive TUI selectors for runs/scenarios
└── logging.py # Loguru configuration with context supportData flow: cli.py loads scenarios via models.py, executes via runner.py, grades via grader.py, and reports via reporter.py.
We use Typer for the CLI.
Use the appropriate output method based on context:
User-facing CLI output (command results, prompts): Use typer.echo()
typer.echo(f"Run directory: {run_dir}")
typer.echo("Error: file not found", err=True)Progress logging (during execution): Use logger from logging.py
from skill_eval.logging import logger
logger.info("Starting scenario")
logger.debug("Tool called: Read")
logger.warning("Timeout reached")
logger.success("Completed")
# With context (for parallel runs)
ctx_logger = logger.bind(scenario="my-scenario", skill_set="with-skill")
ctx_logger.info("Starting") # Shows: [T0/my-scenario/with-skill] StartingNever use print() for output.
New commands go in cli.py:
@app.command()
def mycommand(
arg: str = typer.Argument(..., help="Required argument"),
flag: bool = typer.Option(False, "--flag", "-f", help="Optional flag"),
) -> None:
"""Command description shown in --help."""
typer.echo(f"Running with {arg}")When modifying CLI commands that work with scenarios or skill sets, check if the underlying dataclasses need updates:
In models.py:
Grade - grading result (success, score, tool_usage, notes, etc.)SkillSet - skills, mcp_servers, allowed_toolsScenario - name, path, prompt, skill_sets, descriptionIn runner.py:
RunResult - scenario results with output, success, tools_used, skills_invoked, etc.RunTask - task definition for parallel execution (scenario, skill_set, run_dir)Use dataclasses.asdict() to convert dataclasses to dicts for YAML serialization.
When modifying grading:
Grade dataclass in models.py if adding new fieldsGRADING_PROMPT_TEMPLATE in grader.py if changing what Claude evaluatesparse_grade_response() to extract new fields into Gradereporter.py to display new fieldscli.py with @app.command()typer.echo() for all outputtests/README.md usage sectionuv run ty check src/SkillSet dataclass in models.pyload_scenario() to parse the new fieldRunner.run_scenario() if it affects executionRunResult dataclass in runner.py_parse_json_output() to extract new data from Claude's outputrun_scenario()GRADING_PROMPT_TEMPLATE in grader.pyparse_grade_response() to handle new fieldsinit_grades_file() for manual grading templatereporter.py to show new fields in reportsWe use concurrent.futures.ThreadPoolExecutor for parallel runs:
from concurrent.futures import ThreadPoolExecutor, as_completed
with ThreadPoolExecutor(max_workers=max_workers) as executor:
future_to_task = {executor.submit(fn, task): task for task in tasks}
for future in as_completed(future_to_task):
result = future.result()See Runner.run_parallel() in runner.py for the full implementation.
Run the ty type checker before committing:
uv run ty check src/Fix any type errors. Common issues:
| None typeslist[str] not List[str] (Python 3.11+)Tests live in tests/ and use pytest.
uv run pytest # all tests
uv run pytest tests/test_cli.py # specific file
uv run pytest -k "test_grade" # by name patternEvery new feature needs tests. This includes:
Key dependencies in pyproject.toml:
typer - CLI frameworkpyyaml - YAML parsingclaude-code-transcripts - HTML transcript generationloguru - Logging with context supporttextual - TUI for interactive selectionDev dependencies:
pytest - testingty - type checkingevals
scenarios
dbt-docs-arguments
dbt-docs-unit-test-fixtures
dbt-job-failure
dbt-unit-test-format-choice
example-yaml-error
fusion-migration-triage-basic
fusion-migration-triage-blocked
fusion-triage-cat-a-static-analysis
fusion-triage-cat-b-dict-meta-get
fusion-triage-cat-b-unexpected-config
fusion-triage-cat-b-unused-schema
fusion-triage-cat-b-yaml-syntax
fusion-triage-cat-c-hardcoded-fqn
src
tests
scripts
skills
dbt
skills
adding-dbt-unit-test
references
answering-natural-language-questions-with-dbt
building-dbt-semantic-layer
configuring-dbt-mcp-server
fetching-dbt-docs
scripts
running-dbt-commands
troubleshooting-dbt-job-errors
references
using-dbt-for-analytics-engineering
working-with-dbt-mesh
dbt-extras
skills
creating-mermaid-dbt-dag
dbt-migration
skills
migrating-dbt-core-to-fusion
migrating-dbt-project-across-platforms