A curated collection of Agent Skills for working with dbt, to help AI agents understand and execute dbt workflows more effectively.
91
91%
Does it follow best practices?
Impact
Pending
No eval scenarios have been run
Advisory
Suggest reviewing before use
A/B testing tool for comparing LLM skill variations against recorded scenarios.
cd evals
uv sync# One-time install
uv tool install 'skill-eval @ git+https://github.com/dbt-labs/dbt-agent-skills.git#subdirectory=evals'
# Then use from any directory
skill-eval run --all
skill-eval new my-scenariouvx --from 'skill-eval @ git+https://github.com/dbt-labs/dbt-agent-skills.git#subdirectory=evals' skill-eval --help# Create a new scenario
skill-eval new my-scenario
# Create with context files
skill-eval new my-scenario --context models/ --context seeds/data.csv
# Create in a specific directory
skill-eval new my-scenario --base-dir /path/to/evals
# Run a scenario
uv run skill-eval run <scenario-name>
# Run all scenarios
uv run skill-eval run --all
# Run in parallel (runs all skill-sets concurrently)
uv run skill-eval run <scenario-name> --parallel # single scenario, parallel skill-sets
uv run skill-eval run --all --parallel # all scenarios, all skill-sets in parallel
uv run skill-eval run --all --parallel --workers 8 # custom worker count (default: 4)
# Verbose mode (shows tool calls and skill invocations)
uv run skill-eval run <scenario-name> --verbose # or -v
# Review transcripts in browser (opens HTML files)
uv run skill-eval review # latest run
uv run skill-eval review <run-id> # specific run
# Grade outputs from a run (creates grades.yaml for manual review)
uv run skill-eval grade <run-id>
# Auto-grade using Claude (calls Claude CLI to evaluate each output)
uv run skill-eval grade <run-id> --auto
# Generate comparison report
uv run skill-eval report <run-id>evals/
├── scenarios/ # Test scenarios
│ └── example-yaml-error/
│ ├── scenario.md # Description and grading criteria
│ ├── prompt.txt # User message to send
│ ├── skill-sets.yaml # Skills, MCP servers, allowed tools
│ ├── context/ # Files Claude needs (copied to temp env)
│ └── .env # Environment variables (setup commands + MCP servers)
├── runs/ # Output from runs (timestamped, gitignored)
│ └── 2026-01-15-153633/
│ └── example-yaml-error/
│ └── debug-baseline/
│ ├── output.md # Full conversation text
│ ├── metadata.yaml # Run metrics and tool usage
│ ├── raw.jsonl # Complete NDJSON stream
│ ├── changes/ # Files modified during the run
│ └── transcript/ # HTML conversation viewer
├── reports/ # Generated comparison reports
└── src/skill_eval/ # CLI source codeDefine skill combinations, MCP servers, tool permissions, and prompt variations:
sets:
# Baseline with no skills
- name: no-skills
skills: []
# With specific allowed tools (safer than allowing all)
- name: restricted-tools
skills:
- skills/debugging-dbt-errors
allowed_tools:
- Read
- Glob
- Grep
- Edit
- Bash(dbt:*)
- Skill
# With MCP server
- name: with-mcp
skills:
- skills/troubleshooting-dbt-job-errors
mcp_servers:
dbt:
command: uvx
args:
- --env-file
- .env
- dbt-mcp@latest
allowed_tools:
- Read
- Glob
- mcp__dbt__*
- Skill
# Allow all tools (uses --dangerously-skip-permissions)
- name: all-tools
skills:
- skills/fetching-dbt-docs
# No allowed_tools = allows everything
# With extra instructions appended to the prompt
- name: with-skill-hint
skills:
- skills/debugging-dbt-errors
extra_prompt: Check if any skill can help with this task.
allowed_tools:
- Read
- Glob
- SkillSkills can be referenced in three ways:
Local file path (relative to repo root):
skills:
- skills/debugging-dbt-errorsLocal folder path (copies entire folder including supporting files):
skills:
- skills/add-unit-testHTTP URL (downloads skill from remote server):
skills:
# GitHub blob URL (automatically converted to raw)
- https://github.com/org/repo/blob/main/skills/my-skill/SKILL.md
# Works with branches, tags, and commit SHAs
- https://github.com/org/repo/blob/v1.2.3/skills/my-skill/SKILL.md
- https://github.com/org/repo/blob/abc123def/skills/my-skill/SKILL.md
# Or use raw URL directly
- https://raw.githubusercontent.com/org/repo/main/skills/my-skill/SKILL.mdYou can mix local and remote skills in the same skill set:
skills:
- skills/debugging-dbt-errors
- https://github.com/org/repo/blob/main/skills/external-skill/SKILL.mdNote: The URL must point to a SKILL.md file. GitHub blob URLs are automatically converted to raw URLs. Directory URLs are not supported.
Each scenario gets a .env file (created by skill-eval new, gitignored). Variables are loaded automatically for setup commands and passed to Claude:
# scenarios/dbt-job-failure/.env
DO_NOT_TRACK=1
DBT_HOST=https://cloud.getdbt.com
DBT_TOKEN=your_token_hereMCP servers use the standard mcpServers format:
mcp_servers:
dbt:
command: uvx
args:
- --env-file
- .env
- dbt-mcp@latestRestrict which tools Claude can use (instead of --dangerously-skip-permissions):
allowed_tools:
- Read
- Glob
- Grep
- Edit
- Bash(dbt:*) # Only dbt commands in bash
- Skill # Allow skill invocation
- mcp__dbt__* # All tools from dbt MCP serverIf allowed_tools is omitted, all tools are allowed.
Append additional instructions to the base prompt for specific skill sets:
sets:
# Baseline - just the prompt.txt content
- name: no-hint
skills:
- skills/debugging-dbt-errors
allowed_tools: [Read, Glob, Skill]
# With hint - prompt.txt + extra_prompt
- name: with-hint
skills:
- skills/debugging-dbt-errors
extra_prompt: Check if any skill can help with this task.
allowed_tools: [Read, Glob, Skill]Use this to test whether additional instructions affect skill invocation or behavior. For example:
Multiline prompts are supported using YAML block scalars:
extra_prompt: |
Before starting:
1. Check if any skill can help
2. Use the MCP server if availableRun commands before Claude starts (e.g., installing skills via CLI):
sets:
- name: with-remote-skill
setup:
- npx skills add https://github.com/dbt-labs/dbt-agent-skills -a claude-code -y
skills: []
allowed_tools: [Read, Glob, Grep, Skill]Setup commands run in the isolated temp environment with .env variables loaded. If any command fails, the run stops immediately with an error.
Use cases:
npx skills add <url> -a claude-code -yEach run produces:
Full conversation text from all assistant messages (not just the final result).
success: true
skills_invoked:
- debugging-dbt-errors
skills_available:
- debugging-dbt-errors
tools_used:
- Read
- Edit
- Glob
- Skill
mcp_servers: []
model: claude-opus-4-5-20251101
duration_ms: 31476
num_turns: 10
total_cost_usd: 0.1425935
input_tokens: 125241
output_tokens: 1177Files that were modified or created during the run. Only includes files that differ from the original context (excluding .claude/). Useful for verifying what changes Claude made.
Complete NDJSON (newline-delimited JSON) stream from Claude for debugging.
HTML files for viewing the conversation in a browser. Open index.html to view, with paginated content in page-XXX.html files. In VS Code, you can use the Live Preview extension to view these directly in the editor.
Each run has built-in safeguards:
Stall detection helps catch runs that get stuck waiting for tool approval when using allowed_tools restrictions.
skill-eval new <name> scaffolds the directory structureskill-sets.yaml to specify skills, MCP servers, and tool permissionsskill-eval run <scenario> executes Claude with each configurationskill-eval review opens HTML transcripts in browserskill-eval grade <run-id> (manual) or --auto (Claude-graded)skill-eval report <run-id> shows comparison summaryUse --auto to have Claude grade the outputs:
uv run skill-eval grade <run-id> --autoAuto-grading evaluates each output on three dimensions:
The grader receives:
prompt.txt)scenario.md)output.md)metadata.yaml)changes/)Output grades include:
success: true/falsescore: 1-5tool_usage: appropriate/partial/inappropriatenotes: explanation# Does the skill help Claude solve the problem better?
sets:
- name: without-skill
skills: []
allowed_tools: [Read, Glob, Grep, Edit, Bash(dbt:*)]
- name: with-skill
skills:
- skills/debugging-dbt-errors
allowed_tools: [Read, Glob, Grep, Edit, Bash(dbt:*), Skill]# Does the MCP server provide better results?
sets:
- name: skill-only
skills:
- skills/troubleshooting-dbt-job-errors
allowed_tools: [Read, Glob, Grep, Skill]
- name: skill-plus-mcp
skills:
- skills/troubleshooting-dbt-job-errors
mcp_servers:
dbt:
command: uvx
args: [--env-file, .env, dbt-mcp@latest]
allowed_tools: [Read, Glob, Grep, Skill, mcp__dbt__*]# Compare a local skill against a remote version
sets:
- name: local-skill
skills:
- skills/debugging-dbt-errors
allowed_tools: [Read, Glob, Grep, Edit, Skill]
- name: remote-skill
skills:
# GitHub blob URL - automatically converted to raw
- https://github.com/org/repo/blob/main/skills/debugging-dbt-errors/SKILL.md
allowed_tools: [Read, Glob, Grep, Edit, Skill]If run logs mention needing to log in, authenticate Claude Code first:
claude
/loginThen exit and re-run the evaluation.
evals
scenarios
dbt-docs-arguments
dbt-docs-unit-test-fixtures
dbt-job-failure
dbt-unit-test-format-choice
example-yaml-error
fusion-migration-triage-basic
fusion-migration-triage-blocked
fusion-triage-cat-a-static-analysis
fusion-triage-cat-b-dict-meta-get
fusion-triage-cat-b-unexpected-config
fusion-triage-cat-b-unused-schema
fusion-triage-cat-b-yaml-syntax
fusion-triage-cat-c-hardcoded-fqn
src
tests
scripts
skills
dbt
skills
adding-dbt-unit-test
references
answering-natural-language-questions-with-dbt
building-dbt-semantic-layer
configuring-dbt-mcp-server
fetching-dbt-docs
scripts
running-dbt-commands
troubleshooting-dbt-job-errors
references
using-dbt-for-analytics-engineering
working-with-dbt-mesh
dbt-extras
skills
creating-mermaid-dbt-dag
dbt-migration
skills
migrating-dbt-core-to-fusion
migrating-dbt-project-across-platforms