A curated collection of Agent Skills for working with dbt, to help AI agents understand and execute dbt workflows more effectively.
91
Does it follow best practices?
Validation for skill structure
A/B testing tool for comparing LLM skill variations against recorded scenarios.
cd evals
uv sync# Run a scenario
uv run skill-eval run <scenario-name>
# Run all scenarios
uv run skill-eval run --all
# Run in parallel (runs all skill-sets concurrently)
uv run skill-eval run <scenario-name> --parallel # single scenario, parallel skill-sets
uv run skill-eval run --all --parallel # all scenarios, all skill-sets in parallel
uv run skill-eval run --all --parallel --workers 8 # custom worker count (default: 4)
# Verbose mode (shows tool calls and skill invocations)
uv run skill-eval run <scenario-name> --verbose # or -v
# Review transcripts in browser (opens HTML files)
uv run skill-eval review # latest run
uv run skill-eval review <run-id> # specific run
# Grade outputs from a run (creates grades.yaml for manual review)
uv run skill-eval grade <run-id>
# Auto-grade using Claude (calls Claude CLI to evaluate each output)
uv run skill-eval grade <run-id> --auto
# Generate comparison report
uv run skill-eval report <run-id>evals/
├── scenarios/ # Test scenarios
│ └── example-yaml-error/
│ ├── scenario.md # Description and grading criteria
│ ├── prompt.txt # User message to send
│ ├── skill-sets.yaml # Skills, MCP servers, allowed tools
│ ├── context/ # Files Claude needs (copied to temp env)
│ └── .env # Environment variables for MCP servers
├── runs/ # Output from runs (timestamped, gitignored)
│ └── 2026-01-15-153633/
│ └── example-yaml-error/
│ └── debug-baseline/
│ ├── output.md # Full conversation text
│ ├── metadata.yaml # Run metrics and tool usage
│ ├── raw.jsonl # Complete NDJSON stream
│ ├── changes/ # Files modified during the run
│ └── transcript/ # HTML conversation viewer
├── reports/ # Generated comparison reports
└── src/skill_eval/ # CLI source codeDefine skill combinations, MCP servers, tool permissions, and prompt variations:
sets:
# Baseline with no skills
- name: no-skills
skills: []
# With specific allowed tools (safer than allowing all)
- name: restricted-tools
skills:
- skills/debugging-dbt-errors
allowed_tools:
- Read
- Glob
- Grep
- Edit
- Bash(dbt:*)
- Skill
# With MCP server
- name: with-mcp
skills:
- skills/troubleshooting-dbt-job-errors
mcp_servers:
dbt:
command: uvx
args:
- --env-file
- .env
- dbt-mcp@latest
allowed_tools:
- Read
- Glob
- mcp__dbt__*
- Skill
# Allow all tools (uses --dangerously-skip-permissions)
- name: all-tools
skills:
- skills/fetching-dbt-docs
# No allowed_tools = allows everything
# With extra instructions appended to the prompt
- name: with-skill-hint
skills:
- skills/debugging-dbt-errors
extra_prompt: Check if any skill can help with this task.
allowed_tools:
- Read
- Glob
- SkillSkills can be referenced in three ways:
Local file path (relative to repo root):
skills:
- skills/debugging-dbt-errorsLocal folder path (copies entire folder including supporting files):
skills:
- skills/add-unit-testHTTP URL (downloads skill from remote server):
skills:
# GitHub blob URL (automatically converted to raw)
- https://github.com/org/repo/blob/main/skills/my-skill/SKILL.md
# Works with branches, tags, and commit SHAs
- https://github.com/org/repo/blob/v1.2.3/skills/my-skill/SKILL.md
- https://github.com/org/repo/blob/abc123def/skills/my-skill/SKILL.md
# Or use raw URL directly
- https://raw.githubusercontent.com/org/repo/main/skills/my-skill/SKILL.mdYou can mix local and remote skills in the same skill set:
skills:
- skills/debugging-dbt-errors
- https://github.com/org/repo/blob/main/skills/external-skill/SKILL.mdNote: The URL must point to a SKILL.md file. GitHub blob URLs are automatically converted to raw URLs. Directory URLs are not supported.
MCP servers use the standard mcpServers format. Environment variables are loaded from .env in the scenario directory:
mcp_servers:
dbt:
command: uvx
args:
- --env-file
- .env
- dbt-mcp@latestCreate a .env file in your scenario directory (gitignored):
# scenarios/dbt-job-failure/.env
DBT_HOST=https://cloud.getdbt.com
DBT_TOKEN=your_token_hereRestrict which tools Claude can use (instead of --dangerously-skip-permissions):
allowed_tools:
- Read
- Glob
- Grep
- Edit
- Bash(dbt:*) # Only dbt commands in bash
- Skill # Allow skill invocation
- mcp__dbt__* # All tools from dbt MCP serverIf allowed_tools is omitted, all tools are allowed.
Append additional instructions to the base prompt for specific skill sets:
sets:
# Baseline - just the prompt.txt content
- name: no-hint
skills:
- skills/debugging-dbt-errors
allowed_tools: [Read, Glob, Skill]
# With hint - prompt.txt + extra_prompt
- name: with-hint
skills:
- skills/debugging-dbt-errors
extra_prompt: Check if any skill can help with this task.
allowed_tools: [Read, Glob, Skill]Use this to test whether additional instructions affect skill invocation or behavior. For example:
Multiline prompts are supported using YAML block scalars:
extra_prompt: |
Before starting:
1. Check if any skill can help
2. Use the MCP server if availableEach run produces:
Full conversation text from all assistant messages (not just the final result).
success: true
skills_invoked:
- debugging-dbt-errors
skills_available:
- debugging-dbt-errors
tools_used:
- Read
- Edit
- Glob
- Skill
mcp_servers: []
model: claude-opus-4-5-20251101
duration_ms: 31476
num_turns: 10
total_cost_usd: 0.1425935
input_tokens: 125241
output_tokens: 1177Files that were modified or created during the run. Only includes files that differ from the original context (excluding .claude/). Useful for verifying what changes Claude made.
Complete NDJSON (newline-delimited JSON) stream from Claude for debugging.
HTML files for viewing the conversation in a browser. Open index.html to view, with paginated content in page-XXX.html files. In VS Code, you can use the Live Preview extension to view these directly in the editor.
Each run has built-in safeguards:
Stall detection helps catch runs that get stuck waiting for tool approval when using allowed_tools restrictions.
skill-eval run <scenario> executes Claude with each configurationskill-eval review opens HTML transcripts in browserskill-eval grade <run-id> (manual) or --auto (Claude-graded)skill-eval report <run-id> shows comparison summaryUse --auto to have Claude grade the outputs:
uv run skill-eval grade <run-id> --autoAuto-grading evaluates each output on three dimensions:
The grader receives:
prompt.txt)scenario.md)output.md)metadata.yaml)changes/)Output grades include:
success: true/falsescore: 1-5tool_usage: appropriate/partial/inappropriatenotes: explanation# Does the skill help Claude solve the problem better?
sets:
- name: without-skill
skills: []
allowed_tools: [Read, Glob, Grep, Edit, Bash(dbt:*)]
- name: with-skill
skills:
- skills/debugging-dbt-errors
allowed_tools: [Read, Glob, Grep, Edit, Bash(dbt:*), Skill]# Does the MCP server provide better results?
sets:
- name: skill-only
skills:
- skills/troubleshooting-dbt-job-errors
allowed_tools: [Read, Glob, Grep, Skill]
- name: skill-plus-mcp
skills:
- skills/troubleshooting-dbt-job-errors
mcp_servers:
dbt:
command: uvx
args: [--env-file, .env, dbt-mcp@latest]
allowed_tools: [Read, Glob, Grep, Skill, mcp__dbt__*]# Compare a local skill against a remote version
sets:
- name: local-skill
skills:
- skills/debugging-dbt-errors
allowed_tools: [Read, Glob, Grep, Edit, Skill]
- name: remote-skill
skills:
# GitHub blob URL - automatically converted to raw
- https://github.com/org/repo/blob/main/skills/debugging-dbt-errors/SKILL.md
allowed_tools: [Read, Glob, Grep, Edit, Skill]Install with Tessl CLI
npx tessl i dbt-labs/dbt-agent-skills@1.1.0evals
scenarios
dbt-docs-arguments
dbt-docs-unit-test-fixtures
dbt-job-failure
dbt-unit-test-format-choice
example-yaml-error
fusion-migration-triage-basic
fusion-migration-triage-blocked
fusion-triage-cat-a-static-analysis
fusion-triage-cat-b-dict-meta-get
fusion-triage-cat-b-unexpected-config
fusion-triage-cat-b-unused-schema
fusion-triage-cat-b-yaml-syntax
fusion-triage-cat-c-hardcoded-fqn
tests
scripts
skills
dbt
skills
adding-dbt-unit-test
references
answering-natural-language-questions-with-dbt
building-dbt-semantic-layer
configuring-dbt-mcp-server
fetching-dbt-docs
scripts
running-dbt-commands
troubleshooting-dbt-job-errors
references
using-dbt-for-analytics-engineering
dbt-migration
skills
migrating-dbt-core-to-fusion
migrating-dbt-project-across-platforms