A curated collection of Agent Skills for working with dbt, to help AI agents understand and execute dbt workflows more effectively.
65
82%
Does it follow best practices?
Impact
—
No eval scenarios have been run
Risky
Do not use without reviewing
Use dbt show to interactively explore raw data, understand table structures, and document findings for downstream model development.
Complete all 6 steps for every table you will build models on.
| You're Thinking... | Reality |
|---|---|
| "I don't have time for full discovery" | You don't have time for wrong models. |
| "It's just a quick stakeholder briefing" | Quick briefings become "can you build a model from this?" You need to do full discovery before building anything. |
| "I'll do proper discovery later" | You won't. Document now or create technical debt someone else inherits. |
| "This is technical debt I'm accepting" | You're not accepting it - you're passing it to your future self or teammates. |
| "47 tables is too many for full methodology" | Then prioritize which tables you'll actually use and do full discovery on those. Don't half-discover everything. |
| "I'll just do the critical tables thoroughly" | ALL tables you build on are critical. If it's not worth full discovery, don't build models on it yet. |
| "Standard patterns, I know this data" | You know the pattern. This instance's data might vary. Verify. |
Stop if you catch yourself:
SELECT * without grain analysisAll of these mean: slow down, follow all 6 steps.
When facing many tables (20+), the answer is NOT abbreviated discovery. The answer is:
Wrong approach: "I'll do light discovery on all 47 tables" Right approach: "I'll do full discovery on the 8 tables needed for this project"
When discovering new raw data, list all tables from the new source. E.g. listing all ecom source tables:
# quoting is critical when selecting sources
dbt ls --select "source:ecom.*" --output jsonReview the existing YAML file at original_file_path to understand what's already documented.
When previewing existing models, use standard node selection syntax:
# quoting is critical when selecting multiple nodes
dbt ls --select "my_first_model my_second_model" --output jsonReview existing YAML files (normally colocated with the model's original_file_path) to understand what's already documented.
Preview rows from each source table:
dbt show --inline "SELECT * FROM {{ source('source_name', 'table_name') }}" --limit 50 --output jsonDocument immediately:
Continue to use dbt show to run standard exploratory data analysis queries such as:
Create a discovery report that other agents can consume. Place in a data_discovery.md file alongside the SQL/YAML files. Do not use Jinja in these discovery files to avoid them being mistaken for doc blocks.
## Source: {source_name}.{table_name}
### Overview
- **Row count**: X
- **Grain**: One row per [entity] per [time period]
- **Primary key**: column_name (verified unique)
### Column Analysis
| Column | Type | Nulls | Notes |
|--------|------|-------|-------|
| id | integer | 0% | Primary key |
| status | string | 2% | Values: active, inactive, pending |
| created_at | timestamp | 0% | UTC timezone |
### Data Quality Issues
- [ ] `status` has 15 rows with value "unknown" - clarify with stakeholder
- [ ] `amount` has negative values - confirm if valid or error
### Relationships
- `user_id` → `users.id` (5 orphan records found)
- `product_id` → `products.id` (clean join)
### Recommended Staging Transformations
1. Filter out `status = 'unknown'` rows or map to valid value
2. Cast `created_at` to consistent timezone
3. Add surrogate key if natural key unreliableWhen using dbt show --inline to preview data, push LIMIT clauses as early as possible in CTEs to minimize data scanning. Never add a LIMIT at the end of the query - dbt show always adds an additional limit and you will cause a syntax error.
-- ✅ GOOD: Limit pushed early, minimizes scanning
with orders as (
select * from {{ source('ecom', 'orders') }} limit 100
),
customers as (
select * from {{ source('ecom', 'customers') }} limit 100
)
select ... from orders join customers ...
-- ❌ BAD: Full table scan before limit applied
with orders as (
select * from {{ source('ecom', 'orders') }}
),
customers as (
select * from {{ source('ecom', 'customers') }}
)
select ... from orders join customers ...
limit 100 -- Too late, and redundant with --limit flagAssuming column names reflect content. Always verify with sample data; customer_id might contain account IDs
Not documenting findings. Discovery without documentation wastes effort; write it down immediately
Testing relationships on sampled data only. Orphan records may exist outside your sample; run full counts
Ignoring soft deletes. Check for deleted_at, is_active, or status columns that filter valid records
evals
skills
adding-dbt-unit-test
references
answering-natural-language-questions-with-dbt
building-dbt-semantic-layer
configuring-dbt-mcp-server
fetching-dbt-docs
scripts
migrating-dbt-core-to-fusion
running-dbt-commands
troubleshooting-dbt-job-errors
using-dbt-for-analytics-engineering