Implement data quality validation with Great Expectations, dbt tests, and data contracts. Use when building data quality pipelines, implementing validation rules, or establishing data contracts.
52
56%
Does it follow best practices?
Impact
—
No eval scenarios have been run
Passed
No known issues
Optimize this skill with Tessl
npx tessl skill review --optimize ./plugins/data-engineering/skills/data-quality-frameworks/SKILL.mdProduction patterns for implementing data quality with Great Expectations, dbt tests, and data contracts to ensure reliable data pipelines.
| Dimension | Description | Example Check |
|---|---|---|
| Completeness | No missing values | expect_column_values_to_not_be_null |
| Uniqueness | No duplicates | expect_column_values_to_be_unique |
| Validity | Values in expected range | expect_column_values_to_be_in_set |
| Accuracy | Data matches reality | Cross-reference validation |
| Consistency | No contradictions | expect_column_pair_values_A_to_be_greater_than_B |
| Timeliness | Data is recent | expect_column_max_to_be_between |
/\
/ \ Integration Tests (cross-table)
/────\
/ \ Unit Tests (single column)
/────────\
/ \ Schema Tests (structure)
/────────────\# Install
pip install great_expectations
# Initialize project
great_expectations init
# Create datasource
great_expectations datasource new# great_expectations/checkpoints/daily_validation.yml
import great_expectations as gx
# Create context
context = gx.get_context()
# Create expectation suite
suite = context.add_expectation_suite("orders_suite")
# Add expectations
suite.add_expectation(
gx.expectations.ExpectColumnValuesToNotBeNull(column="order_id")
)
suite.add_expectation(
gx.expectations.ExpectColumnValuesToBeUnique(column="order_id")
)
# Validate
results = context.run_checkpoint(checkpoint_name="daily_orders")Detailed pattern documentation lives in references/details.md. Read that file when the navigation tier above is insufficient.
report.append("")
for table, result in results.items():
status = "✅" if result.passed else "❌"
report.append(f"### {status} {table}")
report.append(f"- Expectations: {result.total_expectations}")
report.append(f"- Failed: {result.failed_expectations}")
if not result.passed:
report.append("- Failed checks:")
for detail in result.details:
if not detail["success"]:
report.append(f" - {detail['expectation']}: {detail['observed_value']}")
report.append("")
return "\n".join(report)context = gx.get_context() pipeline = DataQualityPipeline(context)
tables_to_validate = { "orders": "orders_suite", "customers": "customers_suite", "products": "products_suite", }
results = pipeline.run_all(tables_to_validate) report = pipeline.generate_report(results)
if not all(r.passed for r in results.values()): print(report) raise ValueError("Data quality checks failed!")
## Best Practices
### Do's
- **Test early** - Validate source data before transformations
- **Test incrementally** - Add tests as you find issues
- **Document expectations** - Clear descriptions for each test
- **Alert on failures** - Integrate with monitoring
- **Version contracts** - Track schema changes
### Don'ts
- **Don't test everything** - Focus on critical columns
- **Don't ignore warnings** - They often precede failures
- **Don't skip freshness** - Stale data is bad data
- **Don't hardcode thresholds** - Use dynamic baselines
- **Don't test in isolation** - Test relationships toocf6059d
If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.