Compares deployed CloudFormation templates with locally synthesized CDK templates to detect drift, validate changes, and ensure consistency before deployment. Use when the user wants to compare CDK output with a deployed stack, check for infrastructure drift, run a pre-deployment validation, audit IAM or security changes, investigate a failing deployment, or perform a 'cdk diff'-style review. Triggered by phrases like 'compare templates', 'check for drift', 'cfn drift', 'stack comparison', 'infrastructure drift detection', 'safe to deploy', or 'what changed in my CDK stack'.
95
93%
Does it follow best practices?
Impact
100%
1.08xAverage score across 5 eval scenarios
Passed
No known issues
{
"context": "Tests whether the agent knows to switch from line diff to hierarchical comparison for large templates and understands the threshold for this decision.",
"type": "weighted_checklist",
"checklist": [
{
"name": "Problem threshold identified",
"description": "large-template-strategy.md mentions a specific line count threshold (e.g., >5000 lines) when diff becomes unmanageable",
"max_score": 15
},
{
"name": "Hierarchical approach recommended",
"description": "Document explicitly recommends hierarchical comparison instead of line diff for large templates",
"max_score": 15
},
{
"name": "Structure comparison first",
"description": "Comparison steps include checking top-level structure (keys) first",
"max_score": 10
},
{
"name": "Resource count comparison",
"description": "Comparison steps include checking resource counts (jq '.Resources | length')",
"max_score": 10
},
{
"name": "Added/removed resources",
"description": "Comparison steps include identifying which resources were added/removed",
"max_score": 10
},
{
"name": "Avoid line diff",
"description": "large-template-compare.sh uses jq, comm, or diff with process substitution rather than raw 'diff file1 file2'",
"max_score": 12
},
{
"name": "Summarized output",
"description": "Script commands produce concise output (counts, lists) rather than full template diffs",
"max_score": 10
},
{
"name": "Decision criteria clear",
"description": "Document explains when to use hierarchical vs line diff (based on size/complexity)",
"max_score": 10
},
{
"name": "Security focused subset",
"description": "Strategy mentions focusing on security-sensitive changes (IAM, CDK Nag) rather than all resources",
"max_score": 8
}
]
}