Analyze Python code for correctness using symbolic execution and SMT solving to automatically find counterexamples for functions with type annotations and contracts.
86
Pending
Does it follow best practices?
Impact
86%
1.24xAverage score across 10 eval scenarios
Pending
The risk profile of this skill
{
"context": "This criteria evaluates how well the engineer uses the crosshair-tool package to implement behavioral difference detection between two Python functions. The focus is on correctly utilizing CrossHair's diff_behavior functionality and related APIs for symbolic execution-based comparison.",
"type": "weighted_checklist",
"checklist": [
{
"name": "Import diff_behavior",
"description": "Imports the diff_behavior function from crosshair.diff_behavior module to access the behavioral comparison functionality",
"max_score": 15
},
{
"name": "Call diff_behavior",
"description": "Invokes the diff_behavior() function to compare two functions symbolically, passing both function references as arguments (fn1 and fn2 parameters)",
"max_score": 30
},
{
"name": "Iterate over results",
"description": "Iterates over the generator returned by diff_behavior() to retrieve BehaviorDiff instances representing discovered differences",
"max_score": 20
},
{
"name": "Extract input arguments",
"description": "Accesses the .args property of BehaviorDiff objects to extract the input arguments that caused the behavioral difference",
"max_score": 20
},
{
"name": "Configure analysis options",
"description": "Passes AnalysisOptions to the options parameter of diff_behavior() to control timeout settings like per_path_timeout or per_condition_timeout",
"max_score": 15
}
]
}evals
scenario-1
scenario-2
scenario-3
scenario-4
scenario-5
scenario-6
scenario-7
scenario-8
scenario-9
scenario-10