Analyze Python code for correctness using symbolic execution and SMT solving to automatically find counterexamples for functions with type annotations and contracts.
86
Pending
Does it follow best practices?
Impact
86%
1.24xAverage score across 10 eval scenarios
Pending
The risk profile of this skill
{
"context": "This criteria evaluates how well the engineer uses crosshair-tool's file watching and continuous analysis capabilities to implement a contract monitoring service. The focus is on proper integration with CrossHair's watch functionality, contract checking APIs, and result handling.",
"type": "weighted_checklist",
"checklist": [
{
"name": "Uses watch command",
"description": "Implementation utilizes CrossHair's `watch` command or equivalent API (e.g., subprocess call to 'crosshair watch' or imports from crosshair.watchers module) to monitor directory for file changes.",
"max_score": 25
},
{
"name": "Integrates contract checking",
"description": "Implementation properly invokes CrossHair's contract checking functionality (e.g., using 'crosshair check' command, or importing and calling analysis functions from crosshair.core_and_libs or crosshair.statespace modules) to analyze Python files for contract violations.",
"max_score": 25
},
{
"name": "Handles watch results",
"description": "Implementation correctly captures and processes the output from CrossHair's watch/check operations, parsing the standardized output format (filename:line: error: message) to extract violation details.",
"max_score": 20
},
{
"name": "Implements graceful shutdown",
"description": "Implementation handles interruption signals properly to allow the file watching process to shut down gracefully, using appropriate signal handling or context managers to clean up watch processes.",
"max_score": 15
},
{
"name": "Structured output format",
"description": "Implementation outputs analysis results in the specified JSON structure with file paths, violation details (line numbers and messages), and timestamps as required by the spec.",
"max_score": 15
}
]
}evals
scenario-1
scenario-2
scenario-3
scenario-4
scenario-5
scenario-6
scenario-7
scenario-8
scenario-9
scenario-10