Manage Things 3 tasks on macOS via AppleScript. Full CRUD: view, create, complete, move, and delete tasks and projects across all Things 3 lists.
98
100%
Does it follow best practices?
Impact
96%
2.04xAverage score across 5 eval scenarios
Passed
No known issues
{
"context": "Tests whether the agent shows matched tasks before completing them, uses things-create.sh complete, verifies completion by re-querying the logbook, and surfaces ambiguous name matches to the user rather than silently proceeding.",
"type": "weighted_checklist",
"checklist": [
{
"name": "Search before completing",
"description": "Script calls things-query.sh search (or another read command) BEFORE any things-create.sh complete calls, to show the user what tasks exist",
"max_score": 11
},
{
"name": "Tasks shown as markdown",
"description": "Tasks found by the search are presented as a markdown list or table (not raw JSON) before any completion is attempted",
"max_score": 8
},
{
"name": "Complete via things-create.sh",
"description": "Uses things-create.sh with the 'complete' subcommand to mark tasks done (not raw osascript)",
"max_score": 11
},
{
"name": "Return value inspected",
"description": "Script reads and inspects the JSON returned by things-create.sh complete (checking for 'completed' or 'error' key) rather than ignoring the output",
"max_score": 9
},
{
"name": "Ambiguity error surfaced",
"description": "When things-create.sh returns a JSON error indicating multiple matches, the script parses that error and displays the matching task names to the user rather than silently proceeding or aborting without explanation",
"max_score": 14
},
{
"name": "Logbook re-query",
"description": "Script calls things-query.sh logbook after completions to verify completed tasks appear there",
"max_score": 11
},
{
"name": "Logbook shown as markdown",
"description": "The logbook contents displayed in the report are formatted as a markdown list or table (not raw JSON)",
"max_score": 8
},
{
"name": "Completion confirmed in report",
"description": "The sprint_closure_report.md includes an explicit section noting which tasks were successfully completed and which had errors, based on the JSON return values",
"max_score": 9
},
{
"name": "Query subcommand used correctly",
"description": "Uses a valid things-query.sh subcommand for the initial task lookup (e.g., 'search' with a term, 'today', 'anytime', 'inbox')",
"max_score": 9
},
{
"name": "Logbook queried after completions",
"description": "The things-query.sh logbook call appears AFTER the completion attempts in the script (verifying the completed state, not checking logbook first)",
"max_score": 10
}
]
}