Manage Things 3 tasks on macOS via AppleScript. Full CRUD: view, create, complete, move, and delete tasks and projects across all Things 3 lists.
98
100%
Does it follow best practices?
Impact
96%
2.04xAverage score across 5 eval scenarios
Passed
No known issues
{
"context": "Tests whether the agent queries relevant lists before making changes (show-before-modify), uses the helper scripts correctly for both reading and writing, formats output as markdown, verifies changes by re-querying after mutations, and presents before/after states.",
"type": "weighted_checklist",
"checklist": [
{
"name": "Query before modify",
"description": "Script calls things-query.sh to read list contents BEFORE calling things-create.sh to make any changes — query commands appear earlier in the script than write commands",
"max_score": 14
},
{
"name": "Multiple list queries (before)",
"description": "Script calls things-query.sh for at least two different lists (e.g., today, inbox, someday) as part of the before-state capture",
"max_score": 10
},
{
"name": "Correct query subcommands",
"description": "Uses valid things-query.sh subcommands: today, inbox, upcoming, anytime, someday, tomorrow, logbook, projects, project, tags, areas, or search",
"max_score": 8
},
{
"name": "Write via things-create.sh move",
"description": "Uses things-create.sh with the 'move' subcommand for the reorganization step, not raw osascript or other methods",
"max_score": 10
},
{
"name": "Valid list targets",
"description": "The --list flag in things-create.sh move uses only valid targets: Today, Someday, Anytime, or Tomorrow (not Inbox or Upcoming)",
"max_score": 8
},
{
"name": "Move result checked",
"description": "Script inspects the JSON returned by things-create.sh move (checking for success or error fields) rather than ignoring the output",
"max_score": 8
},
{
"name": "Post-change re-query",
"description": "Script calls things-query.sh again AFTER making changes to capture the updated state of affected lists",
"max_score": 14
},
{
"name": "Markdown formatting",
"description": "The review_report.md presents task data as markdown tables or bullet lists, NOT as raw JSON",
"max_score": 10
},
{
"name": "Before section in report",
"description": "The review_report.md contains a clearly labelled Before (or initial state) section showing list contents prior to any moves",
"max_score": 9
},
{
"name": "After section in report",
"description": "The review_report.md contains a clearly labelled After (or updated state) section showing list contents after moves are complete",
"max_score": 9
}
]
}