Manage Things 3 tasks on macOS via AppleScript. Full CRUD: view, create, complete, move, and delete tasks and projects across all Things 3 lists.
98
100%
Does it follow best practices?
Impact
96%
2.04xAverage score across 5 eval scenarios
Passed
No known issues
{
"context": "Tests whether the agent queries the inbox before making changes (show-before-modify), uses things-create.sh move for reorganization, properly handles ambiguous name errors from the move command, re-queries to verify the updated state, and formats output as markdown.",
"type": "weighted_checklist",
"checklist": [
{
"name": "Inbox query first",
"description": "Script calls things-query.sh inbox BEFORE calling things-create.sh move — the read command appears earlier in execution than the write commands",
"max_score": 12
},
{
"name": "Inbox shown as markdown",
"description": "The inbox_report.md presents the initial inbox contents as a markdown table or bullet list, not raw JSON",
"max_score": 8
},
{
"name": "Move via things-create.sh",
"description": "Uses things-create.sh with the 'move' subcommand (with --list or --project flag) to reorganize tasks, not raw osascript",
"max_score": 12
},
{
"name": "Valid list flag value",
"description": "The --list flag uses only valid list targets: Today, Someday, Anytime, or Tomorrow (not Inbox or Upcoming)",
"max_score": 8
},
{
"name": "Ambiguity error displayed",
"description": "When things-create.sh move returns a JSON error with multiple matches, the script parses and displays the matching task names to the user rather than silently ignoring or crashing",
"max_score": 16
},
{
"name": "Move result inspected",
"description": "Script reads and checks the JSON returned by things-create.sh move (for 'error' or success) rather than ignoring the return value",
"max_score": 10
},
{
"name": "Post-move re-query",
"description": "Script calls things-query.sh again after moves (on inbox and/or destination lists) to capture and display the updated state",
"max_score": 12
},
{
"name": "Destination list queried",
"description": "After moving tasks, script queries the destination list(s) (e.g., today, anytime, or a project) to confirm tasks arrived there",
"max_score": 8
},
{
"name": "Before section in report",
"description": "The inbox_report.md contains a clearly labelled Before (or initial inbox) section showing inbox contents prior to any moves",
"max_score": 7
},
{
"name": "After section in report",
"description": "The inbox_report.md contains a clearly labelled After (or updated state) section showing the inbox and destination lists after all moves",
"max_score": 7
}
]
}