Syncs TripIt travel itineraries to Reclaim.ai timezone segments and Google Calendar OOO blocks.
91
97%
Does it follow best practices?
Impact
80%
1.31xAverage score across 4 eval scenarios
Advisory
Suggest reviewing before use
{
"context": "Tests whether the agent correctly handles credential onboarding: reading from a non-standard input path, writing to the correct destination using file-level filtering (not reproducing values), filtering notification variables, detecting incomplete Google OAuth credentials, and validating via dry-run.",
"type": "weighted_checklist",
"checklist": [
{
"name": "Reads from input path",
"description": "Agent reads credentials from /tmp/tripit-creds.txt (the provided location)",
"max_score": 8
},
{
"name": "Writes to correct env file path",
"description": "Filtered credentials are written to $HOME/.tripit-reclaim/.env (user-space, not /opt/)",
"max_score": 10
},
{
"name": "File permissions set to 600",
"description": "chmod 600 is applied to the written env file",
"max_score": 10
},
{
"name": "Telegram vars excluded",
"description": "TELEGRAM_BOT_TOKEN and TELEGRAM_CHAT_ID are NOT written to the output env file",
"max_score": 10
},
{
"name": "SNS var excluded",
"description": "SNS_TOPIC_ARN is NOT written to the output env file",
"max_score": 10
},
{
"name": "Mandatory vars included",
"description": "Both TRIPIT_ICAL_URL and RECLAIM_API_TOKEN are present in the written env file",
"max_score": 10
},
{
"name": "Trip filtering var included",
"description": "TRIPIT_IGNORE_TRIPS is preserved in the written env file (it's a valid optional variable)",
"max_score": 8
},
{
"name": "Incomplete Google OAuth detected",
"description": "Agent warns that OOO blocks won't work because GOOGLE_REFRESH_TOKEN is missing — without the task having spelled out which credential is missing",
"max_score": 14
},
{
"name": "Dry-run attempted",
"description": "Agent runs or attempts to run the dry-run command (node sync.mjs dry-run --output=json) to validate the setup",
"max_score": 10
},
{
"name": "Dry-run failure handled",
"description": "Agent acknowledges the dry-run failed (fake credentials) and explains that real credentials are needed to fully validate, rather than silently ignoring the failure",
"max_score": 10
}
]
}