CtrlK
BlogDocsLog inGet started
Tessl Logo

jbaruch/reclaim-tripit-sync

Syncs TripIt travel itineraries to Reclaim.ai timezone segments and Google Calendar OOO blocks.

91

1.31x
Quality

97%

Does it follow best practices?

Impact

80%

1.31x

Average score across 4 eval scenarios

SecuritybySnyk

Advisory

Suggest reviewing before use

Overview
Quality
Evals
Security
Files

criteria.jsonevals/scenario-1/

{
  "context": "Tests the critical 'stay silent on no-change runs' behavior. Both JSON inputs have noChanges: true, empty conflicts, and empty errors. The correct behavior is to NOT generate a notification. This also tests whether the agent is tempted to report on the segments that exist (which are not changes) or to fabricate differences between the two runs.",
  "type": "weighted_checklist",
  "checklist": [
    {
      "name": "Run 1 identified as no-change",
      "description": "Agent correctly identifies run-1-output.json as a no-change run (noChanges is true, no errors, no conflicts)",
      "max_score": 10
    },
    {
      "name": "Run 2 identified as no-change",
      "description": "Agent correctly identifies run-2-output.json as a no-change run (noChanges is true, no errors, no conflicts)",
      "max_score": 10
    },
    {
      "name": "Decision is silence",
      "description": "notification-decision.md states that no notification should be sent for either run",
      "max_score": 16
    },
    {
      "name": "Does not report existing segments as changes",
      "description": "Agent does NOT treat the segments array as 'changes' — segments are the current state, not new activity. Does not say 'KubeCon timezone was created' or similar",
      "max_score": 14
    },
    {
      "name": "Does not fabricate diff between runs",
      "description": "Agent does NOT claim that the difference in segment count between run 1 (2 segments) and run 2 (1 segment) represents a change — both runs independently report noChanges: true",
      "max_score": 14
    },
    {
      "name": "Does not report null OOO as a problem",
      "description": "Agent does NOT flag run 2's null OOO field as an error or issue — null means OOO was not configured, which is a valid state",
      "max_score": 10
    },
    {
      "name": "Reasoning is provided",
      "description": "Agent explains WHY silence is correct (e.g., references noChanges being true, empty errors/conflicts) rather than just saying 'nothing to report'",
      "max_score": 8
    },
    {
      "name": "Does not run the sync",
      "description": "Agent does NOT attempt to run sync.mjs or make API calls — the task is to interpret provided output, not to re-run the sync",
      "max_score": 8
    },
    {
      "name": "Both runs processed",
      "description": "Agent processes both JSON files, not just one",
      "max_score": 5
    },
    {
      "name": "Concise output",
      "description": "notification-decision.md is brief and to the point — a few sentences, not a multi-page analysis of nothing",
      "max_score": 5
    }
  ]
}

evals

scenario-1

criteria.json

task.md

README.md

tile.json