CtrlK
BlogDocsLog inGet started
Tessl Logo

jbaruch/frequent-flyer-advocate

Write professional, persuasive complaint letters to US airlines emphasizing loyalty status, DOT regulations, and airline commitments.

93

1.38x
Quality

94%

Does it follow best practices?

Impact

93%

1.38x

Average score across 10 eval scenarios

SecuritybySnyk

Advisory

Suggest reviewing before use

Overview
Quality
Evals
Security
Files

criteria.jsonevals/scenario-9/

{
  "context": "Tests whether the skill checks for pending complaints at the start of a session, correctly resolves them based on user responses, logs credits to the travel credits inventory when the resolution includes miles/vouchers, and uses CLOSED for no-response cases to prevent re-asking.",
  "type": "weighted_checklist",
  "checklist": [
    {
      "name": "Pending complaints checked first",
      "description": "session-log.md shows that pending complaints were checked before starting intake for the new complaint — the pending check happens first, not after",
      "max_score": 12
    },
    {
      "name": "DL891 resolved as PARTIAL",
      "description": "Records DL891 resolution as PARTIAL (or RESOLVED) — the user accepted 15K miles and a $50 voucher, which is less than the 25K miles requested",
      "max_score": 12
    },
    {
      "name": "DL891 credits logged",
      "description": "Shows the credits-tracker.py add command (or describes logging) for the 15,000 miles AND the $50 voucher from the DL891 resolution — both should be tracked in the travel credits inventory",
      "max_score": 15
    },
    {
      "name": "DL2044 resolved as CLOSED",
      "description": "Records DL2044 resolution as CLOSED (not PENDING, not DENIED) — the user said 'complete silence' and 'done waiting', so CLOSED is correct to prevent the skill from asking again",
      "max_score": 15
    },
    {
      "name": "DL2044 no credits logged",
      "description": "Does NOT attempt to log credits for DL2044 since the airline never responded — there is nothing to log",
      "max_score": 8
    },
    {
      "name": "DOT escalation suggested for DL2044",
      "description": "Suggests or notes that the DL2044 downgrade with no response after 30+ days may warrant a DOT complaint filing, since the user has given up on the airline's direct response",
      "max_score": 10
    },
    {
      "name": "New complaint intake proceeds",
      "description": "After resolving pending complaints, proceeds to intake for the new Spirit NK412 complaint — asks relevant follow-up questions",
      "max_score": 10
    },
    {
      "name": "Voucher expiry captured",
      "description": "The credits-tracker command for the $50 voucher includes the December 2026 expiry date that the user mentioned",
      "max_score": 8
    },
    {
      "name": "Resolve commands shown",
      "description": "Shows or describes the complaints-bank.py resolve commands for both complaints with appropriate --resolution and --note arguments",
      "max_score": 10
    }
  ]
}

evals

tile.json