Write professional, persuasive complaint letters to US airlines emphasizing loyalty status, DOT regulations, and airline commitments.
93
94%
Does it follow best practices?
Impact
93%
1.38xAverage score across 10 eval scenarios
Advisory
Suggest reviewing before use
{
"context": "Tests whether the skill checks for pending complaints at the start of a session, correctly resolves them based on user responses, logs credits to the travel credits inventory when the resolution includes miles/vouchers, and uses CLOSED for no-response cases to prevent re-asking.",
"type": "weighted_checklist",
"checklist": [
{
"name": "Pending complaints checked first",
"description": "session-log.md shows that pending complaints were checked before starting intake for the new complaint — the pending check happens first, not after",
"max_score": 12
},
{
"name": "DL891 resolved as PARTIAL",
"description": "Records DL891 resolution as PARTIAL (or RESOLVED) — the user accepted 15K miles and a $50 voucher, which is less than the 25K miles requested",
"max_score": 12
},
{
"name": "DL891 credits logged",
"description": "Shows the credits-tracker.py add command (or describes logging) for the 15,000 miles AND the $50 voucher from the DL891 resolution — both should be tracked in the travel credits inventory",
"max_score": 15
},
{
"name": "DL2044 resolved as CLOSED",
"description": "Records DL2044 resolution as CLOSED (not PENDING, not DENIED) — the user said 'complete silence' and 'done waiting', so CLOSED is correct to prevent the skill from asking again",
"max_score": 15
},
{
"name": "DL2044 no credits logged",
"description": "Does NOT attempt to log credits for DL2044 since the airline never responded — there is nothing to log",
"max_score": 8
},
{
"name": "DOT escalation suggested for DL2044",
"description": "Suggests or notes that the DL2044 downgrade with no response after 30+ days may warrant a DOT complaint filing, since the user has given up on the airline's direct response",
"max_score": 10
},
{
"name": "New complaint intake proceeds",
"description": "After resolving pending complaints, proceeds to intake for the new Spirit NK412 complaint — asks relevant follow-up questions",
"max_score": 10
},
{
"name": "Voucher expiry captured",
"description": "The credits-tracker command for the $50 voucher includes the December 2026 expiry date that the user mentioned",
"max_score": 8
},
{
"name": "Resolve commands shown",
"description": "Shows or describes the complaints-bank.py resolve commands for both complaints with appropriate --resolution and --note arguments",
"max_score": 10
}
]
}evals
scenario-1
scenario-2
scenario-3
scenario-4
scenario-5
scenario-6
scenario-7
scenario-8
scenario-9
scenario-10
skills
frequent-flyer-advocate