Write professional, persuasive complaint letters to US airlines emphasizing loyalty status, DOT regulations, and airline commitments.
93
94%
Does it follow best practices?
Impact
93%
1.38xAverage score across 10 eval scenarios
Advisory
Suggest reviewing before use
{
"context": "Tests whether the agent follows the Phase 1 intake methodology: identifying the essential questions about flight details, delay specifics, consequential damages, loyalty tier, prior contact history, and documentation, while organizing them into logical categories rather than firing off a flat numbered list, and providing a realistic preliminary case assessment.",
"type": "weighted_checklist",
"checklist": [
{
"name": "Flight details requested",
"description": "Asks for specific flight details including at minimum the flight number and/or date",
"max_score": 8
},
{
"name": "Delay specifics requested",
"description": "Asks about the delay duration and/or the reason given by the airline",
"max_score": 8
},
{
"name": "Consequential damages asked",
"description": "Asks about missed connections, financial losses, or other downstream impacts caused by the delay",
"max_score": 10
},
{
"name": "Loyalty tier asked",
"description": "Asks for the specific AAdvantage tier/status level (not just whether they are a member)",
"max_score": 10
},
{
"name": "Documentation asked",
"description": "Asks what documentation the client has (receipts, airline communications, boarding passes, etc.)",
"max_score": 7
},
{
"name": "Severity amplifiers identified",
"description": "Identifies or asks about at least one severity amplifier (e.g., tarmac confinement, no communication from the airline, denied meal vouchers, medical situation)",
"max_score": 9
},
{
"name": "Desired outcome asked",
"description": "Asks what outcome the client is seeking (e.g., miles credit, voucher, cash refund, written apology)",
"max_score": 7
},
{
"name": "Prior contact asked",
"description": "Asks whether the client has already contacted American Airlines about this issue and whether any offers have been made or accepted — per SKILL.md Phase 1 'prior customer service contacts and any offers already made or accepted'",
"max_score": 10
},
{
"name": "Questions organized in categories",
"description": "Questions are grouped into logical categories or phases rather than presented as a flat numbered list of 10+ items all at once",
"max_score": 10
},
{
"name": "Essential questions first",
"description": "Identifies or separates the essential must-have questions (flight details, what happened, loyalty tier) from context-dependent follow-ups",
"max_score": 10
},
{
"name": "Case strength factors named",
"description": "Names at least two specific factors that would strengthen or weaken the case (e.g., elite status level, presence of consequential damages, delay duration, documentation quality)",
"max_score": 11
}
]
}evals
scenario-1
scenario-2
scenario-3
scenario-4
scenario-5
scenario-6
scenario-7
scenario-8
scenario-9
scenario-10
skills
frequent-flyer-advocate