WebSocket vs SSE vs polling, reconnection with backoff and jitter, heartbeats, backpressure, message ordering, connection state UI, auth on upgrade, graceful degradation
94
98%
Does it follow best practices?
Impact
90%
1.87xAverage score across 5 eval scenarios
Passed
No known issues
{
"context": "Tests whether the agent uses WebSocket for this bidirectional collaboration use case, implements room-based scoping per list, and proactively handles reconnection, state recovery, message ordering, connection state, and authentication. The task says nothing about reconnection, heartbeats, ordering, auth, or connection state.",
"type": "weighted_checklist",
"checklist": [
{
"name": "websocket-chosen-for-bidirectional",
"description": "WebSocket or Socket.IO is chosen for this bidirectional use case (users both send and receive), not SSE (which is one-way). The agent was NOT asked to justify the transport choice.",
"max_score": 10
},
{
"name": "room-scoped-per-list",
"description": "Updates are scoped to the specific todo list using rooms/channels (e.g., socket.join('list:' + listId)) rather than broadcasting all changes to all connected clients",
"max_score": 10
},
{
"name": "reconnection-with-backoff",
"description": "Client reconnects automatically with exponential backoff and jitter (Socket.IO config or manual implementation). The agent was NOT asked about reconnection.",
"max_score": 12
},
{
"name": "state-recovery-on-reconnect",
"description": "After reconnection, client re-joins the list room and re-fetches the current todo list state to fill any gaps. The agent was NOT asked about state recovery.",
"max_score": 10
},
{
"name": "connection-state-ui",
"description": "The UI shows when the connection is lost or reconnecting. Uses role='status' or aria-live. The agent was NOT asked about connection state.",
"max_score": 8
},
{
"name": "message-ordering-or-conflict-handling",
"description": "Messages include timestamps or sequence numbers, or the system handles conflicting edits (e.g., two users toggling the same item). The agent was NOT asked about ordering or conflicts.",
"max_score": 8
},
{
"name": "auth-during-handshake",
"description": "WebSocket connection authenticates during handshake, not after. The agent was NOT asked about authentication.",
"max_score": 6
},
{
"name": "heartbeat-configured",
"description": "Heartbeat/ping is configured to detect dead connections. The agent was NOT asked about heartbeats.",
"max_score": 6
},
{
"name": "optimistic-updates",
"description": "Client applies changes optimistically (updates UI immediately) and reconciles if the server rejects the change",
"max_score": 6
},
{
"name": "deduplication",
"description": "Client deduplicates messages to avoid showing the same change twice (e.g., receiving back its own emitted event). The agent was NOT asked about deduplication.",
"max_score": 6
},
{
"name": "client-cleanup-on-disconnect",
"description": "Server cleans up on disconnect. Client hook returns cleanup function for unmount.",
"max_score": 4
},
{
"name": "typescript-types-defined",
"description": "Todo and list types are defined with TypeScript interfaces.",
"max_score": 4
}
]
}evals
scenario-1
scenario-2
scenario-3
scenario-4
scenario-5
skills
realtime-web-patterns
verifiers