Analyze eval results, diagnose low-scoring criteria, fix tile content, and re-run evals — the full improvement loop automated
Does it follow best practices?
Evaluation — 100%
↑ 1.02xAgent success when using this tile
Validation for skill structure
Your platform team maintains a tile for webhook processing that multiple agent-driven services rely on. Recently, during a traffic spike, several services started hammering the downstream API with rapid retry bursts when webhooks failed — causing cascading failures and rate-limit errors. After reviewing the incidents, the team agreed the tile's retry guidance was too vague, leaving agents to implement whatever retry strategy seemed reasonable.
You've been given the tile along with the eval rubric that's currently used to grade agents on their retry implementations. Your job is to update the tile so agents have unambiguous, specific guidance on how to retry failed webhook deliveries — without disrupting the parts of the tile that are already working well.
Update the inputs/SKILL.md file in place with the improvements. Do not rewrite sections that aren't related to retry behavior. Save a brief change_summary.md documenting: which sections you changed, what you added or modified, and why.
The following files are provided as inputs. Extract them before beginning.
Configure your webhook endpoint to accept HTTPS connections with TLS 1.2 or higher. Use HMAC-SHA256 to validate incoming signatures against your webhook secret before processing any payload.
If processing fails after acknowledgement, retry the operation up to 3 times. Wait between retries to avoid overwhelming the downstream service. Log each retry attempt including the attempt number and error received.
Emit a structured log entry for every webhook received, including:
=============== FILE: inputs/eval_criteria.json =============== { "context": "Tests whether agents implement robust retry logic for failed webhook processing", "type": "weighted_checklist", "checklist": [ { "name": "Exponential backoff timing", "description": "Retry delays follow exponential backoff with intervals of 1s, 2s, then 4s", "max_score": 25 }, { "name": "Max retry count", "description": "Maximum of 3 retry attempts before marking the webhook as failed", "max_score": 15 }, { "name": "HMAC-SHA256 validation", "description": "Validates incoming webhook signature using HMAC-SHA256 algorithm", "max_score": 20 }, { "name": "Retry logging", "description": "Each retry attempt is logged with attempt number and error details", "max_score": 20 }, { "name": "Fast acknowledgement", "description": "Returns 200 acknowledgement before heavy processing begins", "max_score": 20 } ] }