Analyze eval results, diagnose low-scoring criteria, fix tile content, and re-run evals — the full improvement loop automated
Does it follow best practices?
Evaluation — 100%
↑ 1.02xAgent success when using this tile
Validation for skill structure
Your team has been iterating on a data pipeline tile over the past few months. Three different engineers have contributed to different parts of it — one owns the main skill file, another maintains the operational rules, and a third recently added a reference doc for error handling. No one has done a cross-file review since the initial version.
Recent eval scores on the retry logic scenario have been inconsistent, with agents implementing wildly different retry counts. This suggests there may be conflicting guidance somewhere in the tile. Before the next release, you need a full consistency audit to find any statements that contradict each other across all tile files — not just the retry-related ones.
Produce a file called contradiction_report.md that:
The following files are provided as inputs. Extract them before beginning.
This skill handles data ingestion, transformation, and delivery between upstream producers and downstream consumers.
All connections to data sources must use service account credentials. If authentication fails at connection time, abort the pipeline run immediately and raise an alert.
Apply the configured transform functions in the order specified in the pipeline manifest. If a transform raises an exception, retry the transform up to 5 times before marking the record as failed.
Push transformed records to the downstream sink in the same batch order they were received. Confirm delivery acknowledgement from the sink before marking records as successfully delivered.
On any unrecoverable error, write a failure record to the dead-letter queue and continue processing remaining records.
=============== FILE: inputs/rules/retry-policy.md ===============
All retry operations in this tile follow a conservative retry policy to protect downstream services:
This policy applies to all operations in the tile unless explicitly overridden by a specific step.
=============== FILE: inputs/docs/error-handling.md ===============
The pipeline is designed to be resilient and self-healing. Failures in individual records or connections should not halt the entire pipeline unless they are truly unrecoverable.
If an authentication failure occurs mid-pipeline (e.g., a token expires while the pipeline is running), the pipeline should attempt to refresh credentials and continue. If the refresh also fails, skip the affected batch and log the error — do not abort the full pipeline run.
For records that cannot be processed after all retries are exhausted, write them to the dead-letter queue. The pipeline continues processing all remaining records regardless of individual record failures.
Transform failures are transient in nature. The pipeline should retry failed transforms with a brief pause between attempts. After retries are exhausted, the record is moved to the dead-letter queue.