Skills are the new Code by Guy Podjarny
89
90%
Does it follow best practices?
Impact
87%
1.38xAverage score across 4 eval scenarios
Passed
No known issues
{
"context": "Tests whether the agent follows the lookup protocol (outline.md first, then transcript.md), quotes transcript.md verbatim with line citations, preserves transcription artifacts, says 'talk doesn't address this' for out-of-scope questions, and does not fabricate Guy's words.",
"type": "weighted_checklist",
"checklist": [
{
"name": "Verbatim quotes present",
"description": "For questions 1, 2, 3, and 5 (which are answered in the transcript), research-brief.md includes at least one quoted passage per answer that is copied character-for-character from transcript.md",
"max_score": 20
},
{
"name": "Line range citations",
"description": "Each quoted passage is accompanied by a transcript line citation in the format 'transcript.md L<n>–L<n>' (or equivalent) indicating which lines were read",
"max_score": 15
},
{
"name": "OpenClaw preserved verbatim",
"description": "The answer to question 2 retains the word 'OpenClaw' exactly as it appears in the transcript rather than silently replacing it with a corrected term",
"max_score": 10
},
{
"name": "Transcription artifact annotation",
"description": "At least one answer includes a parenthetical note on a likely transcription artifact, e.g., '(likely means ...)' or '(probably ...)' — the artifact is quoted but a probable intended word is offered",
"max_score": 10
},
{
"name": "Quality metrics: talk doesn't address",
"description": "The answer to question 4 (specific quality score thresholds/metrics) explicitly states that the talk does not address this, rather than providing fabricated or externally-sourced metrics attributed to Guy",
"max_score": 20
},
{
"name": "No fabricated attribution",
"description": "No answer presents information inside quotation marks that cannot be verified in transcript.md — paraphrased content is NOT placed in quotation marks",
"max_score": 15
},
{
"name": "Outline-guided lookup evidence",
"description": "At least one answer references a section heading, line range, or term from outline.md (e.g., a section name or glossary entry), indicating the agent navigated via outline.md before opening transcript.md",
"max_score": 10
}
]
}evals
scenario-1
scenario-2
scenario-3
scenario-4