CtrlK
BlogDocsLog inGet started
Tessl Logo

ainativedevcon2026/talk-podjarny-skills-are-the-new-code-aindc

Skills are the new Code by Guy Podjarny

89

1.38x
Quality

90%

Does it follow best practices?

Impact

87%

1.38x

Average score across 4 eval scenarios

SecuritybySnyk

Passed

No known issues

Overview
Quality
Evals
Security
Files

criteria.jsonevals/scenario-2/

{
  "context": "Tests whether the agent follows the lookup protocol (outline.md first, then transcript.md), quotes transcript.md verbatim with line citations, preserves transcription artifacts, says 'talk doesn't address this' for out-of-scope questions, and does not fabricate Guy's words.",
  "type": "weighted_checklist",
  "checklist": [
    {
      "name": "Verbatim quotes present",
      "description": "For questions 1, 2, 3, and 5 (which are answered in the transcript), research-brief.md includes at least one quoted passage per answer that is copied character-for-character from transcript.md",
      "max_score": 20
    },
    {
      "name": "Line range citations",
      "description": "Each quoted passage is accompanied by a transcript line citation in the format 'transcript.md L<n>–L<n>' (or equivalent) indicating which lines were read",
      "max_score": 15
    },
    {
      "name": "OpenClaw preserved verbatim",
      "description": "The answer to question 2 retains the word 'OpenClaw' exactly as it appears in the transcript rather than silently replacing it with a corrected term",
      "max_score": 10
    },
    {
      "name": "Transcription artifact annotation",
      "description": "At least one answer includes a parenthetical note on a likely transcription artifact, e.g., '(likely means ...)' or '(probably ...)' — the artifact is quoted but a probable intended word is offered",
      "max_score": 10
    },
    {
      "name": "Quality metrics: talk doesn't address",
      "description": "The answer to question 4 (specific quality score thresholds/metrics) explicitly states that the talk does not address this, rather than providing fabricated or externally-sourced metrics attributed to Guy",
      "max_score": 20
    },
    {
      "name": "No fabricated attribution",
      "description": "No answer presents information inside quotation marks that cannot be verified in transcript.md — paraphrased content is NOT placed in quotation marks",
      "max_score": 15
    },
    {
      "name": "Outline-guided lookup evidence",
      "description": "At least one answer references a section heading, line range, or term from outline.md (e.g., a section name or glossary entry), indicating the agent navigated via outline.md before opening transcript.md",
      "max_score": 10
    }
  ]
}

evals

outline.md

quotes.md

SKILL.md

tessl.json

tile.json

transcript.md