CtrlK
BlogDocsLog inGet started
Tessl Logo

jbaruch/speaker-toolkit

Four-skill presentation system: ingest talks into a rhetoric vault, run interactive clarification, generate a speaker profile, then create new presentations that match your documented patterns. Includes an 88-entry Presentation Patterns taxonomy for scoring, brainstorming, and go-live preparation.

96

1.21x
Quality

93%

Does it follow best practices?

Impact

97%

1.21x

Average score across 30 eval scenarios

SecuritybySnyk

Advisory

Suggest reviewing before use

Overview
Quality
Evals
Security
Files

criteria.jsonevals/scenario-29/

{
  "context": "Tests whether the agent implements the skill's humor post-mortem, blind spot detection, and recency-adapted clarification session — grounding questions in specific talk observations rather than asking generic questions, grading humor effectiveness, capturing spontaneous moments, and compressing questioning for older talks.",
  "type": "weighted_checklist",
  "checklist": [
    {
      "name": "Per-joke grading",
      "description": "Each identified humor beat in the debrief report receives an effectiveness grade from a defined set (e.g., 'hit', 'nod', 'flat', or similar landing-quality labels) — not a numeric score or pass/fail",
      "max_score": 10
    },
    {
      "name": "Joke questions grounded in observations",
      "description": "The debrief questions reference SPECIFIC humor beats from the analysis (verbatim quotes, slide numbers, or meme descriptions) — not generic 'did your jokes land?' questions",
      "max_score": 12
    },
    {
      "name": "Meme slide questions",
      "description": "For meme-only or meme-with-text slides identified in the analysis, the questionnaire asks specifically whether the audience visibly reacted or whether the meme was talked over",
      "max_score": 8
    },
    {
      "name": "Spontaneous humor capture",
      "description": "The questionnaire includes at least one question specifically asking about humor that happened OUTSIDE the transcript — improvised callbacks, audience riffs, recovery humor from failures, or unrecorded moments. Must be a distinct question, not folded into a generic 'anything else?' prompt",
      "max_score": 10
    },
    {
      "name": "Promotion prompt",
      "description": "The report or questionnaire includes a field or question about whether well-received spontaneous humor should be incorporated into future versions of the talk — e.g., a 'promote to planned' flag, a 'reuse recommendation' field, or an explicit question like 'should this become a planned beat?'",
      "max_score": 8
    },
    {
      "name": "Blind spot questions",
      "description": "The questionnaire asks about at least 2 things the analysis CANNOT observe: room energy, audience size, technical failures, physical stage moments, or audience reactions not captured in transcript",
      "max_score": 10
    },
    {
      "name": "Demo reaction questions",
      "description": "For demo sections identified in the analysis, the questionnaire asks about audience engagement or reactions during the demo — whether the audience was engaged, reactions to demo outcomes, any failures or recovery moments. Can be a dedicated demo question or included within blind spot / audience reaction questions",
      "max_score": 8
    },
    {
      "name": "Recency adaptation — recent",
      "description": "For the recent talk (within past week), the debrief contains detailed per-joke and per-interaction questions — at least 5 specific questions about individual humor beats",
      "max_score": 10
    },
    {
      "name": "Recency adaptation — old",
      "description": "For the old talk (2+ years ago), the debrief is COMPRESSED — fewer questions, broader scope (e.g., 'any jokes you remember landing well or badly?' rather than per-joke grading). Demonstrably shorter than the recent talk debrief",
      "max_score": 12
    },
    {
      "name": "Structured output format",
      "description": "The debrief report stores humor grades and blind spot observations in a structured format (JSON with named fields) — not just free-text prose",
      "max_score": 12
    }
  ]
}

evals

README.md

tile.json