**WORKFLOW SKILL** — Iteratively improve skill frontmatter compliance using the Ralph loop pattern. WHEN: "run sensei", "sensei help", "improve skill", "fix frontmatter", "skill compliance", "frontmatter audit", "score skill", "check skill tokens". INVOKES: token counting tools, test runners, git commands. FOR SINGLE OPERATIONS: use token CLI directly for counts/checks.
81
76%
Does it follow best practices?
Impact
Pending
No eval scenarios have been run
Passed
No known issues
Optimize this skill with Tessl
npx tessl skill review --optimize ./.github/skills/sensei/SKILL.md"A true master teaches not by telling, but by refining." - The Skill Sensei
Automates skill frontmatter improvement using the Ralph loop pattern - iteratively improving skills until they reach Medium-High compliance with passing tests, then checking token usage and prompting for action.
When user says "sensei help" or asks how to use sensei, show this:
╔══════════════════════════════════════════════════════════════════╗
║ SENSEI - Skill Frontmatter Compliance Improver ║
╠══════════════════════════════════════════════════════════════════╣
║ ║
║ USAGE: ║
║ Run sensei on <skill-name> # Single skill ║
║ Run sensei on <skill-name> --skip-integration # Fast mode ║
║ Run sensei on <skill1>, <skill2>, ... # Multiple skills ║
║ Run sensei on all Low-adherence skills # Batch by score ║
║ Run sensei on all skills # All skills ║
║ ║
║ EXAMPLES: ║
║ Run sensei on appinsights-instrumentation ║
║ Run sensei on azure-security --skip-integration ║
║ Run sensei on azure-security, azure-observability ║
║ Run sensei on all Low-adherence skills ║
║ ║
║ WHAT IT DOES: ║
║ 1. READ - Load skill's SKILL.md, tests, and token count ║
║ 2. SCORE - Check compliance (Low/Medium/Medium-High/High) ║
║ 3. SCAFFOLD - Create tests from template if missing ║
║ 4. IMPROVE - Add WHEN: triggers (cross-model optimized) ║
║ 5. TEST - Run tests, fix if needed ║
║ 6. REFERENCES- Validate markdown links ║
║ 7. TOKENS - Check token budget, gather suggestions ║
║ 8. SUMMARY - Show before/after with suggestions ║
║ 9. PROMPT - Ask: Commit, Create Issue, or Skip? ║
║ 10. REPEAT - Until Medium-High score + tests pass ║
║ ║
║ TARGET SCORE: Medium-High ║
║ ✓ Description > 150 chars, ≤ 60 words ║
║ ✓ Has "WHEN:" trigger phrases (preferred) ║
║ ✓ No "DO NOT USE FOR:" (unless disambiguation-critical) ║
║ ✓ SKILL.md < 500 tokens (soft limit) ║
║ ║
║ MORE INFO: ║
║ See .github/skills/sensei/README.md for full documentation ║
║ ║
╚══════════════════════════════════════════════════════════════════╝Run sensei on azure-deployRun sensei on azure-security, azure-observabilityRun sensei on all Low-adherence skillsRun sensei on all skillsRun sensei on my-skill --gepa
Run sensei on my-skill --gepa --skip-integration
Run sensei on all skills --gepaWhen --gepa is used, Step 5 (IMPROVE) is replaced with GEPA evolutionary optimization.
Instead of template-based improvements, GEPA parses trigger prompt arrays from the existing
test harness and combines them with content quality heuristics to build a fitness function.
An LLM proposes and evaluates many candidate improvements automatically. Note: GEPA does not
execute Jest tests directly — it uses the test data (prompts) as evaluation inputs.
GEPA score-only mode (no LLM calls, just evaluate current quality):
Run sensei score my-skill
Run sensei score all skillsFor each skill, execute this loop until score >= Medium-High AND tests pass:
plugin/skills/{skill-name}/SKILL.md, tests, and token countname per agentskills.io spec (no --, no start/end -, lowercase alphanumeric)azure-prepare)license, metadata, allowed-tools) if presenttests/{skill-name}/ doesn't exist, create from tests/_template/--gepa flag is set) — Replaces step 5 (IMPROVE FRONTMATTER) with automated optimization; step 6 (IMPROVE TESTS) still runs normally:
tests/{skill-name}/triggers.test.ts and extracts prompt arrayspython .github/skills/sensei/scripts/gepa/auto_evaluator.py optimize --skill {skill-name} --skills-dir plugin/skills --tests-dir testsshouldTriggerPrompts and shouldNotTriggerPrompts to match the finalized frontmatter (including any GEPA changes)cd tests && npm test -- --testPathPatterns={skill-name}cd scripts && npm run references {skill-name} to check markdown linksSensei validates skills against the agentskills.io specification. See SCORING.md for full details.
| Score | Requirements |
|---|---|
| Invalid | Name fails spec validation (consecutive hyphens, start/end hyphen, uppercase, etc.) |
| Low | Basic description, no explicit triggers |
| Medium | Has trigger keywords/phrases, description > 150 chars, >60 words |
| Medium-High | Has "WHEN:" (preferred) or "USE FOR:" triggers, ≤60 words |
| High | Medium-High + compatibility field |
Target: Medium-High (distinctive triggers, concise description)
⚠️ "DO NOT USE FOR:" is risky in multi-skill environments (15+ overlapping skills) — causes keyword contamination on fast-pattern-matching models. Safe for small, isolated skill sets. Use positive routing with
WHEN:for cross-model safety.Exception — disambiguation-critical skills: When a skill's
USE FORtriggers directly overlap with a broader skill (e.g.,azure-prepareowns "deploy to Azure"),DO NOT USE FOR:is REQUIRED to prevent the broader skill from capturing prompts that belong to the specialized skill. Removing it causes routing regressions. Integration tests validate this routing -- run them before removing anyDO NOT USE FOR:clause.
Strongly recommended (reported as suggestions if missing):
license — identifies the license applied to the skillmetadata.version — tracks the skill version for consumersPer the agentskills.io spec, required and optional fields:
---
name: skill-name
description: "[ACTION VERB] [UNIQUE_DOMAIN]. [One clarifying sentence]. WHEN: \"trigger 1\", \"trigger 2\", \"trigger 3\"."
license: MIT
metadata:
version: "1.0"
# Other optional spec fields — preserve if already present:
# metadata.author: example-org
# allowed-tools: Bash(git:*) Read
---IMPORTANT: Use inline double-quoted strings for descriptions. Do NOT use
>-folded scalars (incompatible with skills.sh). Do NOT use|literal blocks (preserves newlines). Keep total description under 1024 characters and ≤60 words.
⚠️ "DO NOT USE FOR:" carries context-dependent risk. In multi-skill environments (10+ skills with overlapping domains), anti-trigger clauses introduce the very keywords that cause wrong-skill activation on Claude Sonnet and fast-pattern-matching models (evidence). For small, isolated skill sets (1-5 skills), the risk is low. When in doubt, use positive routing with
WHEN:and distinctive quoted phrases.Exception:
DO NOT USE FOR:is REQUIRED when a specialized skill's triggers overlap with a broader skill (e.g.,azure-hosted-copilot-sdkvs.azure-prepareon "deploy to Azure"). Without the negative discriminator, the broader skill captures prompts that should route to the specialized one. Always run integration tests before removing aDO NOT USE FOR:clause.
When tests don't exist, scaffold from tests/_template/:
cp -r tests/_template tests/{skill-name}Then update:
SKILL_NAME constant in all test filesshouldTriggerPrompts - 5+ prompts matching new frontmatter triggersshouldNotTriggerPrompts - 5+ prompts matching anti-triggersCommit Messages:
sensei: improve {skill-name} frontmatterplugin/skills/ - these are the Azure skills used by Copilot.github/skills/ contains meta-skills like sensei for developer tooling| Flag | Description |
|---|---|
--skip-integration | Skip integration tests for faster iteration. Only runs unit and trigger tests. |
--gepa | Use GEPA evolutionary optimization instead of template-based improvement. Auto-discovers tests and builds evaluator at runtime. |
⚠️ Skipping integration tests speeds up the loop but may miss runtime issues. Consider running full tests before final commit.
a46a937
If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.