Curated library of 41 public AI agent skills for Ruby on Rails development. Organized by category: planning, testing, code-quality, ddd, engines, infrastructure, api, patterns, context, and orchestration. Covers code review, architecture, security, testing (RSpec), engines, service objects, DDD patterns, and TDD automation. Repository workflows remain documented in GitHub but are intentionally excluded from the Tessl tile.
95
93%
Does it follow best practices?
Impact
96%
1.77xAverage score across 41 eval scenarios
Passed
No known issues
A systematic process for improving tessl evaluation scores across all skills in this library.
This guide provides a repeatable workflow for diagnosing and fixing skill evaluation failures. It was developed while optimizing release-engine and is designed to be applicable to any skill in the library.
Full-library eval on claude-sonnet-4-6 (32 scenarios, tile igmarin/rails-agent-skills):
Per the library's eval strategy: a skill that only beats baseline marginally is under-specified. The gap between baseline and with-context is the signal that skills are earning their token cost. Chasing the baseline up means codifying what the model already knows — bloat without signal.
What this means in practice:
| Scenario | Skill | Baseline | With ctx | Lift |
|---|---|---|---|---|
| S32 | plan-tickets | 30% | 100% | +70 |
| S8 | integrate-api-client | 40% | 100% | +60 |
| S24 | generate-tasks | 43% | 100% | +57 |
| S13 | integrate-api-client | 45% | 100% | +55 |
| S4 | refactor-code | 60% | 100% | +40 |
| S14 | create-prd | 62% | 100% | +38 |
| S10 | apply-code-conventions (logging + backtrace) | 65% | 100% | +35 |
| S3 | create-service-object | 71% | 100% | +29 |
| S12 | implement-graphql | 71% | 100% | +27 |
| S27 | write-yard-docs (inline tagged notes) | 76% | 100% | +24 |
tessl CLI with eval capabilitiesBefore running scenario evaluations, check the skill's intrinsic quality score:
tessl skill review --optimize release-engineThis analyzes the skill file itself for:
Example improvement:
This step catches skill-level issues before running expensive scenario evaluations.
Run the evaluation and identify failing criteria:
tessl eval view --lastDocument the scores for both scenarios:
Compare the two scenarios to understand the problem type:
| Pattern | Meaning | Solution Approach |
|---|---|---|
| Baseline low, With-context high | Skill provides necessary guidance | Good - skill is valuable |
| Baseline high, With-context low | Context dilution or conflicting signals | Fix: Reduce noise, strengthen signal |
| Both low | Missing or incorrect instructions | Fix: Add explicit requirements |
| Both high | Skill is well-optimized | Document as reference |
For each failing criterion, determine the root cause:
Symptom: Agent doesn't mention required element in output Fix: Add explicit requirement to Output Style section
Example from release-engine:
## Output Style
When asked to prepare a release, your output MUST include:
5. **Gemspec verification** — Explicitly state that gemspec metadata
(authors, description, files, dependencies) was verified
6. **Test suite status** — Confirm the full test suite passes
(`bundle exec rspec`) before proceedingSymptom: Scores drop when more files are present Fix: Implement progressive disclosure
## Extended Resources (Progressive Disclosure)
Load these files only when their specific content is needed:
- **[assets/checklist.md](assets/checklist.md)** — Use when you need
detailed verification steps
- **[references/advanced.md](references/advanced.md)** — Use for edge cases
or complex scenariosSymptom: Important instruction exists but agent doesn't prioritize it Fix: Move to prominent position with explicit imperative language
Priority order for maximum score improvement:
Output Style section (highest impact for baseline scores)
Progressive disclosure (highest impact for with-context scores)
assets/, references/Quick Reference table (medium impact for both)
HARD-GATE section (if applicable)
Run the evaluation again:
tessl eval run --skill release-engineVerify both scenarios achieve 100% or acceptable threshold.
Iterate if needed: return to Step 1 with new scores.
Problem: Agent doesn't mention verifying gemspec metadata
Solution: Add to Output Style:
5. **Gemspec verification** — Explicitly state that gemspec metadata
(authors, description, files, dependencies) was verified against
tested Rails/Ruby versionsProblem: Agent doesn't confirm test suite passes
Solution: Add to Output Style:
6. **Test suite status** — Confirm the full test suite passes
(`bundle exec rspec`) before proceeding to buildProblem: Agent doesn't produce upgrade instructions for host apps
Solution:
assets/upgrade_template.md for referenceProblem: Agent doesn't mention blockers or state "no blockers"
Solution: Add to Output Style:
7. **Release blockers** — Call out any open issues preventing release,
or explicitly state "No blockers"| Check | Baseline | With Context |
|---|---|---|
| Gemspec verified | 2/8 (25%) | 5/8 (63%) |
| Test suite mentioned | 7/8 (88%) | 8/8 (100%) |
| Upgrade notes produced | 3/10 (30%) | 10/10 (100%) |
| Total | 86/100 | 97/100 |
assets/examples.md had circular references and wrong pathsassets/examples.md with clear relative paths and loading instructions| Check | Baseline | With Context |
|---|---|---|
| Feature branch task 0.0 | 0/10 (0%) | 10/10 (100%) |
| TDD write-spec sub-task | 3/10 (30%) | 10/10 (100%) |
| TDD run-spec-fail sub-task | 0/10 (0%) | 10/10 (100%) |
| TDD run-spec-pass sub-task | 0/8 (0%) | 8/8 (100%) |
| YARD post-implementation gate | 0/10 (0%) | 10/10 (100%) |
| Relevant Files section | 0/8 (0%) | 8/8 (100%) |
| Total | 43/100 | 100/100 |
Added Output Style section with 7 explicit requirements mapping directly to evaluation criteria:
Rewrote frontmatter description to include explicit trigger words:
When you see this pattern, first determine if it's a signal problem or a training knowledge gap:
Signal Problem (fixable):
Training Knowledge Gap (inherent limitation):
For generate-tasks:
Use this structure to maximize eval scores from the start. The same six sections are codified in skill-structure.md — that doc is the canonical SKILL.md shape the reinforcement pass enforces; this template is the fillable version.
---
name: skill-name
description: >
Use when [concrete trigger]. Covers [specific topics].
Trigger words: [symptom], [tool], [scenario].
---
# Skill Title
Use this skill when [brief trigger description].
## Quick Reference
| Aspect | Rule |
|--------|------|
| [Key point] | [Specific rule] |
| [Key point] | [Specific rule] |
## HARD-GATE (if applicable)DO NOT [forbidden action]. ALWAYS [required action].
## Core Process
1. [Step with specific command or pattern]
2. [Step with specific command or pattern]
3. [Step with specific command or pattern]
## Extended Resources (Progressive Disclosure)
Load these files only when needed:
- **[assets/template.md](assets/template.md)** — Use when [specific condition]
- **[references/advanced.md](references/advanced.md)** — Use when [specific condition]
## Output Style
When asked to [task], your output MUST include:
1. **[Criterion name]** — [Explicit requirement]
2. **[Criterion name]** — [Explicit requirement]
3. **[Criterion name]** — [Explicit requirement]
## Examples
[Minimal working example]
## Integration
| Skill | When to chain |
|-------|---------------|
| [skill-name] | When [condition] |Use this guide as the starting point for fixing any skill in the library:
# 1. Check skill quality first
tessl skill review --optimize <skill-name>
# 2. Run scenario evaluations
tessl eval run . --variant without-context --variant with-context
# 3. View specific failures
tessl eval view --last
# 4. Follow this guide's workflow (Step 0 → Step 5)For each skill below 100%:
tessl skill review --optimizetessl eval view --lastPriority order for optimization:
Document new patterns discovered in the Maintainer Notes section.
Maintainer Notes:
docs
evals
scenario-1
scenario-2
scenario-3
scenario-4
scenario-5
scenario-6
scenario-7
scenario-8
scenario-9
scenario-10
scenario-11
scenario-12
scenario-13
scenario-14
scenario-15
scenario-16
scenario-17
scenario-18
scenario-19
scenario-20
scenario-21
scenario-22
scenario-23
scenario-24
scenario-25
scenario-26
scenario-27
scenario-28
scenario-29
scenario-30
scenario-31
scenario-32
scenario-33
scenario-34
scenario-35
scenario-36
scenario-37
scenario-38
scenario-39
scenario-40
scenario-41
mcp_server
skills
api
generate-api-collection
implement-graphql
code-quality
apply-code-conventions
apply-stack-conventions
assets
snippets
code-review
refactor-code
respond-to-review
review-architecture
security-check
context
load-context
setup-environment
ddd
define-domain-language
model-domain
review-domain-boundaries
engines
create-engine
create-engine-installer
document-engine
extract-engine
release-engine
review-engine
test-engine
upgrade-engine
infrastructure
implement-background-job
implement-hotwire
optimize-performance
review-migration
seed-database
version-api
orchestration
skill-router
patterns
create-service-object
implement-calculator-pattern
write-yard-docs
planning
create-prd
generate-tasks
plan-tickets
testing
plan-tests
test-service
triage-bug
write-tests
workflows