Curated library of 42 public AI agent skills for Ruby on Rails development, plus 5 callable workflow skills. Organized by category: planning, testing, code-quality, ddd, engines, infrastructure, api, patterns, context, orchestration, and workflows. Covers code review, architecture, security, testing (RSpec), engines, service objects, DDD patterns, and TDD automation.
96
96%
Does it follow best practices?
Impact
—
No eval scenarios have been run
Risky
Do not use without reviewing
A systematic process for improving tessl evaluation scores across all skills in this library.
This guide provides a repeatable workflow for diagnosing and fixing skill evaluation failures. It was developed while optimizing release-engine and is designed to be applicable to any skill in the library.
Full-library eval on claude-sonnet-4-6 (32 scenarios, tile igmarin/rails-agent-skills):
Per the library's eval strategy: a skill that only beats baseline marginally is under-specified. The gap between baseline and with-context is the signal that skills are earning their token cost. Chasing the baseline up means codifying what the model already knows — bloat without signal.
What this means in practice:
| Scenario | Skill | Baseline | With ctx | Lift |
|---|---|---|---|---|
| S32 | plan-tickets | 30% | 100% | +70 |
| S8 | integrate-api-client | 40% | 100% | +60 |
| S24 | generate-tasks | 43% | 100% | +57 |
| S13 | integrate-api-client | 45% | 100% | +55 |
| S4 | refactor-code | 60% | 100% | +40 |
| S14 | create-prd | 62% | 100% | +38 |
| S10 | apply-code-conventions (logging + backtrace) | 65% | 100% | +35 |
| S3 | create-service-object | 71% | 100% | +29 |
| S12 | implement-graphql | 71% | 100% | +27 |
| S27 | write-yard-docs (inline tagged notes) | 76% | 100% | +24 |
tessl CLI with eval capabilitiesBefore running scenario evaluations, check the skill's intrinsic quality score:
tessl skill review --optimize release-engineThis analyzes the skill file itself for:
Example improvement:
This step catches skill-level issues before running expensive scenario evaluations.
Run the evaluation and identify failing criteria:
tessl eval view --lastDocument the scores for both scenarios:
Compare the two scenarios to understand the problem type:
| Pattern | Meaning | Solution Approach |
|---|---|---|
| Baseline low, With-context high | Skill provides necessary guidance | Good - skill is valuable |
| Baseline high, With-context low | Context dilution or conflicting signals | Fix: Reduce noise, strengthen signal |
| Both low | Missing or incorrect instructions | Fix: Add explicit requirements |
| Both high | Skill is well-optimized | Document as reference |
For each failing criterion, determine the root cause:
Symptom: Agent doesn't mention required element in output Fix: Add explicit requirement to Output Style section
Example from release-engine:
## Output Style
When asked to prepare a release, your output MUST include:
5. **Gemspec verification** — Explicitly state that gemspec metadata
(authors, description, files, dependencies) was verified
6. **Test suite status** — Confirm the full test suite passes
(`bundle exec rspec`) before proceedingSymptom: Scores drop when more files are present Fix: Implement progressive disclosure
## Extended Resources (Progressive Disclosure)
Load these files only when their specific content is needed:
- **[assets/checklist.md](assets/checklist.md)** — Use when you need
detailed verification steps
- **[references/advanced.md](references/advanced.md)** — Use for edge cases
or complex scenariosSymptom: Important instruction exists but agent doesn't prioritize it Fix: Move to prominent position with explicit imperative language
Priority order for maximum score improvement:
Output Style section (highest impact for baseline scores)
Progressive disclosure (highest impact for with-context scores)
assets/, references/Quick Reference table (medium impact for both)
HARD-GATE section (if applicable)
Run the evaluation again:
tessl eval run --skill release-engineVerify both scenarios achieve 100% or acceptable threshold.
Iterate if needed: return to Step 1 with new scores.
Problem: Agent doesn't mention verifying gemspec metadata
Solution: Add to Output Style:
5. **Gemspec verification** — Explicitly state that gemspec metadata
(authors, description, files, dependencies) was verified against
tested Rails/Ruby versionsProblem: Agent doesn't confirm test suite passes
Solution: Add to Output Style:
6. **Test suite status** — Confirm the full test suite passes
(`bundle exec rspec`) before proceeding to buildProblem: Agent doesn't produce upgrade instructions for host apps
Solution:
assets/upgrade_template.md for referenceProblem: Agent doesn't mention blockers or state "no blockers"
Solution: Add to Output Style:
7. **Release blockers** — Call out any open issues preventing release,
or explicitly state "No blockers"| Check | Baseline | With Context |
|---|---|---|
| Gemspec verified | 2/8 (25%) | 5/8 (63%) |
| Test suite mentioned | 7/8 (88%) | 8/8 (100%) |
| Upgrade notes produced | 3/10 (30%) | 10/10 (100%) |
| Total | 86/100 | 97/100 |
assets/examples.md had circular references and wrong pathsassets/examples.md with clear relative paths and loading instructions| Check | Baseline | With Context |
|---|---|---|
| Feature branch task 0.0 | 0/10 (0%) | 10/10 (100%) |
| TDD write-spec sub-task | 3/10 (30%) | 10/10 (100%) |
| TDD run-spec-fail sub-task | 0/10 (0%) | 10/10 (100%) |
| TDD run-spec-pass sub-task | 0/8 (0%) | 8/8 (100%) |
| YARD post-implementation gate | 0/10 (0%) | 10/10 (100%) |
| Relevant Files section | 0/8 (0%) | 8/8 (100%) |
| Total | 43/100 | 100/100 |
Added Output Style section with 7 explicit requirements mapping directly to evaluation criteria:
Rewrote frontmatter description to include explicit trigger words:
When you see this pattern, first determine if it's a signal problem or a training knowledge gap:
Signal Problem (fixable):
Training Knowledge Gap (inherent limitation):
For generate-tasks:
Use this structure to maximize eval scores from the start. The same six sections are codified in skill-structure.md — that doc is the canonical SKILL.md shape the reinforcement pass enforces; this template is the fillable version.
---
name: skill-name
description: >
Use when [concrete trigger]. Covers [specific topics].
Trigger words: [symptom], [tool], [scenario].
---
# Skill Title
Use this skill when [brief trigger description].
## Quick Reference
| Aspect | Rule |
|--------|------|
| [Key point] | [Specific rule] |
| [Key point] | [Specific rule] |
## HARD-GATE (if applicable)DO NOT [forbidden action]. ALWAYS [required action].
## Core Process
1. [Step with specific command or pattern]
2. [Step with specific command or pattern]
3. [Step with specific command or pattern]
## Extended Resources (Progressive Disclosure)
Load these files only when needed:
- **[assets/template.md](assets/template.md)** — Use when [specific condition]
- **[references/advanced.md](references/advanced.md)** — Use when [specific condition]
## Output Style
When asked to [task], your output MUST include:
1. **[Criterion name]** — [Explicit requirement]
2. **[Criterion name]** — [Explicit requirement]
3. **[Criterion name]** — [Explicit requirement]
## Examples
[Minimal working example]
## Integration
| Skill | When to chain |
|-------|---------------|
| [skill-name] | When [condition] |Use this guide as the starting point for fixing any skill in the library:
# 1. Check skill quality first
tessl skill review --optimize <skill-name>
# 2. Run scenario evaluations
tessl eval run . --variant without-context --variant with-context
# 3. View specific failures
tessl eval view --last
# 4. Follow this guide's workflow (Step 0 → Step 5)For each skill below 100%:
tessl skill review --optimizetessl eval view --lastPriority order for optimization:
Document new patterns discovered in the Maintainer Notes section.
Maintainer Notes:
build
docs
mcp_server
skills
api
generate-api-collection
implement-graphql
code-quality
apply-code-conventions
apply-stack-conventions
assets
snippets
code-review
refactor-code
respond-to-review
review-architecture
security-check
context
load-context
setup-environment
ddd
define-domain-language
model-domain
review-domain-boundaries
engines
create-engine
create-engine-installer
document-engine
extract-engine
release-engine
review-engine
test-engine
upgrade-engine
infrastructure
implement-background-job
implement-hotwire
optimize-performance
review-migration
seed-database
version-api
orchestration
skill-router
patterns
create-service-object
implement-calculator-pattern
write-yard-docs
planning
create-prd
generate-tasks
plan-tickets
testing
plan-tests
test-service
triage-bug
write-tests
workflows