Audit and improve skill collections with an 8-dimension scoring framework, duplication detection, remediation planning, and CI quality gates; use when evaluating skill quality, generating remediation plans, validating report format, or enforcing repository-wide skill artifact conventions.
Does it follow best practices?
Evaluation — 93%
↑ 1.33xAgent success when using this tile
Validation for skill structure
Domain-specific evaluation frameworks for specialized skill quality assessment.
Purpose: Extend skill-judge framework with custom metrics
Use Cases: Domain-specific requirements, organizational standards
Priority: LOW - Use when standard metrics insufficient
Consider custom metrics when:
Each custom metric should define:
For skills handling security-sensitive operations:
## D9: Security Compliance (15 points)
### Purpose
Ensure skills follow security best practices.
### Evaluation Method
1. Check for security warnings
2. Verify no hardcoded secrets
3. Confirm input validation guidance
4. Review authentication patterns
### Scoring Rubric
| Score | Criteria |
|-------|----------|
| 13-15 | All security checks passed |
| 10-12 | Minor issues (warnings) |
| 7-9 | Moderate issues |
| 0-6 | Critical security flaws |For skills involving UI/UX:
## D10: Accessibility Compliance (10 points)
### Purpose
Ensure skills include accessibility guidance.
### Evaluation Method
1. Check for WCAG references
2. Verify screen reader guidance
3. Confirm keyboard navigation
4. Review color contrast guidance
### Scoring Rubric
| Score | Criteria |
|-------|----------|
| 9-10 | Comprehensive accessibility |
| 7-8 | Basic accessibility |
| 5-6 | Minimal accessibility |
| 0-4 | No accessibility guidance |For skills involving performance-critical code:
## D11: Performance Guidance (10 points)
### Purpose
Ensure skills include performance considerations.
### Evaluation Method
1. Check for performance warnings
2. Verify benchmark examples
3. Confirm optimization guidance
4. Review anti-patterns
### Scoring Rubric
| Score | Criteria |
|-------|----------|
| 9-10 | Comprehensive performance |
| 7-8 | Good performance guidance |
| 5-6 | Basic performance notes |
| 0-4 | No performance guidance |// evaluate-custom.ts
interface CustomMetric {
name: string;
maxPoints: number;
evaluate(skillPath: string): number;
}
const customMetrics: CustomMetric[] = [
{
name: "Security Compliance",
maxPoints: 15,
evaluate: (skillPath) => {
// Custom evaluation logic
const content = readFile(skillPath);
let score = 0;
if (content.includes("NEVER hardcode")) score += 5;
if (content.includes("input validation")) score += 5;
if (content.includes("authentication")) score += 5;
return score;
}
},
{
name: "Accessibility",
maxPoints: 10,
evaluate: (skillPath) => {
const content = readFile(skillPath);
let score = 0;
if (content.includes("WCAG")) score += 3;
if (content.includes("screen reader")) score += 3;
if (content.includes("keyboard")) score += 2;
if (content.includes("contrast")) score += 2;
return score;
}
}
];
function evaluateWithCustom(skillPath: string): ScoreReport {
// Standard evaluation
const standardScore = evaluateStandard(skillPath);
// Custom metrics
const customScores = customMetrics.map(m => ({
name: m.name,
score: m.evaluate(skillPath),
max: m.maxPoints
}));
return {
standard: standardScore,
custom: customScores,
total: standardScore + customScores.reduce((a, b) => a + b.score, 0),
maxTotal: 120 + customScores.reduce((a, b) => a + b.max, 0)
};
}# custom-metrics.yaml
metrics:
- name: security-compliance
enabled: true
maxPoints: 15
appliesTo:
- "security-*"
- "auth-*"
- "*-api"
- name: accessibility
enabled: true
maxPoints: 10
appliesTo:
- "ui-*"
- "frontend-*"
- "design-*"
- name: performance
enabled: true
maxPoints: 10
appliesTo:
- "*-optimization"
- "*-performance"
- "database-*"For organizations with regulatory requirements:
## Enterprise Compliance Metrics
### D12: Documentation Standards (10 points)
**Applies to**: All skills
**Requirements**:
- [ ] Version history maintained
- [ ] Owner/contact documented
- [ ] Review date specified
- [ ] Approval workflow followed
**Scoring**:
- 10: All compliance requirements met
- 7: Minor documentation gaps
- 4: Significant gaps
- 0: Non-compliant## D13: HIPAA Compliance (15 points)
**Applies to**: Skills handling PHI
**Requirements**:
- PHI handling guidance
- Encryption requirements
- Audit logging patterns
- Access control guidance## D14: PCI-DSS Compliance (15 points)
**Applies to**: Skills handling payment data
**Requirements**:
- Card data handling
- Encryption standards
- Secure transmission
- Storage requirementsWith custom metrics, adjust grade scale:
| Custom Points | New Total | A-Grade Threshold |
|---|---|---|
| +15 | 135 | 122 (90%) |
| +25 | 145 | 131 (90%) |
| +35 | 155 | 140 (90%) |
## Skill Evaluation Report
### Standard Dimensions (120 points)
| Dimension | Score | Max |
|-----------|-------|-----|
| D1: Knowledge Delta | 18 | 20 |
| D2: Mindset | 14 | 15 |
| ... | ... | ... |
| D8: Usability | 14 | 15 |
| **Subtotal** | **102** | **120** |
### Custom Dimensions (25 points)
| Dimension | Score | Max |
|-----------|-------|-----|
| D9: Security | 13 | 15 |
| D10: Accessibility | 8 | 10 |
| **Subtotal** | **21** | **25** |
### Total
| Metric | Value |
|--------|-------|
| Total Score | 123/145 (85%) |
| Grade | B+ |
| Standard Grade | A- |## Custom Metrics Review - Q1 2026
### Active Metrics
| Metric | Skills Applied | Avg Score | Keep? |
|--------|----------------|-----------|-------|
| Security | 12 | 12/15 | Yes |
| Accessibility | 8 | 7/10 | Yes |
| Performance | 5 | 6/10 | Review |
### Recommendations
- Performance metric needs refinement
- Consider adding API documentation metric
- Remove rarely-used metricsWhen a metric is no longer needed:
framework-skill-judge-dimensions.md - Standard dimensionsframework-scoring-rubric.md - Scoring methodologyadvanced-trends-analysis.md - Tracking custom metricsInstall with Tessl CLI
npx tessl i pantheon-ai/skill-quality-auditor@0.1.4evals
scenario-1
scenario-2
scenario-3
scenario-4
scenario-5
references
scripts