CtrlK
BlogDocsLog inGet started
Tessl Logo

pantheon-ai/skill-quality-auditor

Audit and improve skill collections with an 8-dimension scoring framework, duplication detection, remediation planning, and CI quality gates; use when evaluating skill quality, generating remediation plans, validating report format, or enforcing repository-wide skill artifact conventions.

Does it follow best practices?

Evaluation93%

1.33x

Agent success when using this tile

Validation for skill structure

Overview
Skills
Evals
Files

pattern-recognition-pipeline.shscripts/

#!/usr/bin/env sh
# Comprehensive Pattern Recognition Pipeline
# Runs all advanced pattern recognition algorithms and generates unified report

set -eu

SKILLS_DIR="${1:-skills}"
OUTPUT_DIR=".context/analysis"
FINAL_REPORT="$OUTPUT_DIR/comprehensive-pattern-analysis-$(date +%Y-%m-%d).md"
TMP_DIR=$(mktemp -d)
trap 'rm -rf "$TMP_DIR"' EXIT INT TERM

echo "🚀 Comprehensive Pattern Recognition Pipeline Starting..."

# Ensure output directory exists
mkdir -p "$OUTPUT_DIR"

# Initialize final report
cat > "$FINAL_REPORT" << 'EOF'
# Comprehensive Pattern Recognition Analysis

## Executive Summary

This report combines **three advanced pattern recognition algorithms** to provide comprehensive quality and duplication analysis:

1. **Enhanced Duplication Detection** - Multi-metric similarity analysis
2. **Semantic Similarity Engine** - NLP-inspired content analysis  
3. **ML Quality Prediction** - Machine learning-based quality assessment

---

EOF

echo "📊 Running Enhanced Duplication Detection..."
if [ -x "skills/skill-quality-auditor/scripts/detect-duplication-enhanced.sh" ]; then
    # Run enhanced duplication detection
    ./skills/skill-quality-auditor/scripts/detect-duplication-enhanced.sh "$SKILLS_DIR" > "$TMP_DIR/duplication-report.md" 2>/dev/null || {
        echo "Warning: Enhanced duplication detection failed, using basic version"
        [ -x "skills/skill-quality-auditor/scripts/detect-duplication.sh" ] && \
            ./skills/skill-quality-auditor/scripts/detect-duplication.sh "$SKILLS_DIR" > "$TMP_DIR/duplication-report.md"
    }
else
    echo "Warning: Enhanced duplication detection script not found"
    echo "## Duplication Analysis\n\n❌ Enhanced duplication script not available\n" > "$TMP_DIR/duplication-report.md"
fi

echo "🧠 Running Semantic Similarity Analysis..."
if [ -x "skills/skill-quality-auditor/scripts/semantic-analysis.sh" ]; then
    ./skills/skill-quality-auditor/scripts/semantic-analysis.sh "$SKILLS_DIR" > "$TMP_DIR/semantic-report.md" 2>/dev/null || {
        echo "Warning: Semantic analysis failed"
        echo "## Semantic Analysis\n\n❌ Semantic analysis script failed\n" > "$TMP_DIR/semantic-report.md"
    }
else
    echo "Warning: Semantic analysis script not found"
    echo "## Semantic Analysis\n\n❌ Semantic analysis script not available\n" > "$TMP_DIR/semantic-report.md"
fi

echo "🤖 Running ML Quality Prediction..."
if [ -x "skills/skill-quality-auditor/scripts/ml-pattern-detection.sh" ]; then
    ./skills/skill-quality-auditor/scripts/ml-pattern-detection.sh "$SKILLS_DIR" > "$TMP_DIR/ml-report.md" 2>/dev/null || {
        echo "Warning: ML pattern detection failed"
        echo "## ML Quality Analysis\n\n❌ ML pattern detection script failed\n" > "$TMP_DIR/ml-report.md"
    }
else
    echo "Warning: ML pattern detection script not found"
    echo "## ML Quality Analysis\n\n❌ ML pattern detection script not available\n" > "$TMP_DIR/ml-report.md"
fi

echo "📋 Consolidating Analysis Results..."

# Extract key metrics from each analysis
DUPLICATION_PAIRS=$(grep -o 'Found [0-9]* duplicate pairs' "$TMP_DIR/duplication-report.md" 2>/dev/null | grep -o '[0-9]*' || echo "0")
CRITICAL_PAIRS=$(grep -o '([0-9]* critical)' "$TMP_DIR/duplication-report.md" 2>/dev/null | grep -o '[0-9]*' || echo "0")
SEMANTIC_PAIRS=$(grep -o '[0-9]* semantically similar pairs' "$TMP_DIR/semantic-report.md" 2>/dev/null | grep -o '[0-9]*' || echo "0")
AVG_QUALITY=$(grep -o 'Average predicted quality.*: [0-9]*%' "$TMP_DIR/ml-report.md" 2>/dev/null | grep -o '[0-9]*' || echo "0")
HIGH_QUALITY=$(grep -o '[0-9]* skills.*≥85%' "$TMP_DIR/ml-report.md" 2>/dev/null | grep -o '^[0-9]*' || echo "0")

# Count total skills analyzed
SKILL_COUNT=$(find "$SKILLS_DIR" -name "SKILL.md" -not -path "*/.deprecated/*" | wc -l)

# Calculate composite health score
if [ "$SKILL_COUNT" -gt 0 ]; then
    # Duplication penalty: -2 points per critical pair, -1 per moderate pair
    DUPLICATION_PENALTY=$(( (CRITICAL_PAIRS * 2) + ((DUPLICATION_PAIRS - CRITICAL_PAIRS) * 1) ))
    
    # Semantic clustering bonus: +1 point for good organization (low similarity)
    SEMANTIC_BONUS=$([ "$SEMANTIC_PAIRS" -lt "$((SKILL_COUNT / 4))" ] && echo "5" || echo "0")
    
    # Quality foundation: Use ML average as base
    COMPOSITE_SCORE=$(( AVG_QUALITY - DUPLICATION_PENALTY + SEMANTIC_BONUS ))
    COMPOSITE_SCORE=$([ "$COMPOSITE_SCORE" -lt 0 ] && echo "0" || echo "$COMPOSITE_SCORE")
    COMPOSITE_SCORE=$([ "$COMPOSITE_SCORE" -gt 100 ] && echo "100" || echo "$COMPOSITE_SCORE")
else
    COMPOSITE_SCORE=0
fi

# Determine overall repository grade
if [ "$COMPOSITE_SCORE" -ge 90 ]; then
    REPO_GRADE="A+ (Exceptional)"
    GRADE_COLOR="🟢"
elif [ "$COMPOSITE_SCORE" -ge 80 ]; then
    REPO_GRADE="A (Excellent)" 
    GRADE_COLOR="🟢"
elif [ "$COMPOSITE_SCORE" -ge 70 ]; then
    REPO_GRADE="B (Good)"
    GRADE_COLOR="🟡"
elif [ "$COMPOSITE_SCORE" -ge 60 ]; then
    REPO_GRADE="C (Fair)"
    GRADE_COLOR="🟠"
else
    REPO_GRADE="D (Needs Work)"
    GRADE_COLOR="🔴"
fi

# Add consolidated summary to final report
cat >> "$FINAL_REPORT" << EOF
## 📈 Consolidated Metrics

### Repository Health Score: ${GRADE_COLOR} ${COMPOSITE_SCORE}% - ${REPO_GRADE}

| Metric | Value | Impact |
|--------|-------|--------|
| **Total Skills** | $SKILL_COUNT | Repository size |
| **Duplicate Pairs** | $DUPLICATION_PAIRS ($CRITICAL_PAIRS critical) | -${DUPLICATION_PENALTY} points |
| **Semantic Similarity** | $SEMANTIC_PAIRS pairs | Organization quality |  
| **Average ML Quality** | ${AVG_QUALITY}% | Foundation score |
| **High-Quality Skills** | $HIGH_QUALITY | Quality distribution |

### Key Insights

$(
    if [ "$CRITICAL_PAIRS" -gt 0 ]; then
        echo "🚨 **Critical Issue**: $CRITICAL_PAIRS skill pairs need immediate aggregation"
    elif [ "$DUPLICATION_PAIRS" -gt 5 ]; then
        echo "⚠️ **Moderate Issue**: $DUPLICATION_PAIRS duplicate pairs found - consider consolidation"
    else
        echo "✅ **Good Organization**: Low duplication detected"
    fi
)

$(
    if [ "$AVG_QUALITY" -ge 80 ]; then
        echo "🌟 **Quality Excellence**: ${AVG_QUALITY}% average quality indicates mature skill repository"
    elif [ "$AVG_QUALITY" -ge 70 ]; then
        echo "📈 **Good Quality**: ${AVG_QUALITY}% average with room for improvement"
    else
        echo "🔧 **Quality Focus Needed**: ${AVG_QUALITY}% average requires attention"
    fi
)

$(
    semantic_ratio=$((SEMANTIC_PAIRS * 100 / SKILL_COUNT))
    if [ "$semantic_ratio" -lt 20 ]; then
        echo "🎯 **Well-Organized**: Low semantic overlap indicates clear skill boundaries"
    else
        echo "🔍 **Organization Review**: High semantic overlap suggests consolidation opportunities"
    fi
)

---

EOF

# Append individual analysis reports with section headers
{
    echo "## 🔍 Enhanced Duplication Analysis"
    echo ""
    tail -n +2 "$TMP_DIR/duplication-report.md" 2>/dev/null || echo "Analysis not available"
    echo ""
    echo "---"
    echo ""
    echo "## 🧠 Semantic Similarity Analysis"
    echo ""
    tail -n +2 "$TMP_DIR/semantic-report.md" 2>/dev/null || echo "Analysis not available"
    echo ""
    echo "---"
    echo ""
    echo "## 🤖 ML Quality Prediction Analysis"  
    echo ""
    tail -n +2 "$TMP_DIR/ml-report.md" 2>/dev/null || echo "Analysis not available"
    echo ""
} >> "$FINAL_REPORT"

# Add integrated recommendations
cat >> "$FINAL_REPORT" << EOF
---

## 🎯 Integrated Action Plan

### Immediate Actions (Next 7 Days)
$(
    if [ "$CRITICAL_PAIRS" -gt 0 ]; then
        echo "1. 🚨 **Emergency Aggregation**: Address $CRITICAL_PAIRS critical duplicate pairs"
        echo "2. 🔧 **Apply Navigation Hub Pattern**: Use aggregation-pattern.md methodology"  
        echo "3. 📊 **Quality Validation**: Re-run analysis post-aggregation"
    else
        echo "1. ✅ **Maintain Quality**: Continue current practices"
        echo "2. 📈 **Optimization**: Focus on skills scoring <${AVG_QUALITY}%"
        echo "3. 🔍 **Monitoring**: Set up regular pattern analysis (monthly)"
    fi
)

### Short-term Goals (Next 30 Days)
1. 📚 **Documentation Enhancement**: Improve skills scoring <70% using ML recommendations
2. 🧠 **Semantic Organization**: Address moderate similarity pairs for better boundaries
3. 🔄 **Process Integration**: Incorporate pattern analysis into skill review workflow

### Long-term Strategy (Next Quarter)
1. 🤖 **ML Enhancement**: Implement real machine learning on historical skill-judge data
2. 📊 **Continuous Monitoring**: Automated quality drift detection
3. 🎯 **Predictive Quality**: ML-guided skill creation and improvement

## 🛠️ Implementation Commands

\`\`\`bash
# Run complete analysis pipeline
./skills/skill-quality-auditor/scripts/pattern-recognition-pipeline.sh skills/

# Address critical duplications
./skills/skill-quality-auditor/scripts/generate-remediation-plan.sh --critical-only

# ML-guided improvement for low-scoring skills  
./skills/skill-quality-auditor/scripts/auto-improve-quality.sh --ml-threshold=70

# Schedule regular monitoring
echo "0 9 * * Mon ./skills/skill-quality-auditor/scripts/pattern-recognition-pipeline.sh skills/" >> cron.txt
\`\`\`

## 📚 References

- **Enhanced Duplication**: \`detect-duplication-enhanced.sh\` - Multi-metric similarity analysis
- **Semantic Analysis**: \`semantic-analysis.sh\` - NLP-inspired content analysis
- **ML Quality**: \`ml-pattern-detection.sh\` - Machine learning quality prediction
- **Remediation**: \`aggregation-pattern.md\` - Systematic skill consolidation methodology
- **Implementation**: \`duplication-remediation.md\` - Step-by-step improvement guide

---

*Comprehensive Pattern Recognition Pipeline v1.0*  
*Generated: $(date)*  
*Algorithms: Enhanced Duplication + Semantic Similarity + ML Quality Prediction*
EOF

echo "✅ Comprehensive Pattern Recognition Analysis Complete!"
echo ""
echo "📊 **Repository Health**: ${GRADE_COLOR} ${COMPOSITE_SCORE}% - ${REPO_GRADE}"
echo "🔍 **Duplicate Pairs**: $DUPLICATION_PAIRS ($CRITICAL_PAIRS critical)"  
echo "🧠 **Semantic Pairs**: $SEMANTIC_PAIRS"
echo "🤖 **Average Quality**: ${AVG_QUALITY}%"
echo ""
echo "📋 **Full Report**: $FINAL_REPORT"
echo "📁 **Analysis Files**: $OUTPUT_DIR/"
echo ""

# Display key recommendations
echo "🎯 **Next Steps**:"
if [ "$CRITICAL_PAIRS" -gt 0 ]; then
    echo "   1. 🚨 Address $CRITICAL_PAIRS critical duplicate pairs immediately"
    echo "   2. 🔧 Use aggregation-pattern.md for systematic consolidation"
elif [ "$DUPLICATION_PAIRS" -gt 3 ]; then
    echo "   1. ⚠️ Review $DUPLICATION_PAIRS duplicate pairs for consolidation"  
    echo "   2. 📊 Focus on skills scoring <${AVG_QUALITY}% for quality improvement"
else
    echo "   1. ✅ Repository is well-organized, continue monitoring"
    echo "   2. 📈 Optimize ${HIGH_QUALITY}/$SKILL_COUNT high-quality skills as templates"
fi

echo ""
echo "Run \`cat $FINAL_REPORT | head -50\` to see detailed analysis"

Install with Tessl CLI

npx tessl i pantheon-ai/skill-quality-auditor@0.1.4

SKILL.md

tile.json