sre-engineer

tessl i github:jeffallan/claude-skills --skill sre-engineer

Use when defining SLIs/SLOs, managing error budgets, or building reliable systems at scale. Invoke for incident management, chaos engineering, toil reduction, capacity planning.

52%

Overall

Validation — 75%

Implementation — 42%

Activation — 0%

SKILL.md

Review

Evals

Validation

75%

Warnings & errors only

Criteria	Description	Result
metadata_version	'metadata' field is not a dictionary	Warning
license_field	'license' field is missing	Warning
frontmatter_unknown_keys	Unknown frontmatter key(s) found; consider removing or moving to metadata	Warning
body_examples	No examples detected (no code fences and no 'Example' wording)	Warning

	Total	12 / 16 Passed

Implementation

42%

This skill has good structural organization with clear progressive disclosure through reference files, but critically lacks actionable content. It reads more like a job description than executable guidance - telling Claude what an SRE does rather than showing how to do it with concrete examples, code snippets, or specific calculations.

Suggestions

Add executable examples: include a sample SLO calculation (e.g., 'Error budget = 1 - 0.999 = 0.1% = 43.2 minutes/month'), a Prometheus alerting rule snippet, or a Python script for toil automation

Remove the role-playing framing ('You are a senior SRE with 10+ years...') - Claude doesn't need persona instructions, just actionable guidance

Add validation checkpoints to the workflow: e.g., 'After defining SLOs, verify with stakeholders that targets reflect user expectations' or 'Test alerts in staging before production deployment'

Replace the 'Knowledge Reference' section with actual reference content or remove it - listing topics Claude already knows wastes tokens

Dimension	Reasoning	Score
Conciseness	The skill includes some unnecessary framing ('You are a senior SRE with 10+ years of experience') and explains concepts Claude already knows (what golden signals are, what SRE practices involve). The reference table and constraints are reasonably efficient, but the role definition and knowledge reference sections add padding.	2 / 3
Actionability	The skill provides no executable code, commands, or concrete examples. It describes what to do ('Define quantitative SLOs', 'Monitor golden signals') but never shows how with actual Prometheus configs, SLO calculations, or automation scripts. The output templates promise deliverables but don't demonstrate them.	1 / 3
Workflow Clarity	The core workflow lists 5 steps in sequence, but lacks validation checkpoints or feedback loops. For SRE work involving production systems and incident management, there's no guidance on when to verify SLO calculations are correct, how to validate alerting before deployment, or error recovery steps.	2 / 3
Progressive Disclosure	The skill effectively uses a reference table pointing to specific topic files (slo-sli-management.md, error-budget-policy.md, etc.) with clear 'Load When' guidance. References are one level deep and well-organized for discovery.	3 / 3
	Total	8 / 12 Passed

Activation

N/A

Something went wrong

Reviewed

18 days ago

Table of Contents

Validation Implementation Activation

Is this your skill?

If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.