Operate the joelclaw Kubernetes cluster — Talos Linux on Colima (Mac Mini). Deploy services, check health, debug pods, recover from restarts, add ports, manage Helm releases, inspect logs, fix networking. Triggers on: 'kubectl', 'pods', 'deploy to k8s', 'cluster health', 'restart pod', 'helm install', 'talosctl', 'colima', 'nodeport', 'flannel', 'port mapping', 'k8s down', 'cluster not working', 'add a port', 'PVC', 'storage', any k8s/Talos/Colima infrastructure task. Also triggers on service-specific deploy: 'deploy redis', 'redeploy inngest', 'livekit helm', 'pds not responding'.
90
88%
Does it follow best practices?
Impact
Pending
No eval scenarios have been run
Advisory
Suggest reviewing before use
Quality
Discovery
100%Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.
This is an excellent skill description that clearly defines a specific infrastructure domain (a named Kubernetes cluster with a particular tech stack), enumerates concrete actions comprehensively, and provides an extensive explicit trigger list covering CLI commands, user problem statements, and service-specific operations. The description is well-structured, uses third person voice, and would allow Claude to confidently select this skill from a large pool without ambiguity.
| Dimension | Reasoning | Score |
|---|---|---|
Specificity | Lists multiple specific concrete actions: deploy services, check health, debug pods, recover from restarts, add ports, manage Helm releases, inspect logs, fix networking. Very comprehensive enumeration of capabilities. | 3 / 3 |
Completeness | Clearly answers both 'what' (operate the joelclaw Kubernetes cluster, deploy services, debug pods, etc.) and 'when' (explicit 'Triggers on:' clause with extensive list of trigger terms and scenarios). The trigger guidance is explicit and thorough. | 3 / 3 |
Trigger Term Quality | Excellent coverage of natural trigger terms including CLI tools ('kubectl', 'talosctl', 'helm install'), user-phrased problems ('k8s down', 'cluster not working', 'pds not responding'), specific concepts ('nodeport', 'flannel', 'PVC'), and service-specific deploys ('deploy redis', 'redeploy inngest'). Covers many natural variations users would actually say. | 3 / 3 |
Distinctiveness Conflict Risk | Highly distinctive — scoped to a specific named cluster ('joelclaw'), a specific tech stack (Talos Linux on Colima on Mac Mini), and includes very specific trigger terms like 'talosctl', 'colima', 'flannel', and service-specific deploys. Unlikely to conflict with generic DevOps or coding skills. | 3 / 3 |
Total | 12 / 12 Passed |
Implementation
77%Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.
This is a highly actionable and operationally thorough cluster operations skill with excellent concrete commands, clear recovery workflows, and good validation checkpoints. Its main weakness is length — several sections (Agent Runner, NAS NFS, AIStor details) could be offloaded to reference files to keep the main skill leaner. The content is clearly written by someone with deep operational experience and captures hard-won incident knowledge effectively.
Suggestions
Move the Agent Runner section (~80 lines) to a separate reference file like references/agent-runner.md, keeping only a 2-3 line summary with a link in the main skill
Move the NAS NFS Access section to references/nas-nfs.md, retaining just the PV table and a link for mounting details and troubleshooting
| Dimension | Reasoning | Score |
|---|---|---|
Conciseness | The skill is dense with operational knowledge specific to this cluster that Claude wouldn't know, but it's quite long (~400+ lines) and includes some sections that could be offloaded to reference files (e.g., Agent Runner details, NAS NFS setup). The incident dates and ADR references earn their place, but the Agent Runner section is particularly verbose for a cluster operations skill. | 2 / 3 |
Actionability | Excellent actionability throughout — concrete bash commands, kubectl invocations, YAML manifests, curl health checks, and specific file paths. Every section provides copy-paste ready commands with exact flags, ports, and paths. The Redis AOF recovery, kubeconfig fix, and deploy commands are all fully executable. | 3 / 3 |
Workflow Clarity | Multi-step processes include clear sequencing with validation checkpoints. The Redis AOF recovery has numbered steps with wait/verify/cleanup. The kubeconfig port drift fix has a clear 3-step sequence. Recovery from Colima crash-loop has ordered steps. Resilience rules explicitly call out verification steps after restarts (uncordon, verify flannel, check pods). | 3 / 3 |
Progressive Disclosure | The skill does reference operations.md for detailed recovery procedures and port mappings, which is good progressive disclosure. However, the main file is very long and includes substantial inline content (Agent Runner section, NAS NFS section) that could be split into separate reference files. The Agent Runner section alone is ~80 lines that could be a reference. | 2 / 3 |
Total | 10 / 12 Passed |
Validation
90%Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.
Validation — 10 / 11 Passed
Validation for skill structure
| Criteria | Description | Result |
|---|---|---|
frontmatter_unknown_keys | Unknown frontmatter key(s) found; consider removing or moving to metadata | Warning |
Total | 10 / 11 Passed | |
825972c
Table of Contents
If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.