nitinjain999/platform-skills

Production-grade platform engineering handbook — Kubernetes, Terraform, Flux CD, GitHub Actions, AWS, and more.

Quality

84%

Does it follow best practices?

Impact

—

No eval scenarios have been run

Securityby

Passed

No known issues

Changelog

Name: nitinjain999/platform-skills
Rating: 67.2 (1 reviews)
Author: nitinjain999

All notable changes to Platform Skills will be documented in this file.

The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.

[1.28.0] - 2026-05-30

Added

Codex skill support: added agents/openai.yaml so Codex can display and invoke platform-skills with a default $platform-skills prompt.
Cursor support refresh: updated .cursorrules and .cursor/rules/*.mdc to the current release and documented project/global Cursor installation.
Adoption polish: added install.sh, a README install matrix, and copy-ready prompts for platform, DevOps, cloud, and SRE teams evaluating the skill.
Copilot instructions refresh: documented the installer path, added Copilot-specific response behavior, and validated referenced handbook files.
Community growth assets: added PROMPTS.md plus issue templates for agent/editor support, domain guide requests, and example contributions.
Codex and Cursor installation paths in README, Quick Start, Installation, Getting Started, How It Works, and Editor Integrations docs. The supported Codex install model is to clone the repository as ${CODEX_HOME:-$HOME/.codex}/skills/platform-skills so SKILL.md, references/, examples/, and agents/openai.yaml stay together.
Release validation now checks Codex skill metadata and Cursor rule files in addition to Claude plugin structure.

[1.27.0] - 2026-05-28

Added

commands/renovate.md: new /platform-skills:renovate command with four modes — generate (scan repo for dep file types, emit renovate.json covering only detected ecosystems with per-manager automerge/schedule/grouping rules and minimumReleaseAge: "3 days"), workflow (emit .github/workflows/validate-renovate.yml for schema + config-validator + coverage validation on PRs that touch renovate.json), precommit (interactive semver/automerge wizard and best-practices adviser), and all (run all modes in sequence).
references/renovate.md: 11-section reference — manager catalog, preset reference, package rules patterns, dependency dashboard, GitOps integration (Renovate vs Flux Image Reflector ownership boundary), security hardening (pinDigests, osvVulnerabilityAlerts, minimumReleaseAge), regex manager templates, troubleshooting, private registries, custom regex managers, pre-commit hook.
.github/workflows/validate-renovate.yml: CI workflow triggering on PRs that touch renovate.json — three parallel jobs (JSON syntax via jq, config-validator via npx renovate-config-validator, coverage bash scan) with GITHUB_STEP_SUMMARY result table. No secrets required.
renovate.json: added dependencyDashboard, osvVulnerabilityAlerts, minimumReleaseAge: "3 days" on automerge rules, packageRules for gomod/npm/pip, npmDedupe postUpdateOption.
SKILL.md: Renovate row in tool table; /platform-skills:renovate slash command entry.
COMMANDS.md: TOC entry and full /platform-skills:renovate command section (31 commands total).

Contributors

@geetika-sv — Renovate skill

[1.26.0] - 2026-05-26

Added

commands/aws-profile.md: new /platform-skills:aws-profile command with five modes — discover (profile table with TTL traffic-light), status (per-server credential health), switch (prod-gated profile patching across VS Code + Claude Code), login (type-aware auth command), org-scan (AWS Org account coverage).
references/aws-mcp-profiles.md: 15-section reference covering AWS MCP server catalog, context budget table, profile type detection, auth flows, credential lifecycle, credential_process pattern, config file locations, VS Code input variables, team sharing, multi-account EKS, health check checklist, and prod safety.
commands/mcp.md: new configure-aws mode with upfront cost/token warning, CLI-first alternative check, credential_process validation, version-pinned config generation for VS Code and Claude Code, --multi-account EKS named instances, and AWS MCP server catalog with 5 starter kits.
examples/mcp/aws-multiprofile/: 13 example files — VS Code global/workspace/input/template configs, Claude Code snippet, aws-config-credential-process.ini, .gitignore.snippet, and 5 starter-kit JSON files (eks-debug, observe, knowledge, deploy, cost) with real PyPI version pins.
SKILL.md: AWS MCP Profiles row in tool table; /platform-skills:aws-profile slash command entry.
COMMANDS.md: TOC entry and full /platform-skills:aws-profile command section (30 commands total).
references/mcp.md: AWS Multi-Account Profiles pointer section.
Tessl evaluation scenarios (evals/) for quality testing: pr-triage, opa-policy, flux-helmrelease. Each scenario includes task.md, criteria.json, and capability.txt.

Contributors

@geetika-sv — AWS MCP profile management feature

[1.25.20] - 2026-05-24

Added

Tessl evaluation scenarios (evals/) for 3 skill domains: PR triage, OPA/Rego policy, and Flux CD HelmRelease diagnosis.
Tessl manual publish workflow with version input (workflow_dispatch).

Fixed

FluxCD HelmRelease ownership conflict: corrected remediation guidance to identify owning release via meta.helm.sh annotations; evidence commands use helm list -A for cross-namespace owner lookup.
OPA references/opa.md: added Terraform CI Pipeline Placement section.
Triage commands/triage.md: strengthened reply format with concrete example.
CI pipeline stabilisation: eval 500 handled with continue-on-error; publish now uses --skip-evals with separate eval step.

[1.26.14] - 2026-05-24

Fixed

CI: removed --skip-evals from publish step so eval results link to the registry version and appear on the Evals tab.

[1.26.13] - 2026-05-24

Fixed

Eval scores: 100% with-context average across all 3 scenarios (PR triage, OPA policy, HelmRelease).
OPA ci_placement criteria updated to check for runnable conftest test command presence.
FluxCD debug mode: error-pattern routing table prevents HelmRelease errors from being routed through installation check workflow.
HelmRelease ownership conflict (rendered manifests contain a resource that already exists) added to troubleshooting reference with evidence commands and fix.
Triage reply format: concrete example and literal ✅ Fixed ending requirement added to commands/triage.md.

[1.25.4] - 2026-05-24

Changed

SKILL.md description updated to "as a system" framing — signals cross-cutting platform scope to reduce Tessl Distinctiveness conflict risk with single-technology skills.

[1.25.3] - 2026-05-24

Changed

SKILL.md description rewritten to lead with actions (troubleshoot, implement, review, audit) and explicit "Use when" trigger for improved Tessl Completeness score.

[1.25.2] - 2026-05-24

Fixed

SKILL.md frontmatter description value quoted to fix YAML parse error (unquoted colon in value caused mapping values are not allowed error).

[1.25.1] - 2026-05-24

Fixed

SKILL.md frontmatter description trimmed from 1,357 → 580 characters to pass marketplace 1-1024 character validation.
plugin.json description trimmed from 1,254 → 717 characters (same limit).

[1.25.0] - 2026-05-24

Added

FluxCD reference suite — 8 new deep-dive references + Datadog/Dynatrace integration

references/fluxcd-sources.md — authoritative source CRD reference: GitRepository (auth options, sparse checkout, SSH note), OCIRepository (Cosign/Notation keyless verification, cloud OIDC, layerSelector), HelmRepository, HelmChart, Bucket, ExternalArtifact, ArtifactGenerator (monorepo decomposition + Helm chart composition, copy strategies: Overwrite/Merge/Extract). Source selection decision matrix for all 7 source kinds.
references/fluxcd-resourcesets.md — complete ResourceSet and ResourceSetInputProvider reference: template syntax (<< >> delimiters, slim-sprig functions), input strategies (Flatten/Permute with name normalization rules), all 8 provider types (Static, OCIArtifactTag, ECRArtifactTag, GitHubTag, GitHubPullRequest, GitLabTag, AzureContainerTag, ExternalService), CEL-based dependency readiness expressions, conditional resource inclusion, copyFrom annotation, built-in input fields, deduplication, gitless multi-tenant fleet pattern, preview environments per PR, gitless image automation with limit: 1 warning.
references/fluxcd-notifications.md — notification-controller reference: Provider (all 4 categories: messaging, alerting/monitoring, event streaming, Git commit status; key spec fields), Alert (severity levels, inclusion/exclusion regex filtering, eventSources with wildcard/cross-namespace/matchLabels, eventMetadata), Receiver (HMAC webhook, CEL resourceFilter, webhookPath from status), common patterns: Slack error alerting, GitHub commit status on PRs, Datadog event forwarding, immediate Git sync on push.
references/fluxcd-operator.md — Flux Operator deep dive: FluxInstance rules (one per cluster, named flux), cluster sizing table (small/medium/large: concurrency and memory), multi-tenancy (SA impersonation, cross-namespace restrictions), sync modes (OCI gitless vs Git), kustomize patches for controller Deployments, optional components (image-reflector, image-automation, source-watcher), FluxReport (5-minute auto-update, reporting interval annotation), runtime ConfigMap (flux-runtime-info) with SSA Merge for Terraform co-ownership, reconciliation trigger annotation, migration path from flux bootstrap.
references/fluxcd-kustomization.md — Kustomization advanced reference: core fields with descriptions, CEL-based dependency readyExpr, postBuild substituteFrom with correct Kustomization scoping rule, SOPS decryption (age key + Cloud KMS workload identity), CEL health check expressions, remote cluster deployment (secret-based and secretless workload identity), SSA apply behaviour annotations (Override/Merge/IfNotPresent/Ignore/force/prune-Disabled), status inventory format, reconciliation triggers, common mistakes table.
references/fluxcd-helmrelease.md — HelmRelease deep dive: spec.chartRef vs spec.chart.spec mutual exclusivity, values merging order, install/upgrade RetryOnFailure strategy (legacy vs modern fields), drift detection modes (disabled/warn/enabled) with ignore rules for HPA-managed fields, CRD lifecycle table (Create/Skip/CreateReplace), dependencies, CEL health check expressions, post-renderers (Kustomize patches applied after Helm render, sequential pipeline), remote cluster deployment, status tracking with history, common mistakes table.
references/fluxcd-terraform.md — Terraform + Flux Operator bootstrap: ownership model (Terraform = ephemeral Job, Flux = steady state), repository layout, gitops resources (create-once) vs managed resources (every run), runtime info variable substitution, Terraform + Git co-ownership via SSA Merge, sync authentication (PAT/GitHub App/OCI), node scheduling at all 3 layers (Job/Operator/controllers), drift and revision counter, shared operator values file, debugging with debug_on_failure.
references/fluxcd-mcp.md — Flux MCP server (flux-operator-mcp): Homebrew install, JSON MCP config (absolute path warning, --read-only for production), 6 core workflows (cluster inspection, HelmRelease trace, Kustomization trace, ResourceSet trace, multi-cluster comparison, force reconciliation), log analysis steps, applying manifests with overwrite: true warning, key guidelines (always call get_kubernetes_api_versions first, secrets are masked, avoid modifying Flux-managed resources).
references/datadog.md — added FluxCD integration section: built-in Agent integration (7.51.0+, minimum 7.49.1), supported controllers (source/kustomize/helm/notification), pod annotation setup via FluxInstance kustomize patch, log collection config, key metrics table (reconcile duration, reconcile condition, workqueue depth/retries, active workers, process CPU/memory, leader election), three recommended monitors (HelmRelease failure, Kustomization failure, workqueue saturation), service check fluxcd.openmetrics.health, dashboard guidance.
references/dynatrace.md — added FluxCD integration section: Method 1 ActiveGate Prometheus scraping (DynaKube config + pod annotation patches via FluxInstance), Method 2 OTel Collector pipeline (scrape config + OTLP exporter to DT endpoint), key metrics under gotk_ namespace, Dynatrace metric expression for reconciliation failure rate, Davis AI anomaly detection recommendations (gotk_reconcile_condition, workqueue_depth, process_resident_memory_bytes), log ingestion with field extraction, required token scope (metrics.ingest).

FluxCD troubleshooting reference + cluster-debug improvements

references/fluxcd-troubleshooting.md — scannable incident cheat-sheet derived from fluxcd/agent-skills gitops-cluster-debug references. Organised by failure category: controller failures, source failures (GitRepository, OCIRepository, HelmChart/HelmRepository), Kustomization failures, HelmRelease failures, ResourceSet failures. Each entry: symptom → most likely cause → exact fix. Includes label-based ownership tracing section (how to find the managing Kustomization or ResourceSet from resource labels without guessing), and a 10-step general debugging checklist.
commands/gitops.md (debug mode) — Workflow 2 (HelmRelease trace) and Workflow 3 (Kustomization trace) now include explicit label-tracing steps: read kustomize.toolkit.fluxcd.io/name and resourceset.fluxcd.io/name labels to identify the managing object. Added reference pointer to fluxcd-troubleshooting.md at end of debug workflows.

FluxCD audit tooling — upstream scripts, migration guide, security checklist

references/fluxcd-migration.md — Flux API migration guide: v2.7 v1beta1 removals (all 5 toolkits), v2.8 v1beta2 removals (4 toolkits), current stable apiVersion table, CLI 3-step upgrade (migrate files → migrate in-cluster → upgrade components), Flux Operator 3-step upgrade (upgrade operator → migrate files → update FluxInstance version), CI gate pattern, common post-migration issues.
references/fluxcd-security.md — Security audit checklist with grep commands: unencrypted secrets (Secret without sops:/ENC[), hardcoded credentials (password:, token:, ACCESS_KEY=), insecure sources (insecure: true), cloud registries without Workload Identity, cross-namespace source refs, OCIRepository without Cosign, cluster-admin bindings, image automation pushing to main. Automated scan via upstream validate.sh and check-deprecated.sh.
commands/gitops.md — merged gitops-audit into gitops as a subcommand. Command now has two modes: debug (five structured Flux debug workflows: installation check, source failures, HelmRelease trace, Kustomization trace, ResourceSet trace — plus Argo CD diagnostics, producing a 5-section report) and audit (six-phase read-only repo analysis: discovery via discover.sh, manifest validation via validate.sh, API compliance via check-deprecated.sh, best practices checklist, security grep patterns, Critical/Warning/Info report). commands/gitops-audit.md deleted — its content lives in the audit subcommand.
references/fluxcd.md — Repository patterns section expanded: three pattern descriptions (monorepo, multi-repo Git-based, multi-repo OCI-based), identification heuristics table (8 signals), layout rules (flux-system placement, no overlapping paths, no workloads in default namespace).

FluxCD Skills Enhancement (Domain 29 — GitOps Audit)

references/fluxcd.md — major expansion from 194 → 430+ lines. New sections: full CRD reference table (18 CRDs with apiVersion and controller), source selection decision matrix, Flux Operator and FluxInstance (installation, upgrade lifecycle, FluxReport health, gitless OCI sync YAML, one-per-cluster rule), ResourceSet and ResourceSetInputProvider (N-input templating, << inputs.field >> delimiters, OCIArtifactTag input provider for gitless image automation), gitless OCI delivery model (why to prefer it for Flux Operator deployments, CI push commands, OCIRepository source YAML, HelmRelease from OCI with layerSelector), reconcile.fluxcd.io/watch: Enabled label (what it does, when to use on valuesFrom/substituteFrom ConfigMaps), common mistakes table (5 entries: template delimiters, mutually exclusive chart fields, multiple FluxInstances, remediation strategy, OCIRepository layerSelector). Image automation section extended with gitless model (OCIArtifactTag + ResourceSet) alongside existing Git-based model, with comparison table (Git credentials, audit trail, PR approval, recommended for).
commands/gitops.md — major upgrade from 65 → 200+ lines. Replaced flat evidence-collection approach with 5 structured debug workflows: (1) Installation Check — FluxInstance status, FluxReport health, crashlooping controller logs; (2) HelmRelease trace — spec/status → managing object → valuesFrom references → chart source → inventory → pod logs; (3) Kustomization trace — spec/status → parent object → substituteFrom → source → managed resources; (4) ResourceSet trace — status → inputsFrom providers → dependsOn → generated resources; (5) Log analysis — Deployment → matchLabels → pods → logs. Added structured 5-section report format (Summary, Resource Analysis, Dependency Chain, Root Cause, Recommendations). Added edge cases: spec.suspend: true, Ready: Unknown, Flux-managed resource revert warning, stale status backpressure. Argo CD content retained.
examples/fluxcd/multi-tenant/ — multi-tenant repo pattern: clusters/production/tenants.yaml, tenants/team-a/ (namespace, ServiceAccount in flux-system, RoleBinding to tenant-reconciler ClusterRole, GitRepository, Kustomization with serviceAccountName and targetNamespace for RBAC isolation), platform/rbac/tenant-role.yaml (ClusterRole scoped to workload resources), platform/network-policy/default-deny.yaml (default-deny-all + allow-same-namespace + DNS egress per tenant namespace). README includes bootstrap instructions, troubleshooting, and RBAC can-i verification.
examples/fluxcd/helm-releases/ — Helm chart management via OCI: releases/base/cert-manager/ocirepository.yaml (semver 1.x, layerSelector.mediaType), releases/base/cert-manager/helmrelease.yaml (spec.chartRef, RetryOnFailure, driftDetection.mode: enabled, valuesFrom), releases/base/cert-manager/values-configmap.yaml (reconcile.fluxcd.io/watch: Enabled), releases/staging/kustomization.yaml (1 replica overlay), releases/production/kustomization.yaml (3 replicas, PDB, anti-affinity). README covers key patterns and common mistakes.
examples/fluxcd/image-automation/ — both image automation models side-by-side: git-based/imagerepository.yaml, git-based/imagepolicy.yaml (semver range), git-based/imageupdateautomation.yaml (staging branch push, not main); gitless/resourcesetinputprovider.yaml (OCIArtifactTag, semver filter), gitless/resourceset.yaml (<< inputs.tag >> substitution). README includes comparison table, setup steps for both models, safety rules, and troubleshooting.
examples/fluxcd/flux-operator/ — Flux Operator bootstrap: fluxinstance.yaml (gitless OCI sync, networkPolicy: true, all 6 components, reconcile annotations), ocirepository.yaml (Cosign verify), verify-policy.yaml (keyless Cosign with OIDC issuer + subject matching for GitHub Actions). README covers Flux Operator vs flux bootstrap comparison table, install steps, push-and-sign CI commands, health checks, and migration from flux bootstrap.
examples/fluxcd/README.md — updated from "planned patterns" stubs to full 5-example index table with pattern descriptions, choose-the-right-pattern guide, and shared best practices matrix.

FluxCD command router + examples rename

commands/fluxcd.md — new /platform-skills:fluxcd slash command. Smart entry point that identifies the right workflow from input: error/logs/symptom → /platform-skills:gitops debug (5-workflow debug); repo path/audit intent → /platform-skills:gitops audit (6-phase audit); helm chart path → /platform-skills:helmcheck; manifest YAML → /platform-skills:review. Includes quick reference apiVersion table for all 18 FluxCD CRDs and links to all 11 fluxcd-*.md reference files.
examples/flux/ → renamed to examples/fluxcd/ — consistent naming with references/fluxcd.md, commands/fluxcd.md, and examples/argocd/. All 5 subdirectories preserved: basic-monorepo/, multi-tenant/, helm-releases/, image-automation/, flux-operator/.

Changed

references/flux.md → renamed to references/fluxcd.md — consistent naming with argocd.md, datadog.md, kyverno.md; all references updated across SKILL.md, HOW_IT_WORKS.md, GETTING_STARTED.md, README.md, COMMANDS.md, CONTRIBUTING.md, tests/*.sh, examples/fluxcd/README.md.
SKILL.md — Domain 29 updated to GitOps (debug + audit subcommands); updated Flux/FluxCD reference descriptions; added 8 new FluxCD reference pointers; added /platform-skills:fluxcd slash command; removed /platform-skills:gitops-audit; updated examples/fluxcd/ path.
skills/platform-skills/SKILL.md — synced with root SKILL.md.
COMMANDS.md — added /platform-skills:fluxcd to command index table; updated gitops section to document debug and audit subcommands; removed gitops-audit section.
HOW_IT_WORKS.md — updated gitops row to gitops debug / gitops audit; removed gitops-audit; added fluxcd row.
GETTING_STARTED.md — updated gitops row to show debug/audit subcommands; replaced gitops-audit row with fluxcd entry; expanded FluxCD reference table with all 9 new fluxcd-*.md entries.
INSTALLATION.md — version string updated to v1.25.0.
.claude-plugin/plugin.json — version bumped to 1.25.0; ./commands/fluxcd.md added; ./commands/gitops-audit.md removed (merged into gitops).
.claude-plugin/marketplace.json — version bumped to 1.25.0; description rewritten to 628 chars (was 1,784 chars over the 1,024 limit); added gitops-audit, flux-operator, fluxinstance, resourceset, gitless-delivery, oci-gitops keywords.

[1.24.0] - 2026-05-24

Added

Composite GitHub Actions (Domain 28)

references/composite-actions.md — comprehensive reference covering: decision matrix (composite vs JavaScript vs Docker action), action.yml anatomy, variables and secrets (full mechanism table: ${{ inputs.* }}, env:, $GITHUB_ENV, $GITHUB_OUTPUT, $GITHUB_PATH, $GITHUB_STEP_SUMMARY, ::add-mask:: — with end-to-end secrets flow diagram), inputs and outputs (type annotations, boolean string comparison, multi-step $GITHUB_OUTPUT chaining), nine security requirements (SHA pinning, shell: on every run step, secrets as inputs, env isolation from shell injection, masking, ${{ github.action_path }}, minimum token permissions), observability ($GITHUB_STEP_SUMMARY job summary, ::group::/::endgroup:: log grouping, ::error::/::warning::/::notice:: inline annotations, RUNNER_DEBUG debug mode), fail-fast input validation pattern (enum + regex + required check with ::error:: annotations), step control flow (IDs, conditionals, continue-on-error, $GITHUB_ENV sharing), file references and scripts, multi-OS patterns (${{ runner.os }}, pwsh vs bash), idempotency guidance (skip-if-exists, upsert, kubectl apply patterns), timeout and concurrency (step-level timeout-minutes, caller concurrency: recommendation), testing (local path test workflow, matrix strategy, act commands, actionlint verification), breaking changes (deprecation warnings, dual-input support, CHANGELOG migration notes), versioning and release (semver + floating major tag, release workflow with actionlint gate), dependabot.yml configuration for github-actions ecosystem, anti-patterns table (15 entries), and production checklist (security, correctness, observability, testing, documentation, release)
commands/composite-actions.md — slash command /platform-skills:composite-actions with four modes:
- generate: 9-question interview (purpose, repo destination, pinning strategy, inputs with secret flag, outputs, cloud credentials, notifications, job summary, confirm) → full scaffold: action.yml + README.md + CHANGELOG.md + .gitignore + .github/dependabot.yml + .github/workflows/test-action.yml + .github/workflows/release.yml; if existing repo: branches as feat/add-<name>-composite-action, commits all files, opens PR via gh pr create with inputs/outputs/secrets summary and test plan checklist; post-generation tip to run /platform-skills:awesome-docs generate
- review: audit existing action.yml — CRITICAL (unpinned tags, missing shell:, injection, direct secrets access) / WARNING (no validation, unmasked secrets, relative paths, no summary, no timeout) / INFORMATIONAL (branding, descriptions, dependabot, test workflow) — scored N/20
- secure: fix in place — resolve tags to SHAs via gh api, add shell: bash, move inputs.* to env: blocks, add ::add-mask::, inject validation step, inject job summary step
- test: generate test workflow with local path reference and matrix, plus act commands for all key input variants
Eleven production-ready examples under examples/github-actions/composite-actions/:
- docker-build-push/ — full repo structure: action.yml (multi-platform Buildx, GHCR via ephemeral GITHUB_TOKEN, GHA layer cache, SLSA provenance + SBOM attestations, input validation, ::group:: phases, $GITHUB_STEP_SUMMARY), README.md (architecture diagram, variables & secrets flow, build-args secret guidance), CHANGELOG.md, .github/workflows/test-action.yml (4 test jobs: defaults, multi-platform, validation failure assertion, custom Dockerfile), .github/workflows/release.yml (actionlint gate + SHA pinning check + floating major tag), .github/dependabot.yml, .actionlint.yaml
- notify-slack/ — full repo structure: action.yml (webhook URL masked immediately with ::add-mask::, payload built via printf to avoid quoting issues, channel override, @mention on failure, status_emoji + http_status outputs, 1-minute timeout on curl), README.md (secrets flow diagram, logged vs masked table, Slack webhook setup guide), CHANGELOG.md, .github/workflows/test-action.yml (3 test jobs: invalid webhook validation, invalid status validation, status/emoji matrix), .github/workflows/release.yml, .github/dependabot.yml, .actionlint.yaml
- k8s-deploy/ — action.yml (base64 kubeconfig decoded to chmod-600 temp file, ::add-mask:: on decoded content, kubectl apply with --dry-run=server support, rollout status with configurable timeout, kubectl events on failure, cleanup post-step runs if: always()), README.md (kubeconfig secret flow, logged vs masked table, minimum RBAC snippet, base64 encoding commands, service account setup), .github/workflows/test-action.yml, .github/workflows/release.yml, .github/dependabot.yml, .actionlint.yaml
- terraform-plan/ — action.yml (AWS + Azure OIDC dual-cloud support, Terraform provider cache, fmt → validate → plan pipeline with -detailed-exitcode, idempotent PR comment upsert via hidden marker  using github-script, plan truncated to 65,000 chars), README.md (OIDC auth flow, logged vs masked table, has_changes output usage), .github/workflows/test-action.yml, .github/workflows/release.yml, .github/dependabot.yml, .actionlint.yaml
- security-scan/ — action.yml (Trivy image/fs/repo scan, severity enum validation, registry_password masked, ::error:: annotations for CRITICAL findings, ::warning:: for HIGH, SARIF output support, fail_on_findings gate), README.md (build → scan → deploy pipeline example, no-credentials pattern for GHCR), .github/workflows/test-action.yml, .github/workflows/release.yml, .github/dependabot.yml, .actionlint.yaml
- release-tag/ — action.yml (conventional commit auto-detection: feat!/BREAKING CHANGE → major, feat → minor, fix/perf/etc → patch; $GITHUB_OUTPUT chaining across 4 steps: bump → git_tag → changelog → create_release; floating major tag update; draft + prerelease support; is_new_release=false skip path), README.md (output chaining diagram, secrets flow, idempotency note, combined release + notify example), .github/workflows/test-action.yml, .github/workflows/release.yml, .github/dependabot.yml, .actionlint.yaml
- pr-comment/ — action.yml (idempotent upsert via hidden HTML marker, listComments pagination, collapsible <details> wrapper, delete_on_close for PR closed event, update_existing toggle, github-script with explicit token, comment_id/comment_url/action_taken outputs), README.md (unique marker pattern for multiple instances, delete-on-close example, token scoping), .github/workflows/test-action.yml, .github/workflows/release.yml, .github/dependabot.yml, .actionlint.yaml
- setup-env/ — tutorial-baseline multi-runtime action: action.yml (Node.js/Python/Go runtime choice, per-runtime version defaults, dependency cache restore: ~/.npm/~/.cache/pip/~/go/pkg/mod, optional install_dependencies flag, runtime_version + cache_hit outputs, input validation, $GITHUB_STEP_SUMMARY), README.md (runtime matrix usage example), CHANGELOG.md, .github/workflows/test-action.yml (4 test jobs: invalid runtime assertion, matrix across all runtimes/versions, default versions, cache disabled), .github/workflows/release.yml, .github/dependabot.yml, .actionlint.yaml
- configure-cloud/ — action.yml (AWS + Azure + GKE OIDC via cloud_provider switch; GKE uses google-github-actions/auth WIF + get-gke-credentials; full required-input validation per provider; SHA-pinned actions; aws_account_id output), README.md (per-provider quick-start examples, OIDC trust prerequisites)
- setup-terraform/ — action.yml (version install via hashicorp/setup-terraform, provider plugin cache at ~/.terraform.d/plugin-cache, terraform_wrapper toggle, working directory support, terraform_version output), README.md (cache key strategy, workspace pattern)
- db-migrate/ — action.yml (Flyway / Liquibase / golang-migrate; ::add-mask:: on database_url; nc health check with retry loop; advisory lock timeout; dry-run; post-migration verify; job summary), README.md (per-tool quick-start, rollback guide, connection URL format)
examples/github-actions/composite-actions/README.md — index with 11-example quick-reference table, decision guide ("Need to build a container image? → docker-build-push"), shared best-practices matrix (shell on every run step, SHA pinning, secrets-as-inputs, masking, env isolation, input validation, job summary, log groups, annotations, timeout, idempotency, dependabot, release workflow)
commands/composite-actions.md — extended with two new modes:
- migrate: scan .github/workflows/ for repeated step sequences (3+ occurrences), rank by impact (count × step-count), extract to composite action, rewrite callers
- publish: GitHub Actions Marketplace checklist — prerequisite verification (root placement, unique name, description length, branding), Feather icon/color suggestions per action type, release tag commands, submit walkthrough
references/composite-actions.md — extended with five new sections:
- $GITHUB_STATE — passing data from main run to post-step cleanup; pure composite workaround using $GITHUB_ENV + if: always()
- Action composition — calling composite from composite: layered abstractions, secret threading constraints, SHA-pinning requirement for inner actions
- Self-hosted runner considerations — tool availability, persistent vs ephemeral state, network restrictions, runner requirement documentation pattern
- Ephemeral runner security — risk table (persistent vs ephemeral), GitHub Enterprise policy enforcement, README callout template
- Troubleshooting — 9 common errors with exact diagnosis and fix: missing shell:, empty ${{ secrets.* }}, ::add-mask:: not working, uses: ./ not found, actionlint shellcheck errors, empty step outputs (3 causes), floating tag not updated, boolean comparison always false
references/github-actions.md — extended with composite actions section: key rules summary, pointer to references/composite-actions.md, slash command, and links to all 11 examples

Contributors

Thanks to @geetika-sv for contributing the composite GitHub Actions skill — 11 production-ready examples, the /platform-skills:composite-actions command, and references/composite-actions.md in this release.

[1.23.0] - 2026-05-23

Changed

Self-Improve: explicit `init global` / `init local` subcommands

commands/self-improve.md — init mode split into three explicit forms:
- init global — scaffolds ~/.claude/ directly, no prompt; offers to wire all three hooks and create ~/.claude/CLAUDE.md; idempotent (reports existing state and stops if already set up)
- init local — scaffolds .learnings/ and memory/ in $PWD, no prompt; asks gitignore vs commit; idempotent
- init (no argument) — asks user to choose, then delegates to init global or init local
commands/self-improve.md — argument-hint updated to [init [global|local]|log [LRN|ERR|FEAT]|promote <ID>|migrate [global|local]|status|resume|review|state]
commands/self-improve.md — Path Resolution preamble updated to short-circuit on init global / init local before auto-detection
references/agent-self-improve.md — Bootstrap section updated with explicit subcommand examples; line 26 updated from bare init to init global / init local examples
examples/agent-self-improve/HOW_IT_WORKS.md — init section rewritten with both subcommands, idempotency note

Self-Improve: new subcommands — `status` and `migrate`

commands/self-improve.md — new ## Mode: status — read-only health summary: counts across all three log files, working-buffer age, sessions since last review, any WAL entries still PENDING, pending error count, and a prioritised action-item list
commands/self-improve.md — new ## Mode: migrate — moves workspace between global and local scope; chooses merge or replace for conflicting entries; writes a WAL entry before moving any files; verifies write to new path and confirms deletion of old path; idempotent if already at target scope

Self-Improve: `promote` supports global target

commands/self-improve.md — promote mode now includes ~/.claude/CLAUDE.md as a valid target for rules that should apply across all projects (e.g. "never do X on any repo"); scope determination step added before drafting the promoted line
references/agent-self-improve.md — Promotion targets table updated with ~/.claude/CLAUDE.md row

Self-Improve: `review` flags stale resolved entries

commands/self-improve.md — review mode now reports entries with Status: resolved that are older than 30 days and have no Action containing "promoted", surfacing them as promotion candidates rather than silently aging out

Self-Improve: `resume` warns on stale working buffer

commands/self-improve.md — resume mode now reads the Last updated: timestamp from memory/working-buffer.md before proceeding; warns if buffer is 3+ days old (task may be abandoned); blocks with an explicit confirmation prompt if 7+ days old

Self-Improve: SKILL.md description updates

SKILL.md — self-improve reference line updated to mention global/project-local workspace, init global/init local, status/migrate subcommands
SKILL.md — /platform-skills:self-improve slash command description updated to reflect full capability set
skills/platform-skills/SKILL.md — same updates applied (kept in sync with root SKILL.md)

Self-Improve: global path resolution for cross-project learnings

commands/self-improve.md — added ## Path Resolution preamble that all modes evaluate before acting: detects global setup (~/.claude/.learnings/ exists), project setup (.learnings/ in $PWD), or auto-creates global on first use; all path references updated to use LEARNINGS_BASE
commands/self-improve.md — init mode now asks global vs project-local before creating any directories; step 4 skips .gitignore check for global setup; step 5 targets the correct settings.json (~/.claude/ for global, .claude/ for project) and enforces absolute paths in the PostToolUse hook command
commands/self-improve.md — log confirm message now shows resolved $LEARNINGS_BASE path instead of hardcoded .learnings/
commands/self-improve.md — resume, review, promote, state, WAL Protocol, Working Buffer, SESSION-STATE, and Daily Notes proactive protocols updated to use $LEARNINGS_BASE
references/agent-self-improve.md — new Global vs project scope section under Directory layout: scope comparison table, auto-detection order, promotion-targets-stay-local rule, working PostToolUse hook JSON example with absolute paths
examples/agent-self-improve/HOW_IT_WORKS.md — updated init description to mention global vs project choice; corrected "What This Skill Cannot Do" section which previously stated learnings cannot persist across projects
examples/agent-self-improve/README.md — updated setup instructions to reflect global scope option and hook wiring steps
examples/agent-self-improve/scripts/session-end.sh — Stop hook script: drains .pending-errors.log into ERR entries on every session close, saves daily notes, auto-logs PENDING WAL as ERR, clears session marker, increments session counter with review reminder every 5 sessions, nudges if no LRN logged today
examples/agent-self-improve/scripts/session-start-reminder.sh — PreToolUse hook script: marker-based session detection fires once per session; prints memory-load banner with active task and pending error count; silent on all subsequent tool calls
examples/agent-self-improve/global-claude.md — template for ~/.claude/CLAUDE.md: temporary path override table, session-start instruction, in-session logging rules (log LRN/ERR/FEAT at moment of discovery), VFM_THRESHOLD=60, Agent Rules section
examples/agent-self-improve/settings.json.example — all 3 hooks wired: Stop (session-end.sh), PreToolUse (session-start-reminder.sh), PostToolUse (inline error capture to .pending-errors.log)

Self-Improve: cross-platform support (Windows and Linux distributions)

examples/agent-self-improve/scripts/session-end.ps1 — PowerShell 5.1+ equivalent of session-end.sh for Windows native; same logic (daily notes, error drain, WAL check, session counter, review reminder)
examples/agent-self-improve/scripts/session-start-reminder.ps1 — PowerShell equivalent of session-start-reminder.sh for Windows native; marker-based once-per-session banner
examples/agent-self-improve/settings-windows.json.example — all 3 hooks wired with PowerShell commands for Windows native (%USERPROFILE% paths, powershell -NonInteractive -File ...)
examples/agent-self-improve/scripts/session-end.sh — shebang changed from #!/bin/bash to #!/usr/bin/env bash for portability on Linux distros where bash is not at /bin/bash
examples/agent-self-improve/scripts/session-start-reminder.sh — same shebang fix
examples/agent-self-improve/README.md — new Platform support table (macOS, Linux, Alpine, WSL, Git Bash, Windows native); Windows recommendation (WSL/Git Bash is simpler); PowerShell copy/setup instructions; execution policy note
commands/self-improve.md — Path Resolution preamble updated with cross-platform ~ resolution note; init global step 5 updated to detect platform and point to correct scripts (session-end.sh vs .ps1, settings.json.example vs settings-windows.json.example)
references/agent-self-improve.md — new Platform Compatibility section: global config path table per OS, hook script selection table, per-platform settings.json snippets (bash and PowerShell), Alpine install note, execution policy note; old inline hook JSON examples replaced with a pointer to the new section

Self-Improve: path resolution and hook setup fixes

commands/self-improve.md — path resolution steps reordered: explicit init global/init local now short-circuit (steps 1–2) before auto-detection probes (steps 3–4), matching the changelog claim and preventing an existing .learnings/ repo from hijacking an explicit init global call
references/agent-self-improve.md — copy command updated to copy both session-end.sh and session-start-reminder.sh; added mkdir -p ~/.claude/scripts to prevent hook wiring failure when the scripts directory doesn't exist yet
examples/agent-self-improve/scripts/session-start-reminder.sh — added mkdir -p "$MEMORY_DIR" before touch "$SESSION_MARKER" to prevent banner-on-every-tool-call when ~/.claude/memory doesn't exist

Contributors

Thanks to @geetika-sv for contributing the self-improve global workspace, cross-platform support, and init global/init local/status/migrate subcommands in this release.

[1.22.0] - 2026-05-22

Added

AWS CloudFront + WAF + Lambda@Edge (Domains 33–34)

references/aws-cloudfront.md — comprehensive reference covering: distribution anatomy, OAC (Origin Access Control replaces legacy OAI), cache policies, origin request policies, security headers (aws_cloudfront_response_headers_policy with HSTS preload + CSP + X-Frame + XSS), WAF integration (CLOUDFRONT scope requires us-east-1 — documented as the most common footgun), Lambda@Edge (all 4 event types with memory/timeout/body-access table), CloudFront Functions vs Lambda@Edge decision matrix (~6× cost difference), standard logging v2 + real-time logging + Athena partitioning, multi-account patterns (shared CDN account with cross-account OAC, per-account distributions + FMS enforcement), Organizations SCP, and production readiness checklist
references/aws-waf.md — comprehensive reference covering: scope section (CLOUDFRONT vs REGIONAL, prominent footgun callout), web ACL anatomy, full managed rule groups table (14 groups, free vs paid), rate-based rules with aggregation keys (IP, forwarded IP, HTTP header, query string, custom key combinations), custom rules (IP sets, geo match, regex, label matching), CAPTCHA/Challenge actions, WAF logging (CW log group must start with aws-waf-logs-), testing workflow (Count → Block), multi-account FMS (prerequisites, policy anatomy, FIRST/MIDDLE/LAST rule group ownership model, auto-remediation, compliance dashboard, cross-account central logging), and Shield Advanced integration
commands/aws.md — slash command /platform-skills:aws with six modes: cloudfront (distribution design, OAC, caching, security headers), waf (scope, managed rules, rate limiting, logging), lambda-edge (event type selection, constraints, deployment), multi-account (FMS policy design, OU targeting, audit mode), review (production-readiness checklist for CloudFront + WAF Terraform), terraform (module structure, provider aliases, validation blocks, best practices)
examples/aws/cloudfront/ — production-ready Terraform module:
- versions.tf — provider constraints (aws >= 5.0.0, archive ~> 2.4), us_east_1 alias, required_version = ">= 1.7.0"
- variables.tf — 15 variables with validation{} blocks for price_class, geo_restriction_type, and name (regex + length)
- locals.tf — origin IDs, use_custom_domain, viewer_certificate, default_origin_id, use_alb derived values
- main.tf — S3 bucket (versioning + SSE-KMS + public access block), OAC with cloudfront.amazonaws.com service principal + AWS:SourceArn condition, response headers policy, CloudFront Function, distribution with dynamic blocks for ALB origin + custom header enforcement, Lambda@Edge association via qualified_arn, geo restriction, standard logging v2
- outputs.tf — distribution_id/arn/domain_name/hosted_zone_id, origin_bucket_name/arn, oac_id
- lambda-edge/main.tf — Lambda@Edge: publish = true, provider = aws.us_east_1, dual trust policy (lambda.amazonaws.com + edgelambda.amazonaws.com), pre-created CloudWatch log group, qualified_arn output (with explicit note: do not use .arn which resolves to $LATEST)
- lambda-edge/functions/viewer-request.js — auth check (missing Bearer → 401), method restriction on /api/, A/B routing by cookie (20% to /v2/)
- functions/url-rewrite.js — CloudFront Functions JS 2.0: trailing slash → index.html, SPA deep-link rewrite, clean URL rewrite, path traversal blocking
examples/aws/waf/ — production-ready Terraform module:
- versions.tf — us_east_1 alias with comment explaining CLOUDFRONT scope requirement
- variables.tf — 9 variables with validation: default_action (allow/block), rate_limit (0 or 100–2B), cloudwatch_log_retention_days (validates against CloudWatch valid values list)
- main.tf — WebACL with: IP allowlist (priority 0, dynamic), IP reputation (5), AnonymousIpList (6), CRS with SizeRestrictions_BODY override to Count (10), KnownBadInputs (11), geo block (20, dynamic), rate limit with custom 429 + Retry-After header (30, dynamic), Bot Control with inspection level (40, dynamic), ATP with login path config (50, dynamic); logging config with field redaction (authorization, cookie headers) + filter to keep only BLOCK/COUNT actions
- outputs.tf — web_acl_arn/id, web_acl_capacity (WCU consumption), log_group_name/arn
examples/aws/firewall-manager/main.tf — self-contained FMS module: aws_fms_admin_account, aws_fms_policy for AWS::CloudFront::Distribution with OU include + excluded accounts, preProcessRuleGroups (IP reputation + CRS + KnownBadInputs at priorities 5/10/11), overrideCustomerWebACLAssociation = false to allow app-team local rules, remediation_enabled defaulting to false (audit mode) with explanation
examples/aws/README.md — updated index with cloudfront, waf, firewall-manager examples; quick-start snippets for single-account (WAF → CloudFront) and multi-account (FMS) patterns; key patterns summary and checklist

[1.21.0] - 2026-05-21

Added

Awesome Docs (Domain 27)

references/awesome-docs.md — comprehensive reference covering: SVG Pattern A (architecture flow — offset-path animated dots, glow animations, arrowhead markers, legend), SVG Pattern B (decision/lifecycle loop — 3-state CSS cycling, status bar), SVG Pattern C (field explainer carousel — timing formula slot = T/N, N × slot = T invariant, tooltip structure), SVG Pattern D (timeline phases — bar math y = bottom - (N/max)*height), GitHub SVG animation constraints table (allowed vs blocked), theme system (github-dark, docs-light, custom), multi-platform export rules (GitHub SVG, Confluence/Notion HTML, PNG manual steps), quality checklist, and curated external references (MDN, GitHub Docs, Shields.io, SVGO, Primer CSS)
commands/awesome-docs.md — slash command /platform-skills:awesome-docs with seven modes: generate (doc-type-aware guided interview with outline/preview step → full animated doc with incremental SVG confirmation for any doc type — readme, architecture-guide, runbook, tutorial, api-reference, how-it-works, rfc, post-mortem, or custom), convert (8-type doc classification; batch mode for directory input; inject SVGs into existing Markdown with conflict detection), update (revise a single diagram), diff (detect stale diagrams; --base <branch|tag|SHA> flag, defaults to HEAD), audit (quality punch list), preview (cross-platform local browser preview via python3 -m http.server), export (GitHub SVG default; Confluence/Notion animated HTML; PNG manual step)
references/awesome-docs.md extended: SVG Pattern E (sequence diagram — actor lifelines, message arrows, CSS step animation, activation boxes), SVG Pattern F (state machine — state nodes, transition arrows, current-state cycling ring), field carousel multi-format syntax colors (YAML, HCL/Terraform, JSON, TOML)
examples/awesome-docs/arch-flow.svg — reusable architecture flow SVG template (github-dark, 3 boxes, 2 animated dot paths)
examples/awesome-docs/lifecycle-loop.svg — reusable lifecycle loop SVG template (3-state CSS, diamond gate, status bar)
examples/awesome-docs/field-carousel.svg — reusable field carousel SVG template (4 fields, 8s cycle, N × slot = T timing)
examples/awesome-docs/timeline-phases.svg — reusable timeline phases SVG template (3 phases, replica chart with correct bar math)
examples/awesome-docs/sequence-diagram.svg — auth token request flow demo (4-step CSS-sequenced messages with activation boxes)
examples/awesome-docs/state-machine.svg — deployment lifecycle demo (Pending → Deploying → Healthy, error/retry paths, current-state cycling ring)
examples/awesome-docs/docs-audit-workflow.yml — GitHub Actions workflow: SVG size check, caption check, broken-ref check, external-href check, stale-diagram diff report, PR comment

[1.20.0] - 2026-05-20

Added

Datadog Labs + LLM Observability

references/datadog.md — two new sections: pup CLI (install, config, common operations, scripting post-deploy gates, troubleshooting table) and Datadog Labs Claude Skills (dd-pup, dd-apm, dd-logs, dd-monitors, dd-docs — install commands, capability table, when-to-use decision matrix)
references/llm-observability.md — new reference covering: LLMObs vs traditional APM decision matrix, Python instrumentation (@llm, @workflow, @retrieval decorators, LLMObs.annotate()), Node.js instrumentation (llmobs.trace() callback, llmobs.annotate()), environment variables, eval bootstrap workflow, submit_evaluation() pattern, pup-based CI quality gate, trace RCA with dd-llmo-eval-trace-rca, experiment analysis with dd-llmo-experiment-analyzer, manual pup fallback commands, troubleshooting table, and security guidance
commands/datadog.md — two new modes: pup (log search, metric query, monitor management, post-deploy gate generation) and llmo (instrument/evaluate/rca/experiment classification flow)
examples/datadog/llm-observability/llmobs-python.py — complete Python LLMObs example with @llm, @workflow, @retrieval decorators and faithfulness evaluation
examples/datadog/llm-observability/llmobs-nodejs.js — complete Node.js LLMObs example with llmobs.trace() and evaluation submission
examples/datadog/llm-observability/evaluator-bootstrap.py — faithfulness and quality evaluator stubs with LLM-as-judge pattern

[1.19.0] - 2026-05-20

Added

Chaos Engineering (Domain 27)

references/chaos.md — comprehensive reference covering: decision matrix (Litmus Chaos v3 vs Chaos Mesh v2), fault taxonomy (pod/node/network/stress), steady-state hypothesis pattern (httpProbe and promProbe), blast radius scoping rules, ChaosEngine and NetworkChaos YAML, GitOps integration via ChaosSchedule, rollback semantics, DORA feedback loop, and troubleshooting for stuck experiments and RBAC gaps
commands/chaos.md — slash command /platform-skills:chaos with six modes: install, experiment, schedule, gameday, debug, report
examples/chaos/ — eight working examples and one validator:
- litmus-install-values.yaml — Helm values for Litmus Chaos v3
- chaos-mesh-install-values.yaml — Helm values for Chaos Mesh v2
- pod-delete-experiment.yaml — Litmus ChaosEngine targeting a Deployment with HTTP probe
- network-loss-experiment.yaml — Chaos Mesh NetworkChaos with 20% packet loss
- cpu-stress-experiment.yaml — Litmus pod-cpu-hog with Prometheus steady-state probe
- chaos-schedule.yaml — weekly pod-delete ChaosSchedule (staging only)
- gameday-runbook.md — structured GameDay template (steady-state → blast radius → inject → observe → verdict → DORA impact)
- chaos-validate.sh — domain validator

DORA Metrics (Domain 28)

references/dora.md — comprehensive reference covering: the four DORA metrics with exact definitions, 2023 performance bands (Elite/High/Medium/Low), GitHub Actions + Prometheus Pushgateway open-source instrumentation pattern, all four Prometheus recording rules, PagerDuty and OpsGenie incident webhook integration for MTTR, SaaS decision matrix (Sleuth, LinearB, Cortex, open-source), five anti-pattern detections (commits vs deploys, all alerts vs customer incidents, partial outages, staging vs production, PR open vs first commit), chaos engineering cross-reference for high change failure rate, and agent platform rules
commands/dora.md — slash command /platform-skills:dora with four modes: instrument, dashboard, benchmark, debug
examples/dora/ — five working examples, one validator, and an AMP variant:
- deployment-event-step.yaml — GitHub Actions step pushing deploy timestamp and lead time to Pushgateway
- incident-webhook-handler.yaml — full GitHub Actions workflow triggered by PagerDuty/OpsGenie webhook for MTTR tracking
- prometheus-recording-rules.yaml — all four DORA Prometheus recording rules
- grafana-dashboard.json — Grafana dashboard with four DORA panels and Elite/High/Medium/Low threshold bands
- dora-validate.sh — domain validator
- amp-variant/ — Amazon Managed Prometheus variant: amp-workspace.tf (Terraform module provisioning AMP workspace + recording rules via terraform-aws-modules/managed-service-prometheus/aws), pushgateway-helm-values.yaml, prometheus-agent-values.yaml (SigV4/IRSA remote_write), amp-recording-rules-deploy.sh (AWS CLI fallback), grafana-amp-datasource.yaml, grafana-amg-datasource.json

[1.18.0] - 2026-05-20

Added

Self-Improvement: SESSION-STATE, Daily Notes, VBR, Context Threshold

Four improvements to the proactive agent pattern in Domain 24:

SESSION-STATE.md — always-on capture file (memory/SESSION-STATE.md) for corrections, preferences, decisions, and proper nouns. Written before responding when any of these occur (not only before destructive ops). Second read in compaction recovery after working-buffer.
Daily Notes — rolling per-day log (memory/YYYY-MM-DD.md) of notable exchanges, discoveries, completed tasks, and errors. Survives compaction as a searchable history. Third read in compaction recovery.
Context threshold trigger — working buffer is written proactively at ~60% context, not only at task boundaries. Prevents silent context loss mid-task.
Verify Before Reporting (VBR) — explicit protocol: run the validation command and read the changed file before reporting done. Text change ≠ behavior change; test actual outcomes.
Updated compaction recovery order in commands/self-improve.md resume mode: working-buffer → SESSION-STATE → daily notes → resource verification. "Never ask where were we?" rule added.
New state mode in commands/self-improve.md — explicit slash command for capturing session state entries
Example scaffold updated: memory/SESSION-STATE.md and memory/YYYY-MM-DD.md templates added to examples/agent-self-improve/memory/
.gitignore recommendation updated: gitignore memory/ entirely (daily notes grow fast)

[1.17.0] - 2026-05-20

Added

Closing ## Closing — Log learnings section added to commands/triage.md, commands/supply-chain.md, and commands/runtime-security.md — each workflow now ends with an explicit step to log errors and learnings via /platform-skills:self-improve log immediately after completing work, preventing context loss across sessions
PostToolUse hook in .claude/settings.json (local, not committed) prompts incremental learning capture after every git commit

Changed

Promoted 10 learnings from v1.16.0 session to CLAUDE.md and reference guides (references/supply-chain.md, references/runtime-security.md, references/kyverno.md) as permanent agent rules

[1.16.0] - 2026-05-20

Added

Supply Chain Security — Domain 25

New Domain 25: Supply Chain Security — secure the build pipeline and image lifecycle.

references/supply-chain.md — comprehensive reference covering:
- Cosign keyless signing (Sigstore/Rekor, OIDC-based, no key management)
- SBOM generation and OCI attestation with Syft
- CVE scanning with Trivy and Grype, severity gate strategy
- SLSA Level 2 vs Level 3 provenance requirements
- Kyverno ImageValidatingPolicy for admission enforcement
- Gap classification table and recommended rollout order
commands/supply-chain.md — slash command /platform-skills:supply-chain with six modes: audit, sign, sbom, scan, enforce, slsa
examples/supply-chain/ — five working GitHub Actions workflows and one Kyverno policy:
- sign-and-push.yaml — build → Cosign keyless sign → push
- sbom-attest.yaml — Syft SBOM generation and Cosign attestation
- trivy-gate.yaml — Trivy scan with CRITICAL+HIGH severity gate and GitHub Security tab upload
- kyverno-verify-image.yaml — ImageValidatingPolicy blocking unsigned images (Audit mode)
- slsa-provenance.yaml — SLSA Level 2 provenance via slsa-github-generator
- supply-chain-validate.sh — domain validator

Runtime Security — Domain 26

New Domain 26: Runtime Security — detect in-container threats with Falco.

references/runtime-security.md — comprehensive reference covering:
- Falco architecture: eBPF probe, rules engine, alert output
- Driver comparison: modern_ebpf vs ebpf vs kmod (kmod never on managed K8s)
- EKS and GKE installation with eBPF (Bottlerocket, COS, Fargate caveats)
- Built-in ruleset overview and noise reduction approach
- Custom rule syntax: condition fields, lists, macros
- Falcosidekick alert routing: Slack, webhook, SNS, PagerDuty
- Falco → Kyverno bridge pattern for blocking flagged workload re-admission
- Resource sizing and CPU limit guidance (do not set CPU limits)
commands/runtime-security.md — slash command /platform-skills:runtime-security with five modes: install, rules, alerts, debug, harden
examples/runtime-security/ — four working Helm values and one Kyverno policy:
- falco-values.yaml — Falco Helm values with eBPF driver and resource limits
- falco-custom-rules.yaml — shell-in-container, privilege escalation, unexpected outbound
- falcosidekick-values.yaml — Slack + webhook routing with deduplication
- falco-kyverno-bridge.yaml — ValidatingPolicy blocking Falco-flagged workload re-admission
- runtime-security-validate.sh — domain validator

[1.15.0] - 2026-05-20

Added

Agent Self-Improvement — Domain 24

New Domain 24: Agent Self-Improvement — bootstrap and operate self-improving, proactive agent workspaces.

references/agent-self-improve.md — comprehensive reference guide covering:
- .learnings/ directory layout and entry lifecycle (LRN/ERR/FEAT)
- WAL Protocol (append-only log, write-ahead before action)
- Working Buffer pattern for mid-session state persistence
- VFM Scoring (Value-Frequency Matrix) for evaluating unsolicited proactive actions
- ADL Protocol (Action Decision Logic) for choosing between competing implementation approaches
- Six Operating Pillars for proactive agent behavior
- Heartbeat pattern for session continuity
- Reverse Prompting to surface agent knowledge gaps
commands/self-improve.md — slash command definition for /platform-skills:self-improve with five modes: init, log, resume, review, promote
examples/agent-self-improve/ — ready-to-copy workspace scaffold:
- README.md — overview and copy commands
- .learnings/LEARNINGS.md — positive learnings template
- .learnings/ERRORS.md — error log template
- .learnings/FEATURE_REQUESTS.md — feature request template
- memory/working-buffer.md — WAL scratchpad template
Updated SKILL.md (root and skills/platform-skills/) with Domain 24 entry, reference bullet, and /platform-skills:self-improve slash command
Updated README.md with Domain 26 badge and domain table row
Updated COMMANDS.md with ToC entry and full command reference section

[1.14.0] - 2026-05-16

Added

KEDA — Kubernetes Event-Driven Autoscaling

New Domain 23: KEDA — design, generate, debug, and review event-driven autoscaling with KEDA v2.14.

references/keda.md — comprehensive reference guide covering:
- KEDA vs HPA decision matrix (when to use each)
- Architecture (operator, metrics adapter, webhook)
- Helm installation with resource limits
- ScaledObject spec: minReplicaCount, maxReplicaCount, pollingInterval, cooldownPeriod, advanced HPA behavior overrides, scale-to-zero with cold-start guidance
- ScaledJob spec: batch processing per message with scaling strategies (accurate/default/custom)
- TriggerAuthentication and ClusterTriggerAuthentication: Kubernetes Secret reference, IRSA (AWS), Azure Workload Identity, GCP Workload Identity
- Scalers with full metadata reference: Prometheus, AWS SQS, Apache Kafka (SASL/TLS), Redis List, Cron, Azure Service Bus, HTTP Add-on
- Scaling lifecycle and tuning table (pollingInterval/cooldownPeriod by scenario)
- Security patterns: least-privilege IAM per scaler, namespace-scoped vs cluster-scoped auth, RBAC for ScaledObject management
- GitOps integration: Flux Kustomization with health checks, Argo CD ordering, ExternalSecret pairing
- Troubleshooting: 7 common failure patterns with diagnostic commands and exact fixes
- Observability: KEDA Prometheus metrics, Grafana dashboard import
- Version compatibility matrix and upgrade checklist
/platform-skills:keda (commands/keda.md) — four modes:
- generate — write a production-ready ScaledObject or ScaledJob from a description
- debug — diagnose scaling failures with a structured checklist
- review — correctness, security, and operational safety review
- scale — design the scaling strategy for a workload from requirements
examples/keda/ — four working examples:
- scaledobject-sqs.yaml — SQS queue trigger with IRSA, scale-to-zero, HPA stabilization
- scaledobject-prometheus.yaml — HTTP request rate trigger with Cron floor for business hours
- scaledobject-kafka.yaml — Kafka consumer lag trigger with SASL/TLS auth
- scaledjob-sqs.yaml — batch ScaledJob (one Job per SQS message) with security hardening

[1.13.0] - 2026-05-16

Added

Triage Command

New command /platform-skills:triage — triages any PR comment (from a bot, CI tool, or human reviewer) using the gh CLI directly inside Claude Code. No workflow or secrets required.

commands/triage.md — full skill definition with two modes:
- <PR number> <comment ID> — triage one specific comment
- --all <PR number> — triage every unresolved thread on a PR in one pass
Classifies each comment as ACTIONABLE_FIX, INFORMATIONAL, or NOT_APPLICABLE
For ACTIONABLE_FIX: reads the referenced file, applies the minimal fix with the Edit tool, commits and pushes to the PR branch
Posts a reply on the thread explaining the decision with classification-specific closing lines
Resolves the thread via gh api graphql resolveReviewThread mutation
Prints a summary table when running --all mode
examples/triage/ — 11 fully documented scenarios:
- actionable-fix/ — wildcard IAM (SOC 2 CC6.1), missing resource limits, deprecated networking.k8s.io/v1beta1 API, wrong liveness probe path, hardcoded secret reference
- informational/ — replica count question, PDB follow-up suggestion, KMS rotation evidence request
- not-applicable/ — CI status bot message, already-fixed-in-later-commit, comment on file not in PR diff

Review Bot Comment Mode

Enhanced /platform-skills:review with structured output for automated PR comment workflows.

commands/review.md — adds --bot flag and GitHub-flavoured markdown output format
Defines MERGE_READY / NEEDS_FIX / BLOCKED result values based on finding severity
Uses  HTML marker for idempotent comment updates on re-push
Severity table with emoji labels (🔴 Critical, 🟡 Improvement, 🔵 Note) and file/line references

Wiki

Full GitHub wiki published at https://github.com/nitinjain999/platform-skills/wiki
50 pages covering all 28 commands and 25 domains
Navigation index, Quick Start, Installation, Editor Integrations, How It Works, Contributing
One page per command with Claude Code slash syntax, Copilot Chat prompts, and what gets checked
One page per domain with key patterns, code examples, and links to the full reference guide

[1.12.0] - 2026-05-16

Added

PR Review Domain

New Domain 21: PR Review — comprehensive pre-merge risk review across six dimensions. Each mode inspects the diff and current file state, reports findings with severity, and recommends concrete fixes.

references/pr-review.md — cost impact (compute, storage, network spend delta with AWS pricing reference), environment drift detection (Kustomize overlay gaps, Helm values sibling mismatches, Terraform workspace drift, GitOps source drift, feature flag drift), ownership and governance (CODEOWNERS gaps, missing team labels, Terraform module README requirements, PR governance checklist), SOC 2 compliance mapping (CC6.1–CC8.1, A1.2 with Terraform patterns and aws CLI evidence collection commands), deprecated API / version hygiene (Kubernetes API removal timeline, Terraform provider constraint patterns, GitHub Actions SHA pinning, container digest pinning), rollback feasibility scoring (reversibility matrix: FULL/PARTIAL/MANUAL/NONE, blast radius: LOCAL/CLUSTER/PLATFORM/DATA, pre-merge requirements for high-risk changes, GitOps rollback patterns), and bot comment triage workflow with GH CLI commands for resolving threads
/platform-skills:pr-review (commands/pr-review.md) — six modes: cost (spend delta with severity per resource), drift (environment alignment across overlays, values files, Terraform workspaces), ownership (CODEOWNERS, team labels, module README, PR governance), compliance (SOC 2 control impact with remediation and auditor evidence commands), upgrade (deprecated Kubernetes APIs, loose Terraform constraints, floating action versions, :latest images), rollback (reversibility and blast radius score per change with Rollback Risk Score: 🟢/🟡/🔴), full (all six modes with Merge Readiness Summary)

[1.11.0] - 2026-05-16

Added

Kyverno Domain

New Domain 20: Kyverno — write and maintain Kubernetes-native admission policies using the CEL-based types (ValidatingPolicy, MutatingPolicy, GeneratingPolicy, ImageValidatingPolicy) introduced in Kyverno v1.14–v1.15. Covers Audit→Deny promotion, PolicyExceptions, PolicyReport analysis, and migration from legacy ClusterPolicy or PodSecurityPolicy. All policies use apiVersion: policies.kyverno.io/v1.

references/kyverno.md — all five new policy kinds with namespace-scoped variants, matchConstraints/matchConditions CEL filtering, ValidatingPolicy with validations[].expression and messageExpression, MutatingPolicy with ApplyConfiguration (Object{...}) and JSONPatch ([JSONPatch{...}]) patch types, container iteration with .map()/.all(), mutateExisting, GeneratingPolicy with generator.Apply() and dyn() inline resources, ImageValidatingPolicy with verifyImageSignatures(), Cosign keyless/key-based and Notary attestors, Audit→Deny promotion workflow with PolicyReport queries, PolicyException (kyverno.io/v2), kyverno-cli install and test manifest structure, PolicyReport export to CSV, GitHub Actions integration, three full annotated example policies, and a troubleshooting table covering 11 failure modes
/platform-skills:kyverno (commands/kyverno.md) — five modes: generate (write CEL-based policy from description, always Audit-first), test (write kyverno-test.yaml with pass/fail/skip fixtures), audit (rank PolicyReport violations and produce Deny promotion plan), debug (diagnose webhook registration, matchConstraints/matchConditions gaps, background scan, CEL errors, PolicyException suppression), migrate (legacy ClusterPolicy→new-type translation table, PSP field mapping, Gatekeeper ConstraintTemplate translation)
examples/kyverno/ — two ValidatingPolicy examples (require-team-labels.yaml, disallow-privileged-containers.yaml), one GeneratingPolicy example (generate-default-networkpolicy.yaml), four test resource fixtures, and a kyverno-test.yaml that exercises pass and fail cases for each validate policy

[1.10.0] - 2026-05-09

Added

OPA / Conftest Domain

New Domain 19: OPA / Conftest — generate Rego policies, write unit tests, run the full validation pipeline (fmt → regal lint → conftest verify → integration test), explain existing policies in plain English, and debug why a rule is not firing.

references/opa.md — Rego v1 syntax, METADATA blocks, rule types (deny/warn/violation), package namespacing, input shape analysis with conftest parse, policy examples (Terraform IAM, Kubernetes pod security), unit test patterns with input fixtures, full validation pipeline (conftest fmt, regal lint, conftest verify, integration test), GitHub Actions workflow, shared data.* allow-lists, and troubleshooting table
/platform-skills:opa (commands/opa.md) — five modes: generate (write Rego from description with correct structure), test (write _test.rego unit tests with fixtures), validate (run fmt/regal/verify/integration pipeline), explain (translate policy to plain English), debug (diagnose namespace mismatch, wrong rule name, input shape issues)
examples/opa/conftest/ — working S3 encryption policy (deny_unencrypted_s3.rego), full unit test suite (deny_unencrypted_s3_test.rego), sample Terraform input, and .regal/config.yaml

[1.9.0] - 2026-05-09

Added

Conventional Commits Domain

New Domain 18: Conventional Commits — analyze git diffs or staged changes, generate commit messages that explain WHY a change was made, intelligently stage files for atomic commits, and validate messages against the Conventional Commits 1.0.0 specification.

references/conventional-commits.md — message structure (subject/body/footer rules), type classification table with decision guidance, scope rules, breaking change patterns with examples, atomic commit strategy with git add -p, full examples for fix/feat/chore/breaking/revert, commitlint setup with scope allow-list, husky git hook configuration, GitHub Actions PR title lint workflow, semantic-release configuration with version bump rules, and a validation rules table
/platform-skills:commit (commands/commit.md) — four modes: analyze (classify type/scope/breaking from diff), generate (produce full conventional commit message), stage (intelligently group and stage files for atomic commits), validate (check an existing message against the spec)
examples/conventional-commits/commitlint/ — commitlint config with scope allow-list, husky commit-msg hook, and package.json for installation

Datadog Domain

New Domain 16: Datadog — deploy and configure the Datadog Agent on Kubernetes, instrument services with APM and log correlation, and manage monitors, dashboards, SLOs, and synthetic tests as Terraform-managed resources.

references/datadog.md — Agent Helm setup, APM instrumentation (Node.js/Python), Unified Service Tagging, Log Management with processing rules, Monitors (Terraform), Dashboards (Terraform), SLOs, Synthetic tests, and troubleshooting table
/platform-skills:datadog (commands/datadog.md) — six modes: setup, instrument, monitor, dashboard, slo, debug
examples/datadog/terraform/monitors.tf — Terraform monitors for error rate and latency, plus SLO resource

Dynatrace Domain

New Domain 17: Dynatrace — deploy OneAgent via the Kubernetes Operator with automatic injection, configure anomaly detection, ingest custom metrics, define SLOs, and manage dashboards and alerting profiles with the Dynatrace Terraform provider.

references/dynatrace.md — Operator and DynaKube CR setup, Node.js/Python/Java SDK instrumentation, Log Monitoring, custom metrics via MINT API, SLOs, Terraform provider (anomaly detection, SLOs, alerting), Davis AI problem feeds, troubleshooting table, and token scopes
/platform-skills:dynatrace (commands/dynatrace.md) — six modes: setup, instrument, monitor, slo, dashboard, debug
examples/dynatrace/operator/dynakube.yaml — production DynaKube CR with cloudNativeFullStack injection and ActiveGate

[1.8.0] - 2026-05-08

Added

MCP Domain (Model Context Protocol)

New Domain 13: MCP (Model Context Protocol) — build, review, and debug MCP servers and clients covering TypeScript and Python SDKs, Zod/Pydantic schema validation, stdio/HTTP/SSE transports, protocol compliance, authentication, and rate limiting.

references/mcp.md — protocol fundamentals, TypeScript SDK patterns, Python FastMCP patterns, schema design, error handling, testing with MCP Inspector, security, and deployment checklist
/platform-skills:mcp (commands/mcp.md) — three modes: create (scaffold production-ready server), review (protocol compliance and security audit), debug (diagnose transport/protocol/schema/handler failures)
examples/mcp/docs-server/ — complete TypeScript MCP server with search_docs and get_doc tools and a docs://index resource

Observability Domain

New Domain 14: Observability — instrument services with structured logging, Prometheus metrics, and OpenTelemetry tracing; build Grafana dashboards, write alerting rules, run k6 load tests, and plan capacity.

references/observability.md — Pino/structlog structured logging, prom-client/prometheus-client metrics, OpenTelemetry tracing, RED/USE method alerting rules, Grafana dashboard templates, k6 load testing, and capacity planning formulas
/platform-skills:observability (commands/observability.md) — five modes: instrument, dashboard, alert, loadtest, capacity
examples/observability/prometheus-alerts/ — RED method Prometheus alerting rules for a production service

Documentation Domain

New Domain 15: Documentation — generate and validate inline docstrings (Google/NumPy/Sphinx/JSDoc), OpenAPI 3.1 specifications, documentation sites (MkDocs, TypeDoc), and developer getting started guides.

references/documentation.md — Google/NumPy/Sphinx docstring styles, TypeScript JSDoc, OpenAPI 3.1 spec with shared components, FastAPI and NestJS auto-documentation, coverage measurement, MkDocs site setup, and guide structure
/platform-skills:document (commands/document.md) — four modes: docstrings, openapi, site, guide
examples/documentation/openapi-spec/ — complete OpenAPI 3.1 Orders API with shared components, security schemes, and response definitions

[1.7.0] - 2026-05-08

Added

Helm Domain (Helmcheck)

New Domain 12: Helm (Helmcheck) — production-grade Helm chart development covering scaffolding, values design, template patterns, dependency management, security hardening, and the full lint/validation pipeline.

Reference guide:

references/helm.md — complete template patterns, standard _helpers.tpl, security-hardened Deployment baseline, conditional resource templates (Ingress, HPA, PDB, NetworkPolicy), values.yaml design principles, values.schema.json type safety, dependency management, lint/validation pipeline, and GitOps integration (Flux HelmRelease, Argo CD Application)

Slash command:

/platform-skills:helmcheck (commands/helmcheck.md) — three modes: create (scaffold a production-ready chart), review (analyse chart structure and quality), security (audit RBAC, pod security, network policies, and secrets handling)

Example chart:

examples/helm/web-service/ — full production chart: security-hardened Deployment, Service, Ingress, ServiceAccount, HPA, PDB, NetworkPolicy, values.schema.json, and helm test pod compliant with restricted PodSecurity

Platform rules added:

Helm lint pipeline rule: enforce helm lint --strict → helm template --debug → kubeconform -strict -summary → checkov → helm test in order; fail CI on any helm lint --strict warning

Tests:

tests/validate-helmcheck.sh — 62 checks covering command structure, SKILL.md integration, reference guide sections, chart file presence, Chart.yaml correctness, schema validity, security baselines, template safety, and HOW_IT_WORKS completeness

Documentation

HOW_IT_WORKS.md — explains how AI coding agents work, how skills activate, how to write effective prompts, how the review workflow runs, and what the agent cannot do

[1.6.0] - 2026-04-11

Added

Compliance Domain (SOC 2 for Terraform)

New Domain 11: Compliance — SOC 2 Trust Services Criteria mapped to Terraform patterns, covering 11 criteria across access, encryption, detection, logging, change management, and availability.

Reference guide:

references/compliance.md — full SOC 2 TSC coverage with Terraform patterns, Checkov rule IDs, AWS CLI evidence commands, and a pre-audit readiness checklist

Criteria covered with Terraform patterns, Checkov rules, and evidence commands:

CC6.1 — IAM least privilege: scoped policies, IRSA, SCPs denying privilege escalation
CC6.2 — Authentication: MFA-enforcement SCP, GitHub Actions OIDC (no static credentials)
CC6.3 — Access removal: access key rotation via Config access-keys-rotated rule
CC6.6 — Network security: security groups, private subnets, VPC flow logs, WAF (aws_wafv2_web_acl) with rate limiting and AWS managed rule groups
CC6.7 — Encryption: S3, RDS, EBS, KMS; extended to DynamoDB, ECR, ElastiCache, OpenSearch, Kinesis, EFS, and Redshift
CC6.8 — Vulnerability management: aws_ecr_registry_scanning_configuration (ENHANCED + CONTINUOUS_SCAN), aws_inspector2_enabler, aws_ssm_patch_baseline
CC7.1 — Detection: aws_guardduty_detector (S3, EKS, malware sources), 14 CIS Benchmark CloudWatch metric filters and alarms (CIS 3.1–3.14), aws_securityhub_account with AWS Foundational and CIS standards
CC7.2 — Audit logging: multi-region CloudTrail with KMS encryption, S3 object lock (COMPLIANCE mode, 365-day), AWS Config recorder with 12 SOC 2-relevant managed rules, VPC flow logs
CC7.3 — Incident response: KMS-encrypted SNS topic with email and PagerDuty subscriptions, GuardDuty HIGH/CRITICAL findings → EventBridge → SNS pipeline, Config delivery channel with SNS notification
CC8.1 — Change management: S3 + DynamoDB state locking, PR-gated Terraform plan/apply workflow with plan posted as PR comment
A1.1 — Availability: multi-AZ EKS node groups, RDS multi-AZ
A1.2 / A1.3 — Backup and recovery: aws_backup_plan (daily 35-day + monthly 1-year), aws_backup_vault_lock_configuration (COMPLIANCE mode, 35-day minimum), cross-region copy for DR, tag-based backup selection

Slash command:

/platform-skills:compliance (commands/compliance.md) — five modes: gap (find audit blockers), control (implement a specific criterion), evidence (generate audit-ready CLI commands), remediate (fix a Checkov finding with blast-radius analysis), checklist (full SOC 2 readiness review)

Examples:

examples/compliance/checkov-config.yaml — Checkov config with all SOC 2 check IDs grouped by criterion and documented suppression format
examples/compliance/iam/ — IRSA application role, GitHub Actions OIDC trust (plan + apply), SCPs for MFA enforcement and privilege escalation prevention
examples/compliance/logging/ — CloudTrail with object lock, AWS Config recorder with 12 managed rules, VPC flow logs
examples/compliance/network/ — WAF, security groups, and VPC flow logs
examples/compliance/encryption-data-services/ — DynamoDB, ECR, ElastiCache, OpenSearch, Kinesis, EFS, and Redshift encryption/logging controls
examples/compliance/vulnerability/ — Inspector v2, ECR enhanced scanning, and SSM patching
examples/compliance/detection/ — GuardDuty, CIS CloudWatch alarms, and Security Hub
examples/compliance/incident-response/ — KMS-encrypted SNS, EventBridge alert routing, and PagerDuty subscription
examples/compliance/backup/ — AWS Backup plan, vault lock, and cross-region DR copies

Infrastructure updates:

Domain 11 (Compliance) added to skills/platform-skills/SKILL.md tool-routing and reference file list
references/compliance.md added to REQUIRED_REFERENCES in tests/validate-skill.sh
examples/compliance added to EXAMPLE_DOMAINS in tests/validate-skill.sh
compliance, soc2, checkov, audit-logging, iam-least-privilege, cloudtrail added to marketplace.json keywords

[1.5.0] - 2026-04-03

Added

references/platform-mindset.md — product mindset, developer experience (SPACE/DORA metrics, friction audits, golden paths, Backstage), collaboration and communication (RFC/ADR templates, incident updates, blameless post-mortems), and proactive problem-solving (systemic fixes, capacity planning, cost optimisation loop, quarterly platform health review)
/platform-skills:product command (commands/product.md) — applies product thinking to platform work across nine topics: devex, friction, rfc, adr, incident, postmortem, capacity, cost, review
references/linux-networking.md — Linux administration (process/service management, disk, memory/CPU, kernel tuning, permissions) and networking fundamentals (DNS record types, CoreDNS, Route 53, Azure Private DNS, ALB/NLB, Ingress/Gateway API, VPC/VNet CIDR design, subnet tiers, peering vs Transit Gateway, PrivateLink, NSGs, troubleshooting checklists)
/platform-skills:linux command (commands/linux.md) — structured guidance for dns, lb, vpc, process, disk, network, security-groups, and troubleshoot topics
Domain 8 added to SKILL.md tool-routing section: Linux & Networking
Domain 9 added to SKILL.md tool-routing section: Platform Mindset

[1.4.2] - 2026-04-02

Changed

Moved SKILL.md from the repository root to skills/platform-skills/SKILL.md to match the Claude Code proper skill directory format
Updated .claude-plugin/plugin.json to declare "skills": ["./skills/platform-skills"] (directory reference) instead of the old root-level path
Updated tests/validate-skill.sh to reference skills/platform-skills/SKILL.md in all three validation checks

[1.4.1] - 2026-04-02

Added

.claude-plugin/plugin.json — registration manifest explicitly declaring plugin name, version, author, skill (SKILL.md), and all 5 command paths so Claude Code uses consistent /platform-skills:<name> namespacing regardless of install directory name

[1.4.0] - 2026-04-02

Added

Slash Commands

Five explicit slash commands added under commands/ — invoked as /platform-skills:<name>:

/platform-skills:debug — structured troubleshooting: classifies the problem layer, lists evidence commands, forms a root-cause hypothesis, proposes a fix with validation and rollback
/platform-skills:review — production-readiness review of any manifest, Terraform file, workflow, or Helm values: correctness, security, operational safety, deprecations
/platform-skills:terraform — full Terraform validation pipeline walkthrough (fmt, validate, tflint, tfsec/checkov), blast radius, IAM risk, state impact, module design
/platform-skills:gitops — Flux CD and Argo CD troubleshooting: classifies by source/artifact/reconciliation/runtime layer, provides exact evidence commands, fix, and rollback
/platform-skills:linkerd — Linkerd diagnostics: injection, mTLS, authorization policy, observability, traffic management, multi-cluster, control plane

[1.3.0] - 2026-04-02

Added

Reference Guides

references/linkerd.md: mTLS, proxy injection, authorization policy, traffic management (HTTPRoute, retries, timeouts, canary splits), golden-signal observability (linkerd viz stat/tap), Prometheus PodMonitor integration, multi-cluster mirroring, and 5-scenario troubleshooting guide
Linkerd routing rule added to SKILL.md tool classification
references/linkerd.md added to tests/validate-skill.sh required references

Changed

CONTRIBUTING.md: fixed local install commands (claude plugin install . → claude plugin marketplace add $(pwd) + claude plugin install platform-skills); added upgrade flow; updated What We're Looking For to reflect current roadmap priorities

[1.2.0] - 2026-04-02

Added

Reference Guides

Added RBAC troubleshooting to references/kubernetes.md: 401 vs 403 diagnosis, kubectl auth can-i evidence collection, binding scope matrix (Role/ClusterRole × RoleBinding/ClusterRoleBinding), ClusterRoleBinding audit query
Added image automation section to references/fluxcd.md: ImageRepository, ImagePolicy, ImageUpdateAutomation setup, registry auth table (GHCR/ECR/ACR/GAR/Docker Hub), troubleshooting table, safety rules for staging vs. production promotion
Added references/secrets.md: decision matrix (ESO vs. Sealed Secrets), ESO setup for AWS/Azure/Vault/static credentials, Sealed Secrets seal/rotate/backup workflow, troubleshooting tables for both patterns, least-privilege IAM example, operational rules
Added references/secrets.md to REQUIRED_REFERENCES in tests/validate-skill.sh

Plugin

Bumped marketplace description and keywords to be developer-first (removed enterprise-only framing)
Added keywords: helm, docker, containers, deployment, rbac, secrets, security, gke
Updated SKILL.md activation description and body framing for discoverability by any developer
Fixed assignees format in renovate.json (removed @ prefix)
Fixed kubeconform flag in validate.yml: -skip-kinds → -skip (correct flag name for v0.6.4)
Fixed LICENSE copyright placeholder [yyyy] [name of copyright owner] → 2026 Nitin Jain

[1.1.0] - 2026-04-02

Added

Reference Guides

Expanded AWS reference with tagging guidance: default_tags provider block, ASG propagate_at_launch, EBS/Lambda propagation gaps, AWS Config required-tags rule, cost allocation tag activation steps, org-level tag policy enforcement
Expanded Azure reference with tagging guidance: merge(local.common_tags, {...}) pattern, tag inheritance gap explanation, Azure Policy deny/modify enforcement, remediation task for existing resources, AKS managed resource group tagging
Added tagging rule to SKILL.md: enforce a baseline via provider-level mechanisms; specific keys are an organizational decision

Example Assets

Added real example assets for previously stub domains: examples/kubernetes/*.yaml (4 files), examples/openshift/*.yaml (2 files), examples/aws/iam/*.json (2 files), examples/azure/workload-identity/ (main.tf + serviceaccount.yaml)

Testing

Added tests/validate-skill.sh — checks SKILL.md frontmatter, all reference files exist, each example domain has at least one asset beyond README.md, SKILL.md references every reference file; wired into validate.yml as a blocking CI job

Developer Experience

Added .github/copilot-instructions.md — GitHub Copilot automatically applies Platform Skills patterns (no Claude Code required)
Added VSCODE_INTEGRATION.md — comprehensive guide for VSCode with Claude Code extension, GitHub Copilot split-screen, and browser workflows
Added QUICKSTART.md — 5-minute install and first-use guide
Added INSTALLATION.md — full installation methods, team setup, troubleshooting

Dependency Management

Scoped Renovate automerge catch-all rule to explicit managers (terraform, helmv3, kubernetes, docker-compose) to prevent accidental automerge of GitHub Actions

Fixed

CI/CD Workflows

Fixed validate.yml and release.yml marketplace.json validation — field paths now match marketplace format (plugins[0].version, plugins[0].description, etc.)
Replaced deprecated actions/create-release (archived action) with gh release create CLI in release.yml
SHA-pinned hashicorp/setup-terraform in validate.yml — floating @v4.0.0 tag was causing the workflow's own security check to fail
Removed unused actions/setup-node step from publish-marketplace job

Documentation

Fixed dead Discord # placeholder link in README — replaced with real URL
Added QUICKSTART.md, INSTALLATION.md, and VSCODE_INTEGRATION.md to README navigation and repository structure table
Fixed hardcoded v1.0.0 examples in CHANGELOG release checklist — replaced with vX.Y.Z placeholder
Fixed Argo CD example Application path: fields — were pointing at Flux monorepo paths instead of Argo CD-appropriate paths
Fixed Azure workload-identity main.tf — added required_providers block with minimum azurerm >= 3.87.0

Changed

README repositioned as handbook-first — skill layer described as optional, not the primary product
Updated README repository structure tree to show real example files rather than README.md-only entries
Trimmed VSCode install detail from README to a single pointer to VSCODE_INTEGRATION.md — install story now lives in one place
Marketplace distribution: personal marketplace now named platform-skills (was platform-skills-marketplace)
Owner contact updated to personal email
All CLI command references updated: binary is claude, subcommand is claude plugin (not claude-code skill)

[1.0.0] - 2026-04-02

Initial release of Platform Skills - A comprehensive Claude Agent Skill for platform engineering across 8 domains: Kubernetes, OpenShift, Argo CD, Flux CD, AWS, Azure, Terraform, and GitHub Actions.

Added

Automation & CI/CD

GitHub Actions workflow for automated releases (.github/workflows/release.yml)
- Version validation and consistency checks
- Quality checks (markdown, YAML, Terraform, security)
- Automatic GitHub Release creation
- Marketplace publication preparation
GitHub Actions workflow for continuous validation (.github/workflows/validate.yml)
- Repository structure validation
- Markdown linting and link checking
- YAML and Kubernetes manifest validation
- Terraform format and validation checks
- Security scanning for secrets and action pinning
Renovate configuration for automated dependency updates (renovate.json)
- GitHub Actions SHA pinning with automatic updates
- Terraform provider version management
- Helm chart version tracking
- Container image update monitoring
- Security vulnerability alerts
Consolidated release process documentation in CONTRIBUTING.md
Removed redundant internal documentation (QUALITY_ASSURANCE.md, WORKFLOWS_SUMMARY.md)
Cleaned up experimental files for production release
Clarified distribution model: GitHub repository as primary distribution
Fixed README structure (removed duplicate Installation headers)
Updated marketplace.json with accurate repository URL and description

Core Features

Initial release of Platform Skills
Core skill definition in SKILL.md with activation triggers and troubleshooting framework
GETTING_STARTED.md for new user onboarding
Reference guides for 8 domains:
- Platform Operating Model - Cross-cutting architecture and ownership patterns
- Kubernetes - Cluster baselines, workload patterns, and policy defaults
- OpenShift - Routes, SCC-aware workload design, operators, and tenancy
- Argo CD - Projects, app-of-apps, ApplicationSet patterns, and promotion flows
- Flux CD - GitOps reconciliation and repository structure patterns
- AWS - Account model, EKS, IAM, and cloud foundations
- Azure - Subscription model, AKS, RBAC, and resource management
- Terraform - Module architecture, state management, and validation
- GitHub Actions - Workflow security, reusability, and promotion patterns
Working examples for all 8 domains:
- Kubernetes: Namespace baselines, deployment patterns, network policy, pod disruption budgets
- OpenShift: Routes, quotas, and platform-specific security adaptation
- Argo CD: App-of-apps root application manifest
- Flux CD: Complete monorepo structure with production and staging environments
- AWS: IAM, VPC, EKS patterns
- Azure: AKS, workload identity
- Terraform: Production EKS module, multi-environment structures
- GitHub Actions: Complete CI/CD pipelines, Flux sync validation, container builds
Comprehensive README with installation and usage instructions
Contributing guidelines for community participation
Skill development guide (CLAUDE.md) with philosophy and patterns
Apache-2.0 license with NOTICE file
Claude Code marketplace integration (dual distribution: marketplace + local install)

Core Principles Established

Production-first mindset with blast radius awareness
Root-cause analysis over symptom treatment
Explicit rollback plans for all risky operations
Security by default with least-privilege patterns
Progressive disclosure from quick answers to deep dives

Problem Classification Framework

Kubernetes: Baseline standards, workload patterns, security controls, operational consistency
OpenShift: Platform-specific constraints, GitOps integration, security and tenancy, day-2 operations
Argo CD: Repository patterns, reconciliation model, promotion model, safety rules
Flux CD: Source, Artifact, Reconciliation, Chart Rendering, Runtime issues
AWS: Access/Auth, Network, Service-Specific, Cost, Compliance
Azure: Access/Auth, Network, Service-Specific, Cost, Compliance
Terraform: State Conflicts, Plan Failures, Apply Failures, Drift, Module Design
GitHub Actions: Workflow Syntax, Permissions, Performance, Security, Reliability

Troubleshooting Structure

Symptom identification
Evidence collection commands
Hypothesis formation
Diagnostic validation
Specific fix with justification
Verification steps
Prevention strategies
Rollback procedures

Best Practices Documented

Kubernetes: Platform baselines, workload patterns, security policies, operational rules
OpenShift: Route patterns, SCC compatibility, operator usage, tenant isolation
Argo CD: App-of-apps design, ApplicationSet patterns, sync control, promotion flows
Flux CD: Reconciliation patterns, repository structures, multi-tenancy, progressive delivery
AWS: IAM least privilege, tagging standards, EKS patterns, OIDC federation
Azure: Managed identities, policy enforcement, AKS configuration, workload identity
Terraform: Module conventions, state isolation, validation pipelines, testing strategies
GitHub Actions: Security controls, reusable workflows, OIDC authentication, SHA-pinned actions

Quality & Security Improvements

Fixed workflow validation subshell issues - error counts now properly propagate
Made Terraform validation blocking in release workflow
SHA-pinned all GitHub Actions in examples (no mutable @v3/@v4 tags)
Fixed tflint_version from "latest" to specific version (v0.50.3)
Fixed malformed nested Markdown in contribution guidelines
Updated all reference files to be reconciler-agnostic (supports both Flux CD and Argo CD)
Fixed Argo CD example paths to reference existing repository structure
Clarified dual distribution model: Claude marketplace (primary) + local installation (customization)

Roadmap Items Completed

✅ Added Argo CD patterns alongside Flux CD
✅ Added Kubernetes platform baseline patterns
✅ Added OpenShift operating patterns

Release Process

Version Numbering

Major (X.0.0): Breaking changes to skill interface or structure
Minor (1.X.0): New patterns, reference guides, or significant enhancements
Patch (1.0.X): Bug fixes, clarifications, or minor updates

What Warrants a Release

Major Release:

Restructuring of core SKILL.md that changes skill behavior
Breaking changes to reference file structure
Removal of deprecated patterns

Minor Release:

New reference guides (e.g., adding GCP patterns)
Significant new troubleshooting sections
New best practice patterns
Tool version updates requiring new approaches

Patch Release:

Typo fixes and clarifications
Broken link fixes
Command syntax updates
Minor example improvements

Release Checklist

Before releasing:

Update version in .claude-plugin/marketplace.json
Update this CHANGELOG with release notes
Verify all examples work with current tool versions
Test skill activation in Claude Code
Review all external links
Tag release in git: git tag -a vX.Y.Z -m "Release vX.Y.Z"
Push tag: git push origin vX.Y.Z
Create GitHub release with changelog excerpt
Update marketplace if applicable

Tool Version Compatibility

Current Testing Matrix

Tool	Version	Last Verified
Flux CD	2.2+	2026-04-02
Terraform	1.5+	2026-04-02
AWS CLI	2.x	2026-04-02
Azure CLI	2.50+	2026-04-02
kubectl	1.28+	2026-04-02
Helm	3.12+	2026-04-02

Deprecation Notices

None currently.

Migration Guides

Upgrading from Pre-1.0

N/A - Initial release

Contributors

Thank you to all contributors who helped build Platform Skills:

@nitinjain999 - Initial skill design and implementation

See CONTRIBUTING.md to join this list!

Links

.claude-plugin

.github

docs

examples

scripts

skills

tests

EDITOR_INTEGRATIONS.md

nitinjain999/platform-skills

CHANGELOG.md

Changelog

[1.28.0] - 2026-05-30

Added

[1.27.0] - 2026-05-28

Added

Contributors

[1.26.0] - 2026-05-26

Added

Contributors

[1.25.20] - 2026-05-24

Added

Fixed

[1.26.14] - 2026-05-24

Fixed

[1.26.13] - 2026-05-24

Fixed

[1.25.4] - 2026-05-24

Changed

[1.25.3] - 2026-05-24

Changed

[1.25.2] - 2026-05-24

Fixed

[1.25.1] - 2026-05-24

Fixed

[1.25.0] - 2026-05-24

Added

FluxCD reference suite — 8 new deep-dive references + Datadog/Dynatrace integration

FluxCD troubleshooting reference + cluster-debug improvements

FluxCD audit tooling — upstream scripts, migration guide, security checklist

FluxCD Skills Enhancement (Domain 29 — GitOps Audit)

FluxCD command router + examples rename

Changed

[1.24.0] - 2026-05-24

Added

Composite GitHub Actions (Domain 28)

Contributors

[1.23.0] - 2026-05-23

Changed

Self-Improve: explicit init global / init local subcommands

Self-Improve: new subcommands — status and migrate

Self-Improve: promote supports global target

Self-Improve: review flags stale resolved entries

Self-Improve: resume warns on stale working buffer

Self-Improve: SKILL.md description updates

Self-Improve: global path resolution for cross-project learnings

Self-Improve: cross-platform support (Windows and Linux distributions)

Self-Improve: path resolution and hook setup fixes

Contributors

[1.22.0] - 2026-05-22

Added

AWS CloudFront + WAF + Lambda@Edge (Domains 33–34)

[1.21.0] - 2026-05-21

Added

Awesome Docs (Domain 27)

[1.20.0] - 2026-05-20

Added

Datadog Labs + LLM Observability

[1.19.0] - 2026-05-20

Added

Chaos Engineering (Domain 27)

DORA Metrics (Domain 28)

[1.18.0] - 2026-05-20

Added

Self-Improvement: SESSION-STATE, Daily Notes, VBR, Context Threshold

[1.17.0] - 2026-05-20

Added

Changed

[1.16.0] - 2026-05-20

Added

Supply Chain Security — Domain 25

Runtime Security — Domain 26

[1.15.0] - 2026-05-20

Added

Agent Self-Improvement — Domain 24

[1.14.0] - 2026-05-16

Added

KEDA — Kubernetes Event-Driven Autoscaling

[1.13.0] - 2026-05-16

Self-Improve: explicit `init global` / `init local` subcommands

Self-Improve: new subcommands — `status` and `migrate`

Self-Improve: `promote` supports global target

Self-Improve: `review` flags stale resolved entries

Self-Improve: `resume` warns on stale working buffer