CtrlK
BlogDocsLog inGet started
Tessl Logo

tessleng/skill-insights

Scan a directory or workspace for SKILL.md files across all agents and repos, capture supporting files (references, scripts, linked docs), dedupe vendored copies, enrich each Tessl tile with registry signals, and emit a canonical JSON inventory validated by JSON Schema. Then run four analytical phases in parallel against the inventory — staleness + git provenance (history, broken refs, contributors), quality (Tessl `skill review`), duplicates (similarity + LLM judgement), registry-search (per-standalone-skill registry suggestions, HTTP only) — and render a self-contained interactive HTML report with a top-of-report health overview, top-issues panel, recently-changed list, and per-tessl.json manifests view.

84

1.44x
Quality

90%

Does it follow best practices?

Impact

97%

1.44x

Average score across 2 eval scenarios

SecuritybySnyk

Advisory

Suggest reviewing before use

Overview
Quality
Evals
Security
Files

Skill Insights

A Tessl tile that scans one or more repositories for SKILL.md files, enriches every Tessl tile it finds with registry signals (quality, security, uplift, eval coverage, version drift, context cost), then runs four analytical phases — staleness + git provenance, quality, duplicates, registry-search — and produces a self-contained interactive HTML report. All cross-phase JSON outputs are validated against JSON Schemas at the IO boundary.

A separate sibling skill, posthog-skill-query, is standalone (not part of the orchestrator) and pulls org-wide skill / MCP / session telemetry from PostHog into its own JSON + HTML artifact. It exists to feed a downstream cross-reference step alongside the per-repo phases.

The unique value is multi-repo aggregate visibility — Tessl's registry knows about one tile at a time; skill-insights spans the full skill estate across repos, including vanilla Claude / Cursor skills that aren't on the registry at all.

Pipeline

discover-skills → discovery.json (skills[] + tiles[] with full enrichment)
                     ↓
       ┌────────┬────┴────┬────────────┐
       ↓        ↓         ↓            ↓
   staleness  quality  duplicates  registry-search
   (script) (script+LLM)(script+LLM)  (script+HTTP)
       ↓        ↓         ↓            ↓
       └────────┴────┬────┴────────────┘
                     ↓
                render report

Each analytical phase reads discovery.json independently and writes its own output JSON. They never share state. The render step inlines all five JSONs into one self-contained HTML report.

posthog-skill-query runs outside this orchestrated pipeline. It writes a separate org_usage.json + org_usage.html and reads no other phase's output. See its section below.

Running it

From any directory:

/tessl__run-skill-insights

By default scans $(pwd). The discovery script supports three repo-selection modes:

ModeTriggerBehaviour
Single reposcan-root is a git repoJust that repo
Workspace autoscan-root is a parent dir with git childrenEvery immediate .git-having child
Explicit selectionOne or more --repo PATH flagsExactly the listed repos; ignores other children of scan-root
# Cherry-pick: scan only monorepo + lightdash from a workspace dir
discover_skills.py \
  --scan-root ~/repos \
  --repo ~/repos/monorepo \
  --repo ~/repos/lightdash

Output lands in .skill-insights/:

.skill-insights/
├── discovery.json        ← canonical inventory + per-tile-instance enrichment (schema 1.4)
├── staleness.json        ← per-skill staleness scores + git provenance + estate summary (schema 1.1)
├── quality.json          ← per-skill review scores + per-tile rollup (schema 2.0)
├── duplicates.json       ← duplicate clusters + overlapping pairs (schema 1.0)
├── registry-search.json  ← per-standalone-skill registry suggestions (schema 1.2)
└── report.html           ← self-contained interactive report

The duplicates phase creates intermediate duplicates-prompts/ and duplicates-verdicts/ directories during its subagent dispatch. The finalize step deletes them once the verdicts are rolled up; pass --keep-intermediates to preserve them.

Tier classification

Every skill is stamped with a tier reflecting how it lives in the repo:

TierWhat it means
published_tileOwned by a Tessl tile installed from the registry (.tessl/tiles/...)
authored_tileOwned by a Tessl tile authored locally (source: "file:..." in tessl.json, or a tile.json under tiles/ with no declaration)
github_tileOwned by a Tessl tile installed from a GitHub source
claude_pluginOwned by a Claude plugin (.claude-plugin/plugin.json)
non_tileLoose SKILL.md not part of any tile/plugin (e.g. .claude/skills/foo/SKILL.md in a vanilla Claude repo)

Tiles get the same tier (minus non_tile). Tile records are materialised instances, so a local authored source and an installed .tessl/ copy of the same tile name stay separate. A locally-authored tile (tier: authored_tile) can also be published_to_registry: true — that's a common pattern for Tessl-internal tiles authored AND published from the same monorepo. The two flags are orthogonal.

Each phase, briefly

Discovery (deterministic, with optional Tessl CLI enrichment)

skills/discover-skills/scripts/discover_skills.py (Python stdlib only).

Per skill (always available):

  • Glob walk for SKILL.md files; symlink-following with per-chain cycle detection
  • Content hashing → vendored dedup within a repo
  • Frontmatter parsing
  • Supporting-file capture: references/, scripts/, markdown links, @imports, inline backtick paths
  • Tier stamping per skill

Per tile (when tessl CLI is available + authenticated):

  • Registry call to GET /v1/tiles/{ws}/{name}/versions/{ver} for every tile that resolves on the registry — pulls aggregate / quality / impact / security / eval scores / uplift multiplier / moderation / archived status / fingerprint
  • One tessl outdated --json per scanned repo, mapped per tile to detect "newer version available"
  • One tessl tile lint per tile parsed into front-loaded + on-demand token totals (per-skill breakdown)

Broken-reference detection (git-history backed):

A path-shaped reference (markdown link, @import, or inline backtick) is flagged as broken iff:

  1. It resolves to a repo-relative path
  2. Git history shows the file was tracked at some point
  3. The file no longer exists at HEAD

References to paths that were never tracked are silently ignored — no false positives from prose mentioning external code or package names. No extension allowlist; the repo's own git history is the oracle.

See references/schemas/discovery.schema.json for the full output contract. The phase script validates its output against this schema before writing.

Staleness + git provenance (deterministic, no LLM)

skills/analyze-skill-staleness/scripts/analyze_staleness.py. Reads discovery.json, runs a single git log per skill (with path-priority fallback so vendored gitignored copies still find their tracked source). One stream gives us age, commit count, and per-skill provenance (created_by, last_modified_by, top contributors, recent commits) all derived in-process. Extracts broken refs from discovery warnings, computes a 0-100 staleness score per skill.

Score factors:

  • Age tiers (>30d / >90d / >180d / >365d, compounding)
  • Broken-reference count (capped at +30)
  • Repo-relative age (>1.5× repo median + already old)
  • Never-updated (commit_count == 1)
  • Registry update available for the owning tile (+15)
  • No git history at all (+20)

Buckets: fresh / warm / stale / ancient / unknown.

Provenance fields per skill (added in schema 1.1) — useful for "who created this skill / who last touched it / who else worked on it". null when no git history is available. See references/schemas/staleness.schema.json.

Quality (Tessl CLI driven)

skills/analyze-skill-quality/scripts/analyze_quality.py. Single async script that shells out to tessl skill review --json per skill in parallel batches (concurrency 8 by default).

  • Per-skill: review score (0-100), verdict band, validation results, dual-judge breakdown (description + content) with per-dimension scores and suggestions
  • Per-tile rollup: pulls registry.scores.quality from discovery.tiles[] when available, falls back to mean of per-skill review scores otherwise
  • --skip-published-skills flag: trade per-skill detail for speed by passing through registry tile-level scores instead of running review per skill

Cost: ~12s per skill cold, but LiteLLM caches at the proxy for 24h — repeat scans on unchanged content are sub-second per skill. For 72 skills cold: ~2 min wall-clock. See references/schemas/quality.schema.json.

This phase replaces an earlier custom 6-dimension rubric. We use Tessl's canonical assessment so the scores match what tessl skill publish gates on.

Duplicates (similarity + LLM)

skills/detect-skill-duplicates/scripts/{prepare_duplicates,finalize_duplicates}.py.

Hybrid:

  1. Prep: Jaccard similarity pre-screen over tokenised name + description (and body preview). Generates candidate pairs (within-repo by default; --allow-cross-repo to broaden). Writes one judgement prompt per pair to duplicates-prompts/<idx>.txt.
  2. Dispatch: one subagent per pair, all dispatched in a single parallel batch (default --max-pairs 10, so 10 subagents at most). Each returns duplicate / overlapping / independent.
  3. Finalize: union-find on duplicate verdicts → transitive clusters. Picks dominant_skill_id per cluster. overlapping verdicts go into a separate list. Cleans up prompts/verdicts dirs unless --keep-intermediates.

See references/schemas/duplicates.schema.json (final output) and references/schemas/duplicate-verdict.schema.json (per-pair subagent verdict contract).

Registry search (deterministic, HTTP only)

skills/registry-search/scripts/registry_search.py. Reads discovery.json, filters to skills with source_type: "standalone" (skills not yet declared in any tile or agent harness), and queries the Tessl registry's /experimental/search endpoint twice per candidate (once filtered to type=skill, once to type=tile, both searchMode=hybrid, page[size]=1). The higher-aggregate-score hit is recorded as best_match, tagged with kind: "skill" | "tile" so consumers read kind-specific fields without re-querying. Per-request HTTP errors are captured in search_errors[] without aborting the phase.

Anonymous — no Tessl auth required; /experimental/search returns public registry data. Stdlib + HTTP only, no LLM. Runtime is dominated by HTTP latency: ~5-10s for 30 standalone skills at concurrency 8.

The phase exists to flag where a published Tessl tile or skill could replace a locally-authored standalone one. Skills that already belong to a tile (tessl_tile_skill, claude_skill, cursor_skill, agents_skill, claude_plugin_skill) are skipped because the user has already chosen a source for them. The skipped count surfaces as metadata.skills_skipped_non_standalone.

See references/schemas/registry-search.schema.json.

PostHog org usage (standalone sibling)

skills/posthog-skill-query/scripts/{fetch_org_usage,render_org_usage}.py. Pulls organisation-wide skill / MCP / session telemetry from PostHog (the cli:agent-signals:* events emitted by the Tessl CLI) and writes org_usage.json plus an interactive HTML report. Standalone — does not read discovery.json, is not invoked by run-skill-insights, and does not feed into the main report.html.

Captures, per configured time window (default 7d / 30d / 90d):

  • Activationscli:agent-signals:skill-activation events keyed by (skillTile, skillName), with provider split (claude-code vs cursor-ide).
  • Untiled skills — same event with no skillTile (third-party / private SKILL.md).
  • Loaded skills — derived by unrolling the installedSkills[] array on activation events. Tells you who has each skill available, independent of activation. The same (tile, name) can appear at both project and global scope.
  • Tessl MCP tool activationscli:agent-signals:mcp-tool-activation per tool (query_library_docs, search, install, etc.).
  • Session aggregates — sums of totalMessages, totalSkillCalls, tesslSkillCalls, tesslMcpCalls, tesslToolCalls from cli:agent-signals:session-processed events.
  • Per-repo views (1.4+) — the same data sliced by properties.gitRepo (repos[] plus tiles_by_repo[], skills_by_repo[], untiled_skills_by_repo[], mcp_tools_by_repo[], session_aggregates_by_repo). Capped at top N repos (default 200, --top-repos) by primary-window activations. Powers the report's chip filter, which lets the reader untick repos client-side without re-querying PostHog. Events with no gitRepo are excluded from these views; their volume remains visible via filter.events_per_window.*.events_no_gitrepo and they still feed the all-repos totals.

"Org" is defined by two filters that combine with OR — an event passes if either matches:

  • --filter-repos (default github.com/tesslio) — properties.gitRepo prefix allowlist.
  • --filter-email-domains (default tessl.io) — person.properties.email @<domain> suffix allowlist.

Pass "" to disable either half. Disabling both pulls every event in the project. The output's filter.events_per_window block reports per-window matched/excluded counts and a per-source split (matched-by-repo / matched-by-email) so coverage is visible.

The output is raw counts only — no shelf_warmer / silent / active buckets, no conversion ratios, no warnings list of judgement calls. Cross-reference and value judgements happen downstream. See references/schemas/org-usage.schema.json.

Run it standalone:

python3 skills/posthog-skill-query/scripts/fetch_org_usage.py \
  --output /tmp/org_usage.json
python3 skills/posthog-skill-query/scripts/render_org_usage.py \
  --input  /tmp/org_usage.json \
  --output /tmp/org_usage.html

Render

The orchestrator inlines all five JSON files into references/report-template.html (placeholders <!--@DISCOVERY_DATA@-->, <!--@STALENESS_DATA@-->, <!--@QUALITY_DATA@-->, <!--@DUPLICATES_DATA@-->, <!--@REGISTRY_SEARCH_DATA@-->) and writes report.html. Self-contained — no external CSS/JS beyond the Atkinson Hyperlegible web font.

The report is laid out top-down:

  • Hero — repo/workspace name and skill count.
  • Health overview — three cards (Quality / Staleness / Estate) with stacked-bar distributions for verdict/bucket counts and a key-value summary of estate counts. Each card click-throughs to its detail section.
  • Top issues — three columns surfacing the worst offenders: top 5 stale skills, lowest 5 quality skills, biggest duplicate clusters. Skill rows open the per-skill drawer; cluster rows jump to the duplicates section.
  • Recently changed — last 8 modified skills with author + relative date, derived from git_provenance. Click to open drawer.
  • Tessl tiles — table with Tier · Source · Version · Security · Uplift · Outdated · Context cost · Skills columns, grouped by location prefix (tiles/, apps/, packages/, research/, .tessl/tiles/, …). The Source badge marks how the tile was discovered (tessl.json declaration vs. found by walking the SKILL.md tree). Click a row for the full tile drawer (registry signals, eval breakdown, declarations, per-skill context cost).
  • Manifests — every tessl.json the scan found, grouped by repo. Each row shows path, total/resolved/unresolved dependency counts, and the tiles each manifest declares (clickable → tile drawer).
  • All skills — full inventory with Quality / Staleness / Dup columns and a tier indicator. Click a row for the per-skill drawer.
  • Per-skill drawer sections: Metadata, Declared in (which tessl.jsons depend on it), Paths, Frontmatter, Body preview, Supporting files, Bundled directories, Quality (dual-judge dimension scores + suggestions), Staleness (factors + broken refs), Provenance (created_by, last_modified_by, contributors, recent commits), Duplicate cluster membership.
  • Quality, Staleness, Duplicates sections — estate summaries with rollups and worst-offender tables.
  • Registry suggestions — per-standalone-skill table of the highest-scoring registry hit (skill or tile), sorted by aggregate score, with eval / quality / security columns. Empty when no standalone skills are present or when the phase didn't run.
  • Repos, Breakdown, Warnings, Methodology sections — context.

Phase independence

Each analytical phase reads only discovery.json. They can be run in any order, or one at a time:

python3 skills/analyze-skill-staleness/scripts/analyze_staleness.py \
  --discovery /path/to/.skill-insights/discovery.json

python3 skills/analyze-skill-quality/scripts/analyze_quality.py \
  --discovery /path/to/.skill-insights/discovery.json --max-skills 5

python3 skills/detect-skill-duplicates/scripts/prepare_duplicates.py \
  --discovery /path/to/.skill-insights/discovery.json

python3 skills/registry-search/scripts/registry_search.py \
  --discovery /path/to/.skill-insights/discovery.json

Useful for iterating on one phase without re-running the others.

Schemas

Every phase output (and every cross-phase intermediate) is described by a JSON Schema in references/schemas/. Each phase script validates its inputs and outputs against the relevant schema at the IO boundary — bad shape aborts the run before downstream phases get to read it.

FileSchemaContract
discovery.json1.4[discovery.schema.json](references/schemas/discovery.schema.json)
staleness.json1.1[staleness.schema.json](references/schemas/staleness.schema.json) (1.1 adds git_provenance per skill: created_by, last_modified_by, contributors, recent_commits)
quality.json2.0[quality.schema.json](references/schemas/quality.schema.json)
duplicates.json1.0[duplicates.schema.json](references/schemas/duplicates.schema.json)
registry-search.json1.2[registry-search.schema.json](references/schemas/registry-search.schema.json)
duplicates-prompts/index.json1.0[duplicates-prompts-index.schema.json](references/schemas/duplicates-prompts-index.schema.json) (handoff between prep and judge subagents)
duplicates-verdicts/<n>.json[duplicate-verdict.schema.json](references/schemas/duplicate-verdict.schema.json) (one file per subagent verdict)
org_usage.json1.4[org-usage.schema.json](references/schemas/org-usage.schema.json) (standalone sibling — produced by posthog-skill-query, not consumed by the main render step)

Validation is best-effort: scripts try to import jsonschema (recommended: pip install jsonschema), and if it isn't installed they print a single warning and skip validation, preserving the stdlib-only fallback. With jsonschema available, output validation runs strictly (a malformed output aborts the script with exit 2 and a paginated error report). Per-pair duplicate verdicts validate non-strictly — a bad verdict shape from one subagent is recorded in metadata.failed_pairs[] and the rest of the run continues.

Requirements

RequirementWhyWhat happens if missing
Python 3Run discovery + analytical scriptsHard dep
git on PATHBroken-ref detection, mtime trackingFalls back to filesystem walk; no broken-ref signal
tessl CLI on PATHOutdated check, tile-lint context cost, skill review (quality phase)Those enrichment fields will be missing; fall back to local-only signals
~/.tessl/api-credentials.jsonRegistry enrichment per tilepublished_to_registry: null, no security/uplift signals from registry
jsonschema Python packageIO contract validation between phases (recommended)One stderr warning per run; pipeline runs without validation
~/.tessl/posthog/personal-api-key or $POSTHOG_PERSONAL_API_KEYposthog-skill-query only — pulls org-wide telemetry from PostHog project 57574The standalone PostHog phase exits with a clear error. The main pipeline still runs unaffected.

The pipeline degrades gracefully — every step is best-effort and missing capabilities surface as null in the output rather than aborting the scan.

Assumptions

  • A "skill" is a directory containing a SKILL.md file. Nothing else.
  • Git repositories are the natural unit of scanning.
  • Vendored Tessl install copies (.tessl/, .claude/skills/tessl__*) are not first-party — .tessl/ SKILL.md files are excluded from skills[] entirely (they surface via tiles[] with source: "tessl_json"), and .claude/skills/tessl__* symlinks are deduped against their canonical source.
  • Cross-repo skill dedup is NOT applied (intentional — same tile in two repos is two real install points).

What we don't do (scope guard)

  • AGENTS.md, CLAUDE.md, .cursor/rules/*, MCP configs — Project Insights territory.
  • Global user-level skills (~/.claude/skills/) — scope is the target directory.
  • Run new evals (tessl eval run) — we read existing eval results from the registry, but don't trigger new ones.
  • Cross-reference between repo-local discovery and PostHog org usage — posthog-skill-query writes a sibling JSON; an explicit cross-reference step that joins them with discovery.json / staleness.json / quality.json / duplicates.json / registry-search.json is the obvious next layer but isn't built yet.

Future phases

  • Cross-reference — join org_usage.json against discovery.json + the per-repo phase outputs to produce per-skill verdicts (only_local, recommend_install, cross_team_skill, etc.) and a unified report covering both per-repo and org-wide signals. The data inputs all exist; this is purely the join + render layer.
  • Activation gap closure — patch upstream m/agent-signals/normalizers/claude-code.ts so it also synthesises Skill events from raw Read of SKILL.md and from older <command-name> slash-command flows. Today's PostHog data systematically under-counts those paths; closing the gap improves posthog-skill-query's coverage materially.
  • Behaviour analysis — load-bearing skills surfaced from agent conversation logs (local files, not PostHog). Applies to all skills regardless of source.
  • Model regression — re-run evals on a new model and diff against the prior baseline. Tile-only.
  • Coverage gaps — workflows in agent logs without a matching skill.
  • Per-skill tessl skill review --optimize — trigger improvement runs for the worst-scoring skills (currently we just review, not optimize).

Repo-scoped phases slot in alongside the existing four (parallel after discovery, before render). Org-scoped phases (like posthog-skill-query) are siblings — they don't go through the orchestrator.

Workspace
tessleng
Visibility
Public
Created
Last updated
Publish Source
CLI
Badge
tessleng/skill-insights badge