CtrlK
BlogDocsLog inGet started
Tessl Logo

langfuse-observability

Set up comprehensive observability for Langfuse with metrics, dashboards, and alerts. Use when implementing monitoring for LLM operations, setting up dashboards, or configuring alerting for Langfuse integration health. Trigger with phrases like "langfuse monitoring", "langfuse metrics", "langfuse observability", "monitor langfuse", "langfuse alerts", "langfuse dashboard".

80

Quality

77%

Does it follow best practices?

Impact

Pending

No eval scenarios have been run

SecuritybySnyk

Passed

No known issues

Optimize this skill with Tessl

npx tessl skill review --optimize ./plugins/saas-packs/langfuse-pack/skills/langfuse-observability/SKILL.md
SKILL.md
Quality
Evals
Security

Quality

Discovery

89%

Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.

This is a well-structured skill description with strong trigger terms, clear 'when' guidance, and excellent distinctiveness due to the Langfuse-specific focus. Its main weakness is that the capability descriptions could be more concrete—listing specific tools, integrations, or detailed actions rather than high-level categories like 'metrics, dashboards, and alerts'.

Suggestions

Add more specific concrete actions, e.g., 'configure Prometheus/Grafana dashboards for Langfuse trace latency, token usage, and error rates' instead of the generic 'metrics, dashboards, and alerts'.

DimensionReasoningScore

Specificity

The description names the domain (Langfuse observability) and mentions some actions (metrics, dashboards, alerts), but doesn't list specific concrete actions like 'create Grafana dashboards', 'configure Prometheus exporters', or 'set up alerting rules for latency thresholds'. The actions remain somewhat high-level.

2 / 3

Completeness

Clearly answers both 'what' (set up comprehensive observability with metrics, dashboards, and alerts) and 'when' (implementing monitoring for LLM operations, setting up dashboards, configuring alerting for Langfuse integration health), with an explicit 'Use when' clause and a 'Trigger with phrases' section.

3 / 3

Trigger Term Quality

Excellent trigger term coverage with explicit natural phrases: 'langfuse monitoring', 'langfuse metrics', 'langfuse observability', 'monitor langfuse', 'langfuse alerts', 'langfuse dashboard'. These are terms users would naturally use, and the description also includes broader terms like 'LLM operations' and 'alerting'.

3 / 3

Distinctiveness Conflict Risk

Highly distinctive due to the specific focus on Langfuse, which is a niche LLM observability platform. The repeated use of 'Langfuse' as a qualifier makes it very unlikely to conflict with generic monitoring or dashboard skills.

3 / 3

Total

11

/

12

Passed

Implementation

64%

Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.

This is a highly actionable skill with excellent executable code examples covering the full observability stack from metrics definition through alerting. Its main weaknesses are the lack of validation checkpoints between infrastructure setup steps and the verbosity of having all code inline rather than referenced. The content would benefit from verification steps and better progressive disclosure to keep the main file focused.

Suggestions

Add validation checkpoints between steps, e.g., after Step 3: 'Verify: curl http://localhost:3000/metrics should return Prometheus-formatted output', and after Step 4: 'Verify: check Prometheus targets page shows llm-app as UP'.

Move the full tracedLLM wrapper and Grafana dashboard JSON into separate referenced files (e.g., examples/traced-llm.ts, dashboards/langfuse.json) and keep only concise summaries in the main skill.

Add a brief smoke-test or end-to-end verification step at the end confirming the full pipeline works (metrics flowing → Prometheus scraping → Grafana displaying → alerts firing on test data).

DimensionReasoningScore

Conciseness

The skill is fairly long with substantial inline code that could be referenced externally. Some sections like the Grafana dashboard JSON and the full instrumented wrapper are verbose but mostly earn their place as executable examples. The overview and prerequisites sections are lean, but overall the content could be tightened.

2 / 3

Actionability

Excellent actionability throughout — every step includes fully executable TypeScript code, complete YAML configs, and a ready-to-import Grafana dashboard JSON. The Prometheus scrape config, alert rules, and metrics endpoint are all copy-paste ready.

3 / 3

Workflow Clarity

Steps are clearly numbered and sequenced (Steps 1-6), but there are no validation checkpoints between steps. For an observability setup involving multiple infrastructure components (Prometheus, Grafana, alerting), there should be explicit verification steps like 'confirm metrics endpoint returns data' or 'verify Prometheus is scraping successfully' before proceeding.

2 / 3

Progressive Disclosure

The skill includes a Resources section with external links, and the error handling and metrics reference tables are well-organized. However, the large code blocks (especially the full instrumented LLM wrapper and Grafana dashboard JSON) could be split into referenced files, keeping the SKILL.md as a leaner overview.

2 / 3

Total

9

/

12

Passed

Validation

81%

Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.

Validation9 / 11 Passed

Validation for skill structure

CriteriaDescriptionResult

allowed_tools_field

'allowed-tools' contains unusual tool name(s)

Warning

frontmatter_unknown_keys

Unknown frontmatter key(s) found; consider removing or moving to metadata

Warning

Total

9

/

11

Passed

Repository
jeremylongshore/claude-code-plugins-plus-skills
Reviewed

Table of Contents

Is this your skill?

If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.