LLM observability with Langfuse — query traces, generations, costs, metrics, and debug LLM pipelines via the REST API
79
73%
Does it follow best practices?
Impact
Pending
No eval scenarios have been run
Advisory
Suggest reviewing before use
Optimize this skill with Tessl
npx tessl skill review --optimize ./langfuse-observability/SKILL.mdQuality
Discovery
67%Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.
The description effectively communicates specific capabilities within the Langfuse LLM observability domain and is highly distinctive. However, it lacks an explicit 'Use when...' clause and could benefit from additional natural trigger terms that users might say when needing this skill.
Suggestions
Add a 'Use when...' clause with trigger phrases like 'Use when debugging LLM applications, analyzing token costs, or investigating trace data in Langfuse'
Include additional natural keywords users might say such as 'monitoring', 'token usage', 'latency tracking', or 'LLM debugging'
| Dimension | Reasoning | Score |
|---|---|---|
Specificity | Lists multiple specific concrete actions: 'query traces, generations, costs, metrics, and debug LLM pipelines'. These are distinct, actionable capabilities within the Langfuse domain. | 3 / 3 |
Completeness | Clearly answers 'what' (query traces, generations, costs, metrics, debug LLM pipelines via REST API) but lacks an explicit 'Use when...' clause to indicate when Claude should select this skill. | 2 / 3 |
Trigger Term Quality | Includes relevant technical terms like 'Langfuse', 'traces', 'generations', 'LLM pipelines', 'REST API', but missing common user variations like 'observability platform', 'token usage', 'latency', or 'monitoring'. | 2 / 3 |
Distinctiveness Conflict Risk | Highly distinctive with 'Langfuse' as a specific product name and 'LLM observability' as a clear niche. Unlikely to conflict with other skills due to the specific tooling and domain focus. | 3 / 3 |
Total | 10 / 12 Passed |
Implementation
79%Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.
This is a strong, actionable skill with excellent executable examples covering the Langfuse API comprehensively. The main weaknesses are the lack of explicit debugging workflows (e.g., 'how to investigate a slow trace') and the monolithic structure that could benefit from splitting deployment/integration content into separate files.
Suggestions
Add a 'Debugging workflow' section showing how to sequence queries when investigating issues (e.g., find error -> get trace -> examine generations -> check scores)
Move Docker deployment and OpenRouter integration sections to separate reference files (e.g., DEPLOYMENT.md, INTEGRATIONS.md) with links from the main skill
| Dimension | Reasoning | Score |
|---|---|---|
Conciseness | The skill is lean and efficient, providing executable curl commands without explaining what Langfuse is beyond a single sentence. No unnecessary explanations of REST APIs, authentication, or JSON parsing. | 3 / 3 |
Actionability | Every section provides copy-paste ready curl commands with jq filters. The examples are complete and executable, covering setup, queries, filtering, and common use cases with specific endpoints and parameters. | 3 / 3 |
Workflow Clarity | While individual commands are clear, there's no explicit workflow for debugging LLM pipelines or investigating issues. The skill presents isolated queries without guidance on sequencing them for common debugging scenarios or validation steps. | 2 / 3 |
Progressive Disclosure | Content is well-organized with clear sections, but it's a long monolithic file. The Docker deployment and OpenRouter integration sections could be separate files, and there's no reference to external documentation for advanced topics. | 2 / 3 |
Total | 10 / 12 Passed |
Validation
100%Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.
Validation — 11 / 11 Passed
Validation for skill structure
No warnings or errors.
87d2278
Table of Contents
If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.