langfuse-observability

LLM observability with Langfuse — query traces, generations, costs, metrics, and debug LLM pipelines via the REST API

Quality

73%

Does it follow best practices?

Impact

—

No eval scenarios have been run

Securityby

Advisory

Suggest reviewing before use

Optimize this skill with Tessl

npx tessl skill review --optimize ./langfuse-observability/SKILL.md

Quality

Discovery

82%

Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.

This is a strong description with excellent specificity and distinctiveness, clearly naming the tool (Langfuse) and listing concrete capabilities like querying traces, generations, costs, and metrics. The main weakness is the absence of an explicit 'Use when...' clause, which would help Claude know exactly when to select this skill over others.

Suggestions

Add an explicit 'Use when...' clause, e.g., 'Use when the user asks about Langfuse, LLM tracing, monitoring LLM costs, or debugging LLM pipeline issues.'

Dimension	Reasoning	Score
Specificity	Lists multiple specific concrete actions: 'query traces, generations, costs, metrics, and debug LLM pipelines via the REST API'. These are concrete, actionable capabilities rather than vague language.	3 / 3
Completeness	Clearly answers 'what' (query traces, generations, costs, metrics, debug LLM pipelines via REST API) but lacks an explicit 'Use when...' clause or equivalent trigger guidance, which caps this at 2 per the rubric.	2 / 3
Trigger Term Quality	Includes strong natural keywords users would say: 'Langfuse', 'traces', 'generations', 'costs', 'metrics', 'LLM pipelines', 'REST API', 'observability', 'debug'. These cover the domain well and match how users naturally refer to LLM observability tasks.	3 / 3
Distinctiveness Conflict Risk	Highly distinctive — 'Langfuse' is a specific tool, and the combination of LLM observability, traces, generations, and REST API creates a clear niche that is unlikely to conflict with other skills.	3 / 3
	Total	11 / 12 Passed

Implementation

64%

Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.

This is a solid reference-style skill with highly actionable, copy-paste-ready curl commands covering the Langfuse API comprehensively. Its main weaknesses are the repetitive auth boilerplate across every example, the inclusion of deployment/integration content that dilutes the core observability querying focus, and the lack of a guided debugging workflow that sequences the queries into a coherent investigation process.

Suggestions

Define the curl auth pattern once as a shell variable or alias (e.g., `LANGFUSE_CURL="curl -s -u $LANGFUSE_PUBLIC_KEY:$LANGFUSE_SECRET_KEY $LANGFUSE_HOST/api/public"`) and reuse it to reduce repetition significantly.

Move Docker deployment and OpenRouter integration into separate referenced files (e.g., DEPLOYMENT.md, INTEGRATIONS.md) to keep the main skill focused on observability queries.

Add a 'Debugging workflow' section that sequences queries into a coherent investigation flow: e.g., 'Check errors → find trace → inspect generations → review input/output → check scores'.

Dimension	Reasoning	Score
Conciseness	The skill is mostly efficient with executable examples, but there's significant repetition of the curl auth pattern across every single example. The Docker deployment and OpenRouter integration sections add bulk that may not be core to the observability querying skill. Some sections like the tips could be tighter.	2 / 3
Actionability	Every section provides fully executable curl commands with jq filters that are copy-paste ready (after substituting placeholders). The commands cover a comprehensive range of API endpoints with specific query parameters and output formatting.	3 / 3
Workflow Clarity	The skill is organized as a reference catalog of API queries rather than a multi-step workflow. While individual queries are clear, there's no guidance on debugging workflows (e.g., 'start with traces, drill into generations, check errors') and no validation steps for operations like deployment. The Docker section lacks any verification steps after deployment.	2 / 3
Progressive Disclosure	The content is a monolithic file with no references to supporting files. At ~200 lines, the Docker deployment config and OpenRouter integration could be split into separate files. The structure uses headers well for navigation, but the document tries to cover too many concerns (querying, deployment, integration) in one file.	2 / 3
	Total	9 / 12 Passed

Validation

100%

Warnings & errors only

Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.

Validation — 11 / 11 Passed

Validation for skill structure

No warnings or errors.

Repository: ddnetters/homelab-agent-skills
Commit: 808c382

Reviewed: 27 days ago

Table of Contents

Discovery Implementation Validation

Is this your skill?

If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.