CtrlK
BlogDocsLog inGet started
Tessl Logo

langfuse-core-workflow-a

Execute Langfuse primary workflow: Tracing LLM calls and spans. Use when implementing LLM tracing, building traced AI features, or adding observability to existing LLM applications. Trigger with phrases like "langfuse tracing", "trace LLM calls", "add langfuse to openai", "langfuse spans", "track llm requests".

80

Quality

77%

Does it follow best practices?

Impact

Pending

No eval scenarios have been run

SecuritybySnyk

Passed

No known issues

Optimize this skill with Tessl

npx tessl skill review --optimize ./plugins/saas-packs/langfuse-pack/skills/langfuse-core-workflow-a/SKILL.md
SKILL.md
Quality
Evals
Security

Quality

Discovery

89%

Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.

This is a well-structured description with strong trigger terms and explicit 'Use when' guidance, making it highly selectable and distinctive. Its main weakness is that the 'what' portion is somewhat thin — it mentions tracing LLM calls and spans but doesn't enumerate more specific capabilities (e.g., decorator-based tracing, score logging, session tracking). Overall it's a solid description that would perform well in a multi-skill selection scenario.

Suggestions

Expand the capability list beyond 'Tracing LLM calls and spans' to include more specific actions like 'instrument OpenAI/Anthropic calls with decorators, log scores and metadata, configure session tracking' to improve specificity.

DimensionReasoningScore

Specificity

Names the domain (Langfuse, LLM tracing) and a couple of actions ('Tracing LLM calls and spans'), but doesn't list multiple concrete actions like creating traces, adding span metadata, configuring decorators, or viewing dashboards.

2 / 3

Completeness

Clearly answers both 'what' (tracing LLM calls and spans with Langfuse) and 'when' (explicit 'Use when' clause plus a 'Trigger with phrases' section listing specific triggers).

3 / 3

Trigger Term Quality

Excellent coverage of natural trigger terms: 'langfuse tracing', 'trace LLM calls', 'add langfuse to openai', 'langfuse spans', 'track llm requests' — these are phrases users would naturally say when needing this skill.

3 / 3

Distinctiveness Conflict Risk

Highly distinctive — Langfuse is a specific tool, and the triggers are narrowly scoped to LLM tracing/observability with Langfuse. Very unlikely to conflict with other skills.

3 / 3

Total

11

/

12

Passed

Implementation

64%

Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.

This is a solid, highly actionable skill with excellent executable code examples covering multiple Langfuse tracing scenarios. Its main weaknesses are verbosity (including both v3 and v4 SDK patterns inline, plus multiple provider examples) and lack of validation checkpoints to verify traces are actually being captured. The content would benefit from splitting less common patterns into referenced files and adding verification steps.

Suggestions

Add a verification step after each major pattern (e.g., 'Check Langfuse dashboard for the trace, or call `await langfuse.flushAsync()` and verify 200 response') to catch silent failures.

Move the v3 legacy RAG pipeline (Step 3) and possibly the LangChain Python integration (Step 6) into separate referenced files to reduce the main skill's token footprint.

Remove or condense inline code comments that explain things Claude already knows (e.g., '// Every call captures: model, input, output, tokens, latency, cost').

DimensionReasoningScore

Conciseness

The skill is fairly comprehensive but includes both v3 and v4 SDK versions inline, which adds significant length. The code examples are useful but the v3 legacy section could be referenced externally rather than included in full. Some comments in code are slightly redundant for Claude.

2 / 3

Actionability

All code examples are fully executable TypeScript/Python with complete imports, concrete API calls, and realistic patterns. The examples cover multiple real scenarios (OpenAI wrapper, RAG pipeline, streaming, Anthropic, LangChain) with copy-paste ready code.

3 / 3

Workflow Clarity

The steps are clearly numbered and sequenced, but they represent independent integration patterns rather than a true sequential workflow. There are no validation checkpoints (e.g., verifying traces appear in Langfuse dashboard, checking flush succeeded) despite tracing being an operation where silent failures are common.

2 / 3

Progressive Disclosure

The skill includes a resources section with external links and references to related skills, but the body itself is quite long (~180 lines of code examples). The v3 legacy section and possibly the LangChain Python section could be split into separate reference files to keep the main skill leaner.

2 / 3

Total

9

/

12

Passed

Validation

81%

Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.

Validation9 / 11 Passed

Validation for skill structure

CriteriaDescriptionResult

allowed_tools_field

'allowed-tools' contains unusual tool name(s)

Warning

frontmatter_unknown_keys

Unknown frontmatter key(s) found; consider removing or moving to metadata

Warning

Total

9

/

11

Passed

Repository
jeremylongshore/claude-code-plugins-plus-skills
Reviewed

Table of Contents

Is this your skill?

If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.