CtrlK
BlogDocsLog inGet started
Tessl Logo

langfuse-cost-tuning

Monitor and optimize LLM costs using Langfuse analytics and dashboards. Use when tracking LLM spending, identifying cost anomalies, or implementing cost controls for AI applications. Trigger with phrases like "langfuse costs", "LLM spending", "track AI costs", "langfuse token usage", "optimize LLM budget".

64

Quality

77%

Does it follow best practices?

Impact

No eval scenarios have been run

SecuritybySnyk

Passed

No known issues

Optimize this skill with Tessl

npx tessl skill review --optimize ./plugins/saas-packs/langfuse-pack/skills/langfuse-cost-tuning/SKILL.md
SKILL.md
Quality
Evals
Security

Quality

Discovery

89%

Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.

This is a well-structured skill description that excels in completeness and distinctiveness by clearly specifying both what the skill does and when to use it, with explicit trigger phrases tied to the Langfuse + LLM cost niche. The main weakness is that the specific capabilities could be more concrete — listing particular actions like 'generate cost breakdowns by model', 'set budget alerts', or 'analyze token usage trends' would strengthen the specificity dimension.

Suggestions

Add more concrete actions beyond 'monitor and optimize' — e.g., 'generate cost breakdowns by model, set budget alerts, analyze token usage trends, compare spending across projects'.

DimensionReasoningScore

Specificity

Names the domain (LLM cost monitoring via Langfuse) and mentions some actions like 'monitor', 'optimize', 'tracking', 'identifying cost anomalies', 'implementing cost controls', but doesn't list multiple concrete specific actions (e.g., creating dashboards, setting alerts, generating reports, comparing model costs).

2 / 3

Completeness

Clearly answers both 'what' (monitor and optimize LLM costs using Langfuse analytics and dashboards) and 'when' (tracking LLM spending, identifying cost anomalies, implementing cost controls) with explicit trigger phrases provided.

3 / 3

Trigger Term Quality

Excellent coverage of natural trigger terms including 'langfuse costs', 'LLM spending', 'track AI costs', 'langfuse token usage', 'optimize LLM budget' — these are phrases users would naturally say. Also includes relevant keywords like 'cost anomalies', 'cost controls', and 'analytics'.

3 / 3

Distinctiveness Conflict Risk

Highly distinctive due to the specific combination of Langfuse + LLM cost monitoring. The trigger terms are niche enough ('langfuse costs', 'langfuse token usage') that this is unlikely to conflict with general cost analysis or other observability skills.

3 / 3

Total

11

/

12

Passed

Implementation

64%

Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.

This is a solid, actionable skill with excellent executable code examples covering the full cost monitoring lifecycle. Its main weaknesses are moderate verbosity (some explanatory sections that Claude doesn't need) and lack of explicit validation checkpoints between workflow steps. The content would benefit from splitting longer code examples into bundle files and adding verification steps.

Suggestions

Add validation checkpoints between steps, e.g., 'Verify token capture: check a trace in the Langfuse UI or query a single trace via API to confirm usage fields are populated before proceeding to Step 2.'

Move the longer code blocks (cost report, model routing, budget alerts) into separate bundle files and reference them from SKILL.md to improve progressive disclosure and reduce inline bulk.

Remove the 'How Langfuse Tracks Costs' explanatory section and the 'Dashboard Features' section—condense these into 1-2 lines each, as they describe concepts rather than providing actionable instructions.

DimensionReasoningScore

Conciseness

The skill includes some unnecessary explanations (e.g., 'Understanding of LLM pricing models' prerequisite, the 'How Langfuse Tracks Costs' section explaining basics Claude would know). The code examples are substantial but justified given the complexity. The Dashboard Features section is somewhat filler since it just describes UI features without actionable guidance.

2 / 3

Actionability

All four steps provide fully executable TypeScript code with concrete examples—token capture, cost querying, model routing, and budget alerts. The code is copy-paste ready with real model names, pricing figures, and API calls. The cost optimization strategies table and error handling table add practical, specific guidance.

3 / 3

Workflow Clarity

The four steps are clearly sequenced and logically ordered (capture → query → optimize → alert). However, there are no explicit validation checkpoints—for example, no step to verify that token usage is actually being captured before proceeding to query costs, and no feedback loop for when cost data is missing or incorrect despite the error handling table.

2 / 3

Progressive Disclosure

The content is well-structured with clear sections and headers, and includes external resource links. However, at ~200 lines with extensive inline code, the model routing and budget alert scripts could be split into separate referenced files. No bundle files exist to offload this content, making the SKILL.md heavier than ideal.

2 / 3

Total

9

/

12

Passed

Validation

81%

Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.

Validation9 / 11 Passed

Validation for skill structure

CriteriaDescriptionResult

allowed_tools_field

'allowed-tools' contains unusual tool name(s)

Warning

frontmatter_unknown_keys

Unknown frontmatter key(s) found; consider removing or moving to metadata

Warning

Total

9

/

11

Passed

Repository
jeremylongshore/claude-code-plugins-plus-skills
Reviewed

Table of Contents

Is this your skill?

If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.