CtrlK
BlogDocsLog inGet started
Tessl Logo

promptfoo-evaluation

Configures and runs LLM evaluation using Promptfoo framework. Use when setting up prompt testing, creating evaluation configs (promptfooconfig.yaml), writing Python custom assertions, implementing llm-rubric for LLM-as-judge, or managing few-shot examples in prompts. Triggers on keywords like "promptfoo", "eval", "LLM evaluation", "prompt testing", or "model comparison".

91

1.59x
Quality

88%

Does it follow best practices?

Impact

97%

1.59x

Average score across 3 eval scenarios

SecuritybySnyk

Advisory

Suggest reviewing before use

SKILL.md
Quality
Evals
Security

Evaluation results

94%

51%

Chatbot Quality Evaluation via Corporate API Gateway

Relay API configuration

Criteria
Without context
With context

apiBaseUrl placement

100%

100%

maxConcurrency location

0%

100%

maxConcurrency value

0%

100%

llm-rubric provider config

100%

100%

llm-rubric apiBaseUrl repeated

100%

100%

Anthropic provider ID format

0%

100%

Schema comment present

0%

100%

outputPath defined

0%

0%

file:// path usage

62%

100%

llm-rubric threshold set

0%

100%

Provider label present

0%

100%

ANTHROPIC_API_KEY note

100%

100%

98%

30%

Automated Quality Checks for an HTML-Formatted Summarization Model

Python custom assertions

Criteria
Without context
With context

Default function name

100%

100%

Named function reference

0%

100%

Return dict format

100%

100%

Reason field returned

100%

100%

named_scores included

100%

100%

Context vars access pattern

100%

100%

HTML stripping present

80%

100%

file:// path for assertions

100%

100%

file:// relative to config root

100%

100%

PROMPTFOO_PYTHON note

0%

66%

Schema comment present

0%

100%

Standard directory structure

0%

100%

100%

28%

Multilingual Translation Evaluation Pipeline

Few-shot setup and echo preview

Criteria
Without context
With context

Chat JSON format

100%

100%

Assistant turn in prompt

100%

100%

1-3 few-shot examples

100%

100%

Examples from files

100%

100%

Echo provider config

100%

100%

OpenAI provider ID format

0%

100%

max_tokens set high

100%

100%

outputPath defined

0%

100%

llm-rubric with threshold

50%

100%

Schema comment present

0%

100%

Standard directory layout

50%

100%

maxConcurrency under commandLineOptions

100%

100%

Repository
daymade/claude-code-skills
Evaluated
Agent
Claude Code
Model
Claude Sonnet 4.6

Table of Contents

Is this your skill?

If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.