promptfoo-evaluation

Configures and runs LLM evaluation using Promptfoo framework. Use when setting up prompt testing, creating evaluation configs (promptfooconfig.yaml), writing Python custom assertions, implementing llm-rubric for LLM-as-judge, or managing few-shot examples in prompts. Triggers on keywords like "promptfoo", "eval", "LLM evaluation", "prompt testing", or "model comparison".

1.59x

Quality

82%

Does it follow best practices?

Impact

97%

1.59x

Average score across 3 eval scenarios

Securityby

Advisory

Suggest reviewing before use

Evaluation results

94%

51%

Chatbot Quality Evaluation via Corporate API Gateway

Relay API configuration

Criteria

Without context

With context

apiBaseUrl placement

100%

maxConcurrency location

100%

maxConcurrency value

100%

llm-rubric provider config

100%

llm-rubric apiBaseUrl repeated

100%

Anthropic provider ID format

100%

Schema comment present

100%

outputPath defined

file:// path usage

62%

100%

llm-rubric threshold set

100%

Provider label present

100%

ANTHROPIC_API_KEY note

100%

98%

30%

Automated Quality Checks for an HTML-Formatted Summarization Model

Python custom assertions

Criteria

Without context

With context

Default function name

100%

Named function reference

100%

Return dict format

100%

Reason field returned

100%

named_scores included

100%

Context vars access pattern

100%

HTML stripping present

80%

100%

file:// path for assertions

100%

file:// relative to config root

100%

PROMPTFOO_PYTHON note

66%

Schema comment present

100%

Standard directory structure

100%

28%

Multilingual Translation Evaluation Pipeline

Few-shot setup and echo preview

Criteria

Without context

With context

Chat JSON format

100%

Assistant turn in prompt

100%

1-3 few-shot examples

100%

Examples from files

100%

Echo provider config

100%

OpenAI provider ID format

100%

max_tokens set high

100%

outputPath defined

100%

llm-rubric with threshold

50%

100%

Schema comment present

100%

Standard directory layout

50%

100%

maxConcurrency under commandLineOptions

100%

Repository: daymade/claude-code-skills
Commit: bbf87f3

Evaluated: 3 months ago
Agent: Claude Code
Model: Claude Sonnet 4.6

Table of Contents

Chatbot Quality Evaluation via Corporate API Gateway Automated Quality Checks for an HTML-Formatted Summarization Model Multilingual Translation Evaluation Pipeline

Is this your skill?

If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.