CtrlK
BlogDocsLog inGet started
Tessl Logo

tessl/review-plugin-creator

Create custom Tessl reviewer plugins – fork the default rubric, build one from scratch, or derive its rubrics from evidence (existing skills, PR review feedback, agent logs). Scaffolds the plugin directory structure, authors rubrics and config.json, and validates the result with tessl review run.

97

1.15x
Quality

96%

Does it follow best practices?

Impact

98%

1.15x

Average score across 6 eval scenarios

SecuritybySnyk

Advisory

Suggest reviewing before use

Overview
Quality
Evals
Security
Files

Evaluation results

100%

15%

Build a Security Hygiene Reviewer Plugin

From-scratch security reviewer plugin creation

Criteria
Without context
With context

plugin.json name field

100%

100%

plugin.json version field

100%

100%

plugin.json description field

100%

100%

plugin.json private field

0%

100%

plugin.json skills field

0%

100%

Three schema files present

44%

100%

Rubric evaluation_target

100%

100%

Rubric scale

100%

100%

Rubric reference_examples

100%

100%

Dimension weights sum

100%

100%

Dimension required fields

100%

100%

config.json judge key matches rubric stem

100%

100%

config.json weight invariant

100%

100%

validation_weight is 0.0

100%

100%

SKILL.md present

60%

100%

100%

10%

Code Quality Reviewer Plugin

Fork default rubric and add domain-specific judge

Criteria
Without context
With context

Fork approach used

100%

100%

New judge file exists

100%

100%

Judge key matches filename

100%

100%

All three judges registered

100%

100%

Plugin-level weight invariant

100%

100%

New judge weight ~30%

100%

100%

Rubric dimension weights

100%

100%

New rubric required fields

100%

100%

Standard scale used

100%

100%

Dimension id snake_case

100%

100%

plugin.json present

0%

100%

Correct directory scaffold

0%

100%

100%

27%

Debug a Broken Reviewer Plugin

Fix broken reviewer plugin config

Criteria
Without context
With context

Config weights sum to 1.0

0%

100%

Judge key matches rubric filename

100%

100%

Rubric dimension weights sum to 1.0

100%

100%

Config weight error documented

50%

100%

Key mismatch error documented

100%

100%

Dimension weight error documented

100%

100%

fix-log.md corrections described

80%

100%

100%

90%

API Documentation Reviewer Plugin

Three-judge API docs reviewer plugin from scratch

Criteria
Without context
With context

Three rubric files

0%

100%

Config judge key match

0%

100%

Weight invariant

0%

100%

Small validation weight

0%

100%

plugin.json name field

0%

100%

plugin.json version

0%

100%

plugin.json private

0%

100%

plugin.json skills path

0%

100%

Schemas copied

0%

100%

Standard scale

0%

100%

Dimension id snake_case

0%

100%

Rubric dimension weights sum

100%

100%

Required rubric fields

0%

100%

Dimension score entries

0%

100%

100%

37%

Derive a review-plugin rubric from gathered evidence

Derive a review-plugin rubric from gathered evidence (agent logs present)

Criteria
Without context
With context

Produces a design file only

100%

100%

Trigger-clarity dimension on description

0%

100%

Trigger-clarity anchored in the real failed description

0%

100%

Enforceability dimension on content

83%

100%

Content judge carries scope and scoring_notes

100%

100%

Dimension for the recurring api:sync feedback

100%

100%

api:sync anchors drawn from the PR evidence

100%

100%

Verifier-type finding routed out of the rubric

0%

100%

Dimension weights sum to 1.0 per rubric

100%

100%

Plugin-level weight split sums to 1.0

50%

100%

Weights reflect frequency and severity

100%

100%

88%

-12%

Rubric Design from PR Feedback (No Agent Logs Available)

Derive rubrics from evidence, cloud sandbox (no agent logs)

Criteria
Without context
With context

Acknowledges no log evidence

100%

100%

No fabricated log-based activation evidence

100%

100%

Judges with evaluation_target

100%

100%

Verifier classification present

100%

100%

Dimension vs verifier reasoning

100%

100%

Anchors from actual feedback

100%

100%

Frequency/severity drives weights

100%

100%

Weights sum to 1.0 per judge

100%

100%

Handoff note to create-review-plugin

100%

100%

No scaffolding actions taken

100%

0%

Evaluated
Agent
Claude Code
Model
Claude Sonnet 4.6

Table of Contents