ARTICLE
The new Tessl review: now you decide what "good" looks like:
Discover the new Tessl review tool that lets you define 'good' for your skills, offering agentic accuracy and a sharable review history. Try it now!

Simon Maple

The new Tessl review: now you decide what "good" looks like:
For a while now Tessl has been able to review the quality of your skills straight out of the box. By simply running tessl skill review you get a score against Anthropic's best practices with no setup required. That is a sensible default and it has served most people well, but a default is still somebody else's opinion that you or your organisation might look at and disagree with.
Today we are launching a new version of Tessl’s review functionality. It does three new things: reviews your skills agentically with greater accuracy, and lets you define what good actually means for your skills, and keeps a sharable history of your skill review runs.
The problem with one definition of good
On one of my skills, the current review provides a quality score of 82%. The description review scores a perfect 100%, but the content section drops to 55%, with conciseness at 1 out of 3 and progressive disclosure at 1 out of 3.
In some people’s view, nothing is wrong with the skill, but the judge is marking it down for keeping one tight, self-contained skill rather than spreading it across five files. That is a reasonable position and it is Anthropic's position. But what if your org prefers larger, consolidated skills, in which case an 82 is punishing me for doing exactly what we want. Perhaps we even have further constraints which are being missed in my skill but completely being overlooked by the review and giving me a false sense of quality.
Here’s a video of the new Tessl review in action:
Offering a more accurate review
The new Tessl review is invoked using tessl review run from the CLI or via the agent (but make sure it’s calling the new version!) and you need to pass a workspace name where your review results will be stored.
One of the bigger changes is under the hood. Whereas the previous review used an LLM as a judge in a single pass, the new version uses an agent. It takes more turns, gathers more information about the skill and associated files and reaches a better more grounded verdict. You will still see some variation between runs, since an LLM judge is non-deterministic by it’s very nature, but the results are more accurate.
Defining what good skills look like for your organization
This is the exciting part that changes how reviews determine what’s right, as the new review allows you to pass your own rubric, as a plugin, and review against it.
We’ve made a plugin called review-plugin-creator that walks you through building a custom review plugin. This allows you to fork the Anthropic best practices if you only wish to change a few things, so everything sensible stays in place by default and you only change what you disagree with. In my case I flipped a single rule, the one that punishes consolidated skills.
The creator produces a plugin holding your guidelines and rubric. To reference it on a tessl review run, you can reference it locally in the file system, or link to a private or public plugin on the Tessl Registry.
Running the same skill again, this time with your rules, and you’ll see updated scores. In my case, the consolidated skill now scores full marks on conciseness and progressive disclosure, and the content section reflects what my org actually values rather than what a generic default assumes.
Seeing your reviews
Everything you see at the CLI is also on the Tessl Registry. Head to your workspace and you will find your review plugin alongside a full history of review runs. Each run shows the same breakdown you get in the terminal, plus the plugin that produced it, so you always know which definition of good a score was measured against.
In your workspace settings you can set a default review plugin. From then on every review run from that workspace uses it automatically. You can still override it per run with the --review-plugin flag whenever you need to.
The rest of the toolkit
A few more commands worth knowing:
tessl review list --workspace <workspace-name>lists every review run against a workspacetessl review view <review-id>opens a single run and shows its full output.tessl review fixis the new home for the--optimizebehaviour you already know from our previous review. It agentically applies fixes to the skill based on a review outcome and can update yourSKILL.mddirectly.
What does this mean for the old command?
tessl skill review is not going anywhere yet. We have deliberately left it in place so nothing breaks for anyone relying on it today, although you may see a deprecation message. That said, tessl review run is where all the work is going from here, so please move across and start using it, so you’re not caught out when we do turn off the older review feature. We’ll also be releasing updates to our GitHub actions soon to make use of the new tessl review functionality.
Try it now
The new Tessl review is live and you can use it today, do note that you’ll need a free account in order to use the Tessl review command (you can check the full documentation here. There is plenty more to come and we will keep you posted as it lands. For now, run it against your own skills, write a rubric that matches how your team actually thinks about quality, then tell us how it performs in your environment. Your feedback shapes what we build next.
Customise Tessl review: https://tessl.io/registry/tessl/review-plugin-creator
Learn more about Tessl: https://tessl.io
COPY & SHARE

Simon Maple
Simon Maple is Tessl’s Founding Developer Advocate, a Java Champion, and former DevRel leader at Snyk, ZeroTurnaround, and IBM.
READING
·
0%
IN THIS POST
COPY & SHARE

Simon Maple
Simon Maple is Tessl’s Founding Developer Advocate, a Java Champion, and former DevRel leader at Snyk, ZeroTurnaround, and IBM.
YOUR NEXT READ
Same quality, a quarter of the cost: Should DeepSeek Flash be your model of choice?
DeepSeek Flash offers comparable quality to pricier models at a fraction of the cost, making it a cost-effective choice for running agentic tasks at scale.


Rob Willoughby, Simon Maple



