Skills on Tessl: a developer-grade package manager for agent skillsLearn more
Logo
Back to articlesHow Tessl improved agent accuracy by 1.63× on the Bowser npm library

26 Jan 20266 minute read

Shawn Miller

Technology enthusiast.

TL;DR

Coding agents struggle most with how to use libraries, not whether they can write code at all. In this post, we evaluated an agent using the bowser@2.12.0 npm library, with and without a Tessl tile. Without structured context, the agent succeeds 57% of the time; with the tile, success rises to 93% (1.63×), highlighting how explicit, versioned context improves real-world API usage.

Agents, the problem is real and quantifiable…

Today, agents are doing incredible things, having trained on a wide variety of documentation and code at certain points in time. Agents can hallucinate APIs, miss idioms and incorrectly use libraries, regardless if they are open source or private libraries. Private libraries are even more problematic as they were not in the training set, which hinders the move to agentic workflows; their agents will not be aware of their code or how to interact with it.

To illustrate the problems agents face, let us consider NPM’s Bowser library, a popular browser/platform/engine detector that receives millions of downloads a week. We’ll look at an agent’s use of this library with and without assistance using version 2.12, which came after the coding agents training data was generated. To test, we’ll use Tessl’s automated pipeline, automatically generating scenarios with testing criteria.

By examining the agent’s attempts on these scenarios, and criteria within each one, we see that an agent struggles with actions like browser name & version extraction because it is missing the additional context of the newer version.

The eval screen presents criteria like "parser initialization" and "browser name extraction" and in this case, displays a failed status for the results.
The eval screen presents criteria like "parser initialization" and "browser name extraction" and in this case, displays a failed status for the results.

Scenario 8: Bowser demonstrates a scenario, where the agent failed most of the criteria because it did not have the necessary context.

In fact, we see 43% of the tests failed when the agent tries to use this library by itself.

The result shows 57% success rate without this tile
The result shows 57% success rate without this tile.

Tessl’s Tiles and Registry

At Tessl, we address this with tiles: reusable, structured context for agents. Using Tessl’s MCP tools, agents can search for and install tiles from the Tessl registry corresponding to open source libraries, private modules, company-wide policies and more. In the Bowser example, the agent can download the correctly versioned corresponding tile, v2.12, to get thorough instructions on how to use the library — including any recent updates.

Here we see an example of a tile with examples and code documentation generated by the Tessl pipeline:

A tile can consist of documentation with code examples and descriptions.
A tile can consist of documentation with code examples and descriptions.

Tessl provides a web interface, or the CLI can be used to search and review tiles manually.

When the same agent uses a tile, with the context provided by Tessl, success skyrockets! In the same scenario noted above, the tests all pass when Tessl context is available to the agent:

A scenario contains critera, and pass/fail information with and without Tessl
A scenario contains critera, and pass/fail information with and without Tessl.

Looking across all the scenarios, you will note that almost half, 43% failed during the analysis when an agent was on its own. The agent only succeeded on 57% of the criteria from the 50+ criteria over ten scenarios that were presented. However, when Tessl provided Tessl’s context to the agent, this changes to a 93% success rate for a 1.63x improvement:

The tile evaluation will include information such as testing of the tile with/without tessl and indicate level of improvement.
The tile evaluation will include information such as testing of the tile with/without tessl and indicate level of improvement.

The Tessl web interface allows users to see eval data and level of improvement from the tile details screen.

Use Tessl on your code

With the release of Generating tiles from repositories, open source maintainers and developers writing private libraries can generate tiles for the code they maintain, generate evals, and then determine if they want to publish privately or publicly to their workspace!

To get started, sign up for free, connect Tessl to your code repository, and generate tiles for your libraries.

Navigate to the Tiles menu on the left and click Generate tile
Navigate to the Tiles menu on the left and click Generate tile.

Once a draft is generated , you can examine the eval data by clicking the tile name:

Eval data is displayed for the draft.
Eval data is displayed for the draft.

Clicking Details next to each scenario, you can even examine the criteria the scenario is using to test:

Clicking "Details" will result in displaying the detailed scenario being evaluated.
Clicking "Details" will result in displaying the detailed scenario being evaluated.

After examining the test data, you can publish the tile publicly or privately to your workspace.

Click Publish next to the tile of interest.
Click Publish next to the tile of interest.

See for yourself!

We’ve published these evals across ~300 public tiles in the Registry, covering packages such as mathjs from npm. We’re sharing them so teams can track the impact of changes over time, including model upgrades, version updates, and updates to the tiles themselves. Plus — we’ll be adding more and more public evals over time.

You can enable Tessl on open source projects you maintain or on private libraries you are developing, by configuring Tessl with your repositories and generating tiles. To get started, sign up for free, connect Tessl to your code repository, and generate tiles for your libraries.

Join Our Newsletter

Be the first to hear about events, news and product updates from AI Native Dev.