The Tessl Registry now has security scores, powered by SnykLearn more
Logo
Back to articlesThe infrastructure gap: what we heard at AI Engineer Europe

14 Apr 20266 minute read

Jordan Sanders

Jordan Sanders

Marketing Manager for EMEA at Skimlinks and Connexity, driving campaigns that help publishers and retailers grow affiliate revenue through content, events, and data-led marketing.

Last week we were at AI Engineer Europe, spending a few days in rooms full of engineers who are actively building with agents — not experimenting, but shipping things to production and depending on them day to day.

image

The conversations felt different from a year ago, when most conference talk was still exploratory: should we use agents? where do they fit? Those questions are behind most of the teams we spoke to. The harder question now — and the one nobody seems to have a clean answer to — is how you actually manage this stuff once it's running.

Three themes came up again and again. They aren't independent problems; they tend to reflect the same underlying gap.

Adoption outpaced infrastructure

We kept hearing about agent setups that had grown organically, without any real governance structure. Teams with dozens of skills deployed across an org, no clear ownership, no audit trail, no reliable way to know whether a given skill was still doing what anyone thought it was doing.

"It's purely vibes-based," one engineer told us, without any apparent embarrassment. He wasn't being self-deprecating — he was describing a state of affairs that everyone in earshot immediately recognised.

The parallel to early package management is hard to miss. Before npm or pip, developers copied libraries around by hand, had no concept of versioning, and discovered breakages in production. The tooling eventually caught up. We're at that moment for agent skills — except the teams feeling it most acutely aren't the cautious ones. They're the teams who moved fast, who are already operationally dependent on agents, and are now feeling the absence of any systems underneath them.

Nobody can answer "does this skill actually work?"

The conversation that generated the most energy, consistently, was around evals — not because teams had solved it, but because nobody had and everyone felt the pain. The questions were specific: how do you test a skill across different models? How do you run realistic multi-turn scenarios? How do you catch regressions when you update a skill? How do you know which of your 35 skills are pulling their weight?

Right now, most teams can't answer any of those. There's no standardised unit test for a skill. You ship something, watch it for a while, adjust. That's the industry state of the art.

The absence of evaluation creates a trust problem as much as a quality one. Without the ability to measure whether something works, you can't improve it systematically, justify expanding it, or explain to the rest of the organisation why it should be relied on. Evals are the foundation for optimisation, governance, and confidence — which is probably why the conversations around them felt less like feature discussions and more like relief at finally naming something that had been sitting unnamed.

Enterprise is asking the control questions

The third theme was less about craft and more about risk. From teams inside larger organisations especially, we kept hearing a version of: who decides what gets installed, and who's responsible when something goes wrong?

"Who can install what?" "Can we vet skills before they go into production?" "Can we turn things off?" These questions weren't hypothetical — they were coming from teams where agents are already running in workflows that carry real consequences.

The npm comparison came up here too, but with a different edge. Engineers who remember left-pad or more recent supply chain incidents are pattern-matching hard. Skills run with significant access, and the appetite for an open, unvetted ecosystem is lower than it was for packages. What we heard is that governance is quietly shifting from a nice-to-have into a buying requirement. The orgs that can answer the control questions will expand further; the ones that can't will hit walls.

What this looks like from where we sit

None of this surprised us — these are the problems we're building for. Hearing them in this volume and with this specificity was clarifying, though.

Governance and control feel most urgent for enterprise buyers right now. Evaluation is where the deeper, longer-term value lives: it's the mechanism by which the rest of the system improves over time, turning individual skills from guesses into something you can actually depend on and build on top of. The management layer — versioning, lifecycle, ownership — is what makes any of it operable at scale.

The tooling for all of this is still catching up. But the teams waiting for it are already here.

Tessl is the package manager for agent skills and context. If you missed us at AI Engineer and want to explore Tessl for your organization, get in touch here.