Event — Securing the Agent Skill Supply Chain | Virtual | June 17Register
Logo
Registry
EnterpriseCareersDocsRegistry

AUTHOR

Rob Willoughby

Member of Technical Staff at Tessl

LinkedIn

Articles

Article

AI Coding Agent Accuracy: Opus 4.7 vs 4.8

Opus 4.8 matches Opus 4.7 in accuracy but improves efficiency, solving tasks in fewer turns and at lower costs, highlighting differences beyond headline metrics.

Read more

Article

Why We're Changing Our Default Eval Model

The default eval model is changing from Claude Sonnet 4.6 to GLM 5.1 to reduce costs without losing signal quality, focusing on skill evaluation over model specificity.

Read more

Article

Evaluating Kimi 2.5 vs Kimi 2.6: What happens to agent skills when the model gets smarter?

Early signals from benchmarking Kimi K2.5, K2.6, and Sonnet 4.5 on 21 agent skills. Kimi K2.6 is a better model than K2.5, and skills still matter as models improve.

Read more

Article

A Proposed Evaluation Framework for Coding Agents: Tiles Enhance Proper Use of Public APIs by ~35%

This article proposes an evaluation framework highlighting how specifications enhance coding agents' effective use of public APIs, increasing code quality and efficiency by approximately 35% amidst evolving software interfaces.

Read more