Stop guessing whether your Skill works: skill-optimizer measures and improves it
Skill-optimizer evaluates and enhances AI skills by running them through a judge-scored eval pipeline, providing measurable improvements and insights into skill performance.
AI Native DevCon Day 2: From Agent Demos to Operating Models
AI Native DevCon Day 2 explored operating models for AI-native delivery, focusing on context pipelines, agent behavior metrics, and organizational ownership.
Rohan Sharma · 12 min read · 3 Jun 2026
Read article
The model's solved, now comes the hard part: Reviewability as the bottleneck
AI engineering shifts focus from model development to ensuring system reviewability, emphasizing manageable task sizes for reliable and governable outputs.
Paul Sawers · 9 min read · 2 Jun 2026
Read article
AI Native DevCon Day 1: Making AI Agents Ready for Enterprise
AI Native DevCon Day 1 focused on making AI agents enterprise-ready, emphasizing reliability, skills as code, and adapting platforms for agent integration.
Rohan Sharma · 12 min read · 2 Jun 2026
Read article
AI Coding Agent Accuracy: Opus 4.7 vs 4.8
Opus 4.8 matches Opus 4.7 in accuracy but improves efficiency, solving tasks in fewer turns and at lower costs, highlighting differences beyond headline metrics.
Rob Willoughby · 9 min read · 29 May 2026
Read article
Opus 4.8 tops the LLM leaderboard with 95% on skill evals
Opus 4.8 leads the LLM leaderboard with a 95% skill evaluation score, surpassing Opus 4.7 and Composer 2.5 Fast, despite being the slowest model tested.
Simon Maple · 8 min read · 29 May 2026
Read article
Why We're Changing Our Default Eval Model
The default eval model is changing from Claude Sonnet 4.6 to GLM 5.1 to reduce costs without losing signal quality, focusing on skill evaluation over model specificity.
Rob Willoughby · 9 min read · 29 May 2026
Read article
We ran Composer 2.5 and 2.5 Fast across 11 skills. Surprisingly, Fast won.
Composer 2.5 Fast outperformed Composer 2.5 across 11 skills, scoring higher and running 32% quicker, while costing the same, challenging typical speed-quality trade-offs.
Simon Maple · 6 min read · 28 May 2026
Read article
Don't Make Your Agent Guess
The article discusses the importance of tool design in agent systems, emphasizing that prompts alone are insufficient for ensuring agent safety and reliability.
Matthias Lübken · 11 min read · 27 May 2026
Read article
The Reinvention Of The Dev Team
Explore how AI is reshaping dev teams, challenging traditional roles, and introducing new dynamics in software development, focusing on speed, safety, and value.
Hannah Foxwell · 11 min read · 26 May 2026
Read article
Securing the Coder, Not the Code: Notes on Agentic Development and Security
Agentic development shifts security focus from code to coder, requiring new tools and metrics as AI agents rapidly create and modify software.
Guy Podjarny · 16 min read · 21 May 2026
Read article
AI Native DevCon’26: The London conference for developers building with AI
AI Native DevCon'26 in London focuses on challenges of deploying AI agents in production, featuring four tracks on engineering, orchestration, enablement, and governance.
Rohan Sharma · 7 min read · 20 May 2026
Read article
OpenAI is shutting down self-serve fine-tuning – what this signals for enterprise AI
OpenAI is phasing out self-serve fine-tuning, citing advanced models reducing its necessity, signaling a shift in enterprise AI towards infrastructure challenges.