Simon Maple

Article

The new Tessl review: now you decide what "good" looks like:

The new Tessl review lets users define their own criteria for skill quality, offers agent-based accuracy, and maintains a history of review runs.

Article

Same quality, a quarter of the cost: Should DeepSeek Flash be your model of choice?

DeepSeek Flash offers comparable quality to pricier models at a fraction of the cost, making it a cost-effective choice for running agentic tasks at scale.

Article

Opus 4.8 tops the LLM leaderboard with 95% on skill evals

Opus 4.8 leads the LLM leaderboard with a 95% skill evaluation score, surpassing Opus 4.7 and Composer 2.5 Fast, despite being the slowest model tested.

Article

We ran Composer 2.5 and 2.5 Fast across 11 skills. Surprisingly, Fast won.

Composer 2.5 Fast outperformed Composer 2.5 across 11 skills, scoring higher and running 32% quicker, while costing the same, challenging typical speed-quality trade-offs.

Article

Your benchmarks are lying to you, and your judge is to blame!

Benchmarking AI models with single LLM judges can skew results due to judge bias. Multiple judges reveal score variations, suggesting a need for diverse evaluation methods.

Article

Stop trusting your agent skills with vibes. Eliminate the context security risk.

Learn how 'tessl-audit' helps secure AI agent plugins by scanning for vulnerabilities, assessing quality, and ensuring plugins enhance agent performance.

Article

GPT-5.5 is OpenAI's best model. But paying more for it makes no sense.

GPT-5.5 is OpenAI's most capable model, but its 63% higher cost offers minimal performance gains over GPT-5.4, making it less cost-effective for most tasks.

Article

Stop guessing whether your Skill works: skill-optimizer measures and improves it

Skill-optimizer evaluates and enhances AI skills by running them through a judge-scored eval pipeline, providing measurable improvements and insights into skill performance.

Article

Anthropic, OpenAI, or Cursor model for your agent skills? 7 learnings from running 880 evals (including Opus 4.7)

Explore findings from 880 evaluations comparing Anthropic, OpenAI, and Cursor models, highlighting the impact of agent skills on performance and cost efficiency.

Article

The Tessl Registry now has security scores, powered by Snyk

The Tessl Registry now includes Snyk-powered security scores for skills, enhancing decision-making by assessing quality, impact, and safety before installation.

Article

What's new in Tessl: global installs, watch mode, GitHub badges, and a unified score

Tessl introduces global skill installs, automatic change monitoring with watch mode, GitHub badges for eval scores, and a unified score to represent skill quality.

Article

Your skill works on opus. Does it make haiku worse? Benchmarking AI skills across Claude models

Explore how the 'review-model-performance' skill benchmarks AI skills across Claude models, addressing compatibility and effectiveness issues in AI skill deployment.

Article

Your AGENTS.md file isn't the problem. Your lack of AI Agent Evaluations is.

The issue with context files like AGENTS.md is not their existence but the lack of evaluation to validate their effectiveness in guiding AI agents.

Article

Double your coding agent’s chances of writing secure code with the CodeGuard Skill

Enhance AI coding agents with the CodeGuard Skill to improve secure code generation by applying Cisco's security rules, covering 23 categories and multiple languages.

Article

The CI/CD of Agent Skills: Automate Publishing with GitHub Actions

Automate the publishing of agent skills using GitHub Actions with Tessl, a package manager that ensures updated skills and context are consistently deployed.

Article

Best Agent Skills for AI Code Review: 8 Evaluated Skills For Dev Workflows

Explore eight AI code review skills categorized into reviewers, workflow, and plumbing, enhancing code analysis, security checks, and integration in dev workflows.

Article

Taming AI agents with specs: what the experts say

Experts discuss the critical disconnect between intent and implementation in AI-driven coding, emphasizing the need for streamlined markdown specifications to enhance code quality and preserve institutional knowledge.

Article

Tessl launches spec-driven framework and registry to make AI coding agents reliable

Tessl’s answer to unreliable and over-confident AI coding agents is spec-driven development, which captures intent in structured specifications so agents can build with clarity and guardrails.

Article

Why Code Alone Isn’t Enough: The Case for Code Specification (Spec-Driven Development)

Discover why shifting from code-centric to spec-centric development can unleash AI’s code-gen potential, creating adaptable, autonomous, and self-optimizing software systems.

Article

Cursor 1.0: Everything You Need To Know

Cursor 1.0 is here, featuring BugBot for seamless GitHub code reviews, asynchronous Background Agents, collaborative Jupyter support, Memories, and MCP integrations.

Article

The Dark Side of “Just Hooking Up" AI Agents to GitHub

A single malicious GitHub issue can hijack your AI dev agent, leaking private repo data. Learn why GitHub MCP integrations pose a hidden security risk.

Article

12 MCP Servers That Will Make You A More Productive Developer

Explore 12 powerful MCP servers that supercharge your dev workflow - from Docker and GitHub to Slack, Snyk, and more.

Article

The Best Bits From AI Native DevCon Spring 2025

Relive the highlights from AI Native DevCon Spring 2025—keynotes, live coding, and agentic insights from 30+ speakers across 3 tracks.

Article

JetBrains Releases Junie: An Agentic AI Assistant

JetBrains launches Junie, an "agentic AI" assistant built into its IDEs, promising more autonomous coding and hinting at the future of development workflows.

Article

The Future of AI Native Development: We Need Your Input!

Help shape the future of AI-powered development! Take the AI Native Dev Survey 2025 to share your insights and get exclusive access to the upcoming industry report.

Article

AI Native DevCon Spring 2025

It’s happening! AI Native DevCon Spring 2025 is officially on, and we’re opening up the Call for Proposals (CFP) today!

Article

Announcing AI Native Dev Con, Supercharge development today, and reimagine it for tomorrow

We’re excited to announce the launch of a brand new conference, AI Native Dev Con. We’re kicking off with an inaugural virtual conference on the 21st November, 2024. The conference aims to help you use AI to develop faster and better today, and exploring how AI is reshaping the way we will build, maintain and evolve software tomorrow. We highlight exciting new tools and advancements in AI-powered software development, with a focus on how large language models are reshaping how we build, maintain, and scale complex codebases. Join us to explore the future where AI goes beyond generating code snippets to orchestrating the creation and evolution of entire software systems.

Articles

The new Tessl review: now you decide what "good" looks like:

Same quality, a quarter of the cost: Should DeepSeek Flash be your model of choice?

Opus 4.8 tops the LLM leaderboard with 95% on skill evals

We ran Composer 2.5 and 2.5 Fast across 11 skills. Surprisingly, Fast won.

Your benchmarks are lying to you, and your judge is to blame!

Stop trusting your agent skills with vibes. Eliminate the context security risk.

GPT-5.5 is OpenAI's best model. But paying more for it makes no sense.

Stop guessing whether your Skill works: skill-optimizer measures and improves it

Anthropic, OpenAI, or Cursor model for your agent skills? 7 learnings from running 880 evals (including Opus 4.7)

The Tessl Registry now has security scores, powered by Snyk

What's new in Tessl: global installs, watch mode, GitHub badges, and a unified score

Your skill works on opus. Does it make haiku worse? Benchmarking AI skills across Claude models