Baptiste Fernandez

Article

How Small Can an Agent Model Get? The Nemotron Floor

The article explores the minimum model capacity needed for agent models to function effectively, highlighting the Nemotron family's performance on coding tasks.

Article

Open-Source Agents vs Sonnet 4.6: GLM 5.2, MiniMax M3, Kimi 2.7 and Qwen 3.7 Tested

Open-source coding models like GLM 5.2 and Qwen3.7-Plus are compared to Claude Sonnet 4.6, revealing varied performance in instruction-following and task completion.

Article

How I Scan My Agent Context Across GitHub with Skill Inventory

Explore how Tessl's Skill Inventory scans GitHub to map agent skills, addressing duplication and ownership issues, and providing a clear skill estate overview.

Article

Why Your Gemini Bill Doesn't Match the Model Names

Gemini model billing discrepancies arise as task costs and model names don't align, with Gemini 3.5 Flash costing more than 3.1 Pro despite similar performance scores.

Article

Claude Fable 5 vs Opus 4.8: The Mythos Hype Meets Reality

Claude Fable 5 and Opus 4.8 are compared, revealing that despite Mythos hype, Opus 4.8 remains a better value for most, with Fable 5 offering higher but riskier capabilities.

Article

Anthropic, OpenAI, or Cursor model for your agent skills? 7 learnings from running 880 evals (including Opus 4.7)

Explore findings from 880 evaluations comparing Anthropic, OpenAI, and Cursor models, highlighting the impact of agent skills on performance and cost efficiency.

Article

Anthropic brings evals to skill-creator. Here’s why that’s a big deal

Anthropic's skill-creator now includes evals, allowing developers to test and validate context files, improving the accuracy and efficiency of AI coding agents.

Article

Level Up Claude Code: 14 Techniques Top Engineers Use

Explore 14 practical techniques used by engineers to maximize output with Claude Code, including planning, enhancing context with Tessl, and using mutable memory with CLAUDE.md.

Article

7 AI Devtools to Watch This December

Explore seven emerging AI development tools that enhance workflows through trust and adaptability, spotlighting innovative approaches that tackle unique challenges in coding and collaboration for developers.

Article

8 Devtools in the Spotlight - Last Month on the Landscape

Explore eight standout AI development tools, each uniquely enhancing workflows through trust and adaptability, as the rapidly evolving landscape prompts developers to rethink their approaches to integration.

Article

Does Developer Delight Matter in a CLI? The Case of Charm’s Crush

What if the real edge in AI dev tools isn’t just model power, but delight? Crush’s playful, glamorous CLI begs the question: how significant is joy and polish over technical capability?

Article

IDE Comparison With Cursor, Windsurf, and Copilot (on GPT-5)

Dive into a hands-on comparison of Cursor, Windsurf, and Copilot with GPT-5, highlighting their strengths in greenfield and brownfield projects through a spec-first development lens.

Article

Stack Overflow's 2025 Report is Out: Trends on AI Native Development

Developers use AI more than ever, but trust it less. ‘Almost right’ code is driving a push toward spec-driven development.

Article

Terminal-Bench: Benchmarking AI Agents on CLI Tasks

Terminal-Bench is a new benchmark testing how well AI agents handle real-world terminal tasks, revealing big performance gaps and sparking a wave of innovation in system-level agent design.

Article

The Most Valuable Developer Skill in 2025? Writing Code Specifications

In 2025, the key skill for developers is writing code specifications, shifting from transient prompts to structured documentation that aligns stakeholders and ensures clarity in AI-driven development.

Article

How claude-task-master “Reduced 90% Errors for My Cursor”

Discover how Taskmaster leverages AI-driven PRDs to decompose projects into focused tasks, boosting reliability, efficiency, cost savings, and developer autonomy.

Article

The Rise of “Visual Vibe Coding” (with New VSCode Extension)

Stagewise brings visual vibe coding to life - click elements, describe changes, and watch AI reshape your frontend. A new paradigm for intent-driven, precision UI development.

Article

The Best Open-Source Model for Agentic Coding? Meet Mistral’s Devstral

Mistral's new Devstral model tops SWE-Bench Verified for open-weight LLMs—compact, agent-optimized, Apache-licensed, and strong enough to rival massive proprietary giants.

Article

GitHub Unveils Copilot Coding Agent at Build 2025

GitHub Copilot evolves: cloud-based agents now handle PRs, iterate from feedback, and fit seamlessly into dev workflows.

Article

GitHub’s MCP Server: You Can Now Talk to Your Repos

GitHub’s new MCP server empowers Copilot to automate dev workflows with real-time repo context - marking a leap from code suggestions to worklfow actions.

Article

Developing Apps by Chat: How Far Can Lovable 2.0 Really Go?

Lovable 2.0 lets you build apps by chatting with an AI. What does this mean for developers?

Article

IDE Free Tier War: Windsurf’s Push to Win Over Developers

In April 2025, Windsurf (formerly Codeium) rolled out significant pricing updates that made waves in the developer community.

Article

When OpenAI Goes Open Source: Codex CLI

Codex blends ChatGPT and other models (you read that right) with hands-on capabilities like code execution, file manipulation, and project iteration.

Article

GPT-4.1 Exposes Frustrations in AI Native Development

Benchmarks are up, but so is developer frustration—model sprawl, version confusion, and prompt chaos are sparking a shift toward smarter abstraction.

Article

Plugins and Platforms: v0’s Marketplace Integrations in AI Native Development

Vercel's v0 now supports third-party integrations via its new Marketplace, letting developers spin up full-stack apps—complete with backend services—directly from AI prompts. This marks a major leap in the shift toward AI-native development.

Article

GPT-4.5 vs o3: Observations in AI-Native Development

GPT-4.5 vs o3-mini: Which model is best for AI-native development? Tessl’s AI Engineering team put them to the test—here’s what we found.

Baptiste Fernandez

Articles

How Small Can an Agent Model Get? The Nemotron Floor

Open-Source Agents vs Sonnet 4.6: GLM 5.2, MiniMax M3, Kimi 2.7 and Qwen 3.7 Tested

How I Scan My Agent Context Across GitHub with Skill Inventory

Why Your Gemini Bill Doesn't Match the Model Names

Claude Fable 5 vs Opus 4.8: The Mythos Hype Meets Reality

Anthropic, OpenAI, or Cursor model for your agent skills? 7 learnings from running 880 evals (including Opus 4.7)

Anthropic brings evals to skill-creator. Here’s why that’s a big deal

Level Up Claude Code: 14 Techniques Top Engineers Use

7 AI Devtools to Watch This December

8 Devtools in the Spotlight - Last Month on the Landscape

Does Developer Delight Matter in a CLI? The Case of Charm’s Crush

IDE Comparison With Cursor, Windsurf, and Copilot (on GPT-5)