NEWS

OpenAI is shutting down self-serve fine-tuning – what this signals for enterprise AI

OpenAI is phasing out self-serve fine-tuning, citing advanced models reducing its necessity, signaling a shift in enterprise AI towards infrastructure challenges.

Paul Sawers

·20 May 2026·7 min read

Fine-tuning has widely been regarded as one of the definitive ways of making AI enterprise-ready. The mighty OpenAI itself has made fine-tuning one of its seven core lessons for enterprise AI adoption — recommending it as the "tailored option" for companies wanting domain-specific results.

“If a GPT model is a store-bought suit, fine-tuning is the tailored option – the way you customize the model to your organization’s specific data and needs,” the company wrote in its 2025 AI in the Enterprise report. “Fine-tuned models better understand your industry’s terminology, style, and context.”

Now, however, the ChatGPT hitmaker is quietly winding much of that approach down.

Frontier models reduce the need for fine-tuning

In a deprecation notice posted in early May, OpenAI said it was restricting access to its self-serve fine-tuning platform — starting by blocking new organisations from creating fine-tuning jobs, then cutting off anyone who hasn't run inference on a fine-tuned model in the past 60 days, and finally ending new fine-tuning job creation for all existing customers by January 2027.

In an email sent to developers explaining its decision, OpenAI said that its newest models had become capable enough to render much of the fine-tuning process unnecessary

“Newer base models like GPT-5.5 are much better at following instructions and formats than prior models,” the company wrote. “Prompt-based approaches are now cheaper and faster — as such, we’re seeing fewer use cases that require fine-tuning.”

Enterprise AI engineering centres on orchestration, evaluation, and agent harnesses

This is a recurring theme across the enterprise AI landscape: the models themselves are no longer the bottleneck — the harder problems now lie in the infrastructure around them. As OpenAI's most recent enterprise guidance noted, the bigger challenges are evaluation systems, context handling, memory, orchestration, and the operational controls needed to make these tools reliable inside large organisations.

Rinat Abdullin, founder of AI consultancy BitGN, argued that OpenAI’s move reflects a wider realisation that fine-tuning — and “vector-based RAGs” that retrieve relevant information from databases at query time rather than baking it into the model — are losing ground to simpler, cheaper, faster alternatives.

“Fine-tuning – and vector-based RAGs along the way – are dying not because they don't work,” Abdullin wrote on LinkedIn. “They do work, but are too expensive, risky and time consuming (both for the vendors and users), if compared to the other approaches: context engineering, proper tool-use and quality control, simply sticking to the frontier LLM models.”

Laurie Voss, head of developer relations at AI observability company Arize, said that this doesn’t mean fine-tuning is dead, more it’s a “strong signal that fine-tuning isn't what the average AI engineer wants to do.”

Voss argues that iteration has permanently split in two. A tiny elite — the Cursors and Cognitions of the world — are doing more model training than ever, running continuous reinforcement learning against their own production environments with dedicated ML teams and custom GPU clusters. Everyone else has moved the iteration loop out of the model entirely, into what Voss calls the "harness": the layer of prompts, tools, evals, and feedback loops that wraps around it.

"OpenAI winding down its self-serve fine-tuning platform didn't cause this split," he wrote, "but it did make it impossible to ignore."

In truth, this hasn’t happened overnight. Back in October 2023, Fast.ai founder Jeremy Howard — the man who effectively invented the modern fine-tuning paradigm with his ULMFiT paper — appeared on the Latent Space podcast to argue that nobody actually understood how to do it properly.

"I still don't know how to fine tune language models properly," Howard said, "and I haven't found anybody who feels like they do."

His conclusion was that fine-tuning, as most teams practised it, should simply be abandoned in favour of continued pre-training.

And this leads us to May, 2026, and OpenAI doing exactly what Howard suggested.

The operational stack becomes the real enterprise AI battleground

The news out of OpenAI doesn’t mean fine-tuning disappears from the industry. For teams willing to work with open-source models, parameter-efficient techniques like LoRA and QLoRA remain fully available and unaffected.

Google's Gemini via Vertex AI and Anthropic's Claude via Amazon Bedrock also still offer managed fine-tuning for enterprise customers.

But for OpenAI specifically, well, that’s almost it. Existing active customers have until January 2027 to run their last training jobs, after which new fine-tuning will be gone. OpenAI has confirmed it won't add new models or platform features during the wind-down. Inference on existing fine-tuned models will persist via the Chat Completions and Responses APIs — but only until the underlying base models are deprecated.

The bigger story, though, is where engineering effort is now concentrating. For many enterprise teams, the iteration loop has shifted away from retraining models and toward controlling the systems around them: context retrieval, tool orchestration, evaluation, memory, observability, and governance.

At the same time, a growing ecosystem of companies is building infrastructure around those surrounding systems. Tessl, for example, focuses on evaluation systems,
context registries, and reusable skills layers for coding agents. Yugabyte’s recently launched Meko platform, meanwhile, is pitching shared memory and coordination infrastructure as the missing operational layer underneath multi-agent systems.

The common thread running through all of this is that the operational stack around the model is now the primary engineering problem. And that’s where much of the industry’s engineering effort is now accumulating.

COPY & SHARE

Paul Sawers

Freelance tech writer at Tessl, former TechCrunch senior writer covering startups and open source

130 posts

READING

IN THIS POST

Frontier models reduce the need for fine-tuning Enterprise AI engineering centres on orchestration, evaluation, and agent harnesses The operational stack becomes the real enterprise AI battleground

COPY & SHARE

Paul Sawers

Freelance tech writer at Tessl, former TechCrunch senior writer covering startups and open source

130 posts

YOUR NEXT READ

What GitHub learned when better tools made Copilot code review worse

GitHub's migration of Copilot code review to shared tools initially worsened performance. Rewriting instructions improved accuracy and reduced costs by 20%.

Paul Sawers

·14 Jul 2026·8 min read

OpenAI is shutting down self-serve fine-tuning – what this signals for enterprise AI

Frontier models reduce the need for fine-tuning

Enterprise AI engineering centres on orchestration, evaluation, and agent harnesses

The operational stack becomes the real enterprise AI battleground

What GitHub learned when better tools made Copilot code review worse

More articles by Paul Sawers