17 Mar 20267 minute read

With Small 4, Mistral combines three of its flagship AI models into one
17 Mar 20267 minute read

Developers building with AI models often end up running several systems side by side — one for coding, another for reasoning, and a third for handling images. That adds overhead, especially when those systems need to work together inside the same application.
French AI company Mistral, which builds open-weight language models, is trying to simplify that setup.
The company’s latest release, Mistral Small 4, combines several capabilities from separate models into a single system. Earlier models in the same family, such as Mistral Small 3, were general-purpose, able to handle chat, some coding, and basic multimodal tasks, but they didn't bring together Mistral’s more advanced reasoning or agent-style coding tools.
Mistral Small 4: When 3 becomes 1
Mistral Small 4 draws on three of the company’s more specialised models. These include Magistral, built for more involved reasoning tasks; Pixtral, designed to process images alongside text; and Devstral, which focuses on agent-style coding, such as breaking down tasks and executing them across a codebase.
Until now, developers looking to use these capabilities together would need to rely on multiple models, often stitched together through custom tooling. Small 4 is positioned as a way to run those functions within a single model instead.
Instead of relying on separate models connected through custom tooling, Small 4 runs these capabilities within a single model. Moreover, it also introduces a way to adjust how much reasoning it applies to a task.
For example, setting reasoning_effort="none" produces faster, lighter responses similar to earlier Small models, while reasoning_effort="high" triggers more detailed, step-by-step reasoning, closer to what Mistral’s separate reasoning models were designed to do.
What changed from Small 3
Mistral has spent the past year releasing more specialised models, each aimed at a specific task such as reasoning, coding, or working with images. Those models still exist, and in some cases remain the better choice for teams that need maximum performance in a single area.
Before that expansion, Mistral Small 3 served as the company’s main general-purpose model, designed to run on relatively modest hardware, with around 24 billion parameters. It handled a range of tasks, but the more advanced capabilities in Mistral’s lineup, particularly structured reasoning and multi-step coding agents, were developed separately.
With Small 4, those capabilities sit within the same model. It uses a mixture-of-experts design with 128 experts, of which only four are active at a time. The model has 119 billion total parameters, but only around 6 billion are active per token, which keeps compute requirements closer to smaller models despite its overall size.
That architecture helps explain the “small” label. The model is large on paper, but only a fraction of it runs for each request.
That doesn’t make the separate models redundant. A dedicated reasoning model or coding agent can still be more predictable or better tuned for its specific task. Running a single combined model trades some of that focus for convenience, particularly for teams that want to deploy and maintain fewer systems.
The community reacts: “small ain’t what it used to be”
Early reactions focused on the model’s naming and size, with many users questioning what “small” now means.
One Reddit user wrote: “holy shit ‘small’ ain’t what it used to be.” Another added: “so 120b class is considered small now : )”
More technical commenters stepped in to explain the mixture-of-experts setup, noting that only a fraction of the model runs at a time. As one user put it: “the ‘small’ is more the active parameter count,” meaning that only a small portion of the model is used for each request rather than the full 119B parameters.
There was also a recurring theme around hardware requirements. Despite the efficiency gains, some users felt the model still sits out of reach for local setups. “No chance… I’m going to run this,” one commenter wrote.
Others pointed out that the model’s size appears tuned for specific hardware limits, suggesting it may be following a similar approach to OpenAI’s GPT-OSS-120B model.
“Interesting that they target around 120 billion parameters,” they wrote. “Just enough to fit onto a single H100 with 4 bit quant. Or 128GB APU like apple silicon, AMD AI cpus or the GB spark.”
Simplifying model stacks
More broadly, Mistral Small 4 points to a broader shift in how AI models are being packaged and deployed.
Running multiple models side by side adds complexity, particularly for companies building internal tools or products on top of them. A single model that can handle different types of tasks reduces the need for orchestration, even if it comes with other trade-offs.
Mistral Small 4 reflects that shift. Rather than asking developers to choose between separate systems for reasoning, coding, or multimodal tasks, it brings those capabilities into one model that can be deployed and managed in a single place.
At the same time, combining these capabilities raises questions about how well one model can handle all of them. Specialised systems are often tuned for specific tasks, and merging them into a single model can dilute those strengths. Early reactions reflect that tension.
For now, Mistral is betting that developers will accept those trade-offs in exchange for the convenience of running a single system.
Related Articles

Double your coding agent’s chances of writing secure code with the CodeGuard Skill
12 Feb 2026

As coding agents become collaborative co-workers, orchestration takes center stage
16 Feb 2026

Mistral debuts Vibe CLI agent and open-weight Devstral 2 models for enterprise-grade coding
11 Dec 2025

OpenAI adds Safeguard to GPT-OSS, letting developers set their own safety rules
5 Nov 2025
