Friday, March 20, 2026

Microsoft releases recent, competent Phi-3.5 models

Share


Join our daily and weekly newsletters to receive the latest updates and exclusive content on industry-leading AI coverage. Learn more


Microsoft has no intention of basing its success in AI on its partnership with OpenAI.

Three recent Phi 3.5 models cover 3.82 billion parameters Phi-3.5-mini-instruct41.9 billion parameters Phi-3.5-MoE-manualand parameter 4.15 billion Phi-3.5-vision-manualeach of them is designed for basic/swift reasoning, more advanced reasoning, and tasks requiring vision (image and video analysis), respectively.

All three models are available for developers to download, operate and fine-tune in Hugging Face Microsoft Branded MIT License which allows for commercial operate and modification without restrictions.

Amazingly, all three models boast near-state-of-the-art performance in numerous third-party benchmarks, outperforming even solutions from other AI vendors including Google’s Gemini 1.5 Flash, Meta’s Llama 3.1, and in some cases even OpenAI’s GPT-4o.

Such achievements, combined with a permissive, open license, have led people to praise Microsoft on the social networking site X:

Today, let’s take a quick look at each of the recent models, based on their release notes published on Hugging Face

Phi-3.5 Mini Instruct: Optimized for compute-constrained environments

The Phi-3.5 Mini Instruct model is a lightweight AI model with 3.8 billion parameters, designed to follow instructions and supporting a token context length of 128k.

This model is ideal for scenarios requiring robust reasoning abilities in environments with constrained memory or computational power, including tasks such as code generation, mathematical problem solving, and logic-based reasoning.

Despite its compact size, the Phi-3.5 Mini Instruct delivers competitive performance for conversational tasks involving multiple languages ​​and phrases, a significant improvement over previous models.

In many benchmarks it offers performance close to state-of-the-art solutions, and in the RepoQA test, which measures “understanding of long contextual code”, it outperforms other models of similar size (Llama-3.1-8B-instruct and Mistral-7B-instruct).

Phi-3.5 MoE: Microsoft’s “Mix of Experts”

The Phi-3.5 MoE (Mixture of Experts) model appears to be the first in this class of models from the company, combining several different types of models, each specializing in a different task.

This model uses an architecture with 42 billion lively parameters and supports a token context length of 128k, providing scalable AI performance for demanding applications. However, it only works with 6.6 billion lively parameters, according to HuggingFace documentation.

Designed to excel in a variety of reasoning tasks, Phi-3.5 MoE offers high performance in coding, math, and multilingual language understanding, often outperforming larger models on specific benchmarks including RepoQA:

In an impressive comparison to the GPT-4o, the five-shot MMLU (Massive Multitask Language Understanding) mini test covers subjects such as STEM, humanities, and social sciences at a variety of proficiency levels.

The unique architecture of the MoE model allows it to maintain performance while handling complicated AI tasks across multiple languages.

Phi-3.5 Vision Instruct: Advanced Multimodal Reasoning

The trio is completed by the Phi-3.5 Vision Instruct, which integrates text and image processing capabilities.

This multimodal model is particularly useful for tasks such as general image understanding, optical character recognition, understanding graphs and tables, and summarizing video material.

Microsoft emphasizes that the model was trained using a combination of synthetic and filtered publicly available datasets, with an emphasis on high-quality and high-inference-density data.

Training of the recent Phi trio

The Phi-3.5 Mini Instruct model was trained on 3.4 trillion tokens using 512 H100-80G GPUs in 10 days, while the Vision Instruct model was trained on 500 billion tokens using 256 A100-80G GPUs in 6 days.

The Phi-3.5 MoE model, which features a mixed-expert architecture, was trained on 4.9 trillion tokens and 512 H100-80G GPUs in 23 days.

Open source software under the MIT license

All three Phi-3.5 models are available under the MIT License, reflecting Microsoft’s commitment to supporting the open source community.

This license allows developers to freely operate, modify, combine, publish, distribute, sublicense, and sell copies of the software.

The license also includes a disclaimer that the software is provided “as is” without warranty of any kind. Microsoft and other copyright holders are not responsible for any claims, damages or other liabilities that may arise from the operate of the software.

By making these models open source, Microsoft enables developers to integrate cutting-edge AI capabilities into their applications, driving innovation in both commercial and research settings.

Latest Posts

More News