Microsoft releases recent, competent Phi-3.5 models

Share

Join our daily and weekly newsletters to receive the latest updates and exclusive content on industry-leading AI coverage. Learn more

Microsoft has no intention of basing its success in AI on its partnership with OpenAI.

Three recent Phi 3.5 models cover 3.82 billion parameters Phi-3.5-mini-instruct41.9 billion parameters Phi-3.5-MoE-manualand parameter 4.15 billion Phi-3.5-vision-manualeach of them is designed for basic/swift reasoning, more advanced reasoning, and tasks requiring vision (image and video analysis), respectively.

All three models are available for developers to download, operate and fine-tune in Hugging Face Microsoft Branded MIT License which allows for commercial operate and modification without restrictions.

Amazingly, all three models boast near-state-of-the-art performance in numerous third-party benchmarks, outperforming even solutions from other AI vendors including Google’s Gemini 1.5 Flash, Meta’s Llama 3.1, and in some cases even OpenAI’s GPT-4o.

Such achievements, combined with a permissive, open license, have led people to praise Microsoft on the social networking site X:

Let’s gooo… Microsoft just released Phi 3.5 mini, MoE and vision with 128K context, multilingual and MIT license! MoE beats Gemini flash, Vision competes with GPT4o?
> Mini with 3.8B parameters, beats Llama3.1 8B and Mistral 7B and is competitive with Mistral NeMo 12B
>… photo:twitter.com/7QJYOSSdyX
— Vaibhav (VB) Srivastav (@reach_vb) August 20, 2024

Congratulations to @Microsoft for achieving such an amazing result with the recently released Phi 3.5: mini+MoE+vision?
Phi-3.5-MoE beats Llama 3.1 8B in benchmarks
Of course, Phi-3.5-MoE is a 42B MoE with 6.6B activated during generation
And Phi-3.5 MoE outperforms… photo:twitter.com/9d4h5Q5p7Z
— Rohan Paul (@rohanpaul_ai) August 20, 2024

How the hell is Phi-3.5 even possible?
Phi-3.5-3.8B (Mini) somehow beats LLaMA-3.1-8B.
(trained only on 3.4T tokens)
Phi-3.5-16×3.8B (MoE) somehow beats Gemini-Flash
(trained only on 4.9T tokens)
Phi-3.5-V-4.2B (Vision) somehow beats GPT-4o
(trained on 500B tokens)
how? lol photo: twitter.com/97gmx1CsQs
— Yam Peleg (@Yampeleg) August 20, 2024

Today, let’s take a quick look at each of the recent models, based on their release notes published on Hugging Face

Phi-3.5 Mini Instruct: Optimized for compute-constrained environments

The Phi-3.5 Mini Instruct model is a lightweight AI model with 3.8 billion parameters, designed to follow instructions and supporting a token context length of 128k.

This model is ideal for scenarios requiring robust reasoning abilities in environments with constrained memory or computational power, including tasks such as code generation, mathematical problem solving, and logic-based reasoning.

Despite its compact size, the Phi-3.5 Mini Instruct delivers competitive performance for conversational tasks involving multiple languages and phrases, a significant improvement over previous models.

In many benchmarks it offers performance close to state-of-the-art solutions, and in the RepoQA test, which measures “understanding of long contextual code”, it outperforms other models of similar size (Llama-3.1-8B-instruct and Mistral-7B-instruct).

Phi-3.5 MoE: Microsoft’s “Mix of Experts”

The Phi-3.5 MoE (Mixture of Experts) model appears to be the first in this class of models from the company, combining several different types of models, each specializing in a different task.

This model uses an architecture with 42 billion lively parameters and supports a token context length of 128k, providing scalable AI performance for demanding applications. However, it only works with 6.6 billion lively parameters, according to HuggingFace documentation.

Designed to excel in a variety of reasoning tasks, Phi-3.5 MoE offers high performance in coding, math, and multilingual language understanding, often outperforming larger models on specific benchmarks including RepoQA:

In an impressive comparison to the GPT-4o, the five-shot MMLU (Massive Multitask Language Understanding) mini test covers subjects such as STEM, humanities, and social sciences at a variety of proficiency levels.

The unique architecture of the MoE model allows it to maintain performance while handling complicated AI tasks across multiple languages.

Phi-3.5 Vision Instruct: Advanced Multimodal Reasoning

The trio is completed by the Phi-3.5 Vision Instruct, which integrates text and image processing capabilities.

This multimodal model is particularly useful for tasks such as general image understanding, optical character recognition, understanding graphs and tables, and summarizing video material.

Microsoft emphasizes that the model was trained using a combination of synthetic and filtered publicly available datasets, with an emphasis on high-quality and high-inference-density data.

Training of the recent Phi trio

The Phi-3.5 Mini Instruct model was trained on 3.4 trillion tokens using 512 H100-80G GPUs in 10 days, while the Vision Instruct model was trained on 500 billion tokens using 256 A100-80G GPUs in 6 days.

The Phi-3.5 MoE model, which features a mixed-expert architecture, was trained on 4.9 trillion tokens and 512 H100-80G GPUs in 23 days.

Open source software under the MIT license

All three Phi-3.5 models are available under the MIT License, reflecting Microsoft’s commitment to supporting the open source community.

This license allows developers to freely operate, modify, combine, publish, distribute, sublicense, and sell copies of the software.

The license also includes a disclaimer that the software is provided “as is” without warranty of any kind. Microsoft and other copyright holders are not responsible for any claims, damages or other liabilities that may arise from the operate of the software.

By making these models open source, Microsoft enables developers to integrate cutting-edge AI capabilities into their applications, driving innovation in both commercial and research settings.

VB Daily

Stay up to date! Get the latest news in your inbox every day

By subscribing, you agree to the VentureBeat Terms of Service.

Thanks for subscribing. Check out more VB newsletters here.

An error occurred.

The AI Sckool

Categories

Microsoft releases recent, competent Phi-3.5 models

Phi-3.5 Mini Instruct: Optimized for compute-constrained environments

Phi-3.5 MoE: Microsoft’s “Mix of Experts”

Phi-3.5 Vision Instruct: Advanced Multimodal Reasoning

Training of the recent Phi trio

Open source software under the MIT license

SynthID: what it is and how it works

LinkedIn invited my AI “co-founder” to give a corporate talk and then banned him

‘Uncanny Valley’: Nvidia’s ‘Super Bowl of AI’, Tesla Disappoints and Meta’s VR Metaverse ‘Shutdown’

Straightforward Abacus AI Review and Price: Artificial Intelligence That Lets You Code in Vibe, Build Agents, and Replace 10+ Tools?

Google is making changes to its browser agent team in the wake of the OpenClaw craze

More News

LinkedIn invited my AI “co-founder” to give a corporate talk and then banned him

‘Uncanny Valley’: Nvidia’s ‘Super Bowl of AI’, Tesla Disappoints and Meta’s VR Metaverse ‘Shutdown’

Google is making changes to its browser agent team in the wake of the OpenClaw craze

A novel game turns the H-1B visa system into a surreal simulation

SynthID: what it is and how it works

LinkedIn invited my AI “co-founder” to give a corporate talk and then banned him

‘Uncanny Valley’: Nvidia’s ‘Super Bowl of AI’, Tesla Disappoints and Meta’s VR Metaverse ‘Shutdown’