In project headed by former Inflection chief, MAI-1 may have 500B parameters.
Microsoft is working on a new large-scale AI language model called MAI-1, which could potentially rival state-of-the-art models from Google, Anthropic, and OpenAI, according to a report by The Information. This marks the first time Microsoft has developed an in-house AI model of this magnitude since investing over $10 billion in OpenAI for the rights to reuse the startup’s AI models. OpenAI’s GPT-4 powers not only ChatGPT but also Microsoft Copilot.
FURTHER READING
DeepMind co-founder Mustafa Suleyman will run Microsoft’s new consumer AI unit
The development of MAI-1 is being led by Mustafa Suleyman, the former Google AI leader who recently served as CEO of the AI startup Inflection before Microsoft acquired the majority of the startup’s staff and intellectual property for $650 million in March. Although MAI-1 may build on techniques brought over by former Inflection staff, it is reportedly an entirely new large language model (LLM), as confirmed by two Microsoft employees familiar with the project.
With approximately 500 billion parameters, MAI-1 will be significantly larger than Microsoft’s previous open source models (such as Phi-3, which we covered last month), requiring more computing power and training data. This reportedly places MAI-1 in a similar league as OpenAI’s GPT-4, which is rumored to have over 1 trillion parameters (in a mixture-of-experts configuration) and well above smaller models like Meta and Mistral’s 70 billion parameter models.
The development of MAI-1 suggests a dual approach to AI within Microsoft, focusing on both small locally run language models for mobile devices and larger state-of-the-art models that are powered by the cloud. Apple is reportedly exploring a similar approach. It also highlights the company’s willingness to explore AI development independently from OpenAI, whose technology currently powers Microsoft’s most ambitious generative AI features, including a chatbot baked into Windows.
Apple releases eight small AI language models aimed at on-device use
Reportedly, the exact purpose of MAI-1 has not been determined (even within Microsoft), and its most ideal use will depend on its performance, according to one of The Information’s sources. To train the model, Microsoft has been allocating a large cluster of servers with Nvidia GPUs and compiling training data from various sources, including text generated by OpenAI’s GPT-4 and public Internet data.
Depending on the progress made in the coming weeks, The Information reports that Microsoft may preview MAI-1 as early as its Build developer conference later this month, as reported by one of the sources cited by the publication.