Microsoft, rack, chips
Image by Microsoft

Microsoft has unveiled two new chips tailored for AI workloads, enabling the tech giant to complete the puzzle of complete infrastructure systems delivery.

The first custom-designed silicon chip, the Microsoft Azure Maia 100, is optimized for artificial intelligence (AI) tasks.

With the new Maia 100 accelerators, Microsoft hopes to power its largest internal AI workloads. Future designs may be adopted according to feedback provided by OpenAI, which is developing popular ChatGPT and other large language models.

The Maia 100 includes 105 billion transistors, making it one of the largest chips built on 5nm process technology. For comparison, Nvidia’s top-notch chip H200 has 80 billion transistors but is built on the more advanced 4-nanometer node. AMD’s MI300A chip has 146 billion transistors.

maia

The other processor, Cobalt 100, is Arm-based, fit to run general-purpose compute workloads on the Microsoft Cloud. The Cobalt 100 CPU, the first generation in the series, is designed with great energy efficiency in mind, optimizing “performance per watt.” Microsoft’s data centers in Quincy, Washington, are the first to be powered by the new chip.

Microsoft touts that its 64-bit 128-core chip delivers an “up to 40 percent performance improvement over current generations of Azure Arm chips.”

“The chips will start to roll out early next year to Microsoft’s data centers, initially powering the company’s services such as Microsoft Copilot or Azure OpenAI Service,” the company announced in its annual Ignite conference.

Final piece of the puzzle

The newly designed chips enable Microsoft to offer complete infrastructure systems, which include silicon choices, software, servers, racks, and cooling systems designed from top to bottom. The systems “can be optimized with internal and customer workloads in mind.”

With this move, Microsoft wants to meet the exploding demand for efficient, scalable, and sustainable computing power to take advantage of the latest breakthroughs in AI and cloud technologies.

For quite some time, Microsoft has emphasized its data centers’ dependency on the availability of graphic processing units (GPUs) and other components. Before 2016, most layers of the Microsoft Cloud were bought off the shelf.

“Our devices are primarily manufactured by third-party contract manufacturers,” the latest quarterly report reads. “Some of our products contain certain components for which there are very few qualified suppliers. Extended disruptions at these suppliers could impact our ability to manufacture devices on time to meet consumer demand.”

Scott Guthrie, executive vice president of Microsoft’s Cloud + AI Group, explains that it’s important for them “to optimize and integrate every layer of the infrastructure stack to maximize performance, diversify our supply chain, and give customers infrastructure choice.”

maia cooling solution

Microsoft also co-designed software to work with the new hardware. The end goal for the company is an Azure hardware system that “offers maximum flexibility and can also be optimized for power, performance, sustainability or cost.”

That doesn’t mean that Microsoft is turning a cold cheek to Nvidia, the tech company that currently dominates the AI hardware market.

To complement its custom silicon efforts and “to provide more infrastructure options for customers,” Microsoft will offer new systems utilizing high-end Nvidia H100 Tensor Core GPUs and also will add the latest Nvidia H200 GPUs on the pallet next year.

Microsoft will also add competing AMD GPUs to the mix, specifically AMD MI300X.

“Customer obsession means we provide whatever is best for our customers, and that means taking what is available in the ecosystem as well as what we have developed,” said Rani Borkar, corporate vice president for Azure Hardware Systems and Infrastructure. “We will continue to work with all of our partners to deliver to the customer what they want.”

Microsoft is the last of the largest data center providers to announce its own chips, with Google and Amazon already developing their ARM counterparts.

Daniel Newman, CEO at The Futurum Group, sees Microsoft’s move as complementary to competing parts from Intel, Nvidia, or AMD, rather than a move to replace them.

“Also, this does follow closely to AWS’ strategy, which has been a success. However, note that this doesn’t happen overnight – this is a long-term investment for Microsoft, but I do see it being successful over time,” Newman posted on LinkedIn.

LEAVE A REPLY

Please enter your comment!
Please enter your name here