- awesome MLSS Newsletter
- Posts
- The Hard(ware) Part of AI - Awesome MLSS Newsletter
The Hard(ware) Part of AI - Awesome MLSS Newsletter
2nd Edition

Our teammate, Shashank, recently signed up for the AMD Developer Challenge - a 150K USD challenge (across all prizes) where you have to optimise key kernel software on AMD chipsets, for GenAI operations such as FP8 GEMM, Fused MOE amongst others.
Pretty sure most of us think of GPU training as setting device = cuda
, since most of us have NVIDIA GPUs. Makes you wonder - why aren’t we able to train and deploy AI models on other GPUs and devices?
More on this later. First, let’s get to the all important deadlines.
Upcoming Summer School Announcements
Applications for most of the following summer schools are closing in the next 10 days. Make sure to apply to them before the application deadline!
Title | Deadline | Dates |
Apr 23 (early-bird) | Sep 21 – Sep 25 | |
Apr 23 | June 09 - June 13 | |
Apr 23 (early-bird) | Sep 21 - Sep 24 | |
Summer School on Data Science, Learning and Optimization 2025 - | Apr 24 | June 23 - June 27 |
Mathematics and Physics of Quantum Computing and Learning 2025 - Porquerolles, France | Apr 30 | May 23 - May 28 |
Apr 30 | June 2 - June 6 | |
May 1 (early-bird) | June 10 - June 13 | |
Oxford Machine Learning Summer School: MLx Fundamentals - Online | May 1 | May 1 - May 9 |
Cambridge Ellis Unit Summer School on Probabilistic ML 2025 - Cambridge, UK | May 10 | July 14 - July 18 |
May 18 | July 7 - July 11 |
For the complete list, please visit our website
Some Research Highlights
Looking beyond stars, literally A high school student won a 250K USD prize for developing an AI algorithm that could help us find 1.5 million new astronomical objects |
Flexing Agentic Behaviour Microsoft releases Debug-Gym, an environment to test and assist code-repairing agentic systems | ![]() |
What’s happening in AI?
NVIDIA is the green standard in AI chipset manufacture. We all know this. Most of us wrote our first AI related code and ran it on an NVIDIA GPU. The company has been working on AI for a long time, first releasing their AI Acceleration software CUDA in 2006.
A major breakthrough came in 2012 when AlexNet, one of the most significant models in deep learning history was trained on their GPUs. According to their own blog, a similar model designed by Google needed 2000 CPU based servers, while AlexNet needed only 12 NVIDIA GPUs.
If NVIDIA GPUs were the match, CUDA was the kerosene - it saw extreme adoption from researchers and industry alike, including popular libraries like PyTorch, Tensorflow, and HuggingFace Transformers.
However, quite a few challengers are ready to diversify the market.
AMD released their own chipset to rival Nvidia’s Blackwell chips in Oct 2024. We are already seeing datacenter sales hit 1 Billion USD each quarter, so there is a lot of adoption.
Intel released the Gaudi Accelerator series, which are more price efficient than standard GPUs. Adoption has been slow, with sources indicating they have been missing targets.
AWS has created the Trainium and Inferentia chipsets, for training and inference on their platform as a cost efficient alternative. It’s also easy to use in the AWS ecosystem.
Google has been working on their Tensor Processing Unit for a while now, with Ilya Sutskever’s Safe Superintelligence committing to using TPUs for their AI training, among many other companies.
There are also several startups developing ASICs (Application Specific Integrated Circuits).
Groq created the Language Processing Unit, which provides 18x faster inference specifically on Large Language Models. Sambanova released the Reconfigurable Dataflow Unit which holds multiple models and switches between them 100x faster than GPUs for agentic systems. Cerebras is also innovating in this space with the Wafer Scale Engine.
Now, let’s be clear. This is not by far an exhaustive list - but it does tell us that there are several companies and upstarts innovating in the AI chip sector to reduce cost of AI.
You might be wondering - why does this matter to me? As foundation models get trained on ever larger compute clusters with thousands of GPUs, their applications now require deployments on ever smaller chips.
Each chip has a different philosophy. Some are general (but then more expensive), or more targeted for applications e.g. LLM inference (and therefore more inflexible). Then comes the question of each chipset’s software ecosystem, and how well it works with your specific use case. Of course, there’s also the matter of compute cost.
This affects both researchers and industry alike! The AMD Developer Challenge we mentioned earlier focuses on implementing fast and efficient inference algorithms - akin to Deep Seek’s newly released DeepGEMM library, for achieving optimal usage of hardware resources when performing operations involved in deploying large models on consumer devices.
With all the research on models, some of us might have neglected the innovation in hardware. Hopefully, this served as a good overview of what’s happening in AI!
Awesome Machine Learning Summer Schools is a non-profit organisation that keeps you updated on ML Summer Schools and their deadlines. Simple as that.
Have any questions or doubts? Drop us an email! We would be more than happy to talk to you.
Reply