The Hard(ware) Part of AI - Awesome MLSS Newsletter

2nd Edition

Our teammate, Shashank, recently signed up for the AMD Developer Challenge - a 150K USD challenge (across all prizes) where you have to optimise key kernel software on AMD chipsets, for GenAI operations such as FP8 GEMM, Fused MOE amongst others.

Pretty sure most of us think of GPU training as setting device = cuda, since most of us have NVIDIA GPUs. Makes you wonder - why aren’t we able to train and deploy AI models on other GPUs and devices? 

More on this later. First, let’s get to the all important deadlines.

Upcoming Summer School Announcements

Applications for most of the following summer schools are closing in the next 10 days. Make sure to apply to them before the application deadline!

Title

Deadline

Dates

International AI Summer School — Grosseto, Italy

Apr 23 (early-bird)

Sep 21 – Sep 25

Advanced Course on Data Science & ML 2025 - Grosseto, Italy

Apr 23

June 09 - June 13

Advanced Course on AI & Neuroscience 2025 - Grosseto, Italy

Apr 23 (early-bird)

Sep 21 - Sep 24

Summer School on Data Science, Learning and Optimization 2025 -
Norcia, Italy

Apr 24

June 23 - June 27

Mathematics and Physics of Quantum Computing and Learning 2025 - Porquerolles, France

Apr 30

May 23 - May 28

UK Robotics Summer School 2025 - Edinburgh, UK

Apr 30

June 2 - June 6

AutoML Summer School 2025 - Tübingen, Germany

May 1 (early-bird)

June 10 - June 13

Oxford Machine Learning Summer School: MLx Fundamentals - Online

May 1

May 1 - May 9

Cambridge Ellis Unit Summer School on Probabilistic ML 2025 - Cambridge, UK

May 10

July 14 - July 18

Data Visualization Summer School 2025 - Genoa, Italy

May 18

July 7 - July 11

For the complete list, please visit our website

Some Research Highlights

Looking beyond stars, literally

A high school student won a 250K USD prize for developing an AI algorithm that could help us find 1.5 million new astronomical objects

Flexing Agentic Behaviour

Microsoft releases Debug-Gym, an environment to test and assist code-repairing agentic systems

What’s happening in AI?

NVIDIA is the green standard in AI chipset manufacture. We all know this. Most of us wrote our first AI related code and ran it on an NVIDIA GPU. The company has been working on AI for a long time, first releasing their AI Acceleration software CUDA in 2006.

A major breakthrough came in 2012 when AlexNet, one of the most significant models in deep learning history was trained on their GPUs. According to their own blog, a similar model designed by Google needed 2000 CPU based servers, while AlexNet needed only 12 NVIDIA GPUs.

If NVIDIA GPUs were the match, CUDA was the kerosene - it saw extreme adoption from researchers and industry alike, including popular libraries like PyTorch, Tensorflow, and HuggingFace Transformers.

However, quite a few challengers are ready to diversify the market.

Intel released the Gaudi Accelerator series, which are more price efficient than standard GPUs. Adoption has been slow, with sources indicating they have been missing targets.

AWS has created the Trainium and Inferentia chipsets, for training and inference on their platform as a cost efficient alternative. It’s also easy to use in the AWS ecosystem.

Google has been working on their Tensor Processing Unit for a while now, with Ilya Sutskever’s Safe Superintelligence committing to using TPUs for their AI training, among many other companies.

There are also several startups developing ASICs (Application Specific Integrated Circuits).

Groq created the Language Processing Unit, which provides 18x faster inference specifically on Large Language Models. Sambanova released the Reconfigurable Dataflow Unit which holds multiple models and switches between them 100x faster than GPUs for agentic systems. Cerebras is also innovating in this space with the Wafer Scale Engine.

Now, let’s be clear. This is not by far an exhaustive list - but it does tell us that there are several companies and upstarts innovating in the AI chip sector to reduce cost of AI.

You might be wondering - why does this matter to me? As foundation models get trained on ever larger compute clusters with thousands of GPUs, their applications now require deployments on ever smaller chips.

Each chip has a different philosophy. Some are general (but then more expensive), or more targeted for applications e.g. LLM inference (and therefore more inflexible). Then comes the question of each chipset’s software ecosystem, and how well it works with your specific use case. Of course, there’s also the matter of compute cost.

This affects both researchers and industry alike! The AMD Developer Challenge we mentioned earlier focuses on implementing fast and efficient inference algorithms - akin to Deep Seek’s newly released DeepGEMM library, for achieving optimal usage of hardware resources when performing operations involved in deploying large models on consumer devices.

With all the research on models, some of us might have neglected the innovation in hardware. Hopefully, this served as a good overview of what’s happening in AI!

Awesome Machine Learning Summer Schools is a non-profit organisation that keeps you updated on ML Summer Schools and their deadlines. Simple as that.

Have any questions or doubts? Drop us an email! We would be more than happy to talk to you.

With love, Awesome MLSS

Reply

or to participate.