Profiling in PyTorch (Part 1): A Beginner’s Guide to torch.profiler

What you cannot profile, you cannot optimize.

Whether you are trying to squeeze more tokens per second out of a Large Language Model (LLM), shave milliseconds off inference, or just understand why your training loop runs slower than the spec sheet promises, the path eventually runs through profiling.

The catch is that profiling has a steep on-ramp. The traces are dense walls of colored rectangles. The events carry intimidating names. Most tutorials assume you can already read them. So even when we know we should be profiling, opening a trace can feel like a chore best left for later (or for someone else). This post, and the series it kicks off, is our attempt to lower that on-ramp.

This is the opening post of Profiling in PyTorch, a series where we slowly build the skill of reading profiler traces and use it to drive optimization. The plan:

Part 1 (this post): start with the simplest possible operation, a matrix multiplication followed by a bias add, and learn how to read what the profiler hands back.
Part 2: scale up to nn.Linear and a small MLP, use the

To finish reading, please visit source site