PyTorch From Scratch (Part 0)
This semester at CMU, I'm taking a class on Deep Learning Systems (10-714) with Dr. Tianqi Chen and Dr. Tim Dettmers. Dr.Chen is the creator of XGBoost and TVM, and Dr. Dettmers is the creator of bitsandbytes and QLoRA. Needless to say, I'm pretty excited to be learning from them.
The class covers the fundamentals of deep learning systems, and covers how one might build a PyTorch clone from scratch. This is something I've been interested in for a while, and I thought it might be fun to document my journey here from a student's perspective. Since it is constrained to a one-semester class, we can't cover everything, but I'm hoping to keep this series going so that I can eventually build and share a more complete version.
For a more complete system that I think is also a fantastic resource, checkout TinyGrad! It is much more condensed than PyTorch, and much more expansive than what we are doing in class. It is also written in Python, and is a great resource for learning about how deep learning frameworks work under the hood. You can find it here. It is actually what inspired me to even take this class!
For the actual lecture content, I won't be rehashing it in my posts, as I would be a poor imitation of the professors. If interested, you can find pdfs / youtube videos here here.
Instead, I'll be going over the actual implementation, as well as breaking down the design as simply as possible. For the time being, I'll be building off of what we develop in class, including the starter code, but I'm hoping to eventually "restart" and build it entirely from scratch to better represent / document everything for those interested.