ASTRA-sim and Chakra: Enabling Software-Hardware Co-design Exploration for Distributed Machine Learning Platforms
Presenter Names: Tushar Krishna and William Won (Georgia Tech) Abstract: As Artificial Intelligence (AI) models are scaling at an unprecedented rate, Machine Learning (ML) execution heavily relies on Distributed ML over customized neural accelerator (e.g., GPU or TPU)-based High-Performance Computing (HPC) platforms connected via high-speed interconnects (e.g., NVLinks). Deep Neural Network (DNN) execution involves a complex […]