- This event has passed.
High Performance Machine Learning, Deep Learning, and Data Science
Tutorial
Speakers: Dhabaleswar K. (DK) Panda, Hari Subramoni, Arpan Jain, and Aamir Shafi (The Ohio State University)
Abstract:
Recent advances in Machine and Deep Learning (ML/DL) have led to many exciting challenges and opportunities. Modern ML/DL and Data Science frameworks including TensorFlow, PyTorch, and Dask have emerged that offer high- performance training and deployment for various types of ML models and Deep Neural Networks (DNNs). This tutorial provides an overview of recent trends in ML/DL and the role of cutting-edge hardware architectures and interconnects in moving the field forward. We will also present an overview of different DNN architectures and ML/DL frameworks with special focus on parallelization strategies for model training. We highlight new challenges and opportunities for communication runtimes to exploit high-performance CPU/GPU architectures to efficiently support large-scale distributed training. We also highlight some of our co-design efforts to utilize MPI for large-scale DNN training on cutting-edge CPU/GPU architectures available on modern HPC clusters. The tutorial covers training traditional ML models including— K-Means, linear regression, nearest neighbours—using the cuML framework accelerated using MVAPICH2-GDR. Also, the tutorial presents accelerating GPU-based Data Science applications using MPI4Dask, which is an MPI-based backend for Dask. Throughout the tutorial, we include hands-on exercises to enable attendees to gain first-hand experience of running distributed ML/DL training and Dask on a modern GPU cluster.
Bios:
Dhabaleswar K. (DK) Panda is a Professor of Computer Science and Engineering and University Distinguished Scholar at the Ohio State University. His research interests include parallel computer architecture, high-performance networking, Exascale computing, Big Data, Deep Learning, programming models, accelerators, high-performance file systems and storage, virtualization, and cloud computing. He has published over 500 papers in major journals and international conferences related to these research areas. Dr. Panda and his research group members have been doing extensive research on modern networking technologies including InfiniBand, High-Speed Ethernet, RDMA over Converged Enhanced Ethernet (RoCE), Omni-Path, and EFA. Dr. Panda and his team have been actively working on high-performance MPI and PGAS libraries (http://mvapich.cse.ohio-state.edu), Deep Learning libraries (http://hidl.cse.ohio-state.edu) and Big Data libraries (http://hibd.cse.ohio-state.edu). Dr. Panda has served (or serving) as Program Chair/Co-Chair/Vice-Chair of many international conferences. He is an IEEE Fellow and a member of ACM. More details are available at http://www.cse.ohio-state.edu/~panda.
Hari Subramoni received a Ph.D. degree in Computer Science from The Ohio State University, Columbus, OH, in 2013. He is a research scientist in the Department of Computer Science and Engineering at the Ohio State University, USA, since Sept 2015. His research interests include high-performance interconnects and protocols, parallel computer architecture, network-based computing, exascale computing, network topology aware computing, QoS, power-aware LAN-WAN communication, fault tolerance, virtualization, cloud computing, and deep learning. He has published over 70 papers in international journals and conferences related to these research areas. He has been actively involved in various professional activities in academic journals and conferences. Recently, Dr. Subramoni is doing research and working on design and development for MVAPICH2 (High-Performance MPI over InfiniBand, iWARP, and RoCE) and MVAPICH2-X (Hybrid MPI and PGAS (OpenSHMEM and UPC)) software packages. More details about Dr. Subramoni are available at http://www.cse.ohio-state.edu/~subramon.
Arpan Jain received his B.Tech. and M.Tech. degrees in Information Technology from ABV-IIITM, India. Currently, Arpan is working towards his Ph.D. degree in Computer Science and Engineering at The Ohio State University. His current research focus lies at the intersection of High-Performance Computing (HPC) libraries and Deep Learning (DL) frameworks. He is working on parallelization and distribution strategies for large-scale Deep Neural Network (DNN) training. He previously worked on speech analysis, time series modeling, hyperparameter optimization, and object recognition. He actively contributes to projects like HiDL (high-performance deep learning), MVAPICH2-GDR software, and LBANN deep learning framework. He is a member of IEEE. More details about Arpan are available at https://u.osu.edu/ jain.575.
Dr. Aamir Shafi is currently a Research Scientist in the Department of Computer Science & Engineering at the Ohio State University where he is involved in the High Performance Big Data project led by Dr. Dhabaleswar K. Panda. Dr. Shafi was a Fulbright Visiting Scholar at the Massachusetts Institute of Technology (MIT) in the 2010-2011 academic year where he worked with Prof. Charles Leiserson on the award-winning Cilk technology. Dr. Shafi received his PhD in Computer Science from the University of Portsmouth, UK in 2006. He got his Bachelors in Software Engineering degree from NUST, Pakistan in 2003. Dr. Shafi’s current research interests include architecting robust libraries and tools for Big Data computation with emphasis on Machine and Deep Learning applications. Dr. Shafi co-designed and co-developed a Java-based MPI-like library called MPJ Express. More details about Dr. Shafi are available from https://people.engineering.osu.edu/people/shafi.16.