CS 295: SYSTEMS AND ML

CS 295: Systems and Machine Learning

Welcome to the graduate course on Systems and Machine Learning! This is a project-based seminar course covering topics on Systems for Machine Learning and Machine Learning for Systems. 

 

Machine learning is transforming several domains ranging from natural language processing to drug discovery today. One of the key factors that enabled rapid progress in ML/AI in recent years has been fast-evolving underlying hardware and software platforms. In this course, we will cover recent advancements in research and industry on machine learning systems that enabled the AI/ML revolution. Specific topics include domain-specific architectures, deep learning frameworks and compilers, networking and scheduling in deep learning clusters, etc. We will also discuss practical challenges in deploying such systems. In the second half of the course, we will explore how machine learning has been employed to tackle various networking and systems challenges such as Internet congestion control, adaptive bitrate selection in video streaming, flow prediction, etc. 

 

Instructor: Sangeetha Abdu Jyothi

Class Hours: Tue Thu 3:30 - 4:50 pm PT (fully asynchronous, discussions on Piazza)

Office Hours: Wed 3:30 - 4:30 pm PT or by appointment

Piazza: https://piazza.com/uci/winter2021/cs295/home Links to an external site.

Course Policies: Course Policies

Prerequisites: Understanding of basic concepts in machine learning and systems (taken at least one undergrad course in ML and (networking or distributed systems))

 

Videos are available on the "Modules" page.

 

Schedule: (more optional papers will be added as the quarter progresses)

Date Category Lecture Required Reading Optional
1/5/2021 Lec 1: Introduction
1/7/2021 Systems For ML Lec 2: Domain Specific Architectures In-Datacenter Performance Analysis of a Tensor Processing Unit Links to an external site. (ISCA'17) A Configurable Cloud-Scale DNN Processor for Real-Time AI Links to an external site. (ISCA'18)
1/12/2021 Systems For ML Lec 3: DL Frameworks TensorFlow Links to an external site. (OSDI'16)
PyTorch Links to an external site. (NeurIPS'19)
MXNet Links to an external site.
CNTK Links to an external site. (KDD'16)
1/14/2021 Systems For ML Lec 4: DL compilers TVM Links to an external site. (OSDI'18) Glow Links to an external site.
nGraph Links to an external site.
XLA Links to an external site.
TensorComprehensions Links to an external site.
1/19/2021 Systems For ML Lec 5: Networking Challenges in DNN training Parameter Server Links to an external site. (OSDI'14)
Horovod Links to an external site. (arXiv'18)
BytePS Links to an external site. (OSDI'20)
TicTac Links to an external site. (MLSys'19)
P3 Links to an external site. (MLSys'19)
1/21/2021 Systems For ML Lec 6: Cluster scheduling for DL workloads Gavel Links to an external site. (OSDI'20) AntMan Links to an external site. (OSDI'20)
Themis Links to an external site. (NSDI'20)
Gandiva Links to an external site. (OSDI'18)
1/26/2021 Systems For ML Lec 7: Automated ML: Hyperparameter Tuning and Neural Architectural Search (NAS) Hyperband Links to an external site. (JMLR'18)
Designing Neural Architectures using RL Links to an external site. (ICLR'17)
Cerebro Links to an external site. (VLDB'20)
Vizier Links to an external site. (KDD'17)
ASHA Links to an external site. (MLSys'20)
1/28/2021 Systems For ML Lec 8: Scaling Gradient Boosting and RL XGBoost Links to an external site. (KDD'16)
Asynchronous Methods for Deep Reinforcement Learning Links to an external site. (ICML'18)
RLlib Links to an external site. (ICML'18)
2/2/2021 Systems For ML Lec 9: ML at the Edge Towards Federated Learning at Scale: System Design Links to an external site. (MLSys'19)
2/4/2021 Systems For ML Lec 10: Challenges in Operational ML Systems Hidden Technical Debt in Machine Learning Systems Links to an external site. (NeurIPS'15)
Applied Machine Learning at Facebook: A Datacenter Infrastructure Perspective Links to an external site. (HPCA'18)
TFX Links to an external site. (KDD'17)
2/9/2021 ML for Systems Lec 11: Deep RL and Challenges in the real world Deep RL tutorial (background reading, not for review)
Links to an external site.
Challenges of Real-World Reinforcement Learning Links to an external site. (ICML'19)
2/11/2021 ML for Systems Lec 12: Deep RL for Congestion Control Aurora Links to an external site. (ICML'19)
2/16/2021 ML for Systems Lec 13: Deep RL for Video Streaming (ABR control) Pensieve Links to an external site. (SIGCOMM'17)
2/18/2021 ML for Systems Lec 14: Classic control + Learning
(congestion control & ABR)
Learning in-situ Links to an external site. (ATC'20)
Links to an external site.
Classic Meets Moder Links to an external site.n (SIGCOMM'20)
2/23/2021 ML for Systems Lec 15: Deep RL for Scheduling Placeto Links to an external site. (NeurIPS'19) Decima Links to an external site. (SIGCOMM'19)
RLScheduler Links to an external site. (SC'20)
2/25/2021 ML for Systems Lec 16: ML for Index Structures Learned Index Structures Links to an external site. (SIGMOD'18)
3/2/2021 ML for Systems Lec 17: ML for CDN cache replacement Learning Relaxed Belady Links to an external site. (NSDI'20)
3/4/2021 ML for Systems Lec 18: ML for flow prediction Flux Links to an external site. (NSDI'19)
3/9/2021 ML for Systems Lec 19: Interpretability of RL-based controllers Metis Links to an external site. (SIGCOMM'20)
3/11/2021 Final Project Presentations