This course is a hands-on introduction to modern neural network ("deep learning") tools and methods. The course will cover the fundamentals of neural networks, and introduce standard and new architectures: from simple feed forward networks to recurrent neural networks. We will cover stochastic gradient descent and backpropagation, along with related fitting techniques.
The course will have a particular emphasis on using these technologies in practice, via modern toolkits. We will specifically be working with PyTorch, which provides a flexible framework for working with computation graphs. While PyTorch will be our toolkit of choice, the concepts of automatic differentiation and neural networks are not tied to this particular package, and a key objective in this class is to provide sufficient familarity with the methods and programming paradigms such that switching to new frameworks is no great obstacle. (This is particularly important given the rapid pace of development in the deep learning toolkit space.)
We will introduce now-standard neural network architectures for data of various types, including images and text. This iteration will have a bit of a bias toward the latter, reflecting instructor biases.
40% | Homeworks |
5% | In class exercises |
15% | Midterm topic "survey" |
40% | Final project |
Prior exposure to machine learning is recommended. Working knowledge of Python required (or you must be willing to pick up rapidly as we go). Familiarity with linear algebra, (basic) calculus and probability will be largely assumed throughout, although we will also review some of these prequisites.
Homeworks will consist of both written and programming components. The latter will be completed in Python, often using PyTorch.
Late Policy. Homeworks that are one day late will be subject to a 20% penalty; two days incurs 50%. Homeworks more than two days late will not be accepted.
Typically this class includes an in-class midterm. Given the pandemic and remote nature of this offering, we're going to try something different this year. You are asked to survey the literature/methods on a particular "topic" (or "task") of interest. This will culminate in a brief write-up and in-class presentation explaining the task, dataset, methods. Ideally this will motivate your final project, although this is not required. Details on this assignment are available here: Midterm survey details. Please do not hesitate to post to Piazza or reach out directly with questions (the former encouraged so others may benefit).
A big component of this course will be your project. This will be completed individually. The project might entail building on top of what you learned in your topic survey -- e.g., perhaps you surveyed the area of automatic translation using neural models; your project might then be to re-evaluate a state-of-the-art approach, or to reproduce the results reported in recent papers, etc. This project will be broken down into several graded deliverables, and culminate in a project report and final presentation in class to your peers. Here are additional project details.
A commitment to the principles of academic integrity is essential to the mission of Northeastern University. The promotion of independent and original scholarship ensures that students derive the most from their educational experience and their pursuit of knowledge. Academic dishonesty violates the most fundamental values of an intellectual community and undermines the achievements of the entire University. For more information, please refer to the Academic Integrity Web page.
More specific to this class: It is fine to consult online resources for programming assignments (of course), but lifting a solution/implementation in its entirety is completely inappropriate. Moreover, you must list all sources (websites/URLs) consulted for every homework; failing to do so will constitute a violation of academic integrity.
Meeting | Topic(s) | readings | things due | lecture notes/etc |
9/10 | Course aims, expectations, logistics; Review of supervised learning / Perceptron / intro to colab | d2l: Introduction | join the Piazza site! | Intro/logistics slides; Notes; Perceptron notebook |
9/14 | Preliminaries, Logistic Regression and Optimization via SGD | d2l: Preliminaries | Notes; In-class gradient exercise starter; Notebook on Linear Regression via Gradient Descent | |
9/17 | Beyond Linear Models: The Multi-Layer Perceptron | d2l: MLPs (4.1) | HW 1 Due! | Notes on MLPs; Notebook on (non-linear) MLPs; Notes on metrics; Notebook on metrics |
9/21 | Abstractions: Layers and Computation Graphs | d2l: Layers and blocks | Notes; Notebook on computation graphs; In class exercise starter (see notes) | |
9/24 | Backpropagation I | d2l: Autodiff; Backprop (Colah's blog) | Notes; Notebook; In class exercise on backprop (see notes) | |
9/28 | Backpropagation II | d2l: Backprop | Notes; Notebook: Wacky custom layer exercise | |
10/1 | Optimizer matters: Training NNs in Practice | d2l: Optimization | HW 2 Due! | Notes; Notebook; (Overly audacious) in-class exercise on (custom) optimizers in torch |
10/5 | Learning continuous representations of discrete things: Embeddings | d2l: Word embeddings (14.1) | Notes; Notebook: CBoW w2v | |
10/8 | Convolutional Neural Networks (CNNs) I | d2l: CNNs (6.1) | Notes; Notebook: A simple example in torch | |
10/12 | No class (holiday) | |||
10/15 | Convolutional Neural Networks (CNNs) II | d2l: CNNs (6.2 -- 6.5); Modern CNNs (7.1, 7.5 -- 7.7) | Notes; Notebook: ConvNets in action! | |
10/19 | Recurrent Neural Networks (RNNs) I | RNNs (8.1 and 8.4)/The Unreasonable Effectiveness of RNNs | HW 3 Due! | Notes; Notebook: RNNs (intro) |
10/22 | Recurrent Neural Networks (RNNs) II | d2l: RNNs (8.7) /The Unreasonable Effectiveness of RNNs | Notes; Notebook: More fun with RNNs; Notebook: Character RNN to generate Shakespeare | |
10/26 | Transformer Networks (+ Self-Supervision and Contextualized Word Embeddings) | d2l: Transformers | Notes; Notebook: Self-Attention/Transformersish (1) | |
10/29 | More Transformers --> BERT; + Neural Sequence Tagging | Notes; Notebook: Training a BERTish model; Official PyTorch tutorial on BiLSTM-CRFs | ||
11/2 | Midterm topic survey presentations | Topic survey write-ups due! | ||
11/5 | Sequence-to-Sequence Models 1 | Notes; Notebook: Seq2Seq for learning to "add" | ||
11/9 | Sequence-to-Sequence Models 2 | d2l: Encoder-Decoder (seq2seq) | HW 4 Due! | Notes; Attention in Seq2Seq models (start) |
11/12 | Summarization Models (guest: PhD student Jered McInerney) | Slides (from Jered); Code | ||
11/16 | Auto-Encoders | Auto-encoders; Intuitive VAEs | Project proposals due! | Notes; Notebook: t-SNE in sklearn; Notebook: Autoencoders |
11/19 | Ethics and Bias 1 | Fairness in ML | Notes; Slides | |
11/23 | Ethics and Bias 2 | Fairness in machine learning: against false positive rate equality as a measure of fairness | ||
11/26 | No class (thanksgiving) | |||
11/30 | Interpretability (Guest: PhD student Sarthak Jain) | HW 5 Due! | Slides (from Sarthak) | |
12/3 | Active Learning and Augmentation (Guest: PhD student and TA David Lowell) | Slides | ||
12/7 | Final project presentations/discussion |