This course is a hands-on introduction to modern neural network ("deep learning") tools and methods. The course will cover the fundamentals of neural networks, and introduce standard and new architectures: from simple feed forward networks to recurrent neural networks. We will cover stochastic gradient descent and backpropagation, along with related fitting techniques.
We will place a particular emphasis on using these technologies in practice, via modern toolkits. We will specifically be working with PyTorch, which provides a flexible framework for working with computation graphs. While PyTorch will be our toolkit of choice, the concepts of automatic differentiation and neural networks are not tied to this particular package, and a key objective in this class is to provide sufficient familarity with the methods and programming paradigms such that switching to new frameworks is no great obstacle. (This is particularly important given the rapid pace of development in the deep learning toolkit space.)
We will be doing something a bit different this semester, also: Focussing on projects related to interpretability, i.e., trying to understand how generative models manage to realize the functionality they offer. To this end, we will be using the National Deep Inference Fabric (NDIF), a Northeastern led project!
30% | Homeworks |
5% | Participation (mostly in class exercises) |
25% | Midterm |
40% | Final project |
Prior exposure to machine learning is recommended. Working knowledge of Python required (or you must be willing to pick up rapidly as we go). Familiarity with linear algebra, (basic) calculus and probability will be largely assumed throughout, although we will also review some of these prequisites.
Homeworks will consist of both written and programming components. The latter will be completed in Python, often using PyTorch.
Late Policy. Homeworks that are one day late will be subject to a 20% penalty; two days incurs 50%. Homeworks more than two days late will not be accepted.
We will have a mid-term that covers foundations of deep learning.
A big component of this course will be your project. This should be completed in pairs. The project should concern some flavor of model interpretability (more on this soon) and will be done using NDIF.
A commitment to the principles of academic integrity is essential to the mission of Northeastern University. The promotion of independent and original scholarship ensures that students derive the most from their educational experience and their pursuit of knowledge. Academic dishonesty violates the most fundamental values of an intellectual community and undermines the achievements of the entire University. For more information, please refer to the Academic Integrity Web page.
More specific to this class: It is fine to consult online resources for programming assignments (of course)---and ChatGPT (&etc)---but lifting a solution/implementation in its entirety is completely inappropriate. Moreover, you must list all sources (websites/URLs/language models) consulted for every homework; failing to do so will constitute a violation of academic integrity. You also must fully understand and be able to explain all code you use.
Meeting | Topic(s) | readings | things due | lecture notes/etc |
9/4 (W) | Course aims, expectations, logistics; Review of supervised learning / Perceptron / intro to colab | d2l: Introduction; The original (1957!) Perceptron manuscript | join the Piazza site! | Notes Intro/logistics slides Perceptron notebook |
9/9 (M) | Preliminaries, Logistic Regression and Optimization via SGD | d2l: Preliminaries | Notes Notebook on losses and linear regression | |
9/11 (W) | Beyond Linear Models: The Multi-Layer Perceptron | d2l: MLPs (4.1) | HW 1 Due! (NOW DUE 9/13) | Notes Notebook on (non-linear) MLPs |
9/16 (M) | Abstractions: Layers and Computation Graphs | d2l: Layers and blocks | Notes Notebook on computation graphs (and torch) | |
9/18 (W) | Backpropagation I | d2l: Autodiff; Backprop (Colah's blog); Learning representations by back-propagating errors (1986!) | Notes Notebook on backprop/auto-grad | |
9/23 (M) | Backpropagation II | d2l: Backprop | Notes Notebook: Layers & control flow | |
9/25 (W) | Optimizer matters: Training NNs in Practice | d2l: Optimization | HW 2 due! | Notes Notebook on optimizers |
9/30 (M) | Learning continuous representations of discrete things: Embeddings | d2l: Word embeddings (14.1) | Notes Notebook implementing a (toy) word2vec Bonus notebook on NNSight stuff (may be useful for HW3) | |
10/2 (W) | Convolutional Neural Networks (CNNs) | d2l: CNNs (6.1) | Notes Notebook: A simple CNN example in torch | |
10/7 (M) | Stacking ConvNets, residual connections, and other tricks | d2l: CNNs (6.2 -- 6.5); Modern CNNs (7.1, 7.5 -- 7.7) | Notes Notebook: Deeper ConvNets Notebook: ConvNets for Text | |
10/9 (W) | Recurrent Neural Networks (RNNs) I | RNNs (8.1 and 8.4)/The Unreasonable Effectiveness of RNNs | HW 3 Due! | Notes Notebook on (toy) RNNs |
10/14 (M) | No class (University holiday) | |||
10/16 (W) | Recurrent Neural Networks (RNNs) II | d2l: RNNs (8.7) /The Unreasonable Effectiveness of RNNs | Notes Bonus notes on activation patching (useful for HW4) Notebook on gated RNNs | |
10/21 (M) | Transformers (+ self-supervision, contextualized word embeddings) | d2l, Ch11; Original transformers paper | Notes Notebook on self-attention | |
10/23 (W) | More Transformers and NLP Sesame Street characters; BERTology | HW 4 Due! (NOW DUE 10/25) | BERT and BERTology slides Notebook on building a BERTish model Notebook on BERTology | |
10/28 (M) | Midterm review | Midterm exercises | ||
10/30 (W) | Midterm | |||
11/4 (M) | Modern Language Modeling | Notes Slides on Instruction-tuning and RLHF | ||
11/6 (W) | So how do these LLMs work, actually? (Interpretability; guest speakers Koyena Pal and Jaden Fiotto-Kaufman | |||
11/11 (M) | No class (University holiday) | |||
11/13 (W) | Class cancelled due to conference travel; work on proposals! | (Tentative) project proposals due 11/15; see here for details | ||
11/18 (M) | Guest Lecture (Sanjana Ramprasad): Factuality and LLMs | Slides Annotating some summaries; Evaluating neural summarizers notebook | ||
11/20 (W) | Diffusion Models | Step-by-Step Diffusion: An Elementary Tutorial | HW5 due! | Notes Notebook on Autoencoders Notebook on diffusion models |
11/25 (M) | Ethical problems with generative models | Slides: Bias, robustness, and &etc Notebook on bias in GPT-2 | ||
11/27 (W) | No class (Fall break) | |||
12/2 (M) | Dedicated project feedback and help | |||
12/4 (W) | Project presentations! |