DS4440 // practical neural networks // fall 2024
Course details
Instructor
Byron Wallace
Office: 2208 177 Huntington
Office hours: Mondays 1-2p or by request
2208 177 Huntington or Zoom b.wallace@northeastern.edu
 
TA
Sanjana Ramprasad
Office Hours:
M, W 5-6p
CoLabV in Snell
(watch Piazza for changes)
Also by request! ramprasad.sa@northeastern.edu

Time / Location

MW 2:50 - 4:30 pm / Snell Library 123
Piazza
Here is a link to the course Piazza site.
Books & Resources


Dive into Deep Learning
This is the main reference book for the class. It is online and free.


Course description

This course is a hands-on introduction to modern neural network ("deep learning") tools and methods. The course will cover the fundamentals of neural networks, and introduce standard and new architectures: from simple feed forward networks to recurrent neural networks. We will cover stochastic gradient descent and backpropagation, along with related fitting techniques.

We will place a particular emphasis on using these technologies in practice, via modern toolkits. We will specifically be working with PyTorch, which provides a flexible framework for working with computation graphs. While PyTorch will be our toolkit of choice, the concepts of automatic differentiation and neural networks are not tied to this particular package, and a key objective in this class is to provide sufficient familarity with the methods and programming paradigms such that switching to new frameworks is no great obstacle. (This is particularly important given the rapid pace of development in the deep learning toolkit space.)

We will be doing something a bit different this semester, also: Focussing on projects related to interpretability, i.e., trying to understand how generative models manage to realize the functionality they offer. To this end, we will be using the National Deep Inference Fabric (NDIF), a Northeastern led project!

Grading
30%Homeworks
5%Participation (mostly in class exercises)
25%Midterm
40%Final project
Prerequisites

Prior exposure to machine learning is recommended. Working knowledge of Python required (or you must be willing to pick up rapidly as we go). Familiarity with linear algebra, (basic) calculus and probability will be largely assumed throughout, although we will also review some of these prequisites.

Homeworks

Homeworks will consist of both written and programming components. The latter will be completed in Python, often using PyTorch.

Late Policy. Homeworks that are one day late will be subject to a 20% penalty; two days incurs 50%. Homeworks more than two days late will not be accepted.

Mid-term

We will have a mid-term that covers foundations of deep learning.

Projects (on interpretability!)

A big component of this course will be your project. This should be completed in pairs. The project should concern some flavor of model interpretability (more on this soon) and will be done using NDIF.

Academic integrity policy

A commitment to the principles of academic integrity is essential to the mission of Northeastern University. The promotion of independent and original scholarship ensures that students derive the most from their educational experience and their pursuit of knowledge. Academic dishonesty violates the most fundamental values of an intellectual community and undermines the achievements of the entire University. For more information, please refer to the Academic Integrity Web page.

More specific to this class: It is fine to consult online resources for programming assignments (of course)---and ChatGPT (&etc)---but lifting a solution/implementation in its entirety is completely inappropriate. Moreover, you must list all sources (websites/URLs/language models) consulted for every homework; failing to do so will constitute a violation of academic integrity. You also must fully understand and be able to explain all code you use.

Shedule outline

MeetingTopic(s)readingsthings duelecture notes/etc
9/4 (W)Course aims, expectations, logistics; Review of supervised learning / Perceptron / intro to colab d2l: Introduction; The original (1957!) Perceptron manuscriptjoin the Piazza site!Notes Intro/logistics slides Perceptron notebook
9/9 (M)Preliminaries, Logistic Regression and Optimization via SGDd2l: Preliminaries Notes Notebook on losses and linear regression
9/11 (W)Beyond Linear Models: The Multi-Layer Perceptrond2l: MLPs (4.1)HW 1 Due! (NOW DUE 9/13) Notes Notebook on (non-linear) MLPs
9/16 (M)Abstractions: Layers and Computation Graphsd2l: Layers and blocks Notes Notebook on computation graphs (and torch)
9/18 (W)Backpropagation Id2l: Autodiff; Backprop (Colah's blog); Learning representations by back-propagating errors (1986!) Notes Notebook on backprop/auto-grad
9/23 (M)Backpropagation IId2l: Backprop Notes Notebook: Layers & control flow
9/25 (W)Optimizer matters: Training NNs in Practiced2l: OptimizationHW 2 due! Notes Notebook on optimizers
9/30 (M)Learning continuous representations of discrete things: Embeddingsd2l: Word embeddings (14.1) Notes Notebook implementing a (toy) word2vec Bonus notebook on NNSight stuff (may be useful for HW3)
10/2 (W)Convolutional Neural Networks (CNNs)d2l: CNNs (6.1) Notes Notebook: A simple CNN example in torch
10/7 (M)Stacking ConvNets, residual connections, and other tricksd2l: CNNs (6.2 -- 6.5); Modern CNNs (7.1, 7.5 -- 7.7) Notes Notebook: Deeper ConvNets Notebook: ConvNets for Text
10/9 (W)Recurrent Neural Networks (RNNs) IRNNs (8.1 and 8.4)/The Unreasonable Effectiveness of RNNs HW 3 Due! Notes Notebook on (toy) RNNs
10/14 (M)No class (University holiday)
10/16 (W)Recurrent Neural Networks (RNNs) IId2l: RNNs (8.7) /The Unreasonable Effectiveness of RNNs Notes Bonus notes on activation patching (useful for HW4) Notebook on gated RNNs
10/21 (M)Transformers (+ self-supervision, contextualized word embeddings)d2l, Ch11; Original transformers paper Notes Notebook on self-attention
10/23 (W)More Transformers and NLP Sesame Street characters; BERTologyHW 4 Due! (NOW DUE 10/25) BERT and BERTology slides Notebook on building a BERTish model Notebook on BERTology
10/28 (M)Midterm review Midterm exercises
10/30 (W)Midterm
11/4 (M)Modern Language Modeling Notes Slides on Instruction-tuning and RLHF
11/6 (W)So how do these LLMs work, actually? (Interpretability; guest speakers Koyena Pal and Jaden Fiotto-Kaufman
11/11 (M)No class (University holiday)
11/13 (W)Class cancelled due to conference travel; work on proposals!(Tentative) project proposals due 11/15; see here for details
11/18 (M)Guest Lecture (Sanjana Ramprasad): Factuality and LLMs Slides Annotating some summaries; Evaluating neural summarizers notebook
11/20 (W)Diffusion ModelsStep-by-Step Diffusion: An Elementary TutorialHW5 due!Notes Notebook on Autoencoders Notebook on diffusion models
11/25 (M)Ethical problems with generative models Slides: Bias, robustness, and &etc Notebook on bias in GPT-2
11/27 (W)No class (Fall break)
12/2 (M)Dedicated project feedback and help
12/4 (W)Project presentations!

HTML/CSS/JS used (and modified), with permission, courtesy of Prof. Alan Mislove