DS4440 // Practical Neural Networks

Course description

This course is a hands-on introduction to modern neural network ("deep learning") tools and methods. The course will cover the fundamentals of neural networks, and introduce standard and new architectures: from simple feed forward networks to recurrent neural networks. We will cover stochastic gradient descent and backpropagation, along with related fitting techniques.

We will place a particular emphasis on using these technologies in practice, via modern toolkits. We will specifically be working with PyTorch, which provides a flexible framework for working with computation graphs. While PyTorch will be our toolkit of choice, the concepts of automatic differentiation and neural networks are not tied to this particular package, and a key objective in this class is to provide sufficient familarity with the methods and programming paradigms such that switching to new frameworks is no great obstacle. (This is particularly important given the rapid pace of development in the deep learning toolkit space.)

We will be doing something a bit different this semester, also: Focussing on projects related to interpretability, i.e., trying to understand how generative models manage to realize the functionality they offer. To this end, we will be using the National Deep Inference Fabric (NDIF), a Northeastern led project!

Grading

30%	Homeworks
5%	Participation (mostly in class exercises)
25%	Midterm
40%	Final project

Prerequisites

Prior exposure to machine learning is recommended. Working knowledge of Python required (or you must be willing to pick up rapidly as we go). Familiarity with linear algebra, (basic) calculus and probability will be largely assumed throughout, although we will also review some of these prequisites.

Homeworks

Homeworks will consist of both written and programming components. The latter will be completed in Python, often using PyTorch.

Late Policy. Homeworks that are one day late will be subject to a 20% penalty; two days incurs 50%. Homeworks more than two days late will not be accepted.

Mid-term

We will have a mid-term that covers foundations of deep learning.

Projects (on interpretability!)

A big component of this course will be your project. This should be completed in pairs. The project should concern some flavor of model interpretability (more on this soon) and will be done using NDIF.

Academic integrity policy

A commitment to the principles of academic integrity is essential to the mission of Northeastern University. The promotion of independent and original scholarship ensures that students derive the most from their educational experience and their pursuit of knowledge. Academic dishonesty violates the most fundamental values of an intellectual community and undermines the achievements of the entire University. For more information, please refer to the Academic Integrity Web page.

More specific to this class: It is fine to consult online resources for programming assignments (of course)---and ChatGPT (&etc)---but lifting a solution/implementation in its entirety is completely inappropriate. Moreover, you must list all sources (websites/URLs/language models) consulted for every homework; failing to do so will constitute a violation of academic integrity. You also must fully understand and be able to explain all code you use.

Shedule outline

Meeting	Topic(s)	readings	things due	lecture notes/etc
9/4 (W)	Course aims, expectations, logistics; Review of supervised learning / Perceptron / intro to colab	d2l: Introduction; The original (1957!) Perceptron manuscript	join the Piazza site!	Notes Intro/logistics slides Perceptron notebook
9/9 (M)	Preliminaries, Logistic Regression and Optimization via SGD	d2l: Preliminaries		Notes Notebook on losses and linear regression
9/11 (W)	Beyond Linear Models: The Multi-Layer Perceptron	d2l: MLPs (4.1)	HW 1 Due! (NOW DUE 9/13)	Notes Notebook on (non-linear) MLPs
9/16 (M)	Abstractions: Layers and Computation Graphs	d2l: Layers and blocks		Notes Notebook on computation graphs (and torch)
9/18 (W)	Backpropagation I	d2l: Autodiff; Backprop (Colah's blog); Learning representations by back-propagating errors (1986!)		Notes Notebook on backprop/auto-grad
9/23 (M)	Backpropagation II	d2l: Backprop		Notes Notebook: Layers & control flow
9/25 (W)	Optimizer matters: Training NNs in Practice	d2l: Optimization	HW 2 due!	Notes Notebook on optimizers
9/30 (M)	Learning continuous representations of discrete things: Embeddings	d2l: Word embeddings (14.1)		Notes Notebook implementing a (toy) word2vec Bonus notebook on NNSight stuff (may be useful for HW3)
10/2 (W)	Convolutional Neural Networks (CNNs)	d2l: CNNs (6.1)		Notes Notebook: A simple CNN example in torch
10/7 (M)	Stacking ConvNets, residual connections, and other tricks	d2l: CNNs (6.2 -- 6.5); Modern CNNs (7.1, 7.5 -- 7.7)		Notes Notebook: Deeper ConvNets Notebook: ConvNets for Text
10/9 (W)	Recurrent Neural Networks (RNNs) I	RNNs (8.1 and 8.4)/The Unreasonable Effectiveness of RNNs	HW 3 Due!	Notes Notebook on (toy) RNNs
10/14 (M)	No class (University holiday)
10/16 (W)	Recurrent Neural Networks (RNNs) II	d2l: RNNs (8.7) /The Unreasonable Effectiveness of RNNs		Notes Bonus notes on activation patching (useful for HW4) Notebook on gated RNNs
10/21 (M)	Transformers (+ self-supervision, contextualized word embeddings)	d2l, Ch11; Original transformers paper		Notes Notebook on self-attention
10/23 (W)	More Transformers and NLP Sesame Street characters; BERTology		HW 4 Due! (NOW DUE 10/25)	BERT and BERTology slides Notebook on building a BERTish model Notebook on BERTology
10/28 (M)	Midterm review			Midterm exercises
10/30 (W)	Midterm
11/4 (M)	Modern Language Modeling			Notes Slides on Instruction-tuning and RLHF
11/6 (W)	So how do these LLMs work, actually? (Interpretability; guest speakers Koyena Pal and Jaden Fiotto-Kaufman
11/11 (M)	No class (University holiday)
11/13 (W)	Class cancelled due to conference travel; work on proposals!		(Tentative) project proposals due 11/15; see here for details
11/18 (M)	Guest Lecture (Sanjana Ramprasad): Factuality and LLMs			Slides Annotating some summaries; Evaluating neural summarizers notebook
11/20 (W)	Diffusion Models	Step-by-Step Diffusion: An Elementary Tutorial	HW5 due!	Notes Notebook on Autoencoders Notebook on diffusion models
11/25 (M)	Ethical problems with generative models			Slides: Bias, robustness, and &etc Notebook on bias in GPT-2
11/27 (W)	No class (Fall break)
12/2 (M)	Dedicated project feedback and help
12/4 (W)	Project presentations!

HTML/CSS/JS used (and modified), with permission, courtesy of Prof. Alan Mislove