DS4420 // Machine Learning 2

Course description

This course aims to deepen your undrestanding of machine learning, with an emphasis on modern modeling and estimation methods in the first half of the course, and on cutting-edge applications in the second half.

Covid-19 and Sp22

Unfortunately, we are still in a pandemic. The class is in-person, but do not come to lecture if you sick. While I do not offer a lecture livestream, lecture notes will be posted immediately after classes, and I will be holding special office hours on Wednesdays at 6:00p via Zoom where I can review content presented over the same week. This is not intended as a replacement for lecture, but as a means to keep up with the course asynchronously if you are in isolation. "In-class" exercises can also be completed remotely.

Grading

35%	Homeworks
5%	"In class" exercises
25%	Mid-term
35%	Final project

Prerequisites

I assume you have taken ML1 (DS4400) or equivalent. Working knowledge of Python required (or you must be willing to pick up rapidly as we go).

Homeworks

Homeworks will consist of both written and programming components. The latter will be completed in Python, using a mix of standard libraries (numpy, pytorch, etc.). Homeworks are to be completed on your own; please see academic integrity policy below. Homeworks should be submitted via Canvas; specifically you should submit a link to your completed Colab notebook; a late penalty (see below) will apply if the "last edited" date on this notebook is later than the submission deadline.

Late Policy. Homeworks that are one day late will be subject to a 20% penalty; two days incurs 50%. Homeworks more than two days late will not be accepted. To allow flexibility, your lowest homework score will be dropped.

In-class exercises

On occasion we will have interactive exercises to be done in class. These will be graded essentially based on participation only; you will submit a link to your effort via Canvas, and you will receive credit so long as you have made some sort of good faith effort. If you are sick, you can complete these remotely. Note that your lowest will be dropped (so if you miss one, it will not count against you).

Midterm

The midterm will be given in class, and will be testing for understanding of the core material presented in the course regarding the fundamentals covered in the first half of the course.

Projects

A large component of this course will be your project, which will involve picking a particular dataset on which to implement, train and evaluate machine learning models. Projects will be completed in teams of two; you can select a team-member, or we can assign at random. This project will be broken down into several graded deliverables, and culminate in a report and final presentation in class to your peers.

Here is an outline of the project expectations, (tentative) dates, etc.

Academic integrity policy

A commitment to the principles of academic integrity is essential to the mission of Northeastern University. The promotion of independent and original scholarship ensures that students derive the most from their educational experience and their pursuit of knowledge. Academic dishonesty violates the most fundamental values of an intellectual community and undermines the achievements of the entire University. For more information, please refer to the Academic Integrity Web page.

More specific to this class: It is fine to consult online resources for programming assignments, but lifting a solution/implementation in its entirety is completely inappropriate (nor is simply changing variable names sufficient!). Moreover, you must list all sources (websites/URLs) consulted for every homework; failing to do so will constitute a violation of academic integrity. In general, you must also be able to explain whatever code you use. Do not share code amongst yourselves; solutions that are practically identical will be considered a violation of academic integrity and reported as such (I really hate doing this, but will). We will check for this --- don't do it!

Shedule outline

Keep an eye on this outline because topics may change in some cases, especially later in the semester! Lecture notes will be posted after class.

Note: If you have to miss class due to illness, you can attend an informal review session with me on Wednesdays at 5p 6p via Zoom https://northeastern.zoom.us/j/92275428948 (try and review lecture notes first).

Meeting	Topic(s)	readings	things due	lecture notes/etc
1/19 (w)	Logistics, overview, unifying themes			Notes; Colab notebook (includes in-class exercise)
1/24 (m)	A probabilistic view of ML	Math for ML, Part 1: 5-5.5 (background), 6.1-6.5		Notes; Colab Notebook
1/26 (w)	Bayesian linear regression; graphical models	Math for ML, 6.6; 8.3-8.5		Notes; Notebook (with Bayesian LR exercise!)
1/31 (m)	Conjugacy; Discrete data distributions; Naive Bayes as a graphical model; Semi-supervision	Math for ML, 6.6; CIML, Ch. 9 9.3.		Notes; Notebook
2/2 (w)	Naive Bayes / conjugacy (cont'd.)	Elements of Statistical Learning, 14--14.6	HW1 DUE	Notes; Notebook; Bonus: note on mixtures of Normals (relevant to HW1)
2/7 (m)	Clustering (K-means)	Elements of Statistical Learning, 14.6--14.9		Notes; K-Means notebook
2/9 (w)	Clustering → Mixture models	MML, Part 2: 11		Notes; k-means clustering tweets; Notebook (GMMs)
2/14 (m)	Modeling collections of discrete data: PLSA	PLSA tutorial; Bonus: Intro to LSA (it looks longer than it is!)	HW2 DUE	Notes (Clustering wrap-up); Notes (PLSA)
2/16 (w)	Topic modeling via Latent Dirichlet Allocation (LDA)	Latent Dirichlet Allocation; Applications of Topic Models (Boyd-Graber, Hu, Mimno)		Notes; Notebook
2/21 (m)	* No class (President's Day) *
2/23 (w)	LDA cont'd./MCMC and Gibbs	Latent Dirichlet Allocation; Applications of Topic Models (Boyd-Graber, Hu, Mimno); See also Tutorial on LDA (Darling)		Notes; Notebook on random search MCMC for LR; Notebook on LDA w/Gibbs
2/28 (m)	Dimensionality reduction	Math for ML, Part 2: 10; t-SNE paper		Notes; Notebook
3/2 (w)	Dimensionality reduction cont'd:; t-SNE and Auto-encoders	t-SNE paper	HW 3 DUE	Notes; t-SNE notebook; Auto-encoders notebook
3/7 (m)	Midterm review!
3/9 (w)	Midterm (in-class)
	Spring break!
3/21 (m)	Structured prediction	CIML, Ch 17		Notes; Notebook
3/23 (w)	Fairness and bias I (guest: Vance Ricks)	CIML, Ch. 8; Algorithmic bias: Senses, sources, solutions		Slides
3/28 (m)	Fairness and bias II (guest: Vance Ricks)	CIML, Ch. 8; Algorithmic bias: Senses, sources, solutions		Slides
3/30 (w)	Project pitches and feedback		In class project pitches! // HW 4 DUE TOMORROW
4/4 (m)	From Feed-Forward Neural Networks to Transformers: An Overview of Modern NNs	Dive Into DL (this is for reference!)		Notes; Notebook: Transformish to BERTish
4/6 (w)	Multi-Modal Models (Guest: PhD student Jered McInerney)			Slides
4/11 (m)	Deep latent variable models			Notes; Notebook: VAEs with Pyro
4/13 (w)	Interpretability (Guest: PhD student Sarthak Jain)		HW 5 DUE	Slides; Notebook
4/18 (m)		* No class (Patriot's Day) *
4/20 (w)			Project help (come to class w/questions/problems)
4/25 (m)	Final project presentations I
4/27 (w)	Final project presentations II		FINAL PROJECT WRITE-UPS DUE!

HTML/CSS/JS used (and modified), with permission, courtesy of Prof. Alan Mislove