DS4420 // machine learning 2 // spring 2020

Course details

Instructor

Byron WallaceOffice: 2208 177 Huntington

Office hours: by appointment (just email!)

TA

Sarthak Jain
jain.sar@husky.neu.edu
Office hours: Mondays 11am -- 12:30pm and Friday 4pm -- 5:30pm, room 2204 in 177 Huntington Avenue.
Lecture Time/Location

Tuesdays 11:45a - 1:25p and Thursdays 2:50p -- 4:30p / Piazza

Here is a link to the course Piazza site.
Course description

Machine learning 2

Grading

30% | Homeworks |

5% | In class exercises |

30% | Mid-term |

35% | Final project |

Prerequisites

I assume you have taken ML1 (ds4400) or equivalent. Working knowledge of Python required (or you must be willing to pick up rapidly as we go).

Homeworks

Homeworks will consist of both written and programming components. The latter will be completed in Python, using a mix of standard libraries (numpy, pytorch, etc.)

**Late Policy**. Homeworks that are one day late will be subject to a 20% penalty; two days incurs 50%. Homeworks more than two days late will not be accepted.

Mid-term

The mid-term will be given in class, and will be testing for understanding of the core material presented in the course regarding the fundamentals covered in the first half of the course.

Projects

A big component of this course will be your project, which will involve picking a particular dataset on which to implement, train and evaluate machine learning models. Collaboration is allowed (team sizes <= 2, however). This project will be broken down into several graded deliverables, and culminate in a report and final presentation in class to your peers.

Here is an outline of the project expectations, (tentative) dates, etc.

Academic integrity policy

A commitment to the principles of academic integrity is essential to the mission of Northeastern University. The promotion of independent and original scholarship ensures that students derive the most from their educational experience and their pursuit of knowledge. Academic dishonesty violates the most fundamental values of an intellectual community and undermines the achievements of the entire University. For more information, please refer to the Academic Integrity Web page.

More specific to this class: It is fine to consult online resources for programming assignments (of course), but lifting a solution/implementation in its entirety is completely inappropriate. Moreover, you **must** list all sources (websites/URLs) consulted for every homework; failing to do so will constitute a violation of academic integrity. In general, you must also be able to explain whatever code you use.

Meeting | Topic(s) | readings | things due | lecture notes/etc |

1/7 (t) | Logistics, overview | Slides; Notebook | ||

1/9 (r) | Math Review | Math for ML, Part 1: 5-5.5, 6-6.5 | Slides; probability review NB; autodiff in Torch NB; A follow-up note to our exercise on gradients | |

1/14 (t) | MLE, MAP, and graphical models | Math for ML, Part 2: 8.3, 8.4, 8.5 | Slides; Notebook on conjugate priors | |

1/16 (r) | Neural networks / backprop | A Course in Machine Learning, Ch. 10 | Slides; Notes ok NNs/backprop; Notebook on backprop | |

1/21 (t) | Clustering I | Elements of Statistical Learning, 14--14.6; (optional) CIML 11.3 | Slides; Notes on clustering; Notebook on k-means; In class exercise/NB clustering BERT vectors of Trump tweets | |

1/23 (r) | Clustering II → Mixture models and EM | Elements of Statistical Learning, 14.6--14.9; MML, Part 2: 11 | HW1 DUE | Slides; Notes on Mixture Models; In-class exercise on EM+NB; Worked |

1/28 (r) | Topic modeling I | Applications of Topic Models (Boyd-Graber, Hu, Mimno) | Slides; Notes on PLSA; Notebook/in-class exercise on PLSA (worked) | |

1/30 (r) | Topic modeling II | Applications of Topic Models (Boyd-Graber, Hu, Mimno) | Slides; Notes; Notebook (sampling + LDA) | |

2/4 (t) | Dimensionality reduction I | Math for ML, Part 2: 10 | Slides; Notes; Notebook (PCA/PPCA) | |

2/6 (r) | Dimensionality reduction II | t-SNE paper | HW2 DUE | Slides; Notes; Notebook |

2/11 (t) | Auto-encoders/"Self-supervision"; Learning to embed | Slides; Notes; Notebook | ||

2/13 (r) | Structured prediction I | A Course in Machine Learning, Ch 17 | Slides; Notes | |

2/18 (t) | Structured prediction II | A Course in Machine Learning, Ch 17 | Slides; Notes Notebook | |

2/20 (r) | No class | HW3 DUE **tomorrow 2/21** | ||

2/25 (t) | Review | Review slides | ||

2/27 (r) | Midterm exam | |||

Spring break! | ||||

3/10 (t) | Transformers | The Illustrated Transformer | Notebook on (stripped down) transformers; Slides; Refresher: RNNs | |

3/12 (r) | Fairness and bias | A Course in Machine Learning, Ch. 8 | Project proposal due FRIDAY (3/13) | Slides; Notebook on accidentally making a racist model (by Robyn Speer) |

3/17 (t) | Project pitches and feedback | In class project pitches! | ||

3/19 (r) | "Green" AI | Green AI | Slides; Notebook/exercise on model distillation | |

3/24 (t) | Active learning | Slides; Notebook/exercise on active learning | ||

3/26 (r) | Interpretability (Guest Lecturer: PhD student Sarthak Jain) | HW DUE | Slides; Notebook | |

3/31 (t) | By popular demand: Sequence-2-sequence models | Slides; seq2seq notebook | ||

4/2 (r) | Project Q's / help | |||

4/7 (t) | Final project presentations I | Presentations! | ||

4/9 (r) | Final project presentations II | Presentations! | ||

4/14 (t) | No class (final write-ups due) | FINAL PROJECT WRITE-UPS DUE! |

HTML/CSS/JS used (and modified), with permission, courtesy of Prof. Alan Mislove