syllabus
schedule
-
Week 1
Introduction and Applications
January 11
-
Topics
- A Course Overview
- Data Vocabulary
- Suggested Reading
- Assignment 1 is assigned - Review & Exploring Data
-
Topics
-
Week 2
Mining for Association Rules
January 18
-
Topics
- Definitions of Frequent Itemsets
- Determining Frequent Itemsets
- Creating Association Rules
- Working with Docker
- Suggested Reading
-
Submissions
- Assignment 1 is due
- Assignment 2 is assigned - Association Rules
-
Topics
-
Week 3
Accessing, Storing, and Computing with "Big" Data
January 25
-
Topics
- Distributed Filesystems and Storage
- Structured Data Analysis (SQL)
- Introducing the MapReduce Paradigm
- Distributed Computation
- Suggested Reading - Chapter 2, Sections 2.1-2.4
-
Submissions
- Recitation video is available
- Assignment 2 is due
- Assignment 3 is assigned - Map Reduce Problem
-
Topics
-
Week 4
Large Scale Data (Pre)-Processing
February 1
-
Topics
- Basics of Linear Algebra and Probability Theory
- The Multiple Places Where Data Lives & Multi-source Joins
- Covariance, Correlation, and Cosine Similarity
- Dimensionality Reduction and Feature Selection
- Suggested Reading
-
Topics
-
Week 5
Mining Data without Labels
February 8
-
Topics
- Introducing the Gaussian Distribution
- Parameter Estimation of a Distribution
- Unsupervised Modeling with k-Means and Clustering
- Suggested Reading
-
Submissions
- Assignment 3 is due
- Assignment 4 is assigned - Parameter Estimation & Clustering
-
Topics
-
Week 6
Mining with Bayes Classifiers
February 15
-
Topics
- Anomaly and Outlier Detection
- The Bayesian Framework
- Naive Bayes Classification
- Suggested Reading
-
Submissions
- Assignment 4 is due
- Assignment 5 is assigned - Bayesian Framework & ML Libraries
-
Topics
-
Week 7
Mining with Small Data and Course Review
February 22
-
Topics
- Decision Tree Classification
- Course Review and Midterm Preparation
- Suggested Reading
-
Submissions
- Assignment 5 is due
- Assignment 6 is assigned - Evaluation Metrics
-
Topics
-
Week 8
Midterm Exam
February 29
-
Topics
- Linear Algebra Review
- MapReduce Problems
- Principle Component Analysis
- Parameter Estimation
- Unsupervised Clustering
- Bayesian Framework
- Suggested Preparation
-
Topics
-
Week 9
No Instruction This Week - Spring Break
March 7
- Have a nice break!
-
Week 10
No Instruction This Week - Instructor Absence
March 14
- Office hours held instead
-
Week 11
Foundations of Machine Learning
March 21
-
Topics
- Algorithmic Evaluation with Confusion Matrices, Thresholds, ROC Curves
- The Objective Function, Regularization, and Constraints
- Logistic Regression - Precursor to Modern Data Mining
- Batch Data Processing - Gradient Descent
- The Bias and Variance Tradeoff
- In-Class Colabs: Logistic Regression with MNIST
-
Suggested Reading
- Evaluation Metrics, Chapter 8.5
- Logistic Regression ([1], [2])
-
Submissions
- Project Proposals are Due - Initial Thoughts and Feedback
-
Topics
-
Week 12
Mining Images with Deep Learning
March 28
-
Topics
- Working with Tensors - Reviewing Multivariate Calculus
- Deep Learning - A Historical Perspective
- The Backpropation Algorithm
- Convolutional Neural Networks
- Suggested Reading
- Submissions
-
Topics
-
Week 13
Mining Text with Self Supervision
April 4
-
Topics
- Some Basic Approaches
- Semi-Supervised Learning
- The Concept of an Embedding Space
- The Attention Mechanism
- Large Language Models - From BERT to ChatGPT
- Suggested Reading
-
Topics
-
Week 14
-
Week 15
Project Presentations and Final Review
April 18
- Project Presentations and Outbriefs
-
Submissions
- Final projects are due, including presentation slides and writeup
-
Week 16
Final Exam
April 25
-
Topics
- Objective Functions
- Logistic Regression
- Association Rule Mining
- Evaluation Metrics
- Backpropagation
- Convolutions and Recurrence
-
Topics
grading criterion
Labs & Participation | 10% |
Data Mining Project | 10% |
Assignments | 20% |
Midterm Exam | 30% |
Final Exam | 30% |
course meeting times
-
Lectures
- Tues, 6pm-9:20pm
- Room TBD
-
Office Hours
- Professor, Thurs, 8:30-9:30pm
- TA, Date/Time TBD
suggested textbooks
- Introduction to Data Mining, 2nd Edition Pang-Ning Tan, Michael Steinbach, Anuj Karpatne, Vipin Kumar, 2018
- Mining of Massive Data Sets, 3rd Edition Jure Leskovec, Anand Rajaraman, and Jeff Ullman, 2014
- Deep Learning Ian Goodfellow, Yoshua Bengio, and Aaron Courville, 2016