syllabus
schedule
-
Week 1
Introduction and Applications
September 9
-
Topics
- A Course Overview
- Data Vocabulary
- Suggested Reading
- Assignment 1 is assigned - Review & Exploring Data
-
Topics
-
Week 2
Mining for Association Rules
September 16
-
Topics
- Definitions of Frequent Itemsets
- Determining Frequent Itemsets
- Creating Association Rules
- Working with Docker
- Suggested Reading
-
Submissions
- Assignment 1 is due
- Assignment 2 is assigned - Association Rules
-
Topics
-
Week 3
Accessing, Storing, and Computing with "Big" Data
September 23
-
Topics
- Distributed Filesystems and Storage
- Structured Data Analysis (SQL)
- Introducing the MapReduce Paradigm
- Distributed Computation
- Suggested Reading - Chapter 2, Sections 2.1-2.4
-
Submissions
- Recitation video is available
- Assignment 2 is due
- Assignment 3 is assigned - Map Reduce Problem
-
Topics
-
Week 4
Large Scale Data (Pre)-Processing
September 30
-
Topics
- Basics of Linear Algebra and Probability Theory
- The Multiple Places Where Data Lives & Multi-source Joins
- Covariance, Correlation, and Cosine Similarity
- Dimensionality Reduction and Feature Selection
- Suggested Reading
-
Topics
-
Week 5
Mining Data without Labels
October 7
-
Topics
- Introducing the Gaussian Distribution
- Parameter Estimation of a Distribution
- Unsupervised Modeling with k-Means and Clustering
- Suggested Reading
-
Submissions
- Assignment 3 is due
- Assignment 4 is assigned - Parameter Estimation & Clustering
-
Topics
-
Week 6
No Instruction - Indigenous People's Day
October 14
- Have a nice break!
-
Week 7
Mining with Bayes Classifiers
October 21
-
Topics
- Anomaly and Outlier Detection
- The Bayesian Framework
- Suggested Reading
-
Submissions
- Assignment 4 is due
- Assignment 5 is assigned - The Bayesian Framework
-
Topics
-
Week 8
Mining with Small Data and Course Review
October 28
-
Topics
- Naïve Bayes Classification
- Decision Tree Classification
- Course Review and Midterm Preparation
- Suggested Reading
-
Submissions
- Assignment 5 is due
- Assignment 6 is assigned - Using ML Libraries
-
Topics
-
Week 9
Midterm Exam
November 4
-
Topics
- MapReduce Problems
- Principle Component Analysis
- Parameter Estimation
- Unsupervised Clustering
- Bayesian Framework
- Suggested Preparation
-
Topics
-
Week 10
No Instruction - Veteran's Day
November 11
- Have a nice holiday!
-
Week 11
Foundations of Machine Learning
November 18
-
Topics
- Algorithmic Evaluation with Confusion Matrices and ROC Curves
- The Objective Function, Regularization, and Constraints
- In-Class Colabs: Logistic Regression with MNIST
-
Suggested Reading
- Evaluation Metrics, Chapter 8.5
- Logistic Regression ([1], [2])
-
Submissions
- Assignment 6 is due
- Assignment 7 is assigned - Evaluation Metrics
-
Topics
-
Week 12
Mining Images with Deep Learning
November 25
-
Topics
- Working with Tensors - Reviewing Multivariate Calculus
- Deep Learning - A Historical Perspective
- Suggested Reading
-
Submissions
- Assignment 7 is due
- Assignment 8 is assigned - Neural Networks
-
Topics
-
Week 13
Mining Text with Self Supervision
December 2
-
Topics
- Some Basic Approaches
- Semi-Supervised Learning
- The Attention Mechanism
- Large Language Models - From BERT to ChatGPT
- Suggested Reading
- Submissions
-
Topics
-
Week 14
Final Project Presentations
December 9
-
Topics
- Objective Functions
- Logistic Regression
- Association Rule Mining
- Evaluation Metrics
- Backpropagation
- Convolutions and Recurrence
-
Topics
grading criterion
Labs & Participation | 20% |
Data Mining Project | 20% |
Assignments | 20% |
Midterm Exam | 40% |
Final Exam | ??? |
course meeting times
-
Lectures
- Mon, 4:30pm-7:50pm
- Room TBD
-
Office Hours
- Professor, Tues, 8:30-9:30pm
- TA, Date/Time TBD
suggested textbooks
- Introduction to Data Mining, 2nd Edition Pang-Ning Tan, Michael Steinbach, Anuj Karpatne, Vipin Kumar, 2018
- Mining of Massive Data Sets, 3rd Edition Jure Leskovec, Anand Rajaraman, and Jeff Ullman, 2014
- Deep Learning Ian Goodfellow, Yoshua Bengio, and Aaron Courville, 2016