Data Mining: Syllabus

Syllabus

Course description

Data mining is a practical discipline that combines computer science, statistics, math, and optimization techniques to analyze data and gather valuable knowledge from it. This course is designed to study fundamental data mining concepts and provide hands-on experience with several methods. The students will develop a broad and deep background in data mining and crucial skills to solve practical data science challenges. Problems will involve the analysis of real databases coming from various fields, such as food science, astronomy, human resources, social sciences, and banking, among others. Students are expected to have previous knowledge in the Python programming language.

Course Outcomes

By the end of this course, you will be able to identify fundamental principles, techniques, and data mining applications. You will also apply computational and statistical methods to visualize, explore, and prepare data for posterior analysis. In addition, you will be able to translate real-life problems and frame them under supervised or unsupervised paradigms. Later you can apply different classification, prediction, or clustering approaches, where you will be able to evaluate them empirically and choose the best one with clear foundations.

Grading

The grading scale will break down as follows:

A =	93–100%	C =	73–76%
A- =	90–92%	C- =	70–72%
B+ =	87–89%	D+ =	67–69%
B =	83–86%	D =	63–66%
B- =	80–82%	D- =	60–62%
C+ =	77–79%	F =	Below 60%

There will be one mid-term exam, take-home assignments (individual), and one final project (proposal, presentation, and report). The grading breakdown is as follows:

Assignment	Weight
Class Participation	5%
Midterm Exam	25%
Homework Assignments	30%
Final Project Presentation	10%
Final Project Paper	30%

Evaluation Activities

Take-home assignments will help students gain skills and feel more confident about the topics reinforced in the assignment. The final project will be an open ended capstone project, intended to cover a broader spectrum of contents, implementing a data mining solution for a real problem with real data. We expect a thorough analysis and creative solutions to the problem. The final project can be done individually or in teams. The teams should be determined before midterms, if applicable. All details will be provided with the announcement of every course activity.

Class attendance and participation

We base the learning process of this class on in-class discussion and participation. Attendance is mandatory and preparation of the course material is highly recommended. That includes coming/connecting to the class on time. Classes will combine theory and practice with hands-on activities.

Schedule and Materials:

The course material is approximate and subject to change!

Week 1: 1/11 Introduction to Data Mining	Lecture 1 Lecture 1 Recording Assignment 1 Assignment 1: Files Reading
Week 2: 1/18 Data Analysis & Summarization The Map Reduce Paradigm	Lecture 2 Lecture 2 Recording Assignment 2 Reading: Through 2.5 In Class CoLab Solutions
Week 3: 1/25 Data Preprocessing and Engineering Large Dataset Joins w/Spark	Lecture 3 Lecture 3 Recording No Assignment This Week Reading: Remainder of Chapter 2 Dimensionality Reduction In-Class Colab 3-1 and 3-2
Week 4: 2/1 Parameter Estimation Maximum Likelihood Estimation	Lecture 4 Lecture 4 Recording Assignment 3 Reading: ML Estimation (Stanford) ML Estimation (TAMU) In-Class Colab Solutions 4-1
Week 5: 2/8 Association Rule Mining A Priori and FP-Growth	Lecture 5 Lecture 5 Recording No Assignment This Week Reading: Association Rule Mining
Week 6: 2/15 Unsupervised Machine Learning Clustering and k-Means Text Mining w/LDA & LSI	Lecture 6 Lecture 6 Recording Assignment 4 Reading: Unsupervised Clustering Latent Semantic Indexing (Section 3) Optional Reading Latent Dirichlet Allocation Applications of LSI and LDA to Twitter Data In-Class CoLab Solutions 6-1
Week 7: 2/22 Supervised Machine Learning Supervised Machine Learning	Lecture 7 Lecture 7 Recording No Assignment This Week Reading Naïve Bayes Narwhal's Guide In-Class Colab Solutions
Week 8: 3/1 Logistic Regression: A Precursor to Deep Learning	Lecture 8 Lecture 8 Recording Assignment 5 Reading Introduction to Logistic Regression In-Class Colab Solutions
Week 9: 3/8 Spring Break	No Class
Week 10: 3/15 Deep Neural Network Learning	Lecture 9 Lecture 9 Recording Course Review and Midterm Preparation Assignment 6: Project Proposals
Week 11: 3/22 Midterm Exam
Week 12: 3/29 Practicalities of Machine Learning	Lecture 10 Lecture 10 Recording Project Proposals are Due
Week 13: 4/5 Special Topics -- Recommendation Sciences Mining Relational (Graphs) Infinite (Streams) Data	Lecture 11 Lecture 11 Recording Reading Recommendation Systems Graph Theory and Social NetworksLinks to an external site. Time Series Analysis
Week 14: 4/12 Industry Day	Industry Topics Project Discussions
Week 15: 4/19 Project Presentations	Project Slides
Week 16 4/26 Project Writeup Submissions Due

Home

Syllabus

Course Topics

Recommended Books