CS6220
Fall 2017
Section 1
Data Mining Techniques

Meeting

Time Thursdays 6-9pm
Location Knowles Center 010

Instruction Team

Nate Derbinsky

E-mail n.derbinsky@northeastern.edu
Web https://derbinsky.info
Phone 617-373-7382
Office Hours WVH 208, Tuesdays/Thursdays 5-5:45pm

Yuyu Xu (Teaching Assistant)

E-mail xu.yuyu@husky.neu.edu
Office Hours WVH 208, Wednesdays 6-8pm

Weiqi Weng (Teaching Assistant)

E-mail weng.wei@husky.neu.edu
Office Hours Nightingale 132, Mondays 6-8pm (starting 9/18)

Course Goals

This course introduces a range of techniques in data mining and unsupervised machine learning...

Lectures will focus on developing a mathematical and algorithmic understanding of the methods commonly employed to solve unsupervised machine learning and data mining problems.
Homework will ask students to implement algorithms and/or work out examples.
Team Project will allow students to collaboratively complete a data mining task from start to finish, including pre-processing of data, analysis, and visualization of results.

Evaluation

The final grade for this course will be weighted as follows...

Final grades will be assigned based on the following scale...

A
92 - 100
A-
90 - <92
B+
87 - <90
B
82 - <87
B-
80 - <82
C+
77 - <80
C
72 - <77
C-
70 - <72
F
<70

Make-up Policy

All assignments have a specific due date and time. Submissions will be accepted up to one day after the deadline with a 50% penalty. The assignment will be graded and returned as normal, but the grade will be recorded as half of what was earned. For example, an on-time submission might receive a grade of 90 points. The same assignment submitted after the deadline would receive 45 points (90 x 0.5).

Students who miss scheduled quizzes will not, as a matter of course, be able to make up those quizzes. If there is a legitimate reason why a student will not be able to complete an assignment on time or not be present for a quiz, then they should contact the instructor beforehand. Under extreme circumstances, as decided on a case-by-case basis by the instructor, students may be allowed to make up assignments or quizzes without first informing the instructor.

Homework

Submissions will be made via Blackboard as a single ZIP file. Any written work (e.g. math problems, reports) is to be included as a PDF (preferably in LaTeX). Code is expected to be professional and properly documented; any required data files/libraries must be included.

This class has very strict standards for borrowing code: if you borrow anything for use in your homework/project, you must have a citation. A good guideline is that if you take more than three lines of code from some source, you must include the information on where it came from. A URL or a notation (e.g., "MATLAB help files") is fine. If it is an entire function, note it at the beginning of the code segment and include any original credit information. Provide a qualitative description of what you used, and what you changed/contributed. If you have a question about what is considered a violation of this policy, ASK!

The university's academic integrity policy discusses actions regarded as violations and consequences for students: http://www.northeastern.edu/osccr/academic-integrity

Project

The goal of the project is to gain hands-on experience with a real-life dataset. See the specification for details.

Peer Evaluation

Group projects are sometimes looked upon as being "unfair." To combat contribution inequity, each team member's perception of the quantity of work that s/he performed and that of each team member will be analyzed against the perceptions of the team member(s). Through this process, hopefully equity will be achieved.

Each team member will submit a report rating the relative contributions of each team member (including her/himself) using a single number, as well as optional commentary. The aggregate rating for each student will determine the grade that individual receives, relative to the group grade. In order for this process to work effectively there is the need for each group member to be honest and objective; these ratings and comments will be kept confidential.

Schedule

Note: This schedule is subject to change and will be adjusted as needed throughout the semester.

Day Topics Reading Due (default=W@5pm, late=R@5PM)
Sep 7 Course/Syllabus, Background/What is Data Mining? A: 1-2.3
LRU: 1-2
Sep 14 Self-Test Debrief, Association Rules
  • Introduce HW1
A: 4.1-4.4,
5.2.1-5.2.2
LRU: 6
TSK: 6
WK: 4
Self-Test (not graded for correctness)
Sep 21 Frequent Item Sets, Pattern Summarization A: 4.1-4.4,
5.2.1-5.2.2
LRU: 6
TSK: 6
WK: 4
Sep 28 HW1 debrief, Quiz 1, Quiz 1 debrief HW1
Oct 5 Clustering Analysis
  • Introduce HW2, part 1 + Web Basics
  • Project worktime
A: 6
LRU: 7.1-7.3
TSK: 8
WK: 2
Oct 12 Clustering Analysis cont'd
  • Introduce HW2, part 2
A: 6
LRU: 7.1-7.3
TSK: 8
WK: 2
PRJ::Proposal
Oct 19 Gaussian Mixture Models, Evaluation
  • Introduce HW2, part 3
A: 6.5
WK: 5
Oct 26 HW2 debrief, Quiz 2 review
  • Project worktime
HW2
Nov 2 Quiz 2, Quiz 2 debrief
  • Project worktime
PRJ::Update1
Nov 9 Dimensionality Reduction
  • Introduce HW3
A: 2.4-2.4.3
LRU: 11-11.3
Nov 16 Recommender Systems
  • Project worktime
A: 18.5
LRU: 9
Thanksgiving Recess PRJ::Update2
Nov 30 HW3 debrief, Link Analysis
  • Project worktime
A: 18.4
LRU: 5
WK: 6
HW3
Dec 7 Social-Network Analysis
  • Project worktime
A: 19
LRU: 10.1-10.3
Dec 14 Quiz 3, Quiz 3 debrief PRJ::Packet, PRJ::PeerEval

Resources

Students are expected to read the materials in preparation of each lecture.

Classroom Environment

To create and preserve a classroom atmosphere that optimizes teaching and learning, all participants share a responsibility in creating a civil and non-disruptive forum for the discussion of ideas. Students are expected to conduct themselves at all times in a manner that does not disrupt teaching or learning. Your comments to others should be constructive and free from harassing statements. You are encouraged to disagree with other students and the instructor, but such disagreements need to respectful and be based upon facts and documentation (rather than prejudices and personalities). The instructor reserves the right to interrupt conversations that deviate from these expectations. Repeated unprofessional or disrespectful conduct may result in a lower grade or more severe consequences. Part of the learning process in this course is respectful engagement of ideas with others.

Title IX

Title IX of the Education Amendments of 1972 protects individuals from sex or gender-based discrimination, including discrimination based on gender-identity, in educational programs and activities that receive federal financial assistance.

Northeastern’s Title IX Policy prohibits Prohibited Offenses, which are defined as sexual harassment, sexual assault, relationship or domestic violence, and stalking. The Title IX Policy applies to the entire community, including male, female, transgender students, faculty and staff.

If you or someone you know has been a survivor of a Prohibited Offense, confidential support and guidance can be found through University Health and Counseling Services staff (http://www.northeastern.edu/uhcs/) and the Center for Spiritual Dialogue and Service clergy members (http://www.northeastern.edu/spirituallife/). By law, those employees are not required to report allegations of sex or gender-based discrimination to the University.

Alleged violations can be reported non-confidentially to the Title IX Coordinator within The Office for Gender Equity and Compliance at: titleix@northeastern.edu and/or through NUPD (Emergency 617.373.3333; Non-Emergency 617.373.2121). Reporting Prohibited Offenses to NUPD does NOT commit the victim/affected party to future legal action.

Faculty members are considered "responsible employees" at Northeastern University, meaning they are required to report all allegations of sex or gender-based discrimination to the Title IX Coordinator.

In case of an emergency, please call 911.

Please visit http://www.northeastern.edu/titleix for a complete list of reporting options and resources both on- and off-campus.

Students with Disabilities

Students who have disabilities who wish to receive academic services and/or accommodations should visit the Disability Resource Center at 20 Dodge Hall or call (617) 373-2675. If you have already done so, please provide your letter from the DRC to me early in the semester so that I can arrange those accommodations.