Syllabus

Logistics

Course Description

Offers intermediate to advanced Python programming for data science. Covers object-oriented design patterns using Python, including encapsulation, composition, and inheritance. Advanced programming skills cover software architecture, recursion, profiling, unit testing and debugging, lineage and data provenance, using advanced integrated development environments, and software control systems. Uses case studies to survey key concepts in data science with an emphasis on machine-learning (classification, clustering, deep learning); data visualization; and natural language processing. Additional assigned readings survey topics in ethics, model bias, and data privacy pertinent to today’s big data world. Offers students an opportunity to prepare for more advanced courses in data science and to enable practical contributions to software development and data science projects in a commercial setting.

Professor (email)

Matt Higger mhigger@ccs.neu.edu

Meeting Times

Lesson (DS2500)

Sec

Time

Day

Location

CRN

2

9:50 am - 11:30 am

TF

Churchill 103

34989

3

1:35 pm - 3:15 pm

TF

Behrakis 010

36017

4

*3:40 pm - 5:20 pm

TF

Churchill 101

38790

* this is section 4’s unofficial start time, helpful to allow me to get to class after section 3 which ends just beforehand. The official time is 15 mins earlier.

Labs (DS2501)

Sec

Day/Time

Location

CRN

TA

1

M 8:00am-9:40am

WVH 210B

34283

Parth, Sama

3

M 9:50am-11:30am

WVH 210B

36018

Parth, Shalvi, Sama, Jo

4

M 9:50am-11:30am

WVH 212

36019

Nidutt, Akshi

5

M 11:45am-1:25pm

WVH 210B

36020

Nidutt, Emily, Vaidehi, Shalvi, Mahek

6

M 11:45am-1:25pm

WVH 212

36021

Akhil, Akshi, Claudia, Unnat, Laura, Rohith

7

M 1:35pm-3:15pm

WVH 210B

36022

Kim, Vishwa, Aveek, Unnat

8

M 1:35pm-3:15pm

WVH 212

36023

Akhil, Venkatesh, Jo, Rohith

9

M 3:25pm-5:05pm

WVH 210B

36024

Kim, Emily, Vaidehi, Vishwa

10

M 3:25pm-5:05pm

WVH 212

38791

Venkatesh, Claudia, Laura, Aveek

(WVH is West Village H, it has a big “Khoury College” sign on it)

Lab Digest (optional attendance)

10:30AM - 11:30 AM Weds on zoom

Link available via our our Canvas site, just click the “zoom” tab on the left.

I’ll personally run a zoom session every lab week where we’ll digest any lingering questions students had during the Lab. To allow students with a conflict to participate and benefit, we’ll record this session. Please post your lab related questions under the corresponding labX folder on piazza to participate.

Learning

Real life data science problems often begin by “poking around the data”: making some quick graphs to get a sense of how variables are distributed and related to each other (and ensuring that the data does indeed represent the quantities it claims to!). Your ability to quickly collect and manipulate data will yield the experiences you need to find the statistical insight you’re after. Towards this end, we’ll do “In Class Assignments” each day to reinforce your Python fluency.

The experience of doing data science on a real question is quite different than working in the safe harbor of homework problems written to give a clean opportunity to focus on a particular skill. Know that real data science is often “messy”” in that

  • we rarely have just the right data (in terms of quality or quantity) needed to answer a question with certainty

  • we’ll need to make assumptions about our data to apply an algorithm or claim relevance to some question of interest

  • there isn’t one “right” answer, though some approaches might be better than others

This ambiguous space between data and some question of interest is, in my opinion, where a Data Science solution can really shine (or stink …). It rewards a particular marriage of criticism, creativity and pragmatism.

In Class Assignments

To encourage this python fluency we’ll practice our skills with brief “In Class Assignments” (ICAs).

  • I encourage everyone to work in small groups on ICAs

  • We’ll review a solution together after you’ve had a chance to try yourself

  • Your ICA work will be graded based on effort (not correctness)

  • ICAs are due at 11:59 PM the day of the lesson

  • No late ICAs (even with late passes) will be accepted

  • We’ll drop everyone’s lowest ICA score at the end of the semester to cover those little logistical speed bumps which come up for all of us during the semester (e.g. forgot to submit, unexcused absence from class, technical issues etc)

    • Given the above flexibility, please don’t email asking to submit late ICAs

Tip

Submit at the end of class to be sure you don’t forget. (Remember, ICAs are graded on effort … you needn’t spend time polishing your ICA submission after class)

Note

Be sure to bring your laptop to each class to complete the ICAs.

Weekly Lab (DS 2501)

While class introduces material, labs are designed to give you a chance to push your skills towards mastery. Towards this end, expect labs to be a challenge! You’ll work in a small group with TA support on exciting “real” data & programming problems.

I do not intend for students to spend much time after lab completing the assignment. To allow for this, I’ll run a Lab Digest session each week and labs will be graded based on the following rubric:

Lab Rubric

Points

Parts Attempted

Parts Working

Documentation

4

All

Most

Clear, Complete, Simple

3

All

Some

Complete

2

Most

Any

Present

1

Some

None

None

0

None

None

None

All labs will be due the Weds at 11:59 PM on the week they’re given (labs are on mondays).

Class Attendance

Showing up to class is critical towards learning efficiently because, in large part, our attendance allows us to build real community with each other. Making friends is wonderful in its own right, but I’d also highlight that you can teach each other in ways I can’t. Please do make every effort to show up so you can nurture your own DS expertise in addition to the expertise of your friends-in-waiting sitting nearby in class.

Of course, life circumstances might prevent us from attending a lesson in person. In this case, you’re welcome to attend via zoom. You may access the zoom link via the zoom tab on the left of our Canvas site (be sure to register with your @northeastern.edu address).

Online resources:

  • Piazza - class discussion board

  • Gradescope - submission system

  • Canvas - used to sign up for Piazza and Gradescope & share HW solutions

  • Course site - all other course admin (i.e. syllabus, admin, project information, schedule, class notes)

Note

Sign up for Piazza and Gradescope by accessing the sidebar links on our Canvas site. After registering you can use the quick links on our Course site or access the sites directly.

Textbooks

Class notes and official documentation (e.g. numpy’s) will be sufficient to complete all assignments and prepare for assessments. Some students may prefer to study from a textbook so I’ll share a few I know about here. Both are freely available online :)

Grading

The total course average is computed as the weighted average of the following categories:

In-Class Activity

5%

Labs

15%

Homework

55%

Final Project

25%

Letter grades are assigned according to the highest thresholds met:

A

A-

B+

B

B-

C+

C

C-

D+

D

D-

E

93

90

87

83

80

77

73

70

67

63

60

0

To keep a transparent, consistent grading standard among all students:

  • grades will not be rounded before applying the above threshold

  • no extra credit will be offered to individual students

  • I will not adjust anyone’s grade individually because they’ve asked (please don’t ask)

Late HW

Late HW will incur a penalty of 15% of the total possible points per day it is late up to two days. After 48 hours beyond the due date no HW will be accepted for credit. (Extending beyond this 48 hour mark makes for tight TA grading deadlines and may muddle other students’ ability to work on the next HW without feedback from the previous HW). Additionally, each student has 3 late day “passes” which are automatically used to neutralize the first late day penalties possible; you needn’t contact anyone to utilize late day passes.

A single student’s late HW (example):

HW 1 is 2 days late

2 late passes used

HW 2 is 2 days late

1 late pass used. 15% penalty applied to this HW

HW 3 is 1 day late

15% penalty applied to this HW

HW 4 is 2 days late

30% penalty applied to this HW

HW 5 is 3 days late

no credit is given for this HW

Note

The intention behind giving all students these late passes is to provide flexibility when you just “forget”, need a travel day, have some computer challenge or trouble submitting the work to gradescope. We will not give further accommodation to students who contact us under these circumstances. However, please do get in touch if more significant challenges come up for you.

Academic Integrity and Conduct

Warning

Under no circumstances may one student view or share their ungraded homework with another student.

Sharing or viewing another students ungraded work will result in a failing course grade. With that said, you are welcome to discuss concepts and ideas with other students so long as you don’t view any written code. For example, you could ask a classmate what pd.DataFrame method they used to solve a particular problem though it would be an academic integrity violation to ask them to look at your code to help debug (use a TA during office hours instead!). See OSCCR for further details.

Note

I report every academic integrity violation to OSCCR, which track such violations across semesters to detect patterns.

Like every computer scientist, you’re encouraged to borrow code you find online (so long as it was not written for this course). Doing so requires that you attribute credit to the source: * a quick url link comment (e.g. stackoverflow) will suffice * you need not cite any python module you import (e.g. sklearn, pandas)

Disability Resource Center

The office is available to assist students who have a legally documented disability or students who suspect that they may have a disability. If you have a disabling condition that may interfere with your ability to successfully complete this course, please contact the Disability Resource Center.

Title IX

Title IX of the Education Amendments of 1972 protects individuals from sex or gender-based discrimination, including discrimination based on gender-identity, in educational programs and activities that receive federal financial assistance.

Northeastern’s Title IX Policy prohibits Prohibited Offenses, which are defined as sexual harassment, sexual assault, relationship or domestic violence, and stalking. The Title IX Policy applies to the entire community, including male, female, transgender students, faculty and staff.

If you or someone you know has been a survivor of a Prohibited Offense, confidential support and guidance can be found through University Health and Counseling Services staff and the Center for Spiritual Dialogue and Service clergy members. By law, those employees are not required to report allegations of sex or gender-based discrimination to the University.

Alleged violations can be reported non-confidentially to the Title IX Coordinator within The Office for Gender Equity and Compliance at titleix@northeastern.edu and/or through NUPD (Emergency 617.373.3333; Non-Emergency 617.373.2121). Reporting Prohibited Offenses to NUPD does NOT commit the victim/affected party to future legal action.

Faculty members are considered “responsible employees” at Northeastern University, meaning they are required to report all allegations of sex or gender-based discrimination to the Title IX Coordinator.