The primary book resources for this class are:

  • Introduction to Data Mining (2nd Ed.) - This book is the recommended text for most of the prior CS6220 courses that Northeastern has offered at the Boston Campus, online, in Seattle, and prior iterations in Silicon Valley. There are only three chapters that are free in this book, which are at the authors’ website, but they coincide with the topics in this course.
  • Mining of Massive Datasets: Leskovic et. al. - This book covers various aspects of data mining and machine learning, including algorithms, techniques, and practical considerations for mining large datasets. It also includes case studies and real-world examples. The book is available for free download from the authors’ website.

Beyond these two books, you may find the following free materials useful as they cover the same topics but from a different perspective. It sometimes help to have multiple sources that cover With the rapid pace of the field, these resources are more rapidly updated and consistently updated.