Recommended books and supplemental reading materials for the semester.
The primary book resources for this class are:
- Introduction to Data Mining (2nd Ed.) - This book is the recommended text for most of the prior CS6220 courses that Northeastern has offered at the Boston Campus, online, in Seattle, and prior iterations in Silicon Valley. There are only three chapters that are free in this book, which are at the authors’ website, but they coincide with the topics in this course.
- Mining of Massive Datasets: Leskovic et. al. - This book covers various aspects of data mining and machine learning, including algorithms, techniques, and practical considerations for mining large datasets. It also includes case studies and real-world examples. The book is available for free download from the authors’ website.
Beyond these two books, you may find the following free materials useful as they cover the same topics but from a different perspective. It sometimes help to have multiple sources that cover With the rapid pace of the field, these resources are more rapidly updated and consistently updated.
-
Data Mining: Practical Machine Learning Tools and Techniques, (2nd Ed.) This book by Ian H. Witten, Eibe Frank, and Mark A. Hall, provides a practical approach to data mining and machine learning. It covers various techniques, algorithms, and tools for data mining, including decision trees, association rules, and clustering. The book is available for free download from the official website of the authors.
-
Data Mining and Analysis: Fundamental Concepts and Algorithms This book by Mohammed J. Zaki and Wagner Meira Jr, provides a comprehensive introduction to data mining concepts, algorithms, and methodologies. It covers various data mining techniques, including classification, clustering, association analysis, and more. The book is available for free download on the book’s official website.
-
The Elements of Statistical Learning This book by Trevor Hastie, Robert Tibshirani, and Jerome Friedman, provides a comprehensive introduction to statistical learning methods and techniques, including data mining. It covers topics such as linear regression, classification, clustering, and more. The book is available for free download on the authors’ website.