The article discusses how data science is transforming the publishing industry by enabling publishers to better understand their readers, improve book discovery and recommendation, and optimize their marketing strategies. The article provides examples of how publishers are using data science to analyze reader behavior and preferences, develop personalized recommendations, and identify new marketing opportunities.
import pandas as pd
goodreads_titles_df = pd.read_csv('/Users/ahmedkadous/Desktop/Northeastern/Spring 2023/DS2500; Programming/Project/Goodreads-data.csv')
goodreads_titles_df.head()
Book_Name | Author | Average_star | Ratings | Reviews | 5_Star | 4_Star | 3_Star | 2_Star | 1_Star | |
---|---|---|---|---|---|---|---|---|---|---|
0 | To Kill a Mockingbird | Harper Lee | 4.27 | 5,623,473 | 108,722 | 2,927,118 | 1,669,471 | 730,317 | 192,620 | 103,947 |
1 | 1984 | George Orwell | 4.19 | 4,134,439 | 98,891 | 1,956,290 | 1,345,678 | 588,373 | 158,757 | 85,341 |
2 | Fahrenheit 451 | Ray Bradbury | 3.97 | 2,181,792 | 64,728 | 788,776 | 777,014 | 438,256 | 123,939 | 53,807 |
3 | Animal Farm | George Orwell | 3.98 | 3,521,050 | 81,746 | 1,310,631 | 1,229,834 | 676,221 | 200,989 | 103,375 |
4 | The Hobbit | J.R.R. Tolkien | 4.28 | 3,612,605 | 62,476 | 1,930,001 | 1,047,617 | 439,072 | 118,631 | 77,284 |
The data dictionary for this dataset is as follows:
Column Name | Dictionary Definition |
---|---|
Book_Name | Title of the book |
Author | Author of the book |
Average_star | Average rating of the book |
Ratings | Total number of ratings the book has received |
Reviews | Total number of reviews the book has received |
5_Star | Number of 5-star ratings the book has received |
4_Star | Number of 4-star ratings the book has received |
3_Star | Number of 3-star ratings the book has received |
2_Star | Number of 2-star ratings the book has received |
1_Star | Number of 1-star ratings the book has received |