The video game industry is a fast-growing and dynamic industry, with billions of dollars in revenue each year. As the industry continues to evolve and expand, it is important to understand the factors that contribute to a game's success or failure. In this project, we aim to analyze video game sales and review data to identify trends, predict sales figures, evaluate reviews, and compare games.
import pandas as pd
# Import the CSV file as a pandas DataFrame
df = pd.read_csv("game_statistics.csv")
# Display the first five rows of the DataFrame
print(df.head())
title total_sales total_shipped \ 0 Professor Layton NaN 18.00m 1 Need for Speed: Most Wanted NaN 17.80m 2 Pokémon Diamond / Pearl Version NaN 17.67m 3 Elden Ring NaN 17.50m 4 Grand Theft Auto: Vice City NaN 17.50m publisher developer release_date platform \ 0 Nintendo Level-5 10th Feb 08 Series 1 Electronic Arts EA Canada 15th Nov 05 All 2 Nintendo Game Freak 28th Apr 07 DS 3 Bandai Namco Entertainment From Software 25th Feb 22 All 4 Rockstar Games Rockstar North 29th Oct 02 All japan_sales na_sales other_sales pal_sales pos user_score vgchartz_score \ 0 NaN NaN NaN NaN 201 NaN NaN 1 NaN NaN NaN NaN 202 NaN NaN 2 NaN NaN NaN NaN 203 NaN NaN 3 NaN NaN NaN NaN 204 NaN NaN 4 NaN NaN NaN NaN 205 NaN NaN critic_score last_update 0 NaN 04th Feb 20 1 NaN 20th Oct 20 2 8.6 NaN 3 NaN 28th Feb 22 4 NaN 14th Oct 20
/var/folders/1f/8fvyhtpn77qb8yf3jr3d8fg80000gn/T/ipykernel_17235/475495006.py:4: DtypeWarning: Columns (1,2,7,8,9,10) have mixed types. Specify dtype option on import or set low_memory=False. df = pd.read_csv("game_statistics.csv")
import csv
# Open the CSV file in read mode
with open('game_statistics.csv', mode='r') as file:
# Create a DictReader object
reader = csv.DictReader(file)
# Create an empty dictionary to store the headers
headers = {}
# Loop through the first row of the CSV file
for row in reader:
# Store the header and its value in the dictionary
for key, value in row.items():
headers[key] = value
# Exit the loop after the first row
break
# Print the dictionary of headers
print(headers)
{'title': 'Professor Layton', 'total_sales': 'N/A', 'total_shipped': '18.00m', 'publisher': 'Nintendo', 'developer': 'Level-5', 'release_date': '10th Feb 08', 'platform': 'Series', 'japan_sales': 'N/A', 'na_sales': 'N/A', 'other_sales': 'N/A', 'pal_sales': 'N/A', 'pos': '201', 'user_score': 'N/A', 'vgchartz_score': 'N/A', 'critic_score': 'N/A', 'last_update': '04th Feb 20'}
We will use publicly available data from sources such as VGChartz and Metacritic to gather information on video game sales figures and review scores. We will preprocess the data by removing duplicates, filling in missing values, and normalizing the features. We will then use data visualization techniques such as scatter plots and heatmaps to visualize the relationships between different variables, such as sales figures and review scores. We will also use Matplotlib to create interactive visualizations that allow us to explore the data more deeply.
To predict the sales figures of new games, we will use the K-nearest neighbor (KNN) algorithm. We will split the data into training and testing sets and use the KNN algorithm to predict the sales figures of the testing set based on the features of the training set. We will tune the hyperparameters of the KNN algorithm using cross-validation to achieve the best performance.
To analyze the performance of different games, we will use clustering techniques such as K-means clustering to group games based on their sales figures and review scores. We will visualize the clusters using Matplotlib to identify patterns and trends in the data. We will also compare the clusters to identify the factors that contribute to a game's success or failure.
We expect to identify trends in video game sales across different platforms and regions, as well as to predict sales figures for new games based on key factors using the KNN algorithm. We also expect to gain insights into the relationship between review scores and sales figures through data visualization. Finally, we expect to cluster games based on their sales figures and review scores to identify patterns and trends in the data using K-means clustering and Matplotlib.
This project will provide valuable insights into the video game industry and the factors that contribute to a game's success or failure. The results of this project could be useful for game developers, publishers, and investors who are looking to create or invest in new games. The use of data visualization, KNN, and Matplotlib will allow us to explore the data more deeply and gain new insights into the industry.