COVID-19 IS STILL AN ISSUE¶

Source of dataset: https://www.kaggle.com/datasets/themrityunjaypathak/covid-cases-and-deaths-worldwide

Citations

“The Latest on the Coronavirus.” News, 23 Feb. 2023, https://www.hsph.harvard.edu/news/hsph-in-the-news/the-latest-on-the-coronavirus/#:~:text=Immunity%20to%20COVID%2D19%2C%20both,COVID%20still%20remains%20a%20threat.

Park, Alice. “Why the U.S. Needs Better Covid-19 Data.” Time, Time, 31 Jan. 2023, https://time.com/6249832/cdc-covid-19-data-tracker-hospitalizations/.

Motivation:¶

Although, we consider covid as the pandemic that is over, it is still ongoing globally. However, long COVID which is where COVID sympotms occur after you get COVID. A Harvard article talks about increasing healthy habits that can help reduce the risk of long COVID to almost half with habits like excersising regularly or not smoking (The latest on the coronavirus). The article discusses how COVID-19 is still an ongoing threat, even though it is less severe and impactful as prior years due to vaccines and decreased infections (The latest on the coronavirus). William Hanage says in the article, "Is it the case that there is no preventable suffering? No. There is still preventable suffering and death." (The latest on the coronavirus). This emphasizes the point that COVID-19 is still a very real problem in the world.

Alice Park in "The U.S Still Doesn't Have Good COIVD-19 Data. Here's Why That's a Problem" disccuses the difficulty in reliable dataset because as COVID is considered less of a problem, "it's getting increasingly difficult to parse who is hospitalized or dies from COVID-19, and who is hospitalized or dies from another reason but with COVID-19" (Park). The article continues to talk about the emphasis on how important it is to understand the data coming from COVID-19, regardless of the pandemic being at its 3rd year.

Thus, with this information on emphasisizng on how important it is to conntinue anaylyzing and proccessing COVID-19 Data. We want to find a dataset that answers our real-world problem: What country has the most total covid cases? How many deaths have there been in due to COVID 19? How are the total cases for a nation compared to its corresponding population? What are the active cases right now for COVID-19, to put in perspective the difference in active cases to total cases during the past 3 years?

DATASET¶

In [25]:
import pandas as pd

# load the dataset 
df_covid = pd.read_csv('covid_worldwide.csv')
df_covid.head()
Out[25]:
Serial Number Country Total Cases Total Deaths Total Recovered Active Cases Total Test Population
0 1 USA 104,196,861 1,132,935 101,322,779 1,741,147 1,159,832,679 334,805,269
1 2 India 44,682,784 530,740 44,150,289 1,755 915,265,788 1,406,631,776
2 3 France 39,524,311 164,233 39,264,546 95,532 271,490,188 65,584,518
3 4 Germany 37,779,833 165,711 37,398,100 216,022 122,332,384 83,883,596
4 5 Brazil 36,824,580 697,074 35,919,372 208,134 63,776,166 215,353,593

Data Dictionary Information:¶

  • Column Info:

  • Serial Number(int) : Serial Number of Country

  • Country(str): Country name

  • Total Cases(int): total covid cases in the country

  • Total Deaths(int): total deaths due to covid in the country

  • Total Recovered(int): total recovered from covid in the country

  • Active Cases(int): actives cases of covid in the country

  • Total Test(int): total covid test in the country

  • Population(int): total population in the country

How the Dataset solves our problem:¶

This dataset answers our questions that were stated in the real world case by looking at the country column and use this information given in the other categories to compare the correlation between each country and thier COVID-19 predicament. The specific columns that would be used is country, total cases, total deaths, active cases, population to help answer the real-world problems that will be foucsed on in this dataset.

To reiterate, here are the questions again:

What country has the most total covid cases? How many deaths have there been in due to COVID 19? How are the total cases for a nation compared to its corresponding population? What are the active cases right now for COVID-19, to put in perspective the difference in active cases to total cases during the past 3 years?

We will use a lot of diagrams to help depict the large numbers that are encased in the dataset. We will have a bar chart to show the countries that have the most covid cases. We will also show a bar chart that shows the percentage of covid cases from all given countries based on their population. We will also have a bar/pie chart to depict the total deaths that have occured in each country. We will have a histogram that shows the different in between active cases compared to the total covid-19 cases.