happiness¶

Where The Data Comes From:¶

The following data is being used from Kaggle, which contains results from the World Happiness Report 2021 for 149 countries.

The Real-World Problem:¶

Over the past few years, especially since the COVID-19 pandemic, the issue regarding healthcare economics, health equity, and it's association with various factors such as life expectancy has been more prevalent than ever. According to The World Health Organization, "health equity is achieved when everyone can attain their full potential health and well-being" -- this expands our understand what it means to be truly healthy in our world today. It's more than one's eating habits, one's ability to exercise, or even one's access to healthcare in their country.

The World Happiness Report has been releasing publications about countries' level of happiness and ranking them from highest to lowest scores. By looking further into the dataset, we will be able to look deeper into what allows for a country to score high on the happpiness scale and what leads to such a high quality of life -- and how this ties into achieving the overall goal of the highest health level. Additionally, looking further into the GDP and how different political systems in these countries can allow us to make conclusions on how other countries improve their happiness scores and quality of life. Considering there is a decade's worth of this data now, we would also be able to compare how countries have improved and look further into what has worked for a country to improve the quality of life for their citizens. This data is accessible through The World Happiness Report's website.

Understanding how the score on the Cantril Ladder (which is what the report uses) is calculated is essential for the analysis in this project:

  • 0: worst life possible
  • 10: best life possible

The survey asks individuals to rate their own lives on this scale. According to the organization, the sample size for each country is approxmiately 1,000 individuals -- though this may be higher for countries that have participated in the survey for several years.

In the more recent reports, COVID-19 has been a large focus and the report highlights how it has impacted the world -- especially quality of life. Additionally, we can use this data and compare to years prior to showcase different approaches for handling inequalities, resilience, and the future.

This report can start numerous types of projects as described above, however all of this can be tied back to the idea of health equity. Using various reports from the organization, we'll be able to reflect upon the past, but as well as make predictions for the future to see if countries are on the right track towards reaching the highest level of health.

In [1]:
import pandas as pd

# loading in the data into a dataframe
df = pd.read_csv('world-happiness-report-2021.csv')

# to display the first few rows of the dataframe
df.head()
Out[1]:
Country name Regional indicator Ladder score Standard error of ladder score upperwhisker lowerwhisker Logged GDP per capita Social support Healthy life expectancy Freedom to make life choices Generosity Perceptions of corruption Ladder score in Dystopia Explained by: Log GDP per capita Explained by: Social support Explained by: Healthy life expectancy Explained by: Freedom to make life choices Explained by: Generosity Explained by: Perceptions of corruption Dystopia + residual
0 Finland Western Europe 7.842 0.032 7.904 7.780 10.775 0.954 72.0 0.949 -0.098 0.186 2.43 1.446 1.106 0.741 0.691 0.124 0.481 3.253
1 Denmark Western Europe 7.620 0.035 7.687 7.552 10.933 0.954 72.7 0.946 0.030 0.179 2.43 1.502 1.108 0.763 0.686 0.208 0.485 2.868
2 Switzerland Western Europe 7.571 0.036 7.643 7.500 11.117 0.942 74.4 0.919 0.025 0.292 2.43 1.566 1.079 0.816 0.653 0.204 0.413 2.839
3 Iceland Western Europe 7.554 0.059 7.670 7.438 10.878 0.983 73.0 0.955 0.160 0.673 2.43 1.482 1.172 0.772 0.698 0.293 0.170 2.967
4 Netherlands Western Europe 7.464 0.027 7.518 7.410 10.932 0.942 72.4 0.913 0.175 0.338 2.43 1.501 1.079 0.753 0.647 0.302 0.384 2.798
In [2]:
# getting the col names
cols = df.columns

# create a dictionary of descriptions
descriptions = {
    "Country name": "name of country",
    "Regional indicator": "region of where the country is",
    "Ladder score": "measurement of happiness",
    "Standard error of ladder score": "reflecting the uncertainty surrounding the score",
    "Logged GDP per capita": "log of country's gross domestic product (GDP) per capita.",
    "Social support": "how much social support is available in a country",
    "Healthy life expectancy": "avg. number of years for a healthy individual",
    "Freedom to make life choices": "how free individuals are to make choices",
    "Generosity": "how willing citizens are to donate in a country",
    "Perceptions of corruption": "perceived levels of corruption in a country",
    "Ladder score in Dystopia": "measure of happiness with low scores",
    "Upperwhisker": "lower confidence interval of the Happiness Score",
    "Lowerwhisker": "upper Confidence Interval of the Happiness Score",
    "Explained by: Log GDP per capita": "value to overall happiness score related to this",
    "Explained by: Social support": "value to overall happiness score related to this",
    "Explained by: Healthy life expectancy": "value to overall happiness score related to this",
    "Explained by: Freedom to make life choices": "value to overall happiness score related to this",
    "Explained by: Generosity": "value to overall happiness score related to this",
    "Explained by: Perceptions of corruption": "value to overall happiness score related to this",
    "Dystopia + residual": "value to overall happiness score related to this"
}

# print the cols names and descriptions
for column in cols:
    if column in descriptions:
        print(column.upper() + ": " + descriptions[column])
        print()
COUNTRY NAME: name of country

REGIONAL INDICATOR: region of where the country is

LADDER SCORE: measurement of happiness

STANDARD ERROR OF LADDER SCORE: reflecting the uncertainty surrounding the score

LOGGED GDP PER CAPITA: log of country's gross domestic product (GDP) per capita.

SOCIAL SUPPORT: how much social support is available in a country

HEALTHY LIFE EXPECTANCY: avg. number of years for a healthy individual

FREEDOM TO MAKE LIFE CHOICES: how free individuals are to make choices

GENEROSITY: how willing citizens are to donate in a country

PERCEPTIONS OF CORRUPTION: perceived levels of corruption in a country

LADDER SCORE IN DYSTOPIA: measure of happiness with low scores

EXPLAINED BY: LOG GDP PER CAPITA: value to overall happiness score related to this

EXPLAINED BY: SOCIAL SUPPORT: value to overall happiness score related to this

EXPLAINED BY: HEALTHY LIFE EXPECTANCY: value to overall happiness score related to this

EXPLAINED BY: FREEDOM TO MAKE LIFE CHOICES: value to overall happiness score related to this

EXPLAINED BY: GENEROSITY: value to overall happiness score related to this

EXPLAINED BY: PERCEPTIONS OF CORRUPTION: value to overall happiness score related to this

DYSTOPIA + RESIDUAL: value to overall happiness score related to this

Machine Learning Methods:¶

  • Clustering: group countries together based on happiness scores and various scores (could be clustered with social support and/or life expectancy). WE might be able to find that countries w/ higher GDP tend to have higher scores in certain factors -- overall method would help identify trends and patterns.
  • Linear Regression: checking if there's is a relationship between a country's happiness score and outside factors (GDP, life expectancy, freedom to make life choices, access to healthcare)

The methods above can help us identify factors that contribute to happiness and life satisifcation, as well as how to promote health equity and better health outcomes on an individual and community-level.