The effects of generational poverty are much more far reaching than not having the money to pay for certain things. A lot of people undervalue how important is to help those dealing with generational poverty because they don't understand all of the different aspects of life that are affected.
Around 16% of people living in Chicago are in poverty, while only 12.3% of the general US population live in poverty. A dataset containing health indicators and measures of wealth of the different towns in Chicago can provide insight into a connection between the two. The goal of this project is to identify a relationship between health indicators (cancer rate, infant mortality rate, etc.) and the economic indicators (percent of people living below poverty level, per capita income, etc).
If successful, this work may yield a classifer that can predict the health of a community based on its wealth and vice versa. A predictor like this can be helpful to show people how detrimental poverty can be on all aspects of life. It can bring awareness to the link between the two and act as a call for change.
One negative outcome of a classifer like this is that it doesn't take account all of the complexities of poverty and the things that can contribute to it.
We will use a Kaggle Dataset of Public Health Indicators in Chicago to observe the following features for each town:
import pandas as pd
df_chicago = pd.read_csv("public-health-statistics-selected-public-health-indicators-by-chicago-community-area-1.csv")
df_chicago.head()
index | Community Area | Community Area Name | Birth Rate | General Fertility Rate | Low Birth Weight | Prenatal Care Beginning in First Trimester | Preterm Births | Teen Birth Rate | Assault (Homicide) | ... | Childhood Lead Poisoning | Gonorrhea in Females | Gonorrhea in Males | Tuberculosis | Below Poverty Level | Crowded Housing | Dependency | No High School Diploma | Per Capita Income | Unemployment | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 0 | 1 | Rogers Park | 16.4 | 62.0 | 11.0 | 73.0 | 11.2 | 40.8 | 7.7 | ... | 0.5 | 322.5 | 423.3 | 11.4 | 22.7 | 7.9 | 28.8 | 18.1 | 23714 | 7.5 |
1 | 1 | 2 | West Ridge | 17.3 | 83.3 | 8.1 | 71.1 | 8.3 | 29.9 | 5.8 | ... | 1.0 | 141.0 | 205.7 | 8.9 | 15.1 | 7.0 | 38.3 | 19.6 | 21375 | 7.9 |
2 | 2 | 3 | Uptown | 13.1 | 50.5 | 8.3 | 77.7 | 10.3 | 35.1 | 5.4 | ... | 0.5 | 170.8 | 468.7 | 13.6 | 22.7 | 4.6 | 22.2 | 13.6 | 32355 | 7.7 |
3 | 3 | 4 | Lincoln Square | 17.1 | 61.0 | 8.1 | 80.5 | 9.7 | 38.4 | 5.0 | ... | 0.4 | 98.8 | 195.5 | 8.5 | 9.5 | 3.1 | 25.6 | 12.5 | 35503 | 6.8 |
4 | 4 | 5 | North Center | 22.4 | 76.2 | 9.1 | 80.4 | 9.8 | 8.4 | 1.0 | ... | 0.9 | 85.4 | 188.6 | 1.9 | 7.1 | 0.2 | 25.5 | 5.4 | 51615 | 4.5 |
5 rows × 30 columns
data_dict = {"index": "index",
"Community Area": "unique identifier for each community area as an int",
"Community Area Name": "name of the community area as a str",
"Birth Rate": "number of live births per 1000 population",
"General Fertility Rate": "number of live births per 1000 women aged 15-44",
"Low Birth Weight": "number of live births with a birth weight of less than 2500 g per 1000 live births",
"Prenatal Care Beginning in First Trimester": "% of live births with prenatal care beginning in first trimester",
"Preterm Births": "number of live births before 37 completed weeks of gestation per 1000 live births",
"Teen Birth Rate": "number of live births to women aged 15-19 per 1000 women aged 15-19",
"Assault(Homicide)": "number of homicides per 100000 people",
"Breast Cancer in Females": "number of females that died from breast cancer per 100000 females",
"Cancer(all sites)": "number of people that died from cancer per 100000 people",
"Colorectal Cancer": "number of people that died from colorectal cancer per 100000 people",
"Diabetes-related": "number of people that died from diabetes related issues per 100000 people",
"Firearm-related": "number of people that died from firearm related incidents per 100000 people",
"Infant Mortality Rate": "number of infants that died per 1000 live births",
"Lung Cancer": "number of people that died from lung cancer per 100000 people",
"Prostate Cancer in Males": "number of men that died from prostate cancer per 100000 males",
"Stroke (Cerebrovascular Disease)": "number of people that died from stroke per 100000 people",
"Childhood Blood Lead Level Screening": "number of children w high lead levels per 1000 children aged 0-6",
"Childhood Lead Poisoning": "number of children w lead poisoning per 100 children",
"Gonorrhea in Females": "number of females w gonorrhea per 100000 females aged 15-44",
"Gonorrhea in Males": "number of males w gonorrhea per 100000 males aged 15-44",
"Tuberculosis": "number of people w tuberculosis per 100000 people",
"Below Poverty Level": "% of households below poverty level",
"Crowded Housing": "% of occupied housing units",
"Dependency": "% of people aged less than 16 or more than 64",
"No High School Diploma": "% of people aged 25 or older w no high school diploma",
"Per Capita Income": "income w 2011 inflation adjusted dollars",
"Unemployment": "% of people in labor force aged 16 years or older"}
data_dict
{'index': 'index', 'Community Area': 'unique identifier for each community area as an int', 'Community Area Name': 'name of the community area as a str', 'Birth Rate': 'number of live births per 1000 population', 'General Fertility Rate': 'number of live births per 1000 women aged 15-44', 'Low Birth Weight': 'number of live births with a birth weight of less than 2500 g per 1000 live births', 'Prenatal Care Beginning in First Trimester': '% of live births with prenatal care beginning in first trimester', 'Preterm Births': 'number of live births before 37 completed weeks of gestation per 1000 live births', 'Teen Birth Rate': 'number of live births to women aged 15-19 per 1000 women aged 15-19', 'Assault(Homicide)': 'number of homicides per 100000 people', 'Breast Cancer in Females': 'number of females that died from breast cancer per 100000 females', 'Cancer(all sites)': 'number of people that died from cancer per 100000 people', 'Colorectal Cancer': 'number of people that died from colorectal cancer per 100000 people', 'Diabetes-related': 'number of people that died from diabetes related issues per 100000 people', 'Firearm-related': 'number of people that died from firearm related incidents per 100000 people', 'Infant Mortality Rate': 'number of infants that died per 1000 live births', 'Lung Cancer': 'number of people that died from lung cancer per 100000 people', 'Prostate Cancer in Males': 'number of men that died from prostate cancer per 100000 males', 'Stroke (Cerebrovascular Disease)': 'number of people that died from stroke per 100000 people', 'Childhood Blood Lead Level Screening': 'number of children w high lead levels per 1000 children aged 0-6', 'Childhood Lead Poisoning': 'number of children w lead poisoning per 100 children', 'Gonorrhea in Females': 'number of females w gonorrhea per 100000 females aged 15-44', 'Gonorrhea in Males': 'number of males w gonorrhea per 100000 males aged 15-44', 'Tuberculosis': 'number of people w tuberculosis per 100000 people', 'Below Poverty Level': '% of households below poverty level', 'Crowded Housing': '% of occupied housing units', 'Dependency': '% of people aged less than 16 or more than 64', 'No High School Diploma': '% of people aged 25 or older w no high school diploma', 'Per Capita Income': 'income w 2011 inflation adjusted dollars', 'Unemployment': '% of people in labor force aged 16 years or older'}
One problem is that the health indicators in this dataset are more focused on the mortality and natality. It doesn't include other valuable indicators like life expectancy, obsesity rates, mental health measures, etc. It won't show as comprehensive of a view on the relationship between health and wealth.
We will attempt to tackle this problem as a regression problem. The health indicators (birth rate, cancer(all sites), childhood lead poisoning, etc) measured above will be used to predict the economic indicators (percent of people below poverty level, per capita income, etc). This approach is advantageous because it can show the association of each of these indicators and wealth. Hopefully, it will show that low income neighborhoods score worse on the health indicators.