Around the world in developing countries there are large inequalities and injustices in the education of different demographics. Specifically women, are severley underepresented in education, and are provided far fewer opportunities. According to the UNICEF, and the World Bank, educating women not only helps to stem gender inequality, child pregnancies, and domestic violence against women, it actually is crucial for a country to turn the corner to develop into an industrialized country. Educating women adds to the high earning and productive workforce that help grow a nations economy.
The solution to this issue involves removing the barriers preventing women from remaining in the education syustem in developing countries; whether these are poltical, cultrual, or economic barries. To accomplish this, a greater light needs to be shone on the prevalance that womens education has in a countries development. In order to demonstrate this, the project will use data downloaded from the World Bank Data Catalog to identify the relationship between education statistics and a countries development.
*A note about the headers below. Each column shown is broken down by gender, and other population satistics in the full data set the full extent of column names was excluded for simplicities sake. The full extent of the projects metadata can be accessed through the loaded dataframe 'metadata'.
The Human Capital Index will serve as the indicator of a countries development in this project.
The table displayed below was cleaned for display purposes.
import pandas as pd
import numpy as np
df = pd.read_csv('WorldBankEducationData.csv')
new_df = df.replace('..', np.nan)
new_df = new_df.dropna(thresh = 75)
metadata = pd.read_excel('P_Data_Extract_From_World_Development_Indicators_Metadata.xlsx')
new_df.head()
Country Name | Country Code | Time | Time Code | Adjusted savings: education expenditure (% of GNI) [NY.ADJ.AEDU.GN.ZS] | Adjusted savings: education expenditure (current US$) [NY.ADJ.AEDU.CD] | Compulsory education, duration (years) [SE.COM.DURS] | Current education expenditure, primary (% of total expenditure in primary public institutions) [SE.XPD.CPRM.ZS] | Current education expenditure, secondary (% of total expenditure in secondary public institutions) [SE.XPD.CSEC.ZS] | Current education expenditure, tertiary (% of total expenditure in tertiary public institutions) [SE.XPD.CTER.ZS] | ... | Unemployment with intermediate education, male (% of male labor force with intermediate education) [SL.UEM.INTM.MA.ZS] | Human capital index (HCI) (scale 0-1) [HD.HCI.OVRL] | Human capital index (HCI), female (scale 0-1) [HD.HCI.OVRL.FE] | Human capital index (HCI), female, lower bound (scale 0-1) [HD.HCI.OVRL.LB.FE] | Human capital index (HCI), female, upper bound (scale 0-1) [HD.HCI.OVRL.UB.FE] | Human capital index (HCI), lower bound (scale 0-1) [HD.HCI.OVRL.LB] | Human capital index (HCI), male (scale 0-1) [HD.HCI.OVRL.MA] | Human capital index (HCI), male, lower bound (scale 0-1) [HD.HCI.OVRL.LB.MA] | Human capital index (HCI), male, upper bound (scale 0-1) [HD.HCI.OVRL.UB.MA] | Human capital index (HCI), upper bound (scale 0-1) [HD.HCI.OVRL.UB] | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
292 | Austria | AUT | 2014.0 | YR2014 | 5.16457796151635 | 22879080369.5174 | 10 | 95.3740081787109 | 97.4952087402344 | 92.9210433959961 | ... | 5.38000011444092 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
295 | Austria | AUT | 2017.0 | YR2017 | 5.06065834381404 | 20849912376.5138 | 13 | 91.6796112060547 | 96.6040878295898 | 91.4032211303711 | ... | 5.55999994277954 | 0.793 | 0.798 | 0.787 | 0.809 | 0.783 | 0.785 | 0.773 | 0.798 | 0.803 |
394 | Bangladesh | BGD | 2016.0 | YR2016 | 1.24455026946621 | 2912247630.55094 | 5 | NaN | 72.821533203125 | 91.867301940918 | ... | 7.63000011444092 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
395 | Bangladesh | BGD | 2017.0 | YR2017 | 1.13085133958215 | 2940213482.91359 | 5 | NaN | NaN | NaN | ... | 6.42000007629395 | 0.479 | 0.492 | 0.481 | 0.502 | 0.468 | 0.465 | 0.454 | 0.475 | 0.488 |
466 | Belgium | BEL | 2013.0 | YR2013 | 6.11910737364501 | 32676033375.2643 | 12 | 91.9497222900391 | 96.6265335083008 | 95.6490783691406 | ... | 7.94999980926514 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
5 rows × 89 columns
One of the potential issues is the lack of data for certain countries, this may be fixed by filtering only country with a certain amount of data. Additionaly it will be important to recognize the distinction between correlation and causation. The fact that the data covers 25 years may alleviate this issue, as a time offset can be introduced to explore more causal relationships.
A regression model may be the best approach to this problem as we can provide the model with a large list of paramters from teh data and attept to predict the HCI score which is a continuous variable. This approach allows us to do fast and simple computations and see the importance of each paramter. Disadavnatges include the linearity assumption and mutual independence assumption.