Education as a Predictor of a Countries Development¶

Motivation¶

Problem¶

Around the world in developing countries there are large inequalities and injustices in the education of different demographics. Specifically women, are severley underepresented in education, and are provided far fewer opportunities. According to the UNICEF, and the World Bank, educating women not only helps to stem gender inequality, child pregnancies, and domestic violence against women, it actually is crucial for a country to turn the corner to develop into an industrialized country. Educating women adds to the high earning and productive workforce that help grow a nations economy.

Solution¶

The solution to this issue involves removing the barriers preventing women from remaining in the education syustem in developing countries; whether these are poltical, cultrual, or economic barries. To accomplish this, a greater light needs to be shone on the prevalance that womens education has in a countries development. In order to demonstrate this, the project will use data downloaded from the World Bank Data Catalog to identify the relationship between education statistics and a countries development.

DataSet¶

Data Dictionary:¶

*A note about the headers below. Each column shown is broken down by gender, and other population satistics in the full data set the full extent of column names was excluded for simplicities sake. The full extent of the projects metadata can be accessed through the loaded dataframe 'metadata'.

  • Adjusted savings: education expenditure: refers to the current operating expenditures in education, including wages and salaries and excluding capital investments in buildings and equipment
  • Compulsory Education Duration: number of years that children are legally obliged to attend school
  • Current Education Expenditure: includes staff compensation and current expenditure other than for staff compensation (ex. on teaching materials, ancillary services and administration)
  • Educational Attainment: percentage of specified population that reach specified education level
  • Expenditure on __ education expenditure spent on ___ education level as a percentage of total general government expenditure on education
  • Government expenditure on education: General government expenditure on education (current, capital, and transfers) is expressed as a percentage of GDP
  • Gross intake ratio: new entrants in the specified grade of education regardless of age, expressed as a percentage of the population of the official entrance age
  • Labor force with ___ education: % of total working-age population with ___ level education
  • __ Education Duration: number of years in ___ education level
  • __ Education Pupils: number of student enrolled in ___ education level
  • Share of youth not in education, employment or training: proportion of young people who are not in education, employment, or training to the population of the corresponding age group
  • Unemployment with __ education: % of total labor force with ___ education
  • Human capital index: measures the productivity as a future worker of child born today relative to the benchmark of full health and complete education

The Human Capital Index will serve as the indicator of a countries development in this project.

The table displayed below was cleaned for display purposes.

In [47]:
import pandas as pd
import numpy as np

df = pd.read_csv('WorldBankEducationData.csv')
new_df = df.replace('..', np.nan)
new_df = new_df.dropna(thresh = 75)

metadata = pd.read_excel('P_Data_Extract_From_World_Development_Indicators_Metadata.xlsx')

new_df.head()
Out[47]:
Country Name Country Code Time Time Code Adjusted savings: education expenditure (% of GNI) [NY.ADJ.AEDU.GN.ZS] Adjusted savings: education expenditure (current US$) [NY.ADJ.AEDU.CD] Compulsory education, duration (years) [SE.COM.DURS] Current education expenditure, primary (% of total expenditure in primary public institutions) [SE.XPD.CPRM.ZS] Current education expenditure, secondary (% of total expenditure in secondary public institutions) [SE.XPD.CSEC.ZS] Current education expenditure, tertiary (% of total expenditure in tertiary public institutions) [SE.XPD.CTER.ZS] ... Unemployment with intermediate education, male (% of male labor force with intermediate education) [SL.UEM.INTM.MA.ZS] Human capital index (HCI) (scale 0-1) [HD.HCI.OVRL] Human capital index (HCI), female (scale 0-1) [HD.HCI.OVRL.FE] Human capital index (HCI), female, lower bound (scale 0-1) [HD.HCI.OVRL.LB.FE] Human capital index (HCI), female, upper bound (scale 0-1) [HD.HCI.OVRL.UB.FE] Human capital index (HCI), lower bound (scale 0-1) [HD.HCI.OVRL.LB] Human capital index (HCI), male (scale 0-1) [HD.HCI.OVRL.MA] Human capital index (HCI), male, lower bound (scale 0-1) [HD.HCI.OVRL.LB.MA] Human capital index (HCI), male, upper bound (scale 0-1) [HD.HCI.OVRL.UB.MA] Human capital index (HCI), upper bound (scale 0-1) [HD.HCI.OVRL.UB]
292 Austria AUT 2014.0 YR2014 5.16457796151635 22879080369.5174 10 95.3740081787109 97.4952087402344 92.9210433959961 ... 5.38000011444092 NaN NaN NaN NaN NaN NaN NaN NaN NaN
295 Austria AUT 2017.0 YR2017 5.06065834381404 20849912376.5138 13 91.6796112060547 96.6040878295898 91.4032211303711 ... 5.55999994277954 0.793 0.798 0.787 0.809 0.783 0.785 0.773 0.798 0.803
394 Bangladesh BGD 2016.0 YR2016 1.24455026946621 2912247630.55094 5 NaN 72.821533203125 91.867301940918 ... 7.63000011444092 NaN NaN NaN NaN NaN NaN NaN NaN NaN
395 Bangladesh BGD 2017.0 YR2017 1.13085133958215 2940213482.91359 5 NaN NaN NaN ... 6.42000007629395 0.479 0.492 0.481 0.502 0.468 0.465 0.454 0.475 0.488
466 Belgium BEL 2013.0 YR2013 6.11910737364501 32676033375.2643 12 91.9497222900391 96.6265335083008 95.6490783691406 ... 7.94999980926514 NaN NaN NaN NaN NaN NaN NaN NaN NaN

5 rows × 89 columns

Potential Problems¶

One of the potential issues is the lack of data for certain countries, this may be fixed by filtering only country with a certain amount of data. Additionaly it will be important to recognize the distinction between correlation and causation. The fact that the data covers 25 years may alleviate this issue, as a time offset can be introduced to explore more causal relationships.

Methods¶

A regression model may be the best approach to this problem as we can provide the model with a large list of paramters from teh data and attept to predict the HCI score which is a continuous variable. This approach allows us to do fast and simple computations and see the importance of each paramter. Disadavnatges include the linearity assumption and mutual independence assumption.