As college students, many of us do not know if we are making the correct decisions towards our professional lives. Although we can see current job availability, salaries, etc., it is difficult to take all of these factors into consideration when choosing a job because of each person's individual preferences in a career.
Jobs that include Data Science can take many different career paths from research to economics. Because of these plentiful options, there is also a broad range of the factors that shape the life-style involved with a job in Data Science. The goal of this project is to identify and use multiple factors that determine success in a field (experience level in the job, salary) to determine if each individual, as a Data Science major, is likely to be successful tailored to their own preferences.
If accurate, this project will predict how successful a college graduate with a degree in Data Science will be. This project will therefore be able to help current Data Science students, such as ourselves, in determining if the most probable outcome of a Data Science graduate is a lifestyle that each individual would consider successful. By knowing this information, Data Science students that are unsure of how the field fits their future plans will be able to switch to a different field before it is too late.
However, one negative outcome of this project is that if it predicts a commonly undesired lifestyle, many Data Science students may leave the field and there may be a lack of Data Scientists and people in related professions.
In addition, since data science is a somewhat niche job, in the future, this project can be used on data from any profession with small changes to variable names for each different data set used.
We will use a Kaggle Dataset of Data Science Jobs to observe the following features for each song:
This project seeks to use the features above to estimate the overall success of a data science job.
work_year | experience_level | employment_type | job_title | salary | salary_currency | salary_in_usd | employee_residence | remote_ratio | company_location | company_size |
---|---|---|---|---|---|---|---|---|---|---|
2021e | EN | FT | Data Science Consultant | 54000 | EUR | 64369 | DE | 50 | DE | L |
2020 | SE | FT | Data Scientist | 60000 | EUR | 68428 | GR | 100 | US | L |
2021e | EX | FT | Head of Data Science | 85000 | USD | 85000 | RU | 0 | RU | M |
2021e | EX | FT | Head of Data | 230000 | USD | 230000 | RU | 50 | RU | L |
2021e | EN | FT | Machine Learning Engineer | 125000 | USD | 125000 | US | 100 | US | S |
A potential problem in this data set is the lack of data points, in this data set there are about 200 data points. Although not generally small, because of the range of jobs, the lack of data might result in a small pool of data from each specific job.
This project will use K-Nearest Neighbors in Scikit Learn in order to fairly weigh all of the different factors that determine success in a job. This will ultimately create a scale of average success in a job in Data Science that will be used to determine what jobs in Data Science are most successful. In addition, data science majors can input preferred factors to be compared with those factors in jobs in Data Science to predict what Data Science jobs would be best for the individual. From there, how realistic and difficult to acheive these jobs can be considered to determine if Data Science is the right path