Motivation:¶

Problem¶

As college students, many of us do not know if we are making the correct decisions towards our professional lives. Although we can see current job availability, salaries, etc., it is difficult to take all of these factors into consideration when choosing a job because of each person's individual preferences in a career.

Solution¶

Jobs that include Data Science can take many different career paths from research to economics. Because of these plentiful options, there is also a broad range of the factors that shape the life-style involved with a job in Data Science. The goal of this project is to identify and use multiple factors that determine success in a field (experience level in the job, salary) to determine if each individual, as a Data Science major, is likely to be successful tailored to their own preferences.

Impact¶

If accurate, this project will predict how successful a college graduate with a degree in Data Science will be. This project will therefore be able to help current Data Science students, such as ourselves, in determining if the most probable outcome of a Data Science graduate is a lifestyle that each individual would consider successful. By knowing this information, Data Science students that are unsure of how the field fits their future plans will be able to switch to a different field before it is too late.

However, one negative outcome of this project is that if it predicts a commonly undesired lifestyle, many Data Science students may leave the field and there may be a lack of Data Scientists and people in related professions.

In addition, since data science is a somewhat niche job, in the future, this project can be used on data from any profession with small changes to variable names for each different data set used.

Dataset¶

Detail¶

We will use a Kaggle Dataset of Data Science Jobs to observe the following features for each song:

Experience level (experience_level): The experience level in the job
- Entry-level (EN), mid-level (MI), senior-level (SE), executive-level (EX)
Employment type (employment_type): The type of employment
- Part-time (PT), full-time (FT), contract (CT), freelance (FL)
Job title (job_title): The role worked
Salary in USD (salary_in_usd): The salary converted to USD
Employee Residence (employee_residence): The country of residence
Remote Ratio (remote_ratio): The amount of work done remotely
- Less than 20% remote (0), partially remote (50), more than 80% remote (100)
Company Size (company_size): The average number of people that worked for the company during that year
- Less than 50 employees (S), 50-250 employees (M), more than 250 employees (L)

This project seeks to use the features above to estimate the overall success of a data science job.

work_year	experience_level	employment_type	job_title	salary	salary_currency	salary_in_usd	employee_residence	remote_ratio	company_location	company_size
2021e	EN	FT	Data Science Consultant	54000	EUR	64369	DE	50	DE	L
2020	SE	FT	Data Scientist	60000	EUR	68428	GR	100	US	L
2021e	EX	FT	Head of Data Science	85000	USD	85000	RU	0	RU	M
2021e	EX	FT	Head of Data	230000	USD	230000	RU	50	RU	L
2021e	EN	FT	Machine Learning Engineer	125000	USD	125000	US	100	US	S

Potential Problems¶

A potential problem in this data set is the lack of data points, in this data set there are about 200 data points. Although not generally small, because of the range of jobs, the lack of data might result in a small pool of data from each specific job.

Method:¶

This project will use K-Nearest Neighbors in Scikit Learn in order to fairly weigh all of the different factors that determine success in a job. This will ultimately create a scale of average success in a job in Data Science that will be used to determine what jobs in Data Science are most successful. In addition, data science majors can input preferred factors to be compared with those factors in jobs in Data Science to predict what Data Science jobs would be best for the individual. From there, how realistic and difficult to acheive these jobs can be considered to determine if Data Science is the right path

Will You Be Successful?¶