As we enter the workforce (+ co-op and internship), a big question on a lot of people's mind is "how much will I be getting paid?" I think a better way of looking at this would be "how much can I realistically be paid?" Becoming equipped with this information will allow applicants to better understand their worth and aid in potential salary negotiations.
It can be intimidating to negotiate salary, but with an understanding industry standards, and the pay towards similar jobs in different industries, some of the stress surrounding negotiations can be alleviated.
import pandas as pd
datajobs = pd.read_csv('datajobs.csv', encoding = 'latin-1')
datajobs
company | job title | location | job description | salary estimate | company_size | company_type | company_sector | company_industry | company_founded | ... | python_yn | spark_yn | azure_yn | aws_yn | excel_yn | machine_learning_yn | job_simpl | seniority | description_len | company_age | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | Microsoft | Data & Applied Scientist | Redmond, WA | Microsoft 365 is a key part of the companys c... | 123486 | 10000+ Employees | Company - Public | Information Technology | Computer Hardware Development | 1975.0 | ... | 1 | 0 | 0 | 1 | 0 | 1 | data scientist | junior | 359 | 47.0 |
1 | UT Southwestern Medical Center | Data Scientist or Bioinformatician (remote) | Remote | Center Information:\nThe Quantitative Biomedic... | 93500 | 10000+ Employees | Hospital | Healthcare | Health Care Services & Hospitals | 1943.0 | ... | 1 | 0 | 0 | 0 | 0 | 1 | data scientist | mid | 267 | 79.0 |
2 | Notion | Data Scientist, Growth | New York, NY | About Us:\nWe're on a mission to make it possi... | 137853 | 201 to 500 Employees | Company - Private | Information Technology | Enterprise Software & Network Solutions | 2016.0 | ... | 1 | 0 | 0 | 0 | 0 | 0 | data scientist | Senior | 589 | 6.0 |
3 | Net2Aspire | Jr. Data Scientist | Remote | ? Apply Statistical and Machine Learning metho... | 72500 | Unknown | Company - Public | NaN | NaN | NaN | ... | 0 | 0 | 0 | 0 | 0 | 1 | data scientist | junior | 132 | NaN |
4 | Ntropy Network | Data Scientist | Remote | Over the last few decades, technological innov... | 155000 | 1 to 50 Employees | Company - Private | NaN | NaN | NaN | ... | 1 | 0 | 0 | 1 | 0 | 0 | data scientist | mid | 522 | NaN |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
2079 | YouTube | Staff Software Engineer, Machine Learning, You... | San Bruno, CA | Minimum qualifications:\nBachelor's degree or ... | 141704 | 1001 to 5000 Employees | Subsidiary or Business Segment | Information Technology | Internet & Web Services | 2005.0 | ... | 0 | 0 | 0 | 0 | 0 | 1 | machine learning engineer | Senior | 498 | 17.0 |
2080 | Hunter Engineering | Data Science Co-Op | Bridgeton, MO | Overview:\nDo you have a passion for data scie... | 88383 | 1001 to 5000 Employees | Company - Private | Manufacturing | Machinery Manufacturing | 1946.0 | ... | 1 | 0 | 0 | 0 | 1 | 1 | other | Senior | 349 | 76.0 |
2081 | precision technologies corp | Jr UI/UX Designer Training and Placement | Remote | If you want to start your IT career as a UI/UX... | 70600 | 201 to 500 Employees | Company - Private | Information Technology | Information Technology Support Services | 2008.0 | ... | 1 | 0 | 0 | 1 | 1 | 1 | other | junior | 391 | 14.0 |
2082 | Argonne National Laboratory | Postdoctoral Appointee - Probabilistic Machine... | Lemont, IL | The Mathematics and Computer Science Division ... | 54291 | 1001 to 5000 Employees | Government | Management & Consulting | Research & Development | 1946.0 | ... | 0 | 0 | 0 | 0 | 0 | 1 | machine learning engineer | Senior | 506 | 76.0 |
2083 | Colossal Biosciences | Graduate Research Fellow, Machine Learning S... | Dallas, TX | The Machine Learning Graduate Research Fellow ... | 66609 | 1 to 50 Employees | Company - Private | NaN | NaN | NaN | ... | 1 | 0 | 0 | 0 | 0 | 1 | machine learning engineer | Senior | 376 | NaN |
2084 rows × 23 columns
This dataset includes information (company, title, descriptions, location, requirements, salary, etc) of just over 2,000 data related jobs. My hope is to be able to work with the value_counts() function to try to find common words in the job description and job titles columns, and use these in conjunction with the other columns as features in determining expected salary of the job.
In the end, if someone were to input a dataset with jobs in the format of these columns, the intended output would be that job's expected salary. The goal would be to allow for a greater sense of salary transparancy (to the applicant) and provide them with the information that could serve useful in negotiations.
data_dict = {'company':'company name',
'job title':'job title',
'location':'office location, when available',
'job description':'all available details regarding the position',
'salary estimate':'average annual salary',
'company_size':'approximation of # of employees at company',
'company_type':'classification (public, private, etc)',
'company_sector':'job/company sector',
'company_industry':'company industry',
'company_founded':'year company was founded',
'company_revenue':'company annual revenue',
'hourly':'whether the pay is hourly or not',
'rating':'company rating on glassdoor',
'python_yn':'is python a required skill?',
'spark_yn': 'is spark a required skill?',
'azure_yn':'is azure a required skill?',
'excel_yn': 'is excel a required skill?',
'machine_learning_yn':'is machine learning a required skill?',
'seniority':'position ranking in company hierarchy',
'description_len':'length of job description (word count)',
'company_age':'how many years the company has been in business'}