Credit Risk Analysis for Credit Rating¶

1) Describes and motivates a real-world problem where data science may provide helpful insights. Your description should be easily understood by a casual reader and include citations to motivating sources or relevant information¶

The problem that I am currently looking at is the credit score rating. Credit scores are very important because it can control factors such as buying houses, taking out a loan, and employment. With the advancement of AI in the financial service industry, it can offer an unique opportunitie to imporve the fairness in the credit scoring. Scalable credit risk rating can be made possible by methods and algorithms based on machine learning and data science. The scalability can be built by data engineers to boost platform capacity and Data scientists can adjust and fit models to raise credit scores for the system and the individual users.

Sources used: https://www.theregreview.org/2022/06/07/moss-new-approach-to-regulating-credit-scoring-ai/#:~:text=Credit%20scores%20can%20control%20housing,deepen%20the%20impact%20of%20bias.

https://www.theregreview.org/2022/06/07/moss-new-approach-to-regulating-credit-scoring-ai/#:~:text=Credit%20scores%20can%20control%20housing,deepen%20the%20impact%20of%20bias.

2) Explicitly load and show your dataset. Provide a data dictionary which explains the meaning of each feature present. Demonstrate that this data is sufficient to make progress on your real-world problem described above.¶

Variable name: Credit Scores Data type: Numerical Data Description: Given with a person's credit related information, we will be building a machine learning model that can classify the credit score Scale: 300 - 850 Intepretation: higher values on the scale indicate to lower risk and a higher credit

Variable name: Age Data Type: Numerical Data Description: this will be used to determine the relation between the age and credits Scale: Number of Years Range: positive integer value Intepretation: the older the person this, the higher the credibility might be due to their job experience

Variable name: Income Data Type: numerical Data Description: the annual income per person Scale: US dollars Intepretation: this will be used to determine the relation with a higher income which might mean a greater financial stability and therefore might have a higher credit score

Variable name: Number of Bank Accounts & number of credit cards Data Type: Numerical Data Description: this is the total number of bank accounts a person would have Scale: by count Range: positive values Intepretation: number of bank accounts could affect credit rating due to more credit cards meaning more money to pay back. Good thing or bad thing - depending on how much they are able to pay back.

Variable names: Interest Rate Data Type: numerical data Description: the interest rate on the credit account Scale: percentage Range: positive Intepretation: higher interest rate may result in making more consistent payments which might raise higher risk and potential higher cost

Variable names: Number of loans Data Type: numerical data Description: the number of loans a person may have Scale: count Range: positive integer Intepretation: the higher the loan, the higher the credit. It may indicate that one has the ability to pay off loans within a period of time

Variable names: type of loan Data Type: categorical Description: the type of loan a person may have Scale: depends Range: depends Intepretation: the level of loan may indicate one's financial stability

In [7]:
import pandas as pd

pdDf = pd.read_csv('test.csv')
pdDf.head()
Out[7]:
ID Customer_ID Month Name Age SSN Occupation Annual_Income Monthly_Inhand_Salary Num_Bank_Accounts ... Num_Credit_Inquiries Credit_Mix Outstanding_Debt Credit_Utilization_Ratio Credit_History_Age Payment_of_Min_Amount Total_EMI_per_month Amount_invested_monthly Payment_Behaviour Monthly_Balance
0 0x160a CUS_0xd40 September Aaron Maashoh 23 821-00-0265 Scientist 19114.12 1824.843333 3 ... 2022.0 Good 809.98 35.030402 22 Years and 9 Months No 49.574949 236.64268203272135 Low_spent_Small_value_payments 186.26670208571772
1 0x160b CUS_0xd40 October Aaron Maashoh 24 821-00-0265 Scientist 19114.12 1824.843333 3 ... 4.0 Good 809.98 33.053114 22 Years and 10 Months No 49.574949 21.465380264657146 High_spent_Medium_value_payments 361.44400385378196
2 0x160c CUS_0xd40 November Aaron Maashoh 24 821-00-0265 Scientist 19114.12 1824.843333 3 ... 4.0 Good 809.98 33.811894 NaN No 49.574949 148.23393788500925 Low_spent_Medium_value_payments 264.67544623342997
3 0x160d CUS_0xd40 December Aaron Maashoh 24_ 821-00-0265 Scientist 19114.12 NaN 3 ... 4.0 Good 809.98 32.430559 23 Years and 0 Months No 49.574949 39.08251089460281 High_spent_Medium_value_payments 343.82687322383634
4 0x1616 CUS_0x21b1 September Rick Rothackerj 28 004-07-5839 _______ 34847.84 3037.986667 2 ... 5.0 Good 605.03 25.926822 27 Years and 3 Months No 18.816215 39.684018417945296 High_spent_Large_value_payments 485.2984336755923

5 rows × 27 columns

3) Write one or two sentences about how the data will be used to solve the problem. Earlier in the semester, we won’t have studied the Machine Learning methods just yet but you should have a general idea of what the ML will set out to do.¶

The credit risk data can be used to evaluate the creditworthiness of borrowers and mamke more informed decisions about credit applciations, interst rates, and credit limits. This helps solve credit rating problems by minimizing the risk of default or non-payment on loans and credit accounts.