(1%) Describes and motivates a real-world problem where data science may provide helpful insights. Your description should be easily understood by a casual reader and include citations to motivating sources or relevant information (e.g. news articles, further reading links … Wikipedia makes for a poor reference but the links it cites are usually promising).
This project proposal is aimed to tackle the issue of sustainable business strategy focused around banks and credit care companies. One of the banking and credit industry's biggest problems is customer attrition. Customer churn rates have been a persistent issue because it is difficult to implement solutions in an ultra-competitive environment. According to Forbes, US-based credit card providers average a 20% annual churn rate, meaning that each year they lose one fifth of their existing customers due to various reasons. For not only credit card providers, a 20% churn rate is a major red indicator for the business. Many experts believe that fixing the issue of churn rate is more effective than spending efforts on customer acquisition. One McKinsey & Company study estimates that efforts to reduce churn rate can increase earnings by up to 9%. Churn rate is an obstacle in many industries, however it impacts credit card providers the hardest. One method to resolve high levels of attrition include applying predictive analytics to react to attrition more effectively.
References:
Why Retaining Customers For Banks Is As Important As Winning New Ones - Forbes Article
(1%) Explicitly load and show your dataset. Provide a data dictionary which explains the meaning of each feature present. Demonstrate that this data is sufficient to make progress on your real-world problem described above.
import pandas as pd
pd.read_csv('bank_data.csv')
# data dictionary for features in the dataset
features = {"CLIENTNUM":"Client ID",
"Attrition_Flag":"Existing or former customer",
"Customer_Age":"Age",
"Gender":"Gender",
"Dependent_count":"Number of dependents",
"Education_Level":"Highest level of education",
"Marital_Status":"Married, Single, Unknown",
"Income_Category":"Less than 40k, 40k-60k, other",
"Card_Category":"Card types: Blue, Silver, Gold, Platinum",
"Months_on_book":"Relationship with bank by month",
"Months_Inactive_12_mon":"Number of months inactive over last 12 months",
"Contacts_Count_12_mon":"Number of contacts over last 12 months",
"Credit_Limit":"Credit limit",
"Total_Revolving_Bal":"Total revolving balance",
"Avg_Open_To_Buy":"Open to buy a credit line (Average TTM)",
"Total_Amt_Chng_Q4_Q1":"Change in transaction amount from Q4 to Q1",
"Total_Trans_Amt":"Total transaction amount (TTM)",
"Total_Trans_Ct":"Total number of transactions (TTM)",
"Total_Ct_Chng_Q4_Q1":"Change in total number of transactions from Q4 to Q1",
"Avg_Utilization_Ratio":"Credit card spending limit utilization percentage"}
CLIENTNUM | Attrition_Flag | Customer_Age | Gender | Dependent_count | Education_Level | Marital_Status | Income_Category | Card_Category | Months_on_book | ... | Months_Inactive_12_mon | Contacts_Count_12_mon | Credit_Limit | Total_Revolving_Bal | Avg_Open_To_Buy | Total_Amt_Chng_Q4_Q1 | Total_Trans_Amt | Total_Trans_Ct | Total_Ct_Chng_Q4_Q1 | Avg_Utilization_Ratio | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 768805383 | Existing Customer | 45 | M | 3 | High School | Married | $60K - $80K | Blue | 39 | ... | 1 | 3 | 12691.0 | 777 | 11914.0 | 1.335 | 1144 | 42 | 1.625 | 0.061 |
1 | 818770008 | Existing Customer | 49 | F | 5 | Graduate | Single | Less than $40K | Blue | 44 | ... | 1 | 2 | 8256.0 | 864 | 7392.0 | 1.541 | 1291 | 33 | 3.714 | 0.105 |
2 | 713982108 | Existing Customer | 51 | M | 3 | Graduate | Married | $80K - $120K | Blue | 36 | ... | 1 | 0 | 3418.0 | 0 | 3418.0 | 2.594 | 1887 | 20 | 2.333 | 0.000 |
3 | 769911858 | Existing Customer | 40 | F | 4 | High School | Unknown | Less than $40K | Blue | 34 | ... | 4 | 1 | 3313.0 | 2517 | 796.0 | 1.405 | 1171 | 20 | 2.333 | 0.760 |
4 | 709106358 | Existing Customer | 40 | M | 3 | Uneducated | Married | $60K - $80K | Blue | 21 | ... | 1 | 0 | 4716.0 | 0 | 4716.0 | 2.175 | 816 | 28 | 2.500 | 0.000 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
10122 | 772366833 | Existing Customer | 50 | M | 2 | Graduate | Single | $40K - $60K | Blue | 40 | ... | 2 | 3 | 4003.0 | 1851 | 2152.0 | 0.703 | 15476 | 117 | 0.857 | 0.462 |
10123 | 710638233 | Attrited Customer | 41 | M | 2 | Unknown | Divorced | $40K - $60K | Blue | 25 | ... | 2 | 3 | 4277.0 | 2186 | 2091.0 | 0.804 | 8764 | 69 | 0.683 | 0.511 |
10124 | 716506083 | Attrited Customer | 44 | F | 1 | High School | Married | Less than $40K | Blue | 36 | ... | 3 | 4 | 5409.0 | 0 | 5409.0 | 0.819 | 10291 | 60 | 0.818 | 0.000 |
10125 | 717406983 | Attrited Customer | 30 | M | 2 | Graduate | Unknown | $40K - $60K | Blue | 36 | ... | 3 | 3 | 5281.0 | 0 | 5281.0 | 0.535 | 8395 | 62 | 0.722 | 0.000 |
10126 | 714337233 | Attrited Customer | 43 | F | 2 | Graduate | Married | Less than $40K | Silver | 25 | ... | 2 | 4 | 10388.0 | 1961 | 8427.0 | 0.703 | 10294 | 61 | 0.649 | 0.189 |
10127 rows × 21 columns
This data is sufficient to make progress on the proposal's main topic as it provides both background details about each client and specific details regarding their credit history with the bank. This dataset also has over 10k records which provides a more effective scope for the application of a classifier.
(1%) Write one or two sentences about how the data will be used to solve the problem. Earlier in the semester, we won’t have studied the Machine Learning methods just yet but you should have a general idea of what the ML will set out to do.
We will use classification given parameters such as age, income, previous relationship with the bank, credit limit, utilization, and more to determine which existing customers are likely to become inactive in the near future. Doing so will allow us to gain more insight into the ideal parameters that determine the potential for inactiveness and which existing customers are more prone to becoming inactive in order to make strategic decisions for the credit cart provider.