customer churn¶

(1%) Describes and motivates a real-world problem where data science may provide helpful insights. Your description should be easily understood by a casual reader and include citations to motivating sources or relevant information (e.g. news articles, further reading links … Wikipedia makes for a poor reference but the links it cites are usually promising).

This project proposal is aimed to tackle the issue of sustainable business strategy focused around banks and credit care companies. One of the banking and credit industry's biggest problems is customer attrition. Customer churn rates have been a persistent issue because it is difficult to implement solutions in an ultra-competitive environment. According to Forbes, US-based credit card providers average a 20% annual churn rate, meaning that each year they lose one fifth of their existing customers due to various reasons. For not only credit card providers, a 20% churn rate is a major red indicator for the business. Many experts believe that fixing the issue of churn rate is more effective than spending efforts on customer acquisition. One McKinsey & Company study estimates that efforts to reduce churn rate can increase earnings by up to 9%. Churn rate is an obstacle in many industries, however it impacts credit card providers the hardest. One method to resolve high levels of attrition include applying predictive analytics to react to attrition more effectively.

References:

Why Retaining Customers For Banks Is As Important As Winning New Ones - Forbes Article

A Smarter Way To Reduce Customer Churn - Forbes Article

(1%) Explicitly load and show your dataset. Provide a data dictionary which explains the meaning of each feature present. Demonstrate that this data is sufficient to make progress on your real-world problem described above.

In [4]:
import pandas as pd

pd.read_csv('bank_data.csv')

# data dictionary for features in the dataset
features = {"CLIENTNUM":"Client ID",
            "Attrition_Flag":"Existing or former customer",
            "Customer_Age":"Age",
            "Gender":"Gender",
            "Dependent_count":"Number of dependents",
            "Education_Level":"Highest level of education",
            "Marital_Status":"Married, Single, Unknown",
            "Income_Category":"Less than 40k, 40k-60k, other",
            "Card_Category":"Card types: Blue, Silver, Gold, Platinum",
            "Months_on_book":"Relationship with bank by month",
            "Months_Inactive_12_mon":"Number of months inactive over last 12 months",
            "Contacts_Count_12_mon":"Number of contacts over last 12 months",
            "Credit_Limit":"Credit limit",
            "Total_Revolving_Bal":"Total revolving balance",
            "Avg_Open_To_Buy":"Open to buy a credit line (Average TTM)",
            "Total_Amt_Chng_Q4_Q1":"Change in transaction amount from Q4 to Q1",
            "Total_Trans_Amt":"Total transaction amount (TTM)",
            "Total_Trans_Ct":"Total number of transactions (TTM)",
            "Total_Ct_Chng_Q4_Q1":"Change in total number of transactions from Q4 to Q1",
            "Avg_Utilization_Ratio":"Credit card spending limit utilization percentage"}
Out[4]:
CLIENTNUM Attrition_Flag Customer_Age Gender Dependent_count Education_Level Marital_Status Income_Category Card_Category Months_on_book ... Months_Inactive_12_mon Contacts_Count_12_mon Credit_Limit Total_Revolving_Bal Avg_Open_To_Buy Total_Amt_Chng_Q4_Q1 Total_Trans_Amt Total_Trans_Ct Total_Ct_Chng_Q4_Q1 Avg_Utilization_Ratio
0 768805383 Existing Customer 45 M 3 High School Married $60K - $80K Blue 39 ... 1 3 12691.0 777 11914.0 1.335 1144 42 1.625 0.061
1 818770008 Existing Customer 49 F 5 Graduate Single Less than $40K Blue 44 ... 1 2 8256.0 864 7392.0 1.541 1291 33 3.714 0.105
2 713982108 Existing Customer 51 M 3 Graduate Married $80K - $120K Blue 36 ... 1 0 3418.0 0 3418.0 2.594 1887 20 2.333 0.000
3 769911858 Existing Customer 40 F 4 High School Unknown Less than $40K Blue 34 ... 4 1 3313.0 2517 796.0 1.405 1171 20 2.333 0.760
4 709106358 Existing Customer 40 M 3 Uneducated Married $60K - $80K Blue 21 ... 1 0 4716.0 0 4716.0 2.175 816 28 2.500 0.000
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
10122 772366833 Existing Customer 50 M 2 Graduate Single $40K - $60K Blue 40 ... 2 3 4003.0 1851 2152.0 0.703 15476 117 0.857 0.462
10123 710638233 Attrited Customer 41 M 2 Unknown Divorced $40K - $60K Blue 25 ... 2 3 4277.0 2186 2091.0 0.804 8764 69 0.683 0.511
10124 716506083 Attrited Customer 44 F 1 High School Married Less than $40K Blue 36 ... 3 4 5409.0 0 5409.0 0.819 10291 60 0.818 0.000
10125 717406983 Attrited Customer 30 M 2 Graduate Unknown $40K - $60K Blue 36 ... 3 3 5281.0 0 5281.0 0.535 8395 62 0.722 0.000
10126 714337233 Attrited Customer 43 F 2 Graduate Married Less than $40K Silver 25 ... 2 4 10388.0 1961 8427.0 0.703 10294 61 0.649 0.189

10127 rows × 21 columns

This data is sufficient to make progress on the proposal's main topic as it provides both background details about each client and specific details regarding their credit history with the bank. This dataset also has over 10k records which provides a more effective scope for the application of a classifier.

link to data souce

(1%) Write one or two sentences about how the data will be used to solve the problem. Earlier in the semester, we won’t have studied the Machine Learning methods just yet but you should have a general idea of what the ML will set out to do.

We will use classification given parameters such as age, income, previous relationship with the bank, credit limit, utilization, and more to determine which existing customers are likely to become inactive in the near future. Doing so will allow us to gain more insight into the ideal parameters that determine the potential for inactiveness and which existing customers are more prone to becoming inactive in order to make strategic decisions for the credit cart provider.