Fraud, inequality, and PPP Loans¶

Federal response towards COVID-19 (part 1)¶

The COVID-19 Pandemic is one of the defining moments of the 21st century, upending most people's day-to-day lives for nearly three years, with lasting consequences towards society as a whole. Even as the federal government moves to end the public health emergency this May, there seems to have been little reflection on the four trillion dollars that the federal government has spent on various programs to try to minimize the impact of the pandemic.

PPP Loans¶

The Paycheck Protection Program (PPP) was one of the largest pandemic releif programs, with over 10 million loans totaling over $800 billion.

PPP loans were meant to help carry businesses that wouldn't have stayed open otherwise through the pandemic, with the loans being forgiven if the business met certain criteria. However, the program was riddled with fraud, with an estimated $64 Billion in loans going to businesses that were ineligible for the program, or that were not in fact businesses at all. Additionally, PPP loans were distributed in a highley unequal manner, leaving many minority-owned businesses outmaneuvered by larger, more established businesses.

Despite the programs challenges, PPP loans have saved millions of jobs, arguably leaving America in a better economic position than many other developed nations.

The Problem (part 3)¶

The PPP program was a massive undertaking, and the sheer volume of data involved makes it difficult to analyze without using machine learning. Using clustering, we can group PPP loans into categories, finding patterns in the data that could signify fraudulent activity that wouldn't traditionally appear as fraud. Alternatively, we could use regression as to find out how much underrepresented groups were left out of the PPP program.

The PPP Dataset (part 2)¶

The dataset that the government released due to a FOIA request is a 13-part CSV file containing information on all PPP loans over 150 thousand dollars. It's somewhat terrifying. All can be found using SBA's website.

Columns¶

The dataset contains the following columns, with a description of each column provided by the SBA:

Field Name Field Description
LoanNumber Loan Number (unique identifier)
DateApproved Loan Funded Date
SBAOfficeCode SBA Origination Office Code
ProcessingMethod Loan Delivery Method (PPP for first draw; PPS for second draw)
BorrowerName Borrower Name
BorrowerAddress Borrower Street Address
BorrowerCity Borrower City
BorrowerState Borrower State
BorrowerZip Borrower Zip Code
LoanStatusDate Loan Status Date - Loan Status Date is blank when the loan is disbursed but not Paid In Full or Charged Off
LoanStatus Loan Status Description - Loan Status is replaced by 'Exemption 4' when the loan is disbursed but not Paid in Full or Charged Off
Term Loan Maturity in Months
SBAGuarantyPercentage SBA Guaranty Percentage
InitialApprovalAmount Loan Approval Amount(at origination)
CurrentApprovalAmount Loan Approval Amount (current)
UndisbursedAmount Undisbursed Amount
FranchiseName Franchise Name
ServicingLenderLocationID Lender Location ID (unique identifier)
ServicingLenderName Servicing Lender Name
ServicingLenderAddress Servicing Lender Street Address
ServicingLenderCity Servicing Lender City
ServicingLenderState Servicing Lender State
ServicingLenderZip Servicing Lender Zip Code
RuralUrbanIndicator Rural or Urban Indicator (R/U)
HubzoneIndicator Hubzone Indicator (Y/N)
LMIIndicator LMI Indicator (Y/N)
BusinessAgeDescription Business Age Description
ProjectCity Project City
ProjectCountyName Project County Name
ProjectState Project State
ProjectZip Project Zip Code
CD Project Congressional District
JobsReported Number of Employees
NAICSCode NAICS 6 digit code
Race Borrower Race Description
Ethnicity Borrower Ethnicity Description
UTILITIES_PROCEED Note: Proceed data is lender reported at origination. On the PPP application the proceeds fields were check boxes.
PAYROLL_PROCEED
MORTGAGE_INTEREST_PROCEED
RENT_PROCEED
REFINANCE_EIDL_PROCEED
HEALTH_CARE_PROCEED
DEBT_INTEREST_PROCEED
BusinessType Business Type Description
OriginatingLenderLocationID Originating Lender ID (unique identifier)
OriginatingLender Originating Lender Name
OriginatingLenderCity Originating Lender City
OriginatingLenderState Originating Lender State
Gender Gender Indicator
Veteran Veteran Indicator
NonProfit 'Yes' if Business Type = Non-Profit Organization or Non-Profit Childcare Center or 501(c) Non Profit
ForgivenessAmount Forgiveness Amount
ForgivenessDate Forgiveness Paid Date

The data itself needs some work to be fully useable. Some columns are a mix of numeric and string data, and many are simply missing data. Despite this, there are some interesting things to be found in the data.

Uses for the data¶

If building a regression model, it would be interesting to use gender, veteran status, race, and non-profit status as predictors for the amount of money a business asked for, received, and was forgiven. While it's harder to find the potential datapoints for clustering, I would imagine that the lender could play a big role.

Loading the Data¶

In [1]:
import pandas as pd
#importing the first of 13 datasets. all the datasets are in the same website as the first.
df_0 = pd.read_csv('https://data.sba.gov/dataset/8aa276e2-6cab-4f86-aca4-a7dde42adf24/resource/2b55e11d-7e75-4bbb-b526-69a06c0c4731/download/public_150k_plus_230101.csv')
In [2]:
df_0.head(5)
Out[2]:
LoanNumber DateApproved SBAOfficeCode ProcessingMethod BorrowerName BorrowerAddress BorrowerCity BorrowerState BorrowerZip LoanStatusDate ... BusinessType OriginatingLenderLocationID OriginatingLender OriginatingLenderCity OriginatingLenderState Gender Veteran NonProfit ForgivenessAmount ForgivenessDate
0 9547507704 05/01/2020 464 PPP SUMTER COATINGS, INC. 2410 Highway 15 South Sumter NaN 29150-9662 12/18/2020 ... Corporation 19248 Synovus Bank COLUMBUS GA Unanswered Unanswered NaN 773553.37 11/20/2020
1 9777677704 05/01/2020 464 PPP PLEASANT PLACES, INC. 7684 Southrail Road North Charleston NaN 29420-9000 09/28/2021 ... Sole Proprietorship 19248 Synovus Bank COLUMBUS GA Male Owned Non-Veteran NaN 746336.24 08/12/2021
2 5791407702 05/01/2020 1013 PPP BOYER CHILDREN'S CLINIC 1850 BOYER AVE E SEATTLE NaN 98112-2922 03/17/2021 ... Non-Profit Organization 9551 Bank of America, National Association CHARLOTTE NC Unanswered Unanswered Y 696677.49 02/10/2021
3 6223567700 05/01/2020 920 PPP KIRTLEY CONSTRUCTION INC 1661 MARTIN RANCH RD SAN BERNARDINO NaN 92407-1740 10/16/2021 ... Corporation 9551 Bank of America, National Association CHARLOTTE NC Male Owned Non-Veteran NaN 395264.11 09/10/2021
4 9662437702 05/01/2020 101 PPP AERO BOX LLC NaN NaN NaN NaN 08/17/2021 ... NaN 57328 The Huntington National Bank COLUMBUS OH Unanswered Unanswered NaN 370819.35 04/08/2021

5 rows × 53 columns