14
Title: Increased earthquake depth is associated with increased magnitude Beca Marușa Title: Decreased FICO score is associated with increased interest rate Introduction: Lending Club is an online financial community that brings together creditworthy borrowers and savvy investors so that both can benefit financially [1]. It allows its members to directly invest in and borrow from each other and so avoid the cost and complexity of the banking system. On the Lending Club site there are several files that contain complete loan data, including the current loan status and latest payment information. [2] The data used in this analysis represents a sample of 2,500 peer-to-peer loans issued by the Lending Club explained through 14 variables such as: monthly income, amount requested, FICO range (a range indicating the applicants FICO score) [3], inquiries in the last six months etc. The goal of this analysis is to establish if there is any correlation between the outcome variable – the interest rate of the loans – and the other variables especially considering the FICO score, which is a measure of the creditworthiness of the applicant. In this project we performed an analysis to determine if there was a significant association between the interest rate and the FICO score. Using exploratory analysis and standard multiple regression techniques we show that there is a significant negative relationship between the interest rate and the FICO score, even after adjusting for important confounders such as the length of the loan, the amount funded by the investors and the amount requested by the borrowers. Our analysis suggests that there is a significant, negative association between Interest Rate and FICO score. Our analysis estimates the relationship using a linear model relating one percent of interest rate to one unit of FICO 1 /14

Project data analysis

Embed Size (px)

DESCRIPTION

Project Data Analysis Coursera course

Citation preview

  • 1. Title: Increased earthquake depth is associated with increased magnitudeBeca MaruaTitle: Decreased FICO score is associated with increased interest rateIntroduction:Lending Club is an online financial community that brings together creditworthyborrowers and savvy investors so that both can benefit financially [1]. It allows itsmembers to directly invest in and borrow from each other and so avoid the cost andcomplexity of the banking system.On the Lending Club site there are several files that contain complete loan data, includingthe current loan status and latest payment information. [2] The data used in this analysisrepresents a sample of 2,500 peer-to-peer loans issued by the Lending Club explainedthrough 14 variables such as: monthly income, amount requested, FICO range (a rangeindicating the applicants FICO score) [3], inquiries in the last six months etc. The goal ofthis analysis is to establish if there is any correlation between the outcome variable theinterest rate of the loans and the other variables especially considering the FICO score,which is a measure of the creditworthiness of the applicant.In this project we performed an analysis to determine if there was a significant associationbetween the interest rate and the FICO score. Using exploratory analysis and standardmultiple regression techniques we show that there is a significant negative relationshipbetween the interest rate and the FICO score, even after adjusting for importantconfounders such as the length of the loan, the amount funded by the investors and theamount requested by the borrowers.Our analysis suggests that there is a significant, negative association between InterestRate and FICO score. Our analysis estimates the relationship using a linear model relatingone percent of interest rate to one unit of FICO score. There appears to be a strong inverserelationship between the two variables.Our results suggest that there are other variables such as loan length, amount requested bythe borrower and amount funded by the investors which are associated with both interestrate and FICO score. Including these variables in the regression model relating interestrate to FICO score improves the model fit, but does not remove the significant positiverelationship between the variables.Methods:Data CollectionFor our analysis we used the data loans from the Lending Club site from 2007 to 2011.The data were downloaded from lendingclub.com on November 16, 2013 using the Rprogramming language [3].Exploratory AnalysisExploratory analysis was performed by examining tables and plots of the observed data.We identified transformations to perform on the raw data on the basis of plots andknowledge of the scale of measured variables. Exploratory analysis was used to (1)identify missing values, (2) verify the quality of the data, and (3) determine the termsused in the regression model relating interest rate to FICO score.Statistical Modeling1 /9

2. Title: Increased earthquake depth is associated with increased magnitudeBeca MaruaTo relate interest rate to FICO score we performed a standard multivariate linearregression model [4]. Model selection was performed on the basis of our exploratoryanalysis and prior knowledge of the relationship between interest rate and FICO score,amount of the loan requested and the length in time of the loan. Coefficients wereestimated with ordinary least squares and standard errors were calculated using standardasymptotic approximations [5].ReproducibilityAll analyses performed in this manuscript are reproduced in the R markdown fileloansdata.Rmd [6]. To reproduce the exact results presented in this manuscript the cachedversion of the analysis must be performed.Results:The loans data used in this analysis contains information on the amount requested by theborrower (Amount.Requested), the amount funded by the investors(Amount.Funded.By.Investors), the lending interest rate (Interest.rate), the length in time(in months) of the loan (Loan.Length), the purpose of the loan as stated by the applicant(Loan.Purpose), the percentage of consumers gross income that goes toward payingdebts (Debt.To.Income.Ratio), the U.S. state of residence of the loan applicant (State), theownership type of the home (Home.Ownership), the monthly income of the applicant (indollars) (Monthly.income), a range indicating the applicants FICO score (FICO.range),the number of open lines of credit the applicant had at the time of application(Open.CREDIT.Lines), the total amount outstanding all lines of credit(Revolving.CREDIT.Balance), the number of authorized queries about thecreditworthiness of the applicant in the 6 months before the loan was issued(Inquiries.in.the.Last.6.Months), the length of time employed at current job(Employment.Length). [5].We identified 77 missing values in the data set we collected for the variable EmploymentLength, one missing value for the variable Monthly Income, 2 missing values each for thevariables the number of open lines of credit the applicant had at the time of application(Open.CREDIT.Lines), the total amount outstanding all lines of credit(Revolving.CREDIT.Balance), the number of authorized queries about thecreditworthiness of the applicant in the 6 months before the loan was issued(Inquiries.in.the.Last.6.Months).Three measured variables were outside the standard ranges: for the variable HomeOwnership there are five options (none, other, owns, rents or has a mortgage), althoughthere must have been only three: owns, rents or has a mortgage and for the variableAmount Funded by the Investors there are 2 negative values and 4 values of 0; for thevariable the percentage of consumers gross income that goes toward paying debts(Debt.To.Income.Ratio) there are 8 values of 0% which we consider that must beremoved because it represents the percentage of consumers gross income that goestoward paying the loans that were approved.2 /9 3. Title: Increased earthquake depth is associated with increased magnitudeBeca MaruaAfter removing the missing values and the observations that were outside the standardranges, the data now has 2403 observations and 14 variables.From the barplot of the variable FICO range we can see that the distribution is positivelyskewed with a long right tail (figure 1).Figure 1. Histogram of FICO RangeThe histogram of the interest rate shows a relatively normal distribution with mean 13(figure 2). The majority of the loans granted had an interest rate between 10,2% and15,8%.3 /9 4. Title: Increased earthquake depth is associated with increased magnitudeBeca MaruaFigure 2. Histogram of Interest rateWe performed some exploratory analysis and from the boxplots of the interest ratevariable and the factor variables we observed that the monthly income of the borrower,the employment length, the type of the home ownership and the state from which was theborrower dont have any impact on the size of the interest rate of the loan granted. Thevariables Loan Purpose, Open Credit Lines, Revolving Credit Balance, Inquiries in thelast 6 months and Debt to income ratio have little correlation with the interest ratevariable. The potential confounders are: the length of the loan, the amount founded by theinvestors and the amount requested by the borrowers.We decided to transform the variable FICO range into the variable FICO score whichrepresent the average of the low number and the upper number of a FICO range for eachobserved loan granted. Subsequent analyses focus on this transformed FICO scorevariable. From the boxplot of the FICO range and interest rate we can observe a stronglynegative association between the two (figure 3). The correlation coefficient between theinterest rate and FICO score is -71%.4 /9 5. Title: Increased earthquake depth is associated with increased magnitudeBeca MaruaFigure 3. The Boxplot between the Interest Rate and FICO RangeWe first fit a regression model relating interest rate to FICO score (figure 4). Taking intoconsideration that the multiple R squared is 50,3% which is not equal to the correlationcoefficient of 71%, it means that there are confounders that explain the rest of 49,7% ofthe variation of the variable interest rate.Figure 4. The relationship between the Interest Rate and FICO score5 /9 6. Title: Increased earthquake depth is associated with increased magnitudeBeca MaruaThe correlation coefficient between the amount funded by the investors and the interestrate is 33%. The same coefficient is for the amount requested by the borrowers and theassociation between the interest rate and the loan length is 42%. The mean of the residualsis approx. 0, the variance is 8,6 and they follow a normal distribution positively skewed(figure 5).Figure 5. Residuals distribution for the linear modelResiduals show patterns of non-random variation (figure 6). We attempted to explainthose patterns by fitting models including potential confounders.6 /9 7. Title: Increased earthquake depth is associated with increased magnitudeBeca MaruaFigure 6. The variation of residualsOur final regression model was: Interest.Rate = b0 + b1*FICO.score +b2*Amount.Funded.By.Investors + b3*Amount.Requested + f(Length.Loan) + e,where b0 is an intercept term and b1 represents the change in Interest rate associated with achange of one unit in FICO score at the same amount funded by investors, amountrequested by borrowers and the same loan length of time. The term f(Length.Loan)represents a factor model with two different levels. This model explains 75% of thevariation by one percent in the interest rate variable. The P-values show that all thecoefficients are statistically significant.The error term e represents all sources of unmeasured and unmodeled random variation ininterest rate. Our final regression model appeared to remove most of the non-randompatterns of variation in the residuals. We observe that the residuals for the multivariatelinear model follow a normal distribution with mean 0 and variation 4,38 (figure 7).7 /9 8. Title: Increased earthquake depth is associated with increased magnitudeBeca MaruaFigure 7. Residuals distribution for multivariate linear regressionFrom figure 8 we notice that the residuals variation for the multivariate linear model issmaller and that we can say it follows a White Noise frequency.Figure 8. Variation of residuals for multivariate linear model8 /9 9. Title: Increased earthquake depth is associated with increased magnitudeBeca MaruaWe observed a highly statistically significant (P = 2e-16) association between interest rateand FICO score. A change of one percent in Interest Rate corresponded to a change of b1= -0.08 FICO score (95% Confidence Interval: -0.088, -0.081).For example, for two loans at the same loan length, amount requested by the borrower,amount funded by the investors, we would expect an interest rate to increase by 1% atevery 0.08 decrease in the FICO score.Conclusions:Our analysis suggests that there is a significant, negative association between InterestRate and FICO score. Our analysis estimates the relationship using a linear model relatingone percent of interest rate to one unit of FICO score. There appears to be a strong inverserelationship between the two variables.We also observed that other variables such as loan length, amount requested by theborrower and amount funded by the investors are associated with both interest rate andFICO score. Including these variables in the regression model relating interest rate toFICO score improves the model fit, but does not remove the significant positiverelationship between the variables.Our analysis may be of interest to both investors and borrowers. Investors are interestedin selecting the potential borrowers on the financial market at a low cost, to establish afair interest rate and, in consequence, to build an efficient portfolio with a high return rate.Borrowers are also concerned in obtaining better interest rates at low costs. It could alsobe of interest to the Lending Club to support its members in selecting the proper partners.References1. LendingClub Corporation. URL: https://www.lendingclub.com/public/about-us.actionAccessed 09/16/2014.2. LendingClub Corporation. URL: https://www.lendingclub.com/info/download-data.action, Accessed 09/16/20143. http://en.wikipedia.org/wiki/Credit_score_in_the_United_States4. LendingClub Corporation. URL: https://spark-public.s3.amazonaws.com/dataanalysis/loansData.csv Accessed 09/16/20145. https://spark-public.s3.amazonaws.com/dataanalysis/loansCodebook.pdf6. R Markdown Page. URL:http://www.rstudio.com/ide/docs/authoring/using_markdown.Accessed 09/16/20149 /9 10. Title: Increased earthquake depth is associated with increased magnitudeBeca MaruaWe observed a highly statistically significant (P = 2e-16) association between interest rateand FICO score. A change of one percent in Interest Rate corresponded to a change of b1= -0.08 FICO score (95% Confidence Interval: -0.088, -0.081).For example, for two loans at the same loan length, amount requested by the borrower,amount funded by the investors, we would expect an interest rate to increase by 1% atevery 0.08 decrease in the FICO score.Conclusions:Our analysis suggests that there is a significant, negative association between InterestRate and FICO score. Our analysis estimates the relationship using a linear model relatingone percent of interest rate to one unit of FICO score. There appears to be a strong inverserelationship between the two variables.We also observed that other variables such as loan length, amount requested by theborrower and amount funded by the investors are associated with both interest rate andFICO score. Including these variables in the regression model relating interest rate toFICO score improves the model fit, but does not remove the significant positiverelationship between the variables.Our analysis may be of interest to both investors and borrowers. Investors are interestedin selecting the potential borrowers on the financial market at a low cost, to establish afair interest rate and, in consequence, to build an efficient portfolio with a high return rate.Borrowers are also concerned in obtaining better interest rates at low costs. It could alsobe of interest to the Lending Club to support its members in selecting the proper partners.References1. LendingClub Corporation. URL: https://www.lendingclub.com/public/about-us.actionAccessed 09/16/2014.2. LendingClub Corporation. URL: https://www.lendingclub.com/info/download-data.action, Accessed 09/16/20143. http://en.wikipedia.org/wiki/Credit_score_in_the_United_States4. LendingClub Corporation. URL: https://spark-public.s3.amazonaws.com/dataanalysis/loansData.csv Accessed 09/16/20145. https://spark-public.s3.amazonaws.com/dataanalysis/loansCodebook.pdf6. R Markdown Page. URL:http://www.rstudio.com/ide/docs/authoring/using_markdown.Accessed 09/16/20149 /9