Lenddo - Data Driven NYC (27)

Preview:

DESCRIPTION

Lenddo's CEO and CTO, Jeff Stewart and Naveen Agnihotri, presented at May's edition of Data Driven NYC, which focused on p2p lending.

Citation preview

Empowering the Emerging Market Middle Class

Big Data is not Big Database

Jeff Stewart - CEO Naveen Agnihotri, PhD - CTO

“If you look 5 years out, every industry is going to be rethought in a social way”.

-Mark Zuckerberg, 2010

● Founded in January 2011● Over 500K members around the world● Integrated with Facebook, LinkedIn, Google,

Yahoo, Twitter● Services oriented architecture (LAMP)

○ Front end (clients) in PHP○ Services in PHP and Python

● Technical team based in NY and PH● Data science team based in NY

LENDDO TECH FACTS

Finance in the Age of Social Networks

Lenddo maintains the worlds largest Opt-in, TrustGraph, for trustworthiness and risk management

Lenddo is….

Social

Social sourcing & screeningPeer enforcement

New data sets

Algorithms

Unprecedented processing powerReal-time / ongoing risk management Targeting, underwriting & collections

Cloud

Rich risk analytic data setUnprecedented processing power

Global

Mobile

New datasets24/7 engagement

new cost structures

Why Finance Works Better with Lenddo

Traditional

• Negative selection bias• Costly

• Fact verification time consuming • Scores incomplete or unavailable

• No peer enforcement• Labor intensive• Hard to maintain contact

DEMANDGENERATION

UNDERWRITING

COLLECTIONS

• Digital, fast and potentially viral• Less Expensive• Social nature cause positive selection bias

• Reduced Fraud and default • Big data and powerful algorithms• Larger addressable market • Easily automatable

• Potential for peer enforcement• Lower cost• More points of contact

With Lenddo

Source: http://www.kpcb.com/insights/2013-internet-trends

ID Verification is easier online

● 100% infrastructure on AWS ● Store social data from all online social

networks● Opt-in Social data storage grows about 10

times faster than member data● Social data currently about 3.5 TB● Largest table (dataset) is > 2 billion records

LENDDO SOCIAL DATA

GOOD AND BAD BORROWERS

26

n=1347

CLUSTERS

27

LOAN SCORE IMPROVEMENT

24

No NLP or network

LOAN SCORE IMPROVEMENT

24

No NLP or network With NLP and network

WORD CLUSTERS

17

Words associate closely together, and can be commonly associated with good or bad loans.

WORDS AND LOAN QUALITY

18

% Association with BAD loans

% Association with GOOD loans

● Started with MongoDB for social data storage● As use cases grew, we added indexes

SOCIAL DATA STORAGE HISTORY

SOCIAL DATA STORAGE

User data

Social data

SOCIAL DATA STORAGE

Social data User data

● We moved to larger and larger servers○ At last iteration, used cr1.8xlarge server○ 32 CPUs, 244 GB RAM○ Still couldn’t keep up with index size

● Data acquisition speeds increased○ provisioned IOPS to the rescue!

● Total cost of social data storage: > $10,000 per month● And we want to grow faster!

SOCIAL DATA STORAGE HISTORY

● Simple queries (by index)● Complex queries (by multiple indexes)● Pull out all data for a member● Aggregate all data for a member● Calculate score for a member● Aggregate all data for all members● Calculate score for all members

SOCIAL DATA STORAGE HISTORY

?

REVELATION: 2013

It’s“BIG DATA”

not“BIG DATABASE”

REVELATION: 2013

● Moved all data to Amazon S3● Data model remains largely unchanged● Hadoop compatible storage format

○ Avro format○ Snappy compressed, chunked

● Created a small ‘cache’ type MongoDB○ stores recent data temporarily

● Using DynamoDB for longer-lived data that needs to be queried all the time

SOCIAL DATA REVAMP - 2013

● Use the cache for data when it first arrives○ Data is available for quick computations and

● Move data from cache to S3 at the end of the day● Use EMR over S3 data for all aggregations● Created a EMR based map-reduce framework for data

science team● Standard EMR jobs for common queries:

○ All social data for a member○ Score one member○ Score all members

NEW SOCIAL DATA USAGE

● Peace of mind○ No more database maintenance○ No more periodic server upgrades

● Scalability○ Storage and access remains identical for the next

10x growth● $$$

○ New cost: < $3000 per month: 70% less!○ Includes EMR clusters running routine jobs

WHAT DID WE GAIN?

Thanks!

Recommended