25
Empowering the Emerging Market Middle Class Big Data is not Big Database Jeff Stewart - CEO Naveen Agnihotri, PhD - CTO

Lenddo - Data Driven NYC (27)

Embed Size (px)

DESCRIPTION

Lenddo's CEO and CTO, Jeff Stewart and Naveen Agnihotri, presented at May's edition of Data Driven NYC, which focused on p2p lending.

Citation preview

Page 1: Lenddo - Data Driven NYC (27)

Empowering the Emerging Market Middle Class

Big Data is not Big Database

Jeff Stewart - CEO Naveen Agnihotri, PhD - CTO

Page 2: Lenddo - Data Driven NYC (27)

“If you look 5 years out, every industry is going to be rethought in a social way”.

-Mark Zuckerberg, 2010

Page 3: Lenddo - Data Driven NYC (27)

● Founded in January 2011● Over 500K members around the world● Integrated with Facebook, LinkedIn, Google,

Yahoo, Twitter● Services oriented architecture (LAMP)

○ Front end (clients) in PHP○ Services in PHP and Python

● Technical team based in NY and PH● Data science team based in NY

LENDDO TECH FACTS

Page 4: Lenddo - Data Driven NYC (27)

Finance in the Age of Social Networks

Lenddo maintains the worlds largest Opt-in, TrustGraph, for trustworthiness and risk management

Lenddo is….

Social

Social sourcing & screeningPeer enforcement

New data sets

Algorithms

Unprecedented processing powerReal-time / ongoing risk management Targeting, underwriting & collections

Cloud

Rich risk analytic data setUnprecedented processing power

Global

Mobile

New datasets24/7 engagement

new cost structures

Page 5: Lenddo - Data Driven NYC (27)

Why Finance Works Better with Lenddo

Traditional

• Negative selection bias• Costly

• Fact verification time consuming • Scores incomplete or unavailable

• No peer enforcement• Labor intensive• Hard to maintain contact

DEMANDGENERATION

UNDERWRITING

COLLECTIONS

• Digital, fast and potentially viral• Less Expensive• Social nature cause positive selection bias

• Reduced Fraud and default • Big data and powerful algorithms• Larger addressable market • Easily automatable

• Potential for peer enforcement• Lower cost• More points of contact

With Lenddo

Page 6: Lenddo - Data Driven NYC (27)

Source: http://www.kpcb.com/insights/2013-internet-trends

ID Verification is easier online

Page 7: Lenddo - Data Driven NYC (27)
Page 8: Lenddo - Data Driven NYC (27)

● 100% infrastructure on AWS ● Store social data from all online social

networks● Opt-in Social data storage grows about 10

times faster than member data● Social data currently about 3.5 TB● Largest table (dataset) is > 2 billion records

LENDDO SOCIAL DATA

Page 9: Lenddo - Data Driven NYC (27)

GOOD AND BAD BORROWERS

26

n=1347

Page 10: Lenddo - Data Driven NYC (27)

CLUSTERS

27

Page 11: Lenddo - Data Driven NYC (27)

LOAN SCORE IMPROVEMENT

24

No NLP or network

Page 12: Lenddo - Data Driven NYC (27)

LOAN SCORE IMPROVEMENT

24

No NLP or network With NLP and network

Page 13: Lenddo - Data Driven NYC (27)

WORD CLUSTERS

17

Words associate closely together, and can be commonly associated with good or bad loans.

Page 14: Lenddo - Data Driven NYC (27)

WORDS AND LOAN QUALITY

18

% Association with BAD loans

% Association with GOOD loans

Page 15: Lenddo - Data Driven NYC (27)

● Started with MongoDB for social data storage● As use cases grew, we added indexes

SOCIAL DATA STORAGE HISTORY

Page 16: Lenddo - Data Driven NYC (27)

SOCIAL DATA STORAGE

User data

Social data

Page 17: Lenddo - Data Driven NYC (27)

SOCIAL DATA STORAGE

Social data User data

Page 18: Lenddo - Data Driven NYC (27)

● We moved to larger and larger servers○ At last iteration, used cr1.8xlarge server○ 32 CPUs, 244 GB RAM○ Still couldn’t keep up with index size

● Data acquisition speeds increased○ provisioned IOPS to the rescue!

● Total cost of social data storage: > $10,000 per month● And we want to grow faster!

SOCIAL DATA STORAGE HISTORY

Page 19: Lenddo - Data Driven NYC (27)

● Simple queries (by index)● Complex queries (by multiple indexes)● Pull out all data for a member● Aggregate all data for a member● Calculate score for a member● Aggregate all data for all members● Calculate score for all members

SOCIAL DATA STORAGE HISTORY

Page 20: Lenddo - Data Driven NYC (27)

?

REVELATION: 2013

Page 21: Lenddo - Data Driven NYC (27)

It’s“BIG DATA”

not“BIG DATABASE”

REVELATION: 2013

Page 22: Lenddo - Data Driven NYC (27)

● Moved all data to Amazon S3● Data model remains largely unchanged● Hadoop compatible storage format

○ Avro format○ Snappy compressed, chunked

● Created a small ‘cache’ type MongoDB○ stores recent data temporarily

● Using DynamoDB for longer-lived data that needs to be queried all the time

SOCIAL DATA REVAMP - 2013

Page 23: Lenddo - Data Driven NYC (27)

● Use the cache for data when it first arrives○ Data is available for quick computations and

● Move data from cache to S3 at the end of the day● Use EMR over S3 data for all aggregations● Created a EMR based map-reduce framework for data

science team● Standard EMR jobs for common queries:

○ All social data for a member○ Score one member○ Score all members

NEW SOCIAL DATA USAGE

Page 24: Lenddo - Data Driven NYC (27)

● Peace of mind○ No more database maintenance○ No more periodic server upgrades

● Scalability○ Storage and access remains identical for the next

10x growth● $$$

○ New cost: < $3000 per month: 70% less!○ Includes EMR clusters running routine jobs

WHAT DID WE GAIN?

Page 25: Lenddo - Data Driven NYC (27)

Thanks!