View
5.696
Download
0
Category
Preview:
DESCRIPTION
Citation preview
© tresata all rights reserved
Apache Hadoop and the Big Data Opportunity in Banking
© Hortonworks Inc. 2011
A webcast from Hortonworks & Tresata
Presenters
• Arun C. Murthy − Co-‐founder of Hortonworks − Lead of NextGen MapReduce in Apache Hadoop − Long-‐Jme contributor and commiKer to Apache Hadoop
• Abhishek (Abhi) Mehta − Co-‐founder of Tresata − Creator of first Hadoop-‐powered big data & analyJcs plaOorm for financial industry data
3
Agenda
• What is Apache Hadoop?
• Creating Value from Big Data
• Tresata and Hadoop for Banking
• Future of Hadoop
4
Key A9ributes • Reliable and redundant – Doesn’t slow down or loose data even as hardware fails • Very powerful – Harnesses huge clusters, supports best of breed analyJcs • Scalable – scales linearly to handle “big data” volumes • Cost-‐effecDve – runs on commodity machines & network • Simple and flexible APIs – enabling a large ecosystem of soluJon providers
What is Apache Hadoop?
A set of open source projects owned by the Apache FoundaJon that transforms commodity computers and network into a distributed service
• HDFS – Stores petabytes of data reliably • MapReduce – Allows huge distributed
computaJons
5
Core Apache Hadoop Projects
Programming Languages
Computation
Object Storage
Zookeepe
r (Coo
rdinaJ
on)
Core Apache Hadoop Related Apache Projects
HDFS (Hadoop Distributed File System)
MapReduce (Distributed Programing Framework)
Hive (SQL)
Pig (Data Flow)
HBase (Columnar Storage)
HCatalog (Meta Data)
HMS
(Managem
ent)
Table Storage
6
Apache Hadoop: Why is it TransformaDonal?
Data Deluge (growth faster than Moore’s law) • Economist: Only 5% of generated data is structured
• Gartner: Data growth is the biggest data center hardware infrastructure challenge for large enterprises
• Forrester: Four Vs -‐ Volume, Velocity, Variety, Variability
• Hundreds of exabytes of data per year!
7
Source: IDC Digital Universe Study, May 2011
Apache Hadoop: Why is it TransformaDonal?
New way of thinking about your data:
• Current − What data do I keep? − What reports do I run? − Sample – Store – Extrapolate à What-‐If scenarios à Variable insights
• New dawn
− Store everything (viable and economical) – Process whenever! − Test every what-‐if situaJon, prove every hypothesis… do it in a Jmely manner! − No more down-‐sampled data
8
5 Ways to Create Value from Big Data
9
Source: McKinsey & Company report. Big data: The next fronJer for innovaJon, compeJJon, and producJvity. May 2011.
Crossing the Chasm
DisrupJon: Data
10
Geoffrey A. Moore*
Typical ApplicaDons & Early Adopters
11
advertising optimization mail anti-spam
video & audio processing ad selection
web search
user interest prediction
customer trend analysis
analyzing web logs
content optimization
data analytics
machine learning
data mining
text mining
social media
Big Data Has Reached Every Market Sector
Source: McKinsey & Company report. Big data: The next fronJer for innovaJon, compeJJon, and producJvity. May 2011.
Digital data is personal, everywhere, increasingly accessible, and will con:nue to grow exponen:ally
Big Data Value CreaDon OpportuniDes Financial Services • Detect fraud • Model and manage risk • Improve debt recovery rates • Personalize banking/insurance products
Healthcare • OpJmal treatment pathways • Remote paJent monitoring • PredicJve modeling for new drugs • Personalized medicine
Retail • In-‐store behavior analysis • Cross selling • OpJmize pricing, placement, design • OpJmize inventory and distribuJon
Web / Social / Mobile • LocaJon-‐based markeJng • Social segmentaJon • SenJment analysis • Price comparison services
Manufacturing • Design to value • Crowd-‐sourcing • “Digital factory” for lean manufacturing • Improve service via product sensor data
Government • Reduce fraud • Segment populaJons, customize acJon • Support open data iniJaJves • Automate decision making
13
Why can you bank on Hortonworks?
• Architects of Hadoop since big bang, circa 2006 • Real world experience supporJng the largest Hadoop install
in the world: − 50,000 node footprint − Over 200PB of data − 24x7 service − Billions of ad dollars
• We’ve taken the 3am calls to fix stuff when it breaks!
14
15
Tresata = Hadoop for Banking
What We Believe
16
1. Data will reboot the financial service industry
2. Data growth is viral…exisJng tools can’t keep up
3. The economics to store, process, analyze and visualize all of your data makes it a ‘no-‐brainer’ to do so
4. Big Data capabiliDes are needed to address business problems
and What McKinsey Said
17
most players are smaller, local businesses with a limited ability to gather and analyze information.
A !nal note: this analysis is a snap- shot in time for one large country. As companies and organizations sharpen their data skills, even low-ranking sectors (by our gauges of value potential and data capture), such as construction and education, could see their fortunes change.
Q4 2011Big data sidebar on sector productivityExhibit 1 of 1
Big
dat
a: e
ase-
of-c
aptu
re in
dex1
Big data: value potential index1
The ease of capturing big data’s value, and the magnitude of its potential, vary across sectors.
Utilities
Manufacturing
Health care providersNatural resources
Example: US economy
Professional services
Construction
Administrative services
Management of companies
Computers and other electronic products
Transportation and warehousingReal estate
Wholesale trade
Finance and insurance
Information
Other services
Retail trade
Accommodation and food
Educational services
Arts and entertainment
Government
High
Low High
! "#$%&'()*+'&%',-+*.)(*#/%#0%1'($*.23%2''%)--'/&*,%*/%4.5*/2'6%7+#8)+%9/2(*(:('%0:++%$'-#$(%!"#$%&'&($)*+$,+-'$./0,'"+/$.0/$",,01&'"0,2$3045+'"'"0,2$&,%$5/0%63'"1"'73%);)*+)8+'%0$''%#0%.<)$='%#/+*/'%)(%1.>*/2'6?.#1@1=*?
A#:$.'B%CA%D:$'):%#0%E)8#$%A()(*2(*.2F%4.5*/2'6%7+#8)+%9/2(*(:('%)/)+62*2%%
Size of bubble indicates relative contribution to GDP
6
1 The big data value potential index takes into account a sector’s competitive conditions, such as market turbulence and performance variability; structural factors, such as transaction intensity and the number of potential customers and business partners; and the quantity of data available. The ease-of-capture index takes stock of the number of employees with deep analytical talent in an industry, baseline investments in IT, the accessibility of data sources, and the degree to which managers make data-driven decisions.
Source: mcksinsey global insJtute october 2011
The Opportunity
18
1. 1-‐5% of data in a financial insJtuJon is analyzed
2. Not all data is stored
3. Top-‐down macros cannot be implemented or acted on
4. New approaches to data & analyDcs are essenJal
Source: tresata research
The Business Problems
19
1. Storage & retrieval – archive data on disk
2. Consumer behavior analysis – as it happens
3. Modeling & AnalyDcs – model enJre data sets…
4. Single View Of Customer – structured & unstructured data
Hadoop in Banking Today
1. Early adopter, like any other technology trend a. Interest is global, not just limited to the US b. Approved for use by most technology teams & architects c. Proof of concepts ongoing at most financial services insJtuJons
2. Broad agreement that: a. Hadoop will be the Big Data operaDng system b. A hadoop powered Data processing & analyDcs plaaorm will spur
rapid adopJon c. Need to apply to business problems with revenue/profit impact
20
Elephants in the Room
1. Resources a. Talent – how many MapReduce developers can we get b. ApplicaDons – data, analyJcs and business problems are the same,
why should each insJtuJon build their own hadoop applicaJons to run the same processes
c. EssenDals – how to manage security, provisioning, & performance
2. ImplementaDon a. IntegraDon – fit with exisJng technology infrastructures & business
processes b. Support – experience with managing thousands of nodes/ servers c. Open Source – ‘out of the box’ applicability
21
How Tresata Changes the Game…
22
1. Our ApplicaDon a. FS for Hadoop – fully built on hadoop. Store, process, analyze and
visualize leveraging the full power of hadoop b. Processing Pipeline – automated ingesJon, cleaning, de-‐duping,
matching engine built for financial data c. AnalyDcs – massively parallel scoring and algorithm containers
codified by business problem
2. Tame the elephants (making it work at scale)
Tresata Case Study
23
A. Client Business Problem i. Problem – Process data and score for >30 MM client applicaJons ii. Data Sources – 23 separate data sources, mulDple Dme series iii. Raw Variables – 100 variables per client per data source iv. Current state -‐ expensive legacy plaOorm, algorithms developed on
sub-‐samples, unable to scale algorithms to full data set
B. Tresata SoluDon i. Data Engine – automated data import, cleaning, matching, scoring ii. Compute Engine – algorithms process & score >30MM in minutes iii. IntegraJon – work with exisJng tools and processes iv. Scalable deployment – Big Data as a Service delivery
Tresata Big Data ApplicaDon
24
Tresata AnalyDcs
25
Tresata + Hortonworks
26
1. Commitment to train a. Series of webinars b. Hadoop training tailored for financial services c. Dedicated training programs (tailored for client needs)
2. Commitment to support a. ProducJon scale support model (Tresata cerJfied for Financial Svcs) b. Meet and/or exceed industry standards on distro
© tresata all rights reserved
Where Hadoop is Going
© Hortonworks Inc. 2011
The Future
• Hadoop is going mainstream − Amazon, Microsou, Oracle, EMC, NetApp, etc.
• Apache Hadoop is covering new ground (Hadoop .Next) − Much of the development led by Hortonworks − More scale, more performance − Other paradigms than MapReduce for data processing − Enhanced operability and management (Ambari) − Metadata management (Hcatalog)
28
Technology Roadmap
Hadoop.Now – Making Apache Hadoop Accessible • Release the most widely deployed version of Hadoop ever (0.20.205) • Release directly usable code via Apache (RPMs, .debs…) • Frequent sustaining releases off of the stable branches
Q4 2011
Hadoop.Next – Next GeneraDon Apache Hadoop • Address key product gaps (HBase support, HA, Management…) • Enable community & partner innovaJon via modular architecture &
open APIs • Work with community to define integrated stack
2012 (Alphas starJng late 2011)
29
Next Steps
• Engage with Hortonworks & Tresata − www.hortonworks.com − www.tresata.com
• AddiJonal Webcast Series − Reference Architecture for Hadoop in Financial Services -‐ Nov 15 − Hadoop in Financial Services DEEP DIVE -‐ Dec 6 − www.hortonworks.com/webcasts
• Hortonworks party @ Hadoop World − Tuesday Nov 8 at 7pm − Inc Lounge at the Time Hotel − RSVP: hortonworks.eventbrite.com
30
© tresata all rights reserved
QuesDons?
© Hortonworks Inc. 2011
Recommended