Upload
hortonworks
View
5.014
Download
3
Embed Size (px)
DESCRIPTION
Hortonworks presentation from Cowen Big Data Day for financial industry analysts
Citation preview
Hortonworks
Eric Baldeschwieler, Co-Founder and CEOSeptember 2011
Overview for Cowen Big Data Day 2011
© Hortonworks Inc. 2011
2
Agenda
• Hortonworks• Apache Hadoop• Use cases• Hadoop in the Enterprise• Market• Strategy
© Hortonworks Inc. 2011
3
About Hortonworks – Basics
• Founded – July 1st, 2011− 22 architects & committers from Yahoo!
• Mission – Architect the future of Big Data− Revolutionize and commoditize the storage and processing of Big Data
via open source
• Vision – Half of the worlds data will be stored in Hadoop within five years
© Hortonworks Inc. 2011
4
About Hortonworks – Game Plan
• Support the growth of a huge Apache Hadoop ecosystem−Invest in ease of use, management, and other enterprise features−Define APIs for ISVs, OEMs and others to integrate with Apache Hadoop−Continue to invest in advancing the Hadoop core, remain the experts−Contribute all of our work to Apache
• Profit by providing training & support to the Hadoop community
© Hortonworks Inc. 2011
Credentials
• Technical: key architects and committers from Yahoo! Hadoop engineering team−Delivered every major Apache Hadoop release since 0.1−Highest concentration of Apache Hadoop committers−Driving innovation across entire Apache Hadoop stack−Experience managing world’s largest deployment−Access to Yahoo!’s 1,000+ users and 42k+ nodes for testing, QA, etc.
• Business operations: team of highly successful open source veterans−Led by Rob Bearden, former COO of SpringSource & JBoss
• Investors: backed by Benchmark Capital and Yahoo!
5© Hortonworks Inc. 2011
6
What is Apache Hadoop?
• Set of open source projects −Owned by Apache Software Foundation
• Transforms commodity hardware into a service that:
−Stores petabytes of data reliably (HDFS)−Allows huge distributed computations
(MapReduce)
• Key attributes:−Redundant and reliable
Doesn’t stop or lose data even if hardware fails
−Easy to program−Extremely powerful
Allows the development of big data algorithms & tools
−Batch processing centric −Runs on commodity hardware
Computers & network
© Hortonworks Inc. 2011
7
Typical Hadoop Applications
advertising optimization
ad selection
Website personalization
machine learning search ranking
ad inventory prediction
Mail anti-spam
user interest prediction
audience, ad and search pipelines
advertising data systems
Content Optimization
data analytics
© Hortonworks Inc. 2011
8
Who Builds Hadoop?Lines of code contributed since Hadoop inception
© Hortonworks Inc. 2011
9
Who Builds Hadoop?Lines of code contributed in 2011
© Hortonworks Inc. 2011
, early adopters Scale and productize Hadoop
Apache Hadoop
Other Internet Companies Add tools / frameworks, enhance Hadoop
Wide Enterprise Adoption Funds further development, enhancements
Service Providers Provide training, support, hosting
A Brief History
2006 – present
2008 – present
2010 – present
Nascent / 2011
10© Hortonworks Inc. 2011
HADOOP @ YAHOO!
40K+ Servers
170 PB Storage
5M+ Monthly Jobs
1000+ Active users
11© Yahoo 2011
twice the engagement
CASE STUDYYAHOO! HOMEPAGE
Personalized for each visitor
Result: twice the engagement
+160% clicksvs. one size fits all
+79% clicksvs. randomly selected
+43% clicksvs. editor selected
Recommended links News Interests Top Searches
12© Yahoo 2011
CASE STUDYYAHOO! HOMEPAGE
13
• Serving Maps• Users - Interests
• Five Minute Production
• Weekly Categorization models
SCIENCE HADOOP CLUSTER
SERVING SYSTEMS
PRODUCTION HADOOP CLUSTER
USERBEHAVIOR
ENGAGED USERS
CATEGORIZATIONMODELS (weekly)
SERVINGMAPS
(every 5 minutes)USER
BEHAVIOR
» Identify user interests using Categorization models
» Machine learning to build ever better categorization models
Build customized home pages with latest data (thousands / second)13© Yahoo 2011
CASE STUDYYAHOO! MAIL
Enabling quick response in the spam arms race
• 450M mail boxes • 5B+ deliveries/day
• Antispam models retrainedevery few hours on Hadoop
40% less spam than Hotmail and 55% less spam than Gmail
“ “
SCIENCE
PRODUCTION
1414© Yahoo 2011
Hadoop in the Enterprise
© Hortonworks Inc. 2011 15
Big Data PlatformsCost per TB, Adoption
Source:
Size of bubble = cost effectiveness of solution
16© Hortonworks Inc. 2011
Traditional Enterprise ArchitectureData Silos + ETL
17
EDW Data Marts
BI / Analytics
Traditional Data Warehouses, BI & AnalyticsServing Applications
Web Serving
NoSQLRDMS …
Unstructured Systems
Serving Logs
Social Media
Sensor Data
Text Systems
…
Traditional ETL &Message buses
Traditi
onal
ETL &
Mes
sage
buse
s© Hortonworks Inc. 2011
Hadoop Enterprise ArchitectureConnecting All of Your Big Data
18
EDW Data Marts
BI / Analytics
Traditional Data Warehouses, BI & AnalyticsServing Applications
Web Serving
NoSQLRDMS …
Unstructured Systems
Apache HadoopEsTsL (s = Store) Custom Analytics
Serving Logs
Social Media
Sensor Data
Text Systems
…
Traditional ETL &Message buses
Traditi
onal
ETL &
Mes
sage
buse
s© Hortonworks Inc. 2011
Hadoop Enterprise ArchitectureConnecting All of Your Big Data
19
EDW Data Marts
BI / Analytics
Traditional Data Warehouses, BI & AnalyticsServing Applications
Web Serving
NoSQLRDMS …
Unstructured Systems
Serving Logs
Social Media
Sensor Data
Text Systems
…
80-90% of data produced today is unstructured
Gartner predicts 800% data growth over next 5 years
Traditional ETL &Message buses
Traditi
onal
ETL &
Mes
sage
buse
s
Apache HadoopEsTsL (s = Store) Custom Analytics
© Hortonworks Inc. 2011
The Hadoop Market
© Hortonworks Inc. 2011 20
21
Market Drivers for Apache Hadoop
• Business drivers−Identified high value projects that require use of more data−Belief that there is great ROI in mastering big data
• Financial drivers−Growing cost of data systems as proportion of IT spend−Cost advantage of commodity hardware + open source
Enables departmental-level big data strategies
• Technical drivers−Existing solutions failing under growing requirements
3Vs - Volume, velocity, variety−Proliferation of unstructured data
Significant opportunity for Hadoop in enterprise data architectures
© Hortonworks Inc. 2011
22
Market Opportunity for Hadoop
• Current−Apache Hadoop can become de facto platform for managing
unstructured data in the enterprise−Enable new breed of applications to be built on top of Apache Hadoop
• Future−Hadoop becomes the next generation enterprise data architecture
© Hortonworks Inc. 2011
23
Market Dynamics
• Technology & knowledge gaps are preventing Apache Hadoop from becoming an enterprise standard−Difficult to install and deploy Hadoop projects −Lack of technical content to assist−Demand for knowledgeable developers far exceeds supply
• Virtually every F500 company is constructing a Hadoop strategy−But most are still in POC/experimentation phase with Hadoop
• Top ISV/OEMs working to create Hadoop strategies−Driven by customer demand
• Community is becoming increasingly confused by all of the noise−Multiple distributions, many vendor announcements−Fear of market fragmentation
© Hortonworks Inc. 2011
24
Conclusion
• There is not a Hadoop market to “win” today−Most organizations haven’t moved to full-scale production−Lack of mass adoption limiting short-term monetization opportunities−Need to drive Apache Hadoop as a unifying standard
• In order to succeed, we need to enable the market−Continue investment to overcome technology gaps−Enable a vibrant partner ecosystem−Expand availability of content and services to address knowledge gaps
How will Hortonworks do that?
© Hortonworks Inc. 2011
Hortonworks Strategy
© Hortonworks Inc. 2011 25
Hortonworks Strategy #1
Overcome Technology Gaps
• Make Apache Hadoop projects easier to install, manage & use−Regular sustaining releases−Projects released as binary (RPM, .deb)−Open source Management & Monitoring
• Make Apache Hadoop more robust−Performance gains−High availability−Administration & monitoring
All done within Apache Hadoop community
• Develop collaboratively with community
• Complete transparency• All code contributed
back to Apache
Anyone should be able to easily deploy the Hadoop projects from Apache
26© Hortonworks Inc. 2011
27
Hortonworks Strategy #2
Enable a Vibrant Ecosystem
• Unify the community around a strong Apache Hadoop offering
• Make Apache Hadoop easier to integrate & extend
−Work closely with partners to define and build open APIs
−Everything contributed back to Apache
• Provide enablement services as necessary to optimize integration
Hardware Partners Cloud & Hosting Platform Partners
DW, Analytics & BI Partners
Serving & Unstructured Data Systems
Partners
Integration & Services PartnersHadoop Application Partners
© Hortonworks Inc. 2011
28
Hortonworks Strategy #3
Overcome Knowledge Gaps
• Improve user experience with Apache Hadoop software−Binaries, installers, etc.
• Expand Apache Hadoop technical content−Core content on Apache.org
Docs, installation guides, etc.−Advanced tools on Hortonworks.com
Best practices, screencasts, forums, etc.
• Extensive Hadoop training & certification program
• Expert technical support services
© Hortonworks Inc. 2011
29
Rationale for Hortonworks Strategy
• Strong interest from community (enterprises and ISV/OEMs) in a complete, enterprise-viable, Apache Hadoop platform−Strong desire for core to remain unified and strong, avoid UNIX wars II−Fremium model seen as a barrier to growth and adoption
• Highly defensible because of Hortonworks leadership in core projects
• Proven experience executing open source business models−Rob Bearden & Benchmark
© Hortonworks Inc. 2011
30
Thank You.
© Hortonworks Inc. 2011