Upload
others
View
6
Download
0
Embed Size (px)
Citation preview
1
Hadoop Beyond Hype: Complex Adaptive Systems ConferenceNov 16, 2012
Viswa SharmaSolutions ArchitectTata Consultancy Services
TCS Confidential
Agenda
What is HadoopWhy Hadoop? The Net Generation is here
Sizing the HadoopGartner Hadoop Hype Cycle
TCS view pointHadoop Eco System LandscapeExamples of uses of HadoopTransformational Platform
Ad Hoc Analysis Analytics with Hadoop
Applications of Hadoop Analytics Near Real Time AnalysisWhat is the market
Thank You
3
What is Hadoop?
‐ 3 ‐
SCALE OUT COMPUTING PLATFORM WHICH PROCESSES INTENET SIZE DATA
Hadoop is the Name of a Toy Elephant
COMMODITY HARDWARE
PARALLEL PROGRAMMNG ENVIRONMENT GOOGLE MAP/REDUCE
OPEN SOURCE SOFTWARE
PARALLEL FILE SYSTEM MODLED AFTER GOOGLE FILE SYSTEM
Given To
4
Big Data : Web Scale50 billion web pages800 million Facebook users1000 million Facebook pages200 million Twitter accounts100 million tweets per day5 billion Google queries per dayMillions of servers, Petabytes of data
Varieties of DataVideo / AudioImages / PicturesDiverse internal and external data
Sources of DataNews / Feeds / Blogs / forumsGroups / Polls / Chats / Wiki
Why Hadoop? The Net Generation is here
Information is exploding all around – But the challenge is to understand the it
The Net Generation is inter-connected on a variety of Web based and Digital channels.
5
Sizing the Hadoop
Source: Pawyi Lee
6
Hadoop Hype Cycle Starts
Gartner Hype Cycle 2012
7
TCS View Point: Hadoop Technology is here now…
Big Data Technology handles data at extreme scale and is
characterized by
•Massive parallel computing to divide and conquer workloads.
•Extremely flexible to allow unlimited data manipulation and transformation
•Massively scalable in terms of both technology and cost
Hadoop : Massively Parallel Processing Capability, running on
commodity hardware
Hbase and Hadoop/HDFS are designed to store and manage
massive amounts of data
Hive, Mahout and R, enable query, analysis and running in‐memory compute‐intensive applications
The ecosystem of Hadoop Technology is affordable, and within
the reach of companies
8
Hadoop Eco System Landscape
No SQL
Hadoop Distributions
Cloud Distributions
Distributed File SystemMap‐Reduce
Appliance / MR Re-write
Analytics / Visualization
Data Integration
Data Integration
Query‐Oriented Data Warehouse
CEP
Search
Tools
Languages / Libraries
9
Examples of Uses of Hadoop …
InsuranceClaims analysis & Premium forecasting
Claims Fraud detection & Revenue comparison
Overall risk analysis & Re‐insurance risk assessment
Policy pricing &‐ Customer retention
Travel, Transportation & HospitalityBetter Travel searchesGeo‐fencingCross selling and up‐sellingIntelligent traffic management
GovernmentFraud detection and cyber securityCompliance and regulatory analysisEnergy consumption and carbon
footprint managementDisaster Management
Energy, Resources & UtilitiesWeather impact analysis on power
generationOil Rig data monitoringSmart meter data analysisTerrain data analysis for wind
energy
Hi TechProcess control for Microchip fabricationNetwork ManagementSupply Chain Management and analysisNew Product developmentContent management solutions
Smart Grids
10
HDFSHDFS
MapReduce / Hive /PigMapReduce / Hive /Pig
MapReduce / Hive / Pig could be used to transform data within the distributed file
system (HDFS).
Hadoo
p Cluster
TransactionalSystems
DataWarehouse
Within Hadoop Ecosystem
Tools like SQOOP could be leveraged to load data from and to HDFS
Hadoop as Transformation Platform in ETL
Less number of Higher end nodes
11
Transactional Systems Data
Warehouse
Tools like SQOOP could be leveraged to load data from
and to HDFS
Hadoop as an ad-hoc analysis platform
HDFSHDFS
MapReduce / Hive /PigMapReduce / Hive /Pig
MapReduce / Hive / Pig could be used to transform data within the distributed file system (HDFS), this could provide the business analytics team a platform
for innovation
Had
oop
Clu
ster
Hadoop as an ad-hoc analysis platform
Higher number of nodes for larger storage
Data at lowest grain
12
Analytics With Hadoop
Prescriptive Optimizing outcomes
Identifying possible outcomesDomain ExpertiseText AnalyticsData MiningKnowledge
Predictive Modeling
Statistical AnalysisVisual AnalyticsForecasting
Describing and analyzing outcomesAnalysis, Drill‐Down, Ad‐Hoc Reporting
Dashboards and ScorecardsVisual Analytics
OptimizationSimulation
Descriptive
Predictive
(What should happen?)
(What will happen?)
(What has happened?)
13
Applications for Hadoop Analytics
Homeland Security
Finance Smarter Healthcare Multi-channel sales
Telecom
Manufacturing
Traffic Control
Trading Analytics
Fraud and Risk
Log Analysis
Search Quality
Retail: Churn, NBO
14
Hadoop Near Real Time Analytics
TransactionalSystems
Rule / Pattern Matching on Streams.Dist Processing : Processing is distributed on a set of nodes and not the data.
Complex Event Processing
Rule / Pattern Discovery on Streams.Dist Processing : Both Processing and data are distributed on a set of nodes.
e.g. C-MR (academic project)
Distributed Stream Processing [using MR]
[Time Series] Mining and Rule Discovery
Online
• Fraud Detection• Online Price Mgmt• Yield Management
Rule / Pattern Discovery [on Time Series]Dist Processing : Map-Reduce or scalable
time-series pattern mining.
Batch Map-Reduce Processing
Offline
Rule Application
Rule Discovery
• Learn Frauds Patterns• Demand Signal Refinement
• Real Time Self Learning Systems• Complex / Dynamic Pattern Matching e.g. Trading Patterns,
Mining Current Influencers
External Inputs(incl Social Media)
15
What is the Market?
16
5 December, 2012
Thank You