Informatica Big Data ManagementJoel LaPlountInformatica Product Management
Data Powers Businesses
Big Data = Big Opportunity
Sources:Informatica Big Data Survey, March 2012Cisco, The Zettabyte Era - Trends and Analysis, May 2013
67%Of respondents see big data as an opportunity for their organization.
By 2020, data is predicted to grow at least 75 times and more than 1/3 will pass through the Cloud.
Example Use Cases
Advanced Analytics
Fraud / Risk Management
Process / AssetOptimization
DATA LAKE
The Reality
By 201585% of Fortune 500 organizations will fail to effectively exploit big data for competitive advantage.
Companies Taking on the Big Data Challenge
Their Early Journey
All this new data –let’s just spin up a Hadoop cluster.
Now all we have to do is ingest, blend and prep
data…
STOP! How do we operationalize the
results? Reuse?
The “sandbox” is up – experiments are so much fun!!!
No real business value – no ROI –we are STUCK!
Oops! So many issues with data –just hand-code!
Biz’ wants more insights – let’s put it in the data lake!
We need more Hadoop
developers!!!
Why Do Big Data Projects Fail?“rapid intake of new data sources”Vishal, VP Data Architecture
“too many data silos making it impossible to know what data can be trusted”Pete, Chief Data Officer
“simplify the work of ingesting and mapping data...so that we need fewer specialized development resources”Ron, VP Global Information Systems
“need to ensure confidence in data integrity, accuracy, and timeliness”Ron, VP Global Information Systems
“need code re-usability and code maintainability”Ben, Director of Platform Architecture
“regulations have become very strict and very precise – lots of gaps in the quality of the data”Christine, Manager Data Management
“prepping and cleaning the data used to take us 2-3 weeks”Vishal, VP Data Architecture
“transforming data management from a labor intensive, qualitative approach to a systematic approach…to classify data and understand lineage”Ned, Senior Vice President
What’s Required for Successful Big Data Projects?
Big data does not mean NO
data integration.
Big data does not mean BAD quality
information.
Big data does not mean
PROLIFERATIONof sensitive data.
How do you certify and govern big data?
How do you quickly integrate big data?
How do you secure big data?
Introducing Informatica Big Data Management
Industry’s Only Single Integrated Platform for Big Data ManagementInformatica Big Data Management
Analytical Applications
Data Warehouses, Data Lakes, NoSQL
PILLAR 1Big Data Integration
PILLAR 2Big Data Governance & Quality
PILLAR 3Big Data Security
PILLAR 1 – Big Data Integration
Big Data Cannot Be Tackled Manually
The Race to Business Value Will Not Be Won By Hand
MoreVolume
MoreVariety
MoreVelocity
More DataConsumers
More DataSilos
More DataPlatforms
So Big Data Goes Unused or Is Delivered Late
Data Developer IT Data Management Business Analyst
Overwhelming Manual Efforts Complex Processes Analysis Too Late
Big Data Integration For Maximum Performance
Ingest Instantly Process Everything Deploy Optimally
200+Pre-Built
Connectors
CloudConnectivity
Real-TimeStreaming
100+Pre-Built Parsers and
Transformations
GraphicalDevelopment
DynamicProcess and Mappings
MultipleEngines Supported
(MapReduce, Spark, etc.)
High-SpeedProcessing For Complex
Workloads
AccessWith Brokering & Federation
PILLAR 2 – Big Data Governance & Quality
Big Data Is Difficult To Trust
ChangingNeeds for Quality
Same data used formultiple purposes
HiddenRelationships
Everything and everyoneis interconnected
MagnifiedTrust Issues
New sources ofexternal data
And Regulations And Controls Are Harder To Meet
SOXPCIHIPAAFISMA
ISOGLBANIST
Big Data Governance for Agility and Trust
Collaborative Stewardship 360 Degree Insight Complete Confidence
BusinessContext Provisioning
Role-specific interfaces,business glossary and rules
PolicyDriven Processes
Workflow, approvals, voting
RelationshipDiscovery and View
Big data matching and linking
CatalogOf All Metadata
Smart knowledge graph
Certificationwith Data Quality
Validation, enrichment, standardization
TransparencyIn and Out of the Enterprise
Full data and metadata lineage
PILLAR 3 – Big Data Security
Perimeter Security Is Insufficient
Perimeter security: Outside in security
• Not if, but when• Network focused• Attacks will only grow
Big Data: Bigger Risk
Sensitive Data
Security Exposure
• An exponential attack surface• With exponential risks
Big Data Security Foundation: The ‘Data Perimeter’
Risk Analytics360 Degree Visibility Policy-Based Protection
Risk IdentificationProliferation, Cost, Protection,
Use, Location
DetectionRisky Users
Discoveryof Sensitive Data, with Context
VisualizationsWho, Where, When, What
CentralizedManagement of Rules
De-IdentificationFor Test, Reporting, Analytics
The 3 Pillars of Informatica Big Data ManagementBig Data
Integration• Simple Visual Environment &
Templates• Optimized Execution & Flexible
Deployment• 100’s of Pre-built Transforms,
Connectors & Parsers• Broker-based Data Ingestion
Big Data Governance & Quality
• Collaboration Capabilities• Business Glossary• Profiling and Data Quality
• 360° Relationship Views • End-to-end Data Lineage
Big Data Security
• Sensitive Data Discovery & Classification
• Proliferation Analysis
• Risk Assessment• Persistent & Dynamic Data
Masking
Big Data ManagementKey New Features
A Big Data Fabric Enables Productivity, Repeatability, Collaboration
Automate For Maximum Productivity
100+ PRE-BUILT PARSERS
AND TRANSFORMATIONS
200+PRE-BUILT CONNECTORS
DynamicPROCESSES AND
MAPPINGS
GraphicalDEVELOPMENT
Develop More Quickly And Staff More Quickly
27
HadoopDevelopers
InformaticaDevelopers 100,000+
TRAINED DEVELOPERS WORLDWIDE
500% MORE PRODUCTIVE THAN HAND-CODING
0%RISK OF REWRITING
OUTDATED CODE
Develop Fit-for-Purpose Assets & Drive Collaborative Governance
Apply
DataGovernance
Apply
Measureand
MonitorDefine
Discover
IT Business
Curation of Fit-for-Purpose Data Assets
Raw Prepared Cleansed/ Matched
Hadoop Data Lake
Efficiency & Flexibility with Dynamic Mappings• Mass Ingestion: Build a template once – automate mapping
execution for 1000’s of sources with different schemas automatically• Mapping self adjusts dynamically to external schema changes and
column characteristics
Design time
Run timeAvailable in PC V10.0!
Choice of Execution Engines• For Hadoop execution:
engines as native YARN apps
• Choice of execution on • Map-Reduce• Blaze or • INFA engines outside of
Hadoop • Future: spark based execution as
well as a smart optimizer which decides based on workload
HADOOP Cluster
HDFS
Map-Reduce
Hive Runtime
DIS
INFA Hive Executor
Data Engine Compiler
Blaze Executor
Blaze Runtime
DIS CAL
Hive Driver
Hive MetaStore
YARN
Blaze
Hadoop CAL
Smart Optimizers• In-built mapping optimizer automatically tunes and re-arranges the
mapping for high performance• Early selection, Early projection, Mapping pruning, Semi-join, Join re-ordering
• Automatic partitioning support based on statistics and other heuristics
• Advanced full pushdown optimization support
31
Orderkey = L_ORDERKEY and L_EXTENDEDPRICE < 1000and id1 + id2 > 47 Orderkey = L_ORDERKEY
L_EXTENDEDPRICE < 1000
Id1 + id2 > 47
Enterprise Information Catalog : Basis for Data Intelligence
EICRelationshipsCatalogStatistics
Live Data MapRulesGlossaryRatings
All Informatica
Repositories
Applications, Business glossary &
context
3rd party – BI, Modeling, Big Data,
RDBMS
User Ratings, Feedback,
Operational Stats
• Exploration• Semantic Search• Relationship Discovery
Data Discovery Sensitive Data Tracking
Stewardship & Governance
Smart Suggestions
Live Data Map
Knowledge Graph of all enterprise data assets
• Recommendations• 360 degree views• User Ratings
Project Sonoma : Intelligent Data Lake
EnterpriseInformation
Catalog
BI & Analytics
Self-ServiceData Discovery
IT Monitoring& Tracking
Prepare (Rev)Raw
DataPublished Data Sets
DATADATA
METADATA
Self-Service for Analysts
• Search & Discover
• Prepare & Publish
Visibility for IT
• Usage tracking & monitoring
• Lineage & Security
• Operate at scale
Project Sonoma: Intelligent Data LakeData Analysts
• Enterprise data assets search and discovery
• Data acquisition from on-premise and cloud sources, batch and real-time
• Data set recommendations
• Excel-like Data preparation, enrichment for large data sets
• Data publishing and sharing
Why Informatica?
Data Is ALL We Do
Innovation and Leadership
Magic Quadrant for Data Integration Tools
Magic Quadrant for Enterprise Integration Platform as a Service
Magic Quadrant for Data Quality Tools
Magic Quadrant for Data Masking Technology
Magic Quadrant for Structured Data Archiving
and Application Retirement
Magic Quadrant for Master Data Management of
Customer Data Solutions
These graphics were published by Gartner, Inc. as part of larger research documents and should be evaluated in the context of the entire documents. The Gartner documents are available upon request from Informatica. Gartner does not endorse any vendor, product, or service depicted in its research publications, and does not advise technology users to select only those vendors with the highest ratings. Gartner research publications consist of the opinions of Gartner's research organization and should not be construed as statements of fact. Gartner disclaims all warranties, expressed or implied, with respect to this research, including any warranties of merchantability or fitness for a particular purpose.
Informatica Big Data Customers (Sample)
Informatica Big Data Ecosystem Partners
Thank You!
Big Data Management V10.1 LaunchMay 12Webinar – Check Informatica.com