Upload
dyami
View
44
Download
1
Embed Size (px)
DESCRIPTION
Business Intelligence & Big Data Analytics. Hamid Djam Principal Architect Business Intelligence & Analytics. - PowerPoint PPT Presentation
Citation preview
1© Copyright 2011 EMC Corporation. All rights reserved.
EMC* makes no representation and undertakes no obligations with regard to product planning information, anticipated product characteristics, performance specifications, or anticipated release dates (collectively, “Roadmap Information”). Roadmap Information is provided by EMC as an accommodation to the recipient solely for purposes of discussion and without intending to be bound thereby.
1
Hamid Djam
Principal Architect
Business Intelligence & Analytics
Business Intelligence & Big Data Analytics
2© Copyright 2011 EMC Corporation. All rights reserved.
EMC* makes no representation and undertakes no obligations with regard to product planning information, anticipated product characteristics, performance specifications, or anticipated release dates (collectively, “Roadmap Information”). Roadmap Information is provided by EMC as an accommodation to the recipient solely for purposes of discussion and without intending to be bound thereby.
2
• EMC makes no representation and undertakes no obligations with regard to product planning information, anticipated product characteristics, performance specifications, or anticipated release dates (collectively, “Roadmap Information”).
• Roadmap Information is provided by EMC as an accommodation to the recipient solely for purposes of discussion and without intending to be bound thereby.
• Roadmap information is EMC Restricted Confidential and is provided under the terms, conditions and restrictions defined in the EMC Non-Disclosure Agreement in place with your organization.
Disclaimer
3© Copyright 2011 EMC Corporation. All rights reserved.
EMC* makes no representation and undertakes no obligations with regard to product planning information, anticipated product characteristics, performance specifications, or anticipated release dates (collectively, “Roadmap Information”). Roadmap Information is provided by EMC as an accommodation to the recipient solely for purposes of discussion and without intending to be bound thereby.
3
Why A Complete Big Data Analytics Stack Matters
• Big Data is the new source for economic value
• The clearest path to competitive advantage
• The ultimate manifestation of fact-based decision making
• The net new catalyst for business innovation and workplace evolution
• The driving force of a new computing paradigm: data computing
4© Copyright 2011 EMC Corporation. All rights reserved.
EMC* makes no representation and undertakes no obligations with regard to product planning information, anticipated product characteristics, performance specifications, or anticipated release dates (collectively, “Roadmap Information”). Roadmap Information is provided by EMC as an accommodation to the recipient solely for purposes of discussion and without intending to be bound thereby.
New Realities: Your Data Rules the World
5© Copyright 2011 EMC Corporation. All rights reserved.
EMC* makes no representation and undertakes no obligations with regard to product planning information, anticipated product characteristics, performance specifications, or anticipated release dates (collectively, “Roadmap Information”). Roadmap Information is provided by EMC as an accommodation to the recipient solely for purposes of discussion and without intending to be bound thereby.
5
Challenges in Today’s DW Environments…
Traditional solutions cannot meet new challenges
• Critical business insight is outside enterprise data warehouse because the traditional DW solutions cannot absorb data fast enough
– 100s of data marts– ‘Shadow’ databases
• Data is everywhere and growing
– 44x data growth by 2020
Enterprise Data Warehouse
But it only holds 10 % of data
Data-marts and‘personal databases’ e.g. Access, Excel ……
Makeup up 90% of corporate data
Source: IDC Digital Universe Study, sponsored by EMC, May 2010
6© Copyright 2011 EMC Corporation. All rights reserved.
EMC* makes no representation and undertakes no obligations with regard to product planning information, anticipated product characteristics, performance specifications, or anticipated release dates (collectively, “Roadmap Information”). Roadmap Information is provided by EMC as an accommodation to the recipient solely for purposes of discussion and without intending to be bound thereby.
6
DW Challenges Resolved With, BI as a Service
SpeedAgilityFlexibilityChange Short term
StabilitySecurityControlStandardsLong term
BUSINESS IT Long Project Duration.
Gap in understanding business requirements.
Business creating their own data marts.
Inconsistent data between IT systems and business systems.
Reference: Nine Secrets to Building an Agile, Adaptable BI Environment ,TDWI
7© Copyright 2011 EMC Corporation. All rights reserved.
EMC* makes no representation and undertakes no obligations with regard to product planning information, anticipated product characteristics, performance specifications, or anticipated release dates (collectively, “Roadmap Information”). Roadmap Information is provided by EMC as an accommodation to the recipient solely for purposes of discussion and without intending to be bound thereby.
7
EMC IT: Offering IT-as-a-Service
Infrastructure-as-a-service
NetworkStorage &
backupCompute
Platform-as-a-service
Greenplum SQL Server
Application Platforms
Enterprise Applications/ Software-as-a-service
MDM
ERP
Governance, risk, compliance Business
intelligence
CRM
Application Server
Info. Lifecycle Mgmt Ent. Content Mgmt
Integration Web server
App. frameworks
Security
Runtime environments Development tools
vBlock
Oracle …DatabasePlatform
Apps
Infrastructure
Desktop-as-a-serviceVirtual DesktopsClient Devices
8© Copyright 2011 EMC Corporation. All rights reserved.
EMC* makes no representation and undertakes no obligations with regard to product planning information, anticipated product characteristics, performance specifications, or anticipated release dates (collectively, “Roadmap Information”). Roadmap Information is provided by EMC as an accommodation to the recipient solely for purposes of discussion and without intending to be bound thereby.
8
• Guarantees data availability where and when it is required
• Movement and transformation of enterprise information
• Interconnectivity of IT portfolio
• Standardized formats and service interfaces – SOA
Data Integration
• Identification and deduplication of shared master data
• Cross-referencing and disambiguation
• Hierarchy management• Data governance
framework and stewardship processes
Master Data Management
• Unstructured data storage and management
• Workflow-based publishing & versioning services
• Tie-in to enterprise portal and user identity / security strategies
Content Management
• Framework and organization to ensure management of data as a strategic corporate asset
• Data stewardship• Policies and procedures;
monitoring and measuring
Data Governance
• Data warehouse methodology – envisioning to deployment
• Business use-case- or function-specific datamarts / reporting solutions
• Moving with agility fromreactive to predictive capability
Business Intelligence
• Assurance that trustworthy data is accessible at time of demand
• Standardization& cleansing
• Business data rule enforcement
• Stale data refresh• Augmentation from
external sourcesInformation Quality
Information Management Core Disciplines
9© Copyright 2011 EMC Corporation. All rights reserved.
EMC* makes no representation and undertakes no obligations with regard to product planning information, anticipated product characteristics, performance specifications, or anticipated release dates (collectively, “Roadmap Information”). Roadmap Information is provided by EMC as an accommodation to the recipient solely for purposes of discussion and without intending to be bound thereby.
9
10© Copyright 2011 EMC Corporation. All rights reserved.
EMC* makes no representation and undertakes no obligations with regard to product planning information, anticipated product characteristics, performance specifications, or anticipated release dates (collectively, “Roadmap Information”). Roadmap Information is provided by EMC as an accommodation to the recipient solely for purposes of discussion and without intending to be bound thereby.
10
Building The Industry’s Only Complete Big Data Analytics “Stack”
Greenplum ChorusEnterprise Collaboration Platform for Data
Greenplum Database
Enterprise & Community Editions
World’s Most Scalable MPP Database Platform
Analytic Toolsets(Business Analytics, BI, Statistics, etc.)
Greenplum HD
Hadoop Enterprise & Community Editions
Enterprise Analytics Platform for Unstructured Data
Greenplum Data Computing AppliancesPurpose-built for Big Data Analytics
11© Copyright 2011 EMC Corporation. All rights reserved.
EMC* makes no representation and undertakes no obligations with regard to product planning information, anticipated product characteristics, performance specifications, or anticipated release dates (collectively, “Roadmap Information”). Roadmap Information is provided by EMC as an accommodation to the recipient solely for purposes of discussion and without intending to be bound thereby.
11
GREENPLUM DATABASE
Industry-Leading Massively Parallel Processing (MPP)
Performance
Click icon to add picture
12© Copyright 2011 EMC Corporation. All rights reserved.
EMC* makes no representation and undertakes no obligations with regard to product planning information, anticipated product characteristics, performance specifications, or anticipated release dates (collectively, “Roadmap Information”). Roadmap Information is provided by EMC as an accommodation to the recipient solely for purposes of discussion and without intending to be bound thereby.
12
Building The Industry’s Only Complete Big Data Analytics “Stack”
Greenplum ChorusEnterprise Collaboration Platform for Data
Greenplum Database
Enterprise & Community Editions
World’s Most Scalable MPP Database Platform
Analytic Toolsets(Business Analytics, BI, Statistics, etc.)
Greenplum HD
Hadoop Enterprise & Community Editions
Enterprise Analytics Platform for Unstructured Data
Greenplum Data Computing AppliancesPurpose-built for Big Data Analytics
13© Copyright 2011 EMC Corporation. All rights reserved.
EMC* makes no representation and undertakes no obligations with regard to product planning information, anticipated product characteristics, performance specifications, or anticipated release dates (collectively, “Roadmap Information”). Roadmap Information is provided by EMC as an accommodation to the recipient solely for purposes of discussion and without intending to be bound thereby.
13
EMC Greenplum Database IsPurpose-built for Big Data
• EMC Greenplum is a shared nothing, massively parallel processing (MPP) data warehouse system
• Core principle of data computing is to move the processing dramatically closer to the data and to the people
Fast DataLoading
Extreme Performance
& Elastic Scalability
Unified Data Access
14© Copyright 2011 EMC Corporation. All rights reserved.
EMC* makes no representation and undertakes no obligations with regard to product planning information, anticipated product characteristics, performance specifications, or anticipated release dates (collectively, “Roadmap Information”). Roadmap Information is provided by EMC as an accommodation to the recipient solely for purposes of discussion and without intending to be bound thereby.
14
Greenplum 4.0: Database Architecture
NetworkInterconnect
... ...
......MasterServers
Query planning & dispatch
SegmentServers
Query processing & data storage
SQL
MapReduce
SQL
MapReduce
ExternalSources
Loading, streaming, etc.
Massively Parallel ProcessingAnd Linear Performance Scalability
16© Copyright 2011 EMC Corporation. All rights reserved.
EMC* makes no representation and undertakes no obligations with regard to product planning information, anticipated product characteristics, performance specifications, or anticipated release dates (collectively, “Roadmap Information”). Roadmap Information is provided by EMC as an accommodation to the recipient solely for purposes of discussion and without intending to be bound thereby.
Platform IndependenceDelivers Choice and Flexibility
Software-Only• On your x86 hardware• Flexibility for any workload• Ideal for Q/A or DR
Virtualized Infrastructure• Pool resources• Elastic scalability• Ideal for Test &
Development
Data Computing Appliance• Optimized Price/Performance• Minimum time-to-value• Ideal for Production
Environments
17© Copyright 2011 EMC Corporation. All rights reserved.
EMC* makes no representation and undertakes no obligations with regard to product planning information, anticipated product characteristics, performance specifications, or anticipated release dates (collectively, “Roadmap Information”). Roadmap Information is provided by EMC as an accommodation to the recipient solely for purposes of discussion and without intending to be bound thereby.
Mature Enterprise Platform
PRODUCTFEATURES
CLIENT ACCESS & TOOLS
Multi-Level Fault Tolerance
Shared-Nothing MPP
Parallel Query Optimizer
Polymorphic Data Storage™
CLIENT ACCESS
ODBC, JDBC, OLEDB, etc.
CORE MPPARCHITECTURE
Parallel Dataflow Engine
gNet™ Software Interconnect
MPP Scatter/Gather Streaming™
Online System Expansion Workload ManagementGPDB ADAPTIVE
SERVICES
LOADING & EXT. ACCESS
Petabyte-Scale Loading
Trickle Micro-Batching
Anywhere Data Access
STORAGE & DATA ACCESS
Hybrid Storage & Execution(Row- & Column-Oriented)
In-Database Compression
Multi-Level Partitioning
Indexes – Btree, Bitmap, etc.
LANGUAGE SUPPORT
Comprehensive SQL
Native MapReduce
SQL 2003 OLAP Extensions
Programmable Analytics
3rd PARTY TOOLS
BI Tools, ETL Tools
Data Mining, etc
ADMIN TOOLS
GP Performance Monitor
pgAdmin3 for GPDB
18© Copyright 2011 EMC Corporation. All rights reserved.
EMC* makes no representation and undertakes no obligations with regard to product planning information, anticipated product characteristics, performance specifications, or anticipated release dates (collectively, “Roadmap Information”). Roadmap Information is provided by EMC as an accommodation to the recipient solely for purposes of discussion and without intending to be bound thereby.
18
EMC GREENPLUM HD
Delivering Enterprise-
Ready Apache Hadoop
19© Copyright 2011 EMC Corporation. All rights reserved.
EMC* makes no representation and undertakes no obligations with regard to product planning information, anticipated product characteristics, performance specifications, or anticipated release dates (collectively, “Roadmap Information”). Roadmap Information is provided by EMC as an accommodation to the recipient solely for purposes of discussion and without intending to be bound thereby.
19
Building The Industry’s Only Complete Big Data Analytics “Stack”
Greenplum ChorusEnterprise Collaboration Platform for Data
Greenplum Database
Enterprise & Community Editions
World’s Most Scalable MPP Database Platform
Analytic Toolsets(Business Analytics, BI, Statistics, etc.)
Greenplum HD
Hadoop Enterprise & Community Editions
Enterprise Analytics Platform for Unstructured Data
Greenplum Data Computing AppliancesPurpose-built for Big Data Analytics
20© Copyright 2011 EMC Corporation. All rights reserved.
EMC* makes no representation and undertakes no obligations with regard to product planning information, anticipated product characteristics, performance specifications, or anticipated release dates (collectively, “Roadmap Information”). Roadmap Information is provided by EMC as an accommodation to the recipient solely for purposes of discussion and without intending to be bound thereby.
20
Greenplum HD – Enterprise Ready Hadoop Platform for Unstructured Data
• Greenplum Hadoop is faster, more dependable, and easier to use
– Faster to address the growth of unstructured data
– EMC reliable for the Enterprise
– Easier to use with existing systems and tools
21© Copyright 2011 EMC Corporation. All rights reserved.
EMC* makes no representation and undertakes no obligations with regard to product planning information, anticipated product characteristics, performance specifications, or anticipated release dates (collectively, “Roadmap Information”). Roadmap Information is provided by EMC as an accommodation to the recipient solely for purposes of discussion and without intending to be bound thereby.
21
Why Hadoop?
• With massive growth of unstructured data, open-source software, Apache Hadoop has quickly become an important new data platform and technology
– We've seen this first-hand with customers deploying Hadoop alongside Greenplum databases
22© Copyright 2011 EMC Corporation. All rights reserved.
EMC* makes no representation and undertakes no obligations with regard to product planning information, anticipated product characteristics, performance specifications, or anticipated release dates (collectively, “Roadmap Information”). Roadmap Information is provided by EMC as an accommodation to the recipient solely for purposes of discussion and without intending to be bound thereby.
22
Why EMC Greenplum HD?• EMC has the technical depth, expertise and
critical mass in building the scalable and reliable distributed data processing systems necessary to drive technical innovation into Hadoop
• Hadoop needs to become “mission critical” and “easier to use and manage”
– HDFS optimizations, workload management, job scheduling, systems management, etc.
– Fault-tolerance: Eliminate SPOF for Name-Node, Job Tracker and other key components underlying Hadoop
23© Copyright 2011 EMC Corporation. All rights reserved.
EMC* makes no representation and undertakes no obligations with regard to product planning information, anticipated product characteristics, performance specifications, or anticipated release dates (collectively, “Roadmap Information”). Roadmap Information is provided by EMC as an accommodation to the recipient solely for purposes of discussion and without intending to be bound thereby.
23
Greenplum HD: Hadoop Software Distributions
• Introducing Greenplum HD, enterprise-ready Apache Hadoop software distributions
–Community Edition software• 100% open source
–Enterprise Edition software• Advanced features• 100% API compatible
24© Copyright 2011 EMC Corporation. All rights reserved.
EMC* makes no representation and undertakes no obligations with regard to product planning information, anticipated product characteristics, performance specifications, or anticipated release dates (collectively, “Roadmap Information”). Roadmap Information is provided by EMC as an accommodation to the recipient solely for purposes of discussion and without intending to be bound thereby.
24
Greenplum HD Data Computing Appliance
• Introducing the world’s first:– high-performance– purpose-built– data co-processing Hadoop
appliance
• Combining Hadoop and Greenplum Database in one appliance
25© Copyright 2011 EMC Corporation. All rights reserved.
EMC* makes no representation and undertakes no obligations with regard to product planning information, anticipated product characteristics, performance specifications, or anticipated release dates (collectively, “Roadmap Information”). Roadmap Information is provided by EMC as an accommodation to the recipient solely for purposes of discussion and without intending to be bound thereby.
THE ANSWERMACHINEDATA IN. DECISIONS OUT.
Introducing the Greenplum Data Computing Appliance
26© Copyright 2011 EMC Corporation. All rights reserved.
EMC* makes no representation and undertakes no obligations with regard to product planning information, anticipated product characteristics, performance specifications, or anticipated release dates (collectively, “Roadmap Information”). Roadmap Information is provided by EMC as an accommodation to the recipient solely for purposes of discussion and without intending to be bound thereby.
26
Building The Industry’s Only Complete Big Data Analytics “Stack”
Greenplum ChorusEnterprise Collaboration Platform for Data
Greenplum Database
Enterprise & Community Editions
World’s Most Scalable MPP Database Platform
Analytic Toolsets(Business Analytics, BI, Statistics, etc.)
Greenplum HD
Hadoop Enterprise & Community Editions
Enterprise Analytics Platform for Unstructured Data
Greenplum Data Computing AppliancesPurpose-built for Big Data Analytics
27© Copyright 2011 EMC Corporation. All rights reserved.
EMC* makes no representation and undertakes no obligations with regard to product planning information, anticipated product characteristics, performance specifications, or anticipated release dates (collectively, “Roadmap Information”). Roadmap Information is provided by EMC as an accommodation to the recipient solely for purposes of discussion and without intending to be bound thereby.
27
Key Architectural Principles• Keep it simple
• Build on standard hardware components– Performance comes from our software architecture– Best of breed x86 and Ethernet networking technologies– Benefit from broad ecosystem innovation
• Make it modular for easy scaling
• SAN connectivity designed in
• Focus on Data Computing, not Data Warehousing– Greenplum Database– SAS Analytics– Hadoop
28© Copyright 2011 EMC Corporation. All rights reserved.
EMC* makes no representation and undertakes no obligations with regard to product planning information, anticipated product characteristics, performance specifications, or anticipated release dates (collectively, “Roadmap Information”). Roadmap Information is provided by EMC as an accommodation to the recipient solely for purposes of discussion and without intending to be bound thereby.
28
DCA Functional Components
2 GPDB Master Servers
4 GPDB Segment Servers
2 10GE Switches
Administrative Switch
8 Segment Servers
FreeFunction
alBlock
FreeFunction
alBlock
FreeFunction
alBlock
FreeFunction
alBlock
29© Copyright 2011 EMC Corporation. All rights reserved.
EMC* makes no representation and undertakes no obligations with regard to product planning information, anticipated product characteristics, performance specifications, or anticipated release dates (collectively, “Roadmap Information”). Roadmap Information is provided by EMC as an accommodation to the recipient solely for purposes of discussion and without intending to be bound thereby.
29
Scale to Multiple Racks In Granular Quarter Rack Increments
1st Rack
Add ¼ rackIncrements
+
Expansion Rack
Add ¼ rackIncrements
+ . . .
30© Copyright 2011 EMC Corporation. All rights reserved.
EMC* makes no representation and undertakes no obligations with regard to product planning information, anticipated product characteristics, performance specifications, or anticipated release dates (collectively, “Roadmap Information”). Roadmap Information is provided by EMC as an accommodation to the recipient solely for purposes of discussion and without intending to be bound thereby.
30
High Availability Built-In
Master
Segment Segment Segment Segment…
Master
Master server data protection• HW RAID protection for drive failures• Replicated transaction logs for server failure
On server failure• Standby server activated• Administrator alerted
Segment Server Data Protection• HW RAID protection for drive failures• Mirrored segments for server failures
On server failure• Mirrored segments take over with no loss of
service• Fast online differential recovery
31© Copyright 2011 EMC Corporation. All rights reserved.
EMC* makes no representation and undertakes no obligations with regard to product planning information, anticipated product characteristics, performance specifications, or anticipated release dates (collectively, “Roadmap Information”). Roadmap Information is provided by EMC as an accommodation to the recipient solely for purposes of discussion and without intending to be bound thereby.
31
GPDB HA Groups And Segment Mirrors
GPDB HA Group
GPDB HA Group
GPDB HAGroup
GPDB HA Group
P1 P2 P3 M6 M8M10
P4 P5 P6 M1 M9M11
P7 P8 P9 M2 M4M12
P10 P11 P12 M3 M5 M7
SegmentServer 1
SegmentServer 2
SegmentServer 3
SegmentServer 4
Set of Active Segment Instances
Number of primary and mirror instances shown above are for illustration purposes only. Each Segment Server in a DCA actually supports a total of 12
instances (6 primaries and 6 mirrors)
32© Copyright 2011 EMC Corporation. All rights reserved.
EMC* makes no representation and undertakes no obligations with regard to product planning information, anticipated product characteristics, performance specifications, or anticipated release dates (collectively, “Roadmap Information”). Roadmap Information is provided by EMC as an accommodation to the recipient solely for purposes of discussion and without intending to be bound thereby.
32
DCA Can Sustain Up to Four Server Failures Per Rack, One Per HA Group
GPDB HA Group
GPDB HA Group
GPDB HAGroup
GPDB HA Group
P1 P2 P3 M6 M8M10
P4 P5 P6 M1 M9M11
P7 P8 P9 M2 M4M12
P10 P11 P12 M3 M5 M7
SegmentServer 1
SegmentServer 2
SegmentServer 3
SegmentServer 4
Set of Active Segment Instances
Number of primary and mirror instances shown above are for illustration purposes only. Each Segment Server in a DCA actually supports a total of 12 instances (6
primaries and 6 mirrors)
33© Copyright 2011 EMC Corporation. All rights reserved.
EMC* makes no representation and undertakes no obligations with regard to product planning information, anticipated product characteristics, performance specifications, or anticipated release dates (collectively, “Roadmap Information”). Roadmap Information is provided by EMC as an accommodation to the recipient solely for purposes of discussion and without intending to be bound thereby.
33
EMC Dial-Home andRemote Support Built-In
• EMC Premium Support
• ESRS secure IP connection enabled for DCA racks
– Automatic dial home for DCA HW and SW failures
– 24x7 Remote technical support and trouble shooting
– Online support triggers FRU parts shipment
• Four hour on site support objective
EMC Support
FTPSOr
ESRS
34© Copyright 2011 EMC Corporation. All rights reserved.
EMC* makes no representation and undertakes no obligations with regard to product planning information, anticipated product characteristics, performance specifications, or anticipated release dates (collectively, “Roadmap Information”). Roadmap Information is provided by EMC as an accommodation to the recipient solely for purposes of discussion and without intending to be bound thereby.
34
Customer Support Services EMC Greenplum Warranty and Premium Maintenance
Premium Maintenance• Remote Technical Support
– 24x7 technical support and remote troubleshooting
– Customer-managed case severity level– Installation of platform operating system
updates• Onsite Support
– Installation of replacement parts– Four-hour response objective
• Proactive Service– Secure remote monitoring for hardware– Notification of engineering technical
advisories– Built-in tools maximize stability and
performance• Secure Self-Help
– 24x7 access to eService support tools including knowledgebase, forums, and appropriately licensed software updates
One year Limited HW Warranty • Secure Self-Help
– 24x7 access to eService support tools including knowledgebase, forums
• Remote Technical Support– Technical support and remote
troubleshooting during normal business hours
• Replacement parts shipped for next business day arrival
35© Copyright 2011 EMC Corporation. All rights reserved.
EMC* makes no representation and undertakes no obligations with regard to product planning information, anticipated product characteristics, performance specifications, or anticipated release dates (collectively, “Roadmap Information”). Roadmap Information is provided by EMC as an accommodation to the recipient solely for purposes of discussion and without intending to be bound thereby.
35
EMC Effect: Rapidly Expanding Portfolio
One Rack DCA High Capacity DCA
Total CPU Cores 192 192
Total Memory 768 GB 768 GB
Segment HDD 192 192
HDD Type 600GB SAS 2TB SATA
Usable Capacity (uncompressed) 36 TB 124 TB
Usable Capacity (compressed) 144 TB 496TB
Scan Rate 24GB/Sec 14GB/Sec
Data Load Rate 10TB/Hour 10TB/Hour
36© Copyright 2011 EMC Corporation. All rights reserved.
EMC* makes no representation and undertakes no obligations with regard to product planning information, anticipated product characteristics, performance specifications, or anticipated release dates (collectively, “Roadmap Information”). Roadmap Information is provided by EMC as an accommodation to the recipient solely for purposes of discussion and without intending to be bound thereby.
36
Data Computing Appliance (DCA)GP10
(1/4 Rack)
GP100 (1/2
Rack)
GP1000 (Full
Rack)
Master Servers 2 2 2
Segment Servers 4 8 16
Total CPU core 48 96 192
Total Memory 192 GB 384 GB 768 GB
Segment HDD’s (SAS) 48 96 192
Usable Capacity
(uncompressed)9 TB 18 TB 36 TB
Usable Capacity (compressed) 36 TB 72 TB 144 TB
Scan Rate 6 GB/Sec
12 GB/Sec
24 GB/Sec
Data Load Rate 2.5TB/Hr 5TB/Hr 10TB/Hr
• Purpose-built, highly scalable next generation data warehousing appliance
• Architecturally integrates database, compute, storage, and network into an enterprise-class, easy-to-implement system.
• Balanced for best price/performance ratio
• Available in quarter-, half-, three-quarter-, full-, and multi-rack configurations
37© Copyright 2011 EMC Corporation. All rights reserved.
EMC* makes no representation and undertakes no obligations with regard to product planning information, anticipated product characteristics, performance specifications, or anticipated release dates (collectively, “Roadmap Information”). Roadmap Information is provided by EMC as an accommodation to the recipient solely for purposes of discussion and without intending to be bound thereby.
37
High Capacity DCA• Suitable for large
data base customers with PB scalability in mind
• Increase the data capacity in a rack by three-times
• Reduced rack space, power, and cooling needs per unit data
• Lowest price-per-unit data warehouse appliance
• Available in quarter-, half-, three-quarter-, full-, and multi-rack configurations
GP10C (1/4
Rack)
GP100C (1/2
Rack)
GP1000C (Full
Rack)
Master Servers 2 2 2
Segment Servers 4 8 16
Total CPU core 48 96 192
Total Memory 192 GB 384 GB 768 GB
Segment HDD’s (SATA) 48 96 192
Usable Capacity (uncompressed) 31TB 62 TB 124 TB
Usable Capacity (compressed) 124 TB 248 TB 496TB
Scan Rate 3.5 GB/Sec
7 GB/Sec 14 GB/Sec
Data Load Rate 2.5 TB/Hr 5TB/Hr 10TB/Hr
38© Copyright 2011 EMC Corporation. All rights reserved.
EMC* makes no representation and undertakes no obligations with regard to product planning information, anticipated product characteristics, performance specifications, or anticipated release dates (collectively, “Roadmap Information”). Roadmap Information is provided by EMC as an accommodation to the recipient solely for purposes of discussion and without intending to be bound thereby.
38
Application Specific Configurations
Database Hadoop
EMC* makes no representation and undertakes no obligations with regard to product planning information, anticipated product characteristics, performance specifications, or anticipated release dates (collectively, “Roadmap Information”). Roadmap Information is provided by EMC as an accommodation to the recipient solely for purposes of discussion and without intending to be bound thereby.
39© Copyright 2011 EMC Corporation. All rights reserved.
EMC* makes no representation and undertakes no obligations with regard to product planning information, anticipated product characteristics, performance specifications, or anticipated release dates (collectively, “Roadmap Information”). Roadmap Information is provided by EMC as an accommodation to the recipient solely for purposes of discussion and without intending to be bound thereby.
39
Seamless Infrastructure Integration
Big Data Loading & Staging
Disaster Recovery
Storage Expansion
Data Protection
40© Copyright 2011 EMC Corporation. All rights reserved.
EMC* makes no representation and undertakes no obligations with regard to product planning information, anticipated product characteristics, performance specifications, or anticipated release dates (collectively, “Roadmap Information”). Roadmap Information is provided by EMC as an accommodation to the recipient solely for purposes of discussion and without intending to be bound thereby.
40
Seamless Infrastructure Integration
EMC Data DomainEfficient Backup & Restore
EMC VMAX SAN MirrorFor Advanced Storage
Management
Isilon Scale Out StorageFor Big Data Staging
EMC VMAX SRDFEMC Data Domain Replication
For Disaster Recovery
EMC* makes no representation and undertakes no obligations with regard to product planning information, anticipated product characteristics, performance specifications, or anticipated release dates (collectively, “Roadmap Information”). Roadmap Information is provided by EMC as an accommodation to the recipient solely for purposes of discussion and without intending to be bound thereby.
41© Copyright 2011 EMC Corporation. All rights reserved.
EMC* makes no representation and undertakes no obligations with regard to product planning information, anticipated product characteristics, performance specifications, or anticipated release dates (collectively, “Roadmap Information”). Roadmap Information is provided by EMC as an accommodation to the recipient solely for purposes of discussion and without intending to be bound thereby.
41
Efficient Backup/Restore withEMC Data Domain
• Data Domain deduplication is a great fit for Greenplum datasets
• Drastic reduction in backup storage requirement
• Backup all segment servers in parallel directly to Data Domain
• With Greenplum deduplication friendly compressed data streams, achieve effective backup rates up to 6TB/hr
42© Copyright 2011 EMC Corporation. All rights reserved.
EMC* makes no representation and undertakes no obligations with regard to product planning information, anticipated product characteristics, performance specifications, or anticipated release dates (collectively, “Roadmap Information”). Roadmap Information is provided by EMC as an accommodation to the recipient solely for purposes of discussion and without intending to be bound thereby.
42
P1 M1
DCA SAN Mirror• Default DCA configuration has
Segment Primaries and Segment Mirrors on internal storage
• SAN Mirror offloads Segment Mirrors to SAN storage
– Doubles effective capacity of a DCA– Foundation of SAN leverage– Seamless off-host backups– Data replication
• No performance impact– Primaries on internal storage– SAN sized for load and failed
segment server
P96 M96
… …EMC* makes no representation and undertakes no obligations with regard to product planning information, anticipated product characteristics, performance specifications, or anticipated release dates (collectively, “Roadmap Information”). Roadmap Information is provided by EMC as an accommodation to the recipient solely for purposes of discussion and without intending to be bound thereby.
H1 2011
43© Copyright 2011 EMC Corporation. All rights reserved.
EMC* makes no representation and undertakes no obligations with regard to product planning information, anticipated product characteristics, performance specifications, or anticipated release dates (collectively, “Roadmap Information”). Roadmap Information is provided by EMC as an accommodation to the recipient solely for purposes of discussion and without intending to be bound thereby.
43
GREENPLUM CHORUS
The World’s First
Enterprise Data Cloud
Platform
44© Copyright 2011 EMC Corporation. All rights reserved.
EMC* makes no representation and undertakes no obligations with regard to product planning information, anticipated product characteristics, performance specifications, or anticipated release dates (collectively, “Roadmap Information”). Roadmap Information is provided by EMC as an accommodation to the recipient solely for purposes of discussion and without intending to be bound thereby.
44
Building The Industry’s Only Complete Big Data Analytics “Stack”
Greenplum ChorusEnterprise Collaboration Platform for Data
Greenplum Database
Enterprise & Community Editions
World’s Most Scalable MPP Database Platform
Analytic Toolsets(Business Analytics, BI, Statistics, etc.)
Greenplum HD
Hadoop Enterprise & Community Editions
Enterprise Analytics Platform for Unstructured Data
Greenplum Data Computing AppliancesPurpose-built for Big Data Analytics
45© Copyright 2011 EMC Corporation. All rights reserved.
EMC* makes no representation and undertakes no obligations with regard to product planning information, anticipated product characteristics, performance specifications, or anticipated release dates (collectively, “Roadmap Information”). Roadmap Information is provided by EMC as an accommodation to the recipient solely for purposes of discussion and without intending to be bound thereby.
45
Greenplum Chorus
• Greenplum’s Enterprise Data Cloud Platform (EDC), enabling:
– Self-service provisioning– Data services– Collaborative analytics
• Customers deploy Chorus along with VMware and the Greenplum Database to create an agile and self-service analytic infrastructure
• Chorus can significantly accelerate the time and ease with which companies extract value and insight from their data
46© Copyright 2011 EMC Corporation. All rights reserved.
EMC* makes no representation and undertakes no obligations with regard to product planning information, anticipated product characteristics, performance specifications, or anticipated release dates (collectively, “Roadmap Information”). Roadmap Information is provided by EMC as an accommodation to the recipient solely for purposes of discussion and without intending to be bound thereby.
46
Spin up new projects rapidly with self-service provisioning.o Provision instances, both single-
node and multi-node.o Provision sandboxes as new
databases or schemas.o Import data easily from anywhere in
the cloud.
47© Copyright 2011 EMC Corporation. All rights reserved.
EMC* makes no representation and undertakes no obligations with regard to product planning information, anticipated product characteristics, performance specifications, or anticipated release dates (collectively, “Roadmap Information”). Roadmap Information is provided by EMC as an accommodation to the recipient solely for purposes of discussion and without intending to be bound thereby.
47
Data is now discoverable, self-documenting, and shared.o Browse schemas and explore data
with powerful search and visualization tools.
o Attach documents, ask questions, add comments, and build a living data dictionary.
o Define data sets, share them with the team, and schedule imports.
48© Copyright 2011 EMC Corporation. All rights reserved.
EMC* makes no representation and undertakes no obligations with regard to product planning information, anticipated product characteristics, performance specifications, or anticipated release dates (collectively, “Roadmap Information”). Roadmap Information is provided by EMC as an accommodation to the recipient solely for purposes of discussion and without intending to be bound thereby.
48
Create a collaborative environment for deep analytics on big data.o Create project workspaces with shared files,
data, documentation and workflows.o Execute workflows directly in the sandbox,
and then track changes to work and results over time.
o Control permissions to protect private data.o Publish functions and documentation, to
promote common standards and techniques.o Import functions from libraries of in-database
analytics functions.o Collaborate within projects, share information
across teams.
49© Copyright 2011 EMC Corporation. All rights reserved.
EMC* makes no representation and undertakes no obligations with regard to product planning information, anticipated product characteristics, performance specifications, or anticipated release dates (collectively, “Roadmap Information”). Roadmap Information is provided by EMC as an accommodation to the recipient solely for purposes of discussion and without intending to be bound thereby.
49
THANK YOU