Microsoft Analytics Platform System (APS)Modern Data Warehousing
James SerraBig Data EvangelistMicrosoft
Agenda• Traditional data warehouse & modern data warehouse• APS architecture• Hadoop & PolyBase• Performance and scale• Appliance benefits• Summarize/questions
The traditional data warehouse
5
… data warehousing has reached the most significant tipping point since its inception. The biggest, possibly most elaborate data management system in IT is changing. – Gartner, “The State of Data Warehousing in 2012”
Data sources
OLTP ERP CRM LOB
ETL
Data warehouse
BI and analytics
Will your current solution handle future needs?
How to “break” the traditional data warehouse
10
Data sources
OLTP ERP CRM LOB
ETL
Data warehouse
BI and analytics
Increasing data volumes
1
Real-time Performance/Data
2
Non-Relational Data
Devices
Web Sensors
Social
New data sources & types
3Cloud-born data
4
(IoT)
INFRASTRUCTURE
DATA MANAGEMENT & PROCESSING
DATA ENRICHMENT AND FEDERATED QUERY
BI & ANALYTICS
Self-service CollaborationCorporate PredictiveMobile
Extract, transform, load
Single query model Data quality Master data
management
Non-relationalRelational Analytical Streaming Internal & External
Data sources
OLTP ERP CRM LOB
Non-relational data
Devices
Web Sensors
Social
Modern data warehouse defined
Are you using or going to use “Big Data” and/or “Hadoop”
No or limited access to detailed data; can only
surface reports and cannot ask ad-hoc
questions.
Slow data loading performance cannot
keep up with the need for data from
transactional systems for intraday reporting.
MOLAP cube processing and data refresh take
too long.
Slow query performance with need
for constant tuning, especially with SAN
storage.
High cost of SAN storage chargeback.
Do you have any of these pain points?
Keep legacy investment
Buy new tier one hardware appliance
Acquire big data solution (Hadoop)
Acquire business intelligence solution
Roadblocks to evolving to a modern data warehouse
Limitedscalability & ability to handle new data
types
Significant training & still
siloed
High acquisition/ migrationcosts & no
Hadoop
Complex with low adoption
Solution and issue with that solution
Introducing the Microsoft Analytics Platform SystemYour turnkey modern data warehouse appliance
Next-generation performance at scale
Enterprise-ready big data
Engineered foroptimal value
• Relational and non-relational data in a single appliance
• Or, integrate relational data with non-relational data in an external Hadoop cluster on premise or data stored in the Cloud (hot, warm, cold)
• Enterprise-ready Hadoop
• Integrated querying across Hadoop and APS using T-SQL (PolyBase)
• Direct integration with Microsoft BI tools such as Power BI
• Near real-time performance with In-Memory
• Scale-out to accommodate your growing data or to increase performance (2-nodes to 56-nodes)
• Remove SMP DW bottlenecks with MPP SQL Server• No rip and replace when more
performance needed• No performance tuning
required
• Concurrency that fuels rapid adoption
• Industry’s lowest DW price/TB
• Value through a single appliance solution
• Value with flexible hardware options using commodity hardware
• Free up space on SAN (cost averages 10k per TB)
Hardware appliance vendor offerings
Hardware and software engineered togetherThe ease of an appliance
Co-engineered with HP, Dell, and Quanta best practices
Leading performance with commodity hardware
Pre-configured, built, and tuned software and hardware
Integrated support plan with a single Microsoft contact
PDW
HDInsight
PolyBase
Social and web analytics
Live data feeds
Advanced analytics
APS History• DatAllegro started in 2003• Microsoft acquires DatAllegro in September 2008• PDW released in December 2010 (version 1)• Version 2 made available in March, 2013 (PolyBase introduced)• AU1 released in April 2014. Renamed from Parallel Data Warehouse (PDW) to Analytics Platform
System (APS). It still includes the PDW region as well as a new HDInsights/Hadoop region• AU2 was released in July 2014• AU3 released in October 2014
There will be AU updates every 3-4 months.
NOTE: This is a Data Warehouse solution and not an OLTP (online transaction processing) solution.
Case studies: Go to https://customers.microsoft.com and enter "parallel data warehouse" (old name) in the keyword box and search the results, then enter "analytics platform system“ (new name)
Parallelism
• Uses many separate CPUs running in parallel to execute a single program
• Shared Nothing: Each CPU has its own memory and disk (scale-out)
• Segments communicate using high-speed network between nodes
MPP - Massively Parallel
Processing
• Multiple CPUs used to complete individual processes simultaneously
• All CPUs share the same memory, disks, and network controllers (scale-up)
• All SQL Server implementations up until now have been SMP• Mostly, the solution is housed on a shared SAN
SMP - Symmetric
Multiprocessing
APS Logical Architecture (overview)“Compute” node Balanced
storageSQL
“Compute” node Balanced storage
SQL
“Compute” node Balanced storage
SQL
“Compute” node Balanced storage
SQL
DMS
DMS
DMS
DMS
Compute Node – the “worker bee” of APS• Runs SQL Server 2014 APS • Contains a “slice” of each database• CPU is saturated by storage
Control Node – the “brains” of the APS• Also runs SQL Server 2014 APS • Holds a “shell” copy of each database
• Metadata, statistics, etc• The “public face” of the appliance
Data Movement Services (DMS)• Part of the “secret sauce” of APS• Moves data around as needed• Enables parallel operations among the
compute nodes (queries, loads, etc)
“Control” nodeSQL
DMS
APS Logical Architecture (overview)
“Compute” node Balanced storage
SQL“Control” nodeSQL
“Compute” node Balanced storage
SQL
“Compute” node Balanced storage
SQL
“Compute” node Balanced storage
SQL
DMS
DMS
DMS
DMS
DMS
1) User connects to the appliance (control node) and submits query
2) Control node query processor determines best *parallel* query plan
3) DMS distributes sub-queries to each compute node
4) Each compute node executes query on its subset of data
5) Each compute node returns a subset of the response to the control node
6) If necessary, control node does any final aggregation/computation
7) Control node returns results to userQueries running in parallel on a subset of the data, using separate pipes effectively making the pipe larger
APS Data Layout Options“Compute” node Balanced
storageSQL
Balanced storage
Balanced storage
Balanced storage
“Compute” nodeSQL
“Compute” nodeSQL
“Compute” nodeSQL
DMS
DMS
DMS
DMS
Time DimDate Dim IDCalendar YearCalendar QtrCalendar MoCalendar Day
Store DimStore Dim ID
Store NameStore MgrStore Size
Product DimProd Dim ID
Prod CategoryProd Sub CatProd Desc
Customer Dim
Cust Dim IDCust NameCust AddrCust PhoneCust Email
Sales FactDate Dim IDStore Dim IDProd Dim IDCust Dim IDQty SoldDollars Sold
TD
PD
SD
CD
TD
PD
SD
CD
TD
PD
SD
CD
TD
PD
SD
CD
Sale
s Fac
t
Replicated
Table copied to each compute node
DistributedTable spread across compute nodes based on “hash”
Star Schema
FactSales_A
FactSales_B
FactSales_C
FactSales_D
FactSales_E
FactSales_F
FactSales_G
FactSales_H
FactSales_A
FactSales_B
FactSales_C
FactSales_D
FactSales_E
FactSales_F
FactSales_G
FactSales_H
FactSales_A
FactSales_B
FactSales_C
FactSales_D
FactSales_E
FactSales_F
FactSales_G
FactSales_H
FactSales_A
FactSales_B
FactSales_C
FactSales_D
FactSales_E
FactSales_F
FactSales_G
FactSales_H
FactSales_A
FactSales_B
FactSales_C
FactSales_D
FactSales_E
FactSales_F
FactSales_G
FactSales_H
FactSales_A
FactSales_B
FactSales_C
FactSales_D
FactSales_E
FactSales_F
FactSales_G
FactSales_H
FactSales_A
FactSales_B
FactSales_C
FactSales_D
FactSales_E
FactSales_F
FactSales_G
FactSales_H
FactSales_A
FactSales_B
FactSales_C
FactSales_D
FactSales_E
FactSales_F
FactSales_G
FactSales_H
FactSales_A
FactSales_B
FactSales_C
FactSales_D
FactSales_E
FactSales_F
FactSales_G
FactSales_H
FactSales_A
FactSales_B
FactSales_C
FactSales_D
FactSales_E
FactSales_F
FactSales_G
FactSales_H
FactSales_A
FactSales_B
FactSales_C
FactSales_D
FactSales_E
FactSales_F
FactSales_G
FactSales_H
FactSales_A
FactSales_B
FactSales_C
FactSales_D
FactSales_E
FactSales_F
FactSales_G
FactSales_H
FactSales_A
FactSales_B
FactSales_C
FactSales_D
FactSales_E
FactSales_F
FactSales_G
FactSales_H
FactSales_A
FactSales_B
FactSales_C
FactSales_D
FactSales_E
FactSales_F
FactSales_G
FactSales_H
FactSales_A
FactSales_B
FactSales_C
FactSales_D
FactSales_E
FactSales_F
FactSales_G
FactSales_H
FactSales_A
FactSales_B
FactSales_C
FactSales_D
FactSales_E
FactSales_F
FactSales_G
FactSales_H
FactSales_A
FactSales_B
FactSales_C
FactSales_D
FactSales_E
FactSales_F
FactSales_G
FactSales_H
FactSales_A
FactSales_B
FactSales_C
FactSales_D
FactSales_E
FactSales_F
FactSales_G
FactSales_H
DATA DISTRIBUTION CREATE TABLE FactSales
(ProductKey INT NOT NULL ,OrderDateKey INT NOT NULL ,DueDateKey INT NOT NULL ,ShipDateKey INT NOT NULL ,ResellerKey INT NOT NULL ,EmployeeKey INT NOT NULL ,PromotionKey INT NOT NULL ,CurrencyKey INT NOT NULL ,SalesTerritoryKey INT NOT NULL ,SalesOrderNumber VARCHAR(20) NOT NULL,
) WITH (
DISTRIBUTION = HASH(ProductKey),
CLUSTERED INDEX(OrderDateKey) ,
PARTITION(OrderDateKey RANGE RIGHT FOR
VALUES ( 20010601, 20010901,
) ) );
Control Node
…Compute Node 1
Compute Node 2
Compute Node X
Send Create Table SQL to each compute nodeCreate Table FactSales_ACreate Table FactSales_BCreate Table FactSales_C……Create Table FactSales_H
FactSalesA
FactSalesB
FactSalesC
FactSalesD
FactSalesE
FactSalesF
FactSalesG
FactSalesH
FactSalesA
FactSalesB
FactSalesC
FactSalesD
FactSalesE
FactSalesF
FactSalesG
FactSalesH
FactSalesA
FactSale B
FactSalesC
FactSalesD
FactSalesE
FactSalesF
FactSalesG
FactSalesH
Create table metadata on Control Node
APS – Balanced across servers and within
41
Largest Table 600,000,000,000
Randomly distributed across 40 compute nodes (5 racks) 15,000,000,000
In each server randomly distributed to 8 tables (so 320 total tables)
1,875,000,000
Each partition – 2 years data partitioned by week (benefiting queries by date)
18,028,846As an end user or DBA you think about 1 table: LineItem.
“Select * from LineItem” is split into 320 queries running in parallel against 320 (1.875b row) tables.
“Select * from LineItem where OrderDate = ‘1/1/2014’ is 320 queries against 320 (18m row) tables.
You don’t care or need to know that there are actually 320 tables representing your 1 logical table.
CCI can add further performance via segment elimination.
InfinibandInfinibandEthernetEthernet
Control NodeFailover Node
Microsoft Storage Spaces 1
Compute Node 1Compute Node 2
Microsoft Storage Spaces 2
Compute Node 3Compute Node 4
Microsoft Storage Spaces 3
Compute Node 5Compute Node 6
Microsoft Storage Spaces 4
Compute Node 7Compute Node 8
CustomerUse
Base Unit (6U):• Redundant Infiniband• Redundant Ethernet• Mgmt & Control (Active)• Rack Failover Node (Passive)
Base Unit (7U):• 2 HP 1U Servers
• (16 Cores/Ea. Total: 32)• Microsoft Storage Spaces 5U
• 1TB Drives• User Data Capacity: 75TB
Scale Unit (7U):• 2 HP 1U Servers
• (16 Cores/Ea. Total: 32)• Microsoft Storage Spaces 5U
• 1TB Drives• User Data Capacity: 75TB
¼ Rack15TB
(Uncompressed)
1/2 Rack30TB
(Uncompressed)
Customer Space (8U)• ETL Servers (Landing zone)• Backup Servers• Passive Unit (Additional spares)
Scale Unit (7U):• 2 HP 1U Servers
• (16 Cores/Ea. Total: 32)• Microsoft Storage Spaces 5U
• 1TB Drives• User Data Capacity: 75TB
Scale Unit (7U):• 2 HP 1U Servers
• (16 Cores/Ea. Total: 32)• Microsoft Storage Spaces 5U
• 1TB Drives• User Data Capacity: 75TB
Full Rack60TB (Uncom
pressed)
InfinibandInfinibandEthernetEthernet
Failover Node
Microsoft Storage Spaces 5
Compute Node 9Compute Node 10
Microsoft Storage Spaces 6
Compute Node 11Compute Node 12
Microsoft Storage Spaces 7
Compute Node 13Compute Node 14
Microsoft Storage Spaces 8
Compute Node 15Compute Node 16
CustomerUse
Extension Base Unit (5U):• Redundant Infiniband• Redundant Ethernet• Rack Failover Node (Passive)
Extension Base Unit (7U):• 2 HP 1U Servers
• (16 Cores/Ea. Total: 32)• Microsoft Storage Spaces 5U
• 1TB Drives• User Data Capacity: 75TB
Scale Unit (7U):• 2 HP 1U Servers
• (16 Cores/Ea. Total: 32)• Microsoft Storage Spaces 5U
• 1TB Drives• User Data Capacity: 75TB
1¼ Rack
75.5TB (Uncom
pressed)
Customer Space (9U)• ETL Servers• Backup Servers• Passive Unit (Additional spares)
Scale Unit (7U):• 2 HP 1U Servers
• (16 Cores/Ea. Total: 32)• Microsoft Storage Spaces 5U
• 1TB Drives• User Data Capacity: 75TB
Scale Unit (7U):• 2 HP 1U Servers
• (16 Cores/Ea. Total: 32)• Microsoft Storage Spaces 5U
• 1TB Drives• User Data Capacity: 75TB
3 Rack181.2TB (Uncom
pressed)
1 1/2 Rack90.6TB
(Uncompressed)
2 Rack120.8TB (Uncom
pressed)
InfinibandInfinibandEthernetEthernet
Failover Node
Microsoft Storage Spaces 9
Compute Node 17Compute Node 18
Microsoft Storage Spaces 10
Compute Node 19Compute Node 20
Microsoft Storage Spaces 11
Compute Node 21Compute Node 22
Microsoft Storage Spaces 12
Compute Node 23Compute Node 24
CustomerUse
Extension Base Unit (5U):• Redundant Infiniband• Redundant Ethernet• Rack Failover Node (Passive)
Extension Base Unit (7U):• 2 HP 1U Servers
• (16 Cores/Ea. Total: 32)• Microsoft Storage Spaces 5U
• 1TB Drives• User Data Capacity: 75TB
Scale Unit (7U):• 2 HP 1U Servers
• (16 Cores/Ea. Total: 32)• Microsoft Storage Spaces 5U
• 1TB Drives• User Data Capacity: 75TB
Customer Space (9U)• ETL Servers (Landing zone)• Backup Servers• Passive Unit (Additional spares)
Scale Unit (7U):• 2 HP 1U Servers
• (16 Cores/Ea. Total: 32)• Microsoft Storage Spaces 5U
• 1TB Drives• User Data Capacity: 75TB
Scale Unit (7U):• 2 HP 1U Servers
• (16 Cores/Ea. Total: 32)• Microsoft Storage Spaces 5U
• 1TB Drives• User Data Capacity: 75TB
HP Configuration
• 2 – 56 compute nodes (32-896 cores)
• 1 – 7 racks
• 1, 2, or 3 TB drives
• 15TB – 1.2PB uncompressed
• 75TB – 6PB User data (5:1)
• Up to 7 spare nodes available across the entire appliance
• Dual Infiband: 56Gbps
Details
Next-generation performance at scale
Enterprise-ready big data
Engineered foroptimal value
Microsoft Analytics Platform SystemYour turnkey modern data warehouse appliance
Advanced Analytics Defined
Analytics ExampleDescriptive: How many of our customers left in the last month? How many of these customers where profitable?
Diagnostic: Why did these profitable customers leave?
Predictive: How many profitable customers are likely to leave next month?
Prescriptive: How can we reduce this profitable customer churn rate?
What is Hadoop?
Microsoft Confidential
61
Distributed, scalable system on commodity HW
Composed of a few parts: HDFS – Distributed file system MapReduce – Programming model Other tools: Hive, Pig, SQOOP, HCatalog,
HBase, Flume, Mahout, YARN, Tez, Spark, Stinger, Oozie, ZooKeeper, Flume, Storm
Main players are Hortonworks, Cloudera, MapR
WARNING: Hadoop, while ideal for processing huge volumes of data, is inadequate for analyzing that data in real time (companies do batch analytics instead)
Core Services
OPERATIONAL SERVICES
DATASERVICES
HDFS
SQOOP
FLUME
NFS
LOAD & EXTRACT
WebHDFS
OOZIE
AMBARI
YARN
MAP REDUCE
HIVE &HCATALOGPIG
HBASEFALCON
Hadoop Clustercompute
&storage . . .
. . .
. .compute
&storage
.
.
Hadoop clusters provide scale-out storage and distributed data processing on commodity hardware
Move HDFS into the warehouse before analysis
HDFS (Hadoop) ETL
WarehouseHDFS (Hadoop)
Learn new skills
TSQL
Build Integrate ManageMaintainSupport
Complex query and analysis with big data todaySteep learning curve, slow and inefficient
Hadoop ecosystem
“New” data sources
Devices
Web Sensor Social
“New” data sources“New” data sources
Devices
Web Sensor Social
APS delivers enterprise-ready Hadoop with HDInsightManageable, secured and highly available Hadoop integrated into the appliance
High performance tuned within the appliance
End-user authentication with Active Directory
Accessible insights for everyone with Microsoft BI tools
Managed and monitored using System Center
100% Apache Hadoop
SQL ServerParallel DataWarehouse
Microsoft HDInsight
PolyBase
Leverage your existing TSQL skills
Additional features over a separate Hadoop cluster
Plus one support contact still!
Parallel Data Warehouse region
HDInsight region
Fabric
Hardware
Appl
ianc
e
A region is a logical container within an appliance
Each workload contains the following boundaries:• Security • Metering • Servicing
APS appliance overview
Select… Result set Provides a single T-SQL query model (“semantic layer”) for APS and Hadoop with rich features of T-SQL, including joins without ETL
Uses the power of MPP to enhance query execution performance
Supports Windows Azure HDInsight to enable new hybrid cloud scenarios
Provides the ability to query non-Microsoft Hadoop distributions, such as Hortonworks and Cloudera
Use existing SQL skillset, no IT intervention
Query Hadoop data with T-SQL using PolyBaseBringing the worlds or big data and the data warehouse together for users and IT
SQL ServerParallel DataWarehouse
Cloudera CHD Linux 5.1Hortonworks HDP 2.2 (Windows, Linux)
Windows AzureHDInsight (HDP 2.2) (WASB)
PolyBase
Microsoft HDInsightHDP 2.0
Query re la t i ona l + non re l a t i ona l
Others (SQL Server, DB2, Oracle)? True federated query engine
Use cases where PolyBase simplifies using Hadoop dataBringing islands of Hadoop data together
High performance queries against Hadoop data(Predicate pushdown)
Archiving data warehouse data to Hadoop (move)(Hadoop as cold storage)
Exporting relational data to Hadoop (copy)(Hadoop as backup/DR, analysis,
cloud use)Importing Hadoop data into data warehouse (copy)(Hadoop as staging area, sandbox, Data Lake)
Big data insights for anyoneNative Microsoft BI integration to create new insights with familiar tools
Tools like Power BI minimize ITintervention for discovering dataT-SQL for DBA and power users to join relational and Hadoop data
Hadoop tools like map-reduce, Hive and Pig for data scientists
Leverages high adoptionof Excel, Power View, Power Pivot, and SSAS
Power Users
Data Scientist
Everyone else using Microsoft BI tools
Next-generation performance at scale
Enterprise-ready big data
Engineered foroptimal value
Microsoft Analytics Platform SystemYour turnkey modern data warehouse appliance
Scale-out Massively Parallel Processing (MPP) parallelizes queries (speed-driven not just capacity-driven)
Multiple nodes with dedicated CPU, memory, storage “shared-nothing”
Incrementally add HW for near-linear scale to multi-PB (no need to delete older data, stage)
Handles query complexity and concurrency at scale
No “forklift” of prior warehouse to increase capacity
Start small with a few terabyte warehouse Mixed workload support: Query while
you load (250GB/hour per node). No need for maintenance window
Scaling out relational data to petabytesScale-out technologies in the Analytics Platform System
91
PDW
0TB 6PB
PDW or HDInsight
PDW or HDInsight
PDW or HDInsight
PDW or HDInsight
PDW or HDInsight
PDW or HDInsight
Blazing fast performanceMPP and In-memory columnstore for next-generation performance
• Store data in columnar format for massive compression
• Load data into or out of memory for next-generation performance
• Updateable and clustered for real-time trickle loading
• No secondary indexes required
92
Up to 100x faster queries
Updatable clustered columnstore vs. table with customary indexing
Up to 15xmore compression
Columnstore index representation
C1
C3
C5
C4
C2
C6
Parallel query execution
Query
Results
Investment firm Before/After Results - HPSMP vs APS
21x improvement loading data (7:30 minutes vs 21 seconds)
62x improvement staging to landing (30 minutes vs 29 seconds)17x, 166x,
169x query performance improvement (1:05 hour vs 23 seconds)
Microsoft BI tools work unchanged
1.1 TB/hr loading time, 8.8x compression (2 billion rows) (472GB to 53GB)
46x improvement creating datamart (70 minutes vs 1:31 minutes)
BI Tools
Reporting and cubes
SQL Server SMP (Spoke)
Concurrency that fuels rapid adoptionGreat performance with mixed workloads
Analytics Platform SystemETL/ELT with SSIS, DQS,
MDS
ERP CRM LOB APPS
ETL/ELT with DWLoader
Hadoop / Big Data
PDW
HDInsight
PolyBase
Ad hoc queries
Intra-Day
Near real-time
Fast ad hoc
Columnstore
Polybase
CRTAS
“Link Table”
Real-Time
ROLAP / MOLAP DirectQuery
SNAC
Stream Analytics
TransformIngest
Example overall data flow and Architecture
Web logs
Present & decide
IoT, Mobile Devices etc.
Social Data
Event Hubs HDInsight
Azure Data Factory
Azure SQL DB
Azure Blob Storage
Azure Machine Learning
(Fraud detection etc.)
Power BI
Web dashboards
Mobile devices
DW / Long-term storage
Predictive analytics
Event & data producers
Analytics Platform Sys.
Next-generation performance at scale
Enterprise-ready big data
Engineered foroptimal value
Microsoft Analytics Platform SystemYour turnkey modern data warehouse appliance
APS provides the industry’s lowest DW appliance price/TBReshaped hardware specs through software innovation Price per terabyte for leading vendors (Sept 2014) Significantly lower
price per TB than the closest competitor
Lower storage costs with Windows Server 2012 Storage Spaces
Small cost gap between multiple clustered HP DL980's with SAN vs APS 1/4 rack
Oracle Pivotal IBM Teradata Microsoft $-
$20,000
$40,000
$60,000
$80,000
$100,000
$120,000
$140,000 TCO per TB (uncompressed):
Virtualized architecture overview
Host 2
Host 1
Host 3
Host 4
Economical disk
storageIB andEthernet
Direct attached SAS
Base UnitCTL
MAD
AD
VMM
Compute 2
Compute 1
• APS engine• DMS Manager• SQL Server 2012 Enterprise Edition (APS build) (AU3: SQL
2014)
Software details• All hosts run Windows Server 2012 Standard
(AU3: 2012 R2) and Windows Azure Virtual Machines
• Fabric or workload in Hyper-V Virtual Machines
• Fabric virtual machine, management server (MAD01), and control server (CTL) share one server
• APS agent that runs on all hosts and all virtual machines
• DWConfig and Admin Console • Windows Storage Spaces and Azure Storage
blobs• Does not require expertise in Hyper-V or
Windows
APS High-Availability
X XCompute Host 1
Compute Host 2
XControl Host
Failover Host
Infin
iban
d 1
Ethe
rnet
1
Infin
iban
d 2
Ethe
rnet
2
XXXFAB AD VMM MAD CTL
Compute 2 VM
Compute 1 VMCompute 1 VMIn
finib
and
1
Ethe
rnet
1
• No Single Point-Of-Failure• No need for SQL Server
Clustering
Less DBA Maintenance/Monitoring• No index creation• No deleting/archiving data to save space• Management simplicity (System Center, Admin console, DMVs)• No blocking• No logs• No query hints• No wait states• No IO tuning• No query optimization/tuning• No index reorgs/rebuilds• No partitioning• No managing filegroups• No shrinking/expanding databases• No managing physical servers• No patching servers and software
RESULT: DBA’s spend more of their time as architects and not baby sitters!
The no-compromise modern data warehouse solution Microsoft’s turn-key modern data warehouse appliance Analytics Platform System
Microsoft
• Improved query performance• Faster data loading• Improved concurrency• Less DBA maintenance• Limited training needed• Use familiar BI tools• Ease of appliance
deployment• Mixed workload
support
• Improved data compression• Scalability• High availability• PolyBase• Integration with cloud-
born data• HDInsight/Hadoop
integration• Data warehouse
consolidation• Easy support model
Summary of Benefits
Bold = benefits of APS over upgrading to SQL Server 2014, no worry about future hardware roadblocks
© 2013 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
Questions?
James [email protected]
Blog about PDW topics: http://www.jamesserra.com/archive/category/pdw/
Microsoft Confidential
H1 CY2015APS Appliance Update 4 APS Appliance Update
5
Analytics Platform System
Microsoft Confidential—Preliminary Information. Dates and capabilities subject to change. Microsoft makes no warranties, express or implied.
Enterprise-ready big data – cloud enabled• Improved PolyBase Support
• Cloudera 5.1 Support• Partial Aggregate Pushdowns
• Expanding Big Data capacity • Grow HDInsight region on an
appliance with an existing region
Next-gen performance & engineered for optimal value• 1.5X data return rate for SELECT *
queries• Streaming large data sets for external
apps (e.g., SSAS, SAS, R, etc.)
Next-gen performance & engineered for optimal value• TSQL Compatibility
• Scalar UDFs (CREATE Function)• SQL Server SMP to APS (SQL
Server MPP) Migration Utility• Bulk load / BCP through SQL
Server command-line tools• OEM Hardware Refresh (HP Gen 9)
• HP ProLiant DL360 Gen9 Server w/2x Intel Haswell Processors, 256 GB (16x16Gb) 2133MHz memory
• HP 5900 series switches (HA improvements)
Symmetry between DW On-Prem and Azure• Backup from SQL Server/APS• Hybrid APS to Azure Disaster Recovery
T-SQL Compat:Reduced friction DW upsizing from SQL Server to APS
Appliance Hardware• Heterogeneous server hardware
generation support (e.g. mixed racks)• Polybase (Parquet support, String
filter pushdown to Hadoop, MapR support, Kerberos Support)
H2 CY2015