Upload
others
View
24
Download
0
Embed Size (px)
Citation preview
Building PetaByte Warehouses with Unmodified PostgreSQL
Emmanuel Cecchet
Member of Research Staff
May 21st, 2009
Topics
Introduction to frontline data warehouses
PetaByte warehouse design with PostgreSQL
Aster contributions to PostgreSQL
Q&A
2 PGCon 2009, Ottawa © 2009 Aster Data Systems
Enterprise Data Warehouse Under Stress
EnterpriseData Warehouse
EnterpriseData Warehouse
PGCon 2009, Ottawa © 2009 Aster Data Systems 3
Offloading The Enterprise Data Warehouse
Enterprise Data Warehouse
Enterprise Data Warehouse
Frontline Data Warehouse
Frontline Data Warehouse
Archival Data Warehouse
Archival Data Warehouse
• Petabytes detailed data• Low cost/TB• Flexible compression• On-line access• Aging out to off-line
• Petabytes source data• 24 x7 Availability• TB/Day Load capacity• In-database transforms• Rapid data access
PGCon 2009, Ottawa © 2009 Aster Data Systems 4
�� �� ��� ���������� ���� ����� ������������ ��� ������� ���������������� ����!����� ��� "�����!��� ����� �� ��� #$%������ �� ��������� ��&� ���'�������� ������� ��&� ���($) *�+ �,�� � $���+ -.�������� �� � ��/����� 0�1 �� ��� 2�� �������Requirements for Frontline Data Warehouses
5 PGCon 2009, Ottawa © 2009 Aster Data Systems
Who is Aster Data Systems?
Aster nCluster is a software-only RDBMS for large-scale frontline data warehousing
High performance “Always Parallel” MPP architecture
High availability “Always On” on-line operations
High value analytics In-Database MapReduce
Low cost Petabytes on commodity HW
6 PGCon 2009, Ottawa © 2009 Aster Data Systems
� ��������� � �� � �� �� � ��� � ��� �������� �� ���0��� ����� !"#!�$%�&�' �����()*+, ������� !"#!�$%�&�' �����()*+, �� -./ %%�� 01 * 2%31 !+�', #3�-./ %%�� 01 * 2%31 !+�', #3�MySpace Frontline Data Warehouse4 5677689 :;<9=> ? @ABC5 D<E FGHIJKLMKNJNO PJQRST UVWXYZ[4 5677689 :;<9=> ? @ABC5 D<E FGHIJKLMKNJNO PJQRST UVWXYZ[\�] _̂ `a �̂ bc_c dc �� e] f��ghhi jklmno pqhh rs tuvutwrx7 PGCon 2009, Ottawa © 2009 Aster Data Systems
Topics
Introduction to frontline data warehouses
PetaByte warehouse design with PostgreSQL
Aster contributions to PostgreSQL
Q&A
8 PGCon 2009, Ottawa © 2009 Aster Data Systems
Petabyte Datawarehouse Design
PostgreSQL as a building blockDo not hack PostgreSQL to serve the distributed database
Build on top of mainline PostgreSQL
Use standard Postgres APIs
Service Oriented ArchitectureHierarchical query planning and optimization
Shared nothing replication treating Postgres and Linux as a
service
Compression at the OS level transparently to PostgreSQL
In-database Map-Reduce out-of-process
9 PGCon 2009, Ottawa © 2009 Aster Data Systems
Aster nCluster Database
10 PGCon 2009, Ottawa © 2009 Aster Data Systems
Queries/AnswersQueries/Answers
Queen Server Group
QueriesQueries
DataData
Worker Server Group
Loader/Exporter Server Group
Aster nCluster Database
Queen Node�QKQ��O O�OL��� �JK�M�N�QLMJKO� O���Q�QKR ���J� QKRMK�IJJ�RMKQL�O �N��M�OWorker Node� ������ ������� ������ ������� ��������������� �������� ��!���������"#����� ����������� ���� ��� ��������� $��%����� ��� ��%�� &��'���Loader Node(����)� ��$ ���� )�� *+, �� -� ' .�����/�������� �%�� ���� ���� ����������� ��01����2�����3��� �%� ��01����� ���� ������ &��'���
Query Processing: How It Works
A: Users send queries to the Queen (via ACT, ODBC/JDBC, BI tools, etc.)
B: The Queen parses the query, creates an optimal distributed plan, sends subqueriesto the Workers, and supervises the processing
C: Workers execute their subqueries (locally or via distributed communication)
D: The Queen aggregates Worker results and returns the final result to the user
��C � �%&��&5� C887>�F5���F5� �%&��& �%&��& �%&��&….
�� ������ �� ��� � � �� ��#, 32� ��11 PGCon 2009, Ottawa © 2009 Aster Data Systems
Table Compression Architecture
How It Works���� �� �� !"#��#$ % $#�� !"#��#$ &#'�( '�)���' $���&��#*��"#$ �+ $�,,#"#+� ��&'#�!��#� $#!#+$�+) �+ �� !"#����+ '#-#'Architecture Benefits: .�)/ �� !"#����+ "����� 0&�))#" &'��1 ��2#� 3�#'$ �"# ���� ��-�+)�4����&��# �"�+�!�"#+� 0#��# �, ,5�5"# 6���)"#� 5!)"�$#�47�+�5""#+�3 0/�)/8!#",�" �+�# 5'��8��&'# �� !"#����+46#",�" �+�#9 :;#(8"�(< =5#"�#� $�+>� +##$ ,5'' ��&'# $#�� !"#����+
Logical Database (eg. Schema)
Physical Tablespaces?@ABCDEEFGDHIJA?@ABCDEEFGDHIJA?@ABCDEEFGDHIJA?@ABCDEEFGDHIJA?@ABCDEEFGDHIJA?@ABCDEEFGDHIJA ?@ABCDEEF KILM?@ABCDEEF KILM ?@ABCDEEFN@O?@ABCDEEFN@O?@ABCDEEFN@O?@ABCDEEFN@O?@ABCDEEF KILM?@ABCDEEF KILM?@ABCDEEF KILM?@ABCDEEF KILM?@ABCDEEF KILM?@ABCDEEF KILM?@ABCDEEF KILM?@ABCDEEF KILM?@ABCDEEF KILM?@ABCDEEF KILM?@ABCDEEF KILM?@ABCDEEF KILM P@Q?@ABCDEEDHP@Q?@ABCDEEDH ?@ABCDEEF KILM?@ABCDEEF KILM?@ABCDEEF KILM?@ABCDEEF KILM?@ABCDEEF KILM?@ABCDEEF KILMRS TUVWXX YWZRS TUVWXXQuery?@ABCDEEFGDHIJA?@ABCDEEFGDHIJA?@ABCDEEFGDHIJA?@ABCDEEFGDHIJA ?@ABCDEEF KILM?@ABCDEEF KILM?@ABCDEEF KILM?@ABCDEEF KILM
12 PGCon 2009, Ottawa © 2009 Aster Data Systems
Table Compression Enables Powerful Archival
Compressed - High
Compressed - Medium
Compressed - Low
Not Compressed
Older data accessed less frequentlyCompress to save space and costOldest data is compressed the most, recent is
compressed the leastCompressed tables are fully available for
queries (true online archival )
13 PGCon 2009, Ottawa © 2009 Aster Data Systems
What is “MapReduce” and Why Should I Care?
What is MapReduce?
Popularized by Google
http://labs.google.com/papers/mapreduce.html
Processes data in parallel across distributed cluster
Why is MapReduce significant?
Empowers ordinary developers
Write application logic, not debug cluster communication code
Why is In-Database MapReduce significant?
Unites MapReduce with SQL: power invoked from SQL
Develop SQL/MR functions with common languages
14 PGCon 2009, Ottawa © 2009 Aster Data Systems
Aster In-Database MapReduce
Slide 15
Aster nClusterAster nCluster
Data Store Engine
In-Database MapReduce
SQL/MR Functions
PGCon 2009, Ottawa © 2009 Aster Data Systems 15
Users, analysts, applications
In-Database MapReduce
Extensible framework (MapReduce + SQL)
•Flexible: Map-Reduce expressiveness, languages, polymorphism
•Performance: Massively parallel, computational push-down
•Availability: Fault isolation, resource management
Out-of-process executables
•Does not use PL/* for custom code execution
•Can execute Map and Reduce functions in any language that has a runtime on Linux (e.g. Java, Python, C#, C/C++, Ruby, Perl, R,
etc…)
•Standard PostgreSQL APIs to send/receive data to executables
•Fault isolation, security and resource management for arbitrary user code
PGCon 2009, Ottawa © 2009 Aster Data Systems 16
Live QueriesLive Queries
Always On: Minimize Planned Downtime
Backup &Restore
Backup &Restore
DataBackup
DataBackup
Add CapacityAdd CapacityRebalance DataRebalance Data
Load & ExportLoad & Export
17 PGCon 2009, Ottawa © 2009 Aster Data Systems
Precision Scaling
When more CPU/memory/capacity are needed, new nodes can be added for scale-out.
Precision Scaling uses standard PostgreSQL APIs to migrate vWorker partitions to new nodes either for load balancing (more compute power) or capacity expansion
Example: Assume Workers 1/2/3 are 100% CPU-bottlenecked. Incorporation adds a new Worker4 node and migrates over vWorker partitions D/H/L. As a result of load-balancing, CPU-utilization drops to 75% per node, eliminating hotspots.
��[��[S ��[��[� ��[��[�����W � ��V ����� �� ����� �� � � ��[��[���� �� ����� �� � � ….
��� �� � ���� �� � ���� �� ����� �� � � ��� �� � ���� �� ����� �� � ���� �� � � ��� �� � ���� �� � ���� �� � ���� �� � ���� �� � ���� �� � �18 PGCon 2009, Ottawa © 2009 Aster Data Systems
Replication���������� ���0���� ��� ��"� �� ���� �������� ��� ��� ���0����"��� �1Q� �Q�LMLMJK � �J����O� MO ���M�QL�R QL �QOL JK�� JK QKJL�� KJR��K L� � �KL J� KJR� �QMN��� ���M�QLMJK ��JL��LO Q�QMKOL RQLQ JOO QKR O�OL��JNLQ��O���� ��� ��� ������ ��� ����� ��� � ���0������QKO�Q��KL LJ �JOL���O�M� J� ��P QKR �QLQ OM��MK�PMKN� �� �Q�MMLM�O LJ �Q�LN�� RQLQ �QK��O19 PGCon 2009, Ottawa © 2009 Aster Data Systems
Fault Tolerance & Automatic Online Failover
Replication Failover
Automatic, non-disruptive, graceful performance impact
Replication Restoration
Delta-based (fast) and online (non-disruptive)
� !"#!$ � !"#!% � !"#!&'(##) *+, -./012��� �� ����� �� � ���� �� � �3��� �� � �3 ��� �� ��3��� �� ����� �� � ���� �� ��3 ��� �� � �3��� �� ��3��� �� � ���� �� � � � !"#!4 � !"#!5��� �� � �3��� �� � �3��� �� ����� �� � � ��� �� � �3��� �� ��3��� �� � ���� �� ����� �� ��3 ��� �� � �320 PGCon 2009, Ottawa © 2009 Aster Data Systems
Using Commodity Hardware
….
….….
������V � �� ��[����M�Q�ML L��K�L �����IMO�J� NKM���� �L��������V � ��[��[X���NLM�IJ�� I� O � �NLM�� I� O ��� O�� ����MOL�M�NL�R ���J�� ���� ���� �������������� �MO� ��M �O � Q��MK� ���� �Q�Q�ML������� ��� ���� �NK� �L������W ����X ����[�VW� ����[��[ ����X ����[�������[ ����X ��[�WX��[���� ��W��V X����QQ�� ����J��QK����M�� �QKQ��Q�����M�� � QMQ����PJ��OL �I�
21 PGCon 2009, Ottawa © 2009 Aster Data Systems
Scaling OnScaling On--Demand to a Demand to a PetaBytePetaByte
Commodity Hardware
2 TB Building Block
Dell, HP, IBM, Intel x86
16 GB Memory
2.4TB of Storage
8 Disks
$5k to $10k Node�� ��7!>=<E "8#<More Blocks = More Power
Massive Power Per Rack
160 Cores
640 GB RAM
48 TB SAS
22 PGCon 2009, Ottawa © 2009 Aster Data Systems
Heterogeneous Hardware Support
Heterogeneous HW support enables customers to add cheaper/faster nodes to existing systems for investment protection
Mix-n-match different servers as you grow (faster CPUs, more memory, bigger disk drives, different vendors, etc)
��[��[ ��[��[ ��[��[ ��[��[...
����W � ��V ��...
�8E�<E....
CPU
Memory
Disk
23 PGCon 2009, Ottawa © 2009 Aster Data Systems
Topics
Introduction to frontline data warehouses
PetaByte warehouse design with PostgreSQL
Aster contributions to PostgreSQL
Q&A
24 PGCon 2009, Ottawa © 2009 Aster Data Systems
Error Logging in COPY
Separate good from malformed tuples
Per tuple error information
Errors to capture
• Type mismatch (e.g. text vs. int)
• Check constraints (e.g. int < 0)
• Malformed chars (e.g. invalid UTF-8 seq.)
• Missing / extra column
Low-performance overhead
Activated on-demand using environment variables
25 PGCon 2009, Ottawa © 2009 Aster Data Systems
Error Logging in COPY
26 PGCon 2009, Ottawa © 2009 Aster Data Systems
Detailed error context is logged along with tuple content
Error Logging Performance
• 1 million tuples COPY performance
27 PGCon 2009, Ottawa © 2009 Aster Data Systems
Auto-partitioning in COPY
COPY into a parent table route tuples
directly in the child table with matching
constraints
COPY y2008 FROM ‘data.txt’
PGCon 2009, Ottawa © 2009 Aster Data Systems 28
Auto-partitioning in loading
Activated on-demand
• set tuple_routing_in_copy = 0/1;
Configurable LRU cache size
• set tuple_routing_cache_size = 3;
COPY performance into parent table similar to
direct COPY in child table if data is sorted
Will leverage partitioning information in the
future (WIP for 8.5)
PGCon 2009, Ottawa © 2009 Aster Data Systems 29
Other contributions
Temporary tables and 2PC transactions
[Auto-]Partitioning infrastructure
Regression test suite
LFI (http://lfi.sourceforge.net/)
•Fault injection at the library level or below
•out-of-memory conditions, network connection errors, interrupted system calls, data corruption, hardware failures, etc…
•Lightning talk on Friday!
PGCon 2009, Ottawa © 2009 Aster Data Systems 30
Topics
Introduction to Aster and data warehousing
PetaByte warehouse design with PostgreSQL
Aster contributions to PostgreSQL
Q&A
31 PGCon 2009, Ottawa © 2009 Aster Data Systems
PetaByte Warehouses with PostgreSQL
Unmodified PostgreSQL
“Always Parallel” MPP architecture
“Always On” on-line operations
In-Database MapReduce
PetaByte on commodity Hardware
32 PGCon 2009, Ottawa © 2009 Aster Data Systems
Learn more
www.asterdata.com
Free TDWI report on advanced analytics:
• asterdata.com/mapreduce
Free Gartner webcast on mission-critical DW:
• asterdata.com/gartner
Contact us
Aster Data Systems
33 PGCon 2009, Ottawa © 2009 Aster Data Systems
Bonus slides
34 PGCon 2009, Ottawa © 2009 Aster Data Systems
nCluster Components
������ ���������� ��������������� ��� ������1 �/���&'5� C887>� FG=G �6969�DE8�EG�>������������ �� ��� / � � ���������!+, + "&�)+&+, #%!!+, + "&�)+&+, #%! # )%&, # )%&,
nCluster
� � �$ �� � %� �& �� � � �'���( �35 PGCon 2009, Ottawa © 2009 Aster Data Systems
Aster Loader
• )8G# 5G7G9*69�+,-../01/2 34567 8494 :79: ;57497 4 8494 954<:=75 >75=?5@4<;7 A?99B7<7;CDEF.GHIF02 JKB9L>B7 3?4875: 457 @4>>78 9? MN?5C75: ?< N?5C75 <?87:D O=975 >459L9L?<L<6P 3?4875QLBB >454BB7B B?48 K<LRK7 8494 L<9? MN?5C75:DS/0/TIH2 3L<745BU :;4B4AB7 B?48 >75=?5@4<;7 V=5?@ 3?4875: 9? N?5C75:WScalable Partitioning+,-../01/2 X459L9L?<L<6 @K:9 A7 :;4B4AB7 4<8 <?9 L@>787 9Y7 B?48L<6 >5?;7::EF.GHIF02 Z4;Y 3?4875 ;?<94L<: 4 [X459L9L?<75\P QYL;Y K:7: 4< 4B6?5L9Y@ 9? 4::L6< 8494 L<9?[AK;C79:\D Z4;Y AK;C79 L: K<LRK7BU @4>>78 9? QL9YL< 9Y7 ;BK:975DS/0/TIH2 ]4:9P L<97BBL67<9 >459L9L?<L<6 8K5L<6 @4::LM7^:;4B7 8494 B?48:• _G!7= `G9#769�+,-../01/2 ]4LB78 <?87: @4U B?:7 8494 ?5 :L6<L=L;4<9BU 85?> B?48 >75=?5@4<;7DEF.GHIF02 a= 4 <?87 =4LB:P ?9Y75 :9574@: ;?<9L<K7 9? B?48 Vbcd >75=?5@4<;7 YL9WD O8@L<: ;4< 74:LBU57;?M75 B?:9 8494 AU 57^B?48L<6 eKBC ]77875 8494DS/0/TIH2 f494 B?:: >5?97;9L?< g >75=?5@4<;7 ;?<:L:97<;U 8K5L<6 <?87 =4LBK57D
36 PGCon 2009, Ottawa © 2009 Aster Data Systems