Upload
doanthu
View
236
Download
0
Embed Size (px)
Citation preview
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
����❏
❏ER C M R Sa C A R
A R
❏ W C❖❖ ❖
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
eo l• # ( k eo i a
i a eoG c• b eoG
r• ( T GW
)
t keo ns����
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
2/ 39 : wS 2 e ej i zvL
) )2 r Ln d su R J. 3 T 9C
) ()2mL p dhs tLu J- 1 2 0 8 9C
a e S Rb I a WRIj i zvL p o L cmiu :CC ) DC 9E9 C 9 A
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
��������
• ( ) j nl Mcauv S b/. hiojgm 0 A 0 0 2 w W
• sS Mdb uv /. hiojgm uv t/. hiojgm uvf W b
• uv b My q prf ef z W b
• /. A 18 8 6 82 C A /. 828 6 8A C1:82: 0 08:01: 0 8A AC19 22 0 6 8 022 0 2 8 /. ,CA 6 0 08:01: 0
0 A 0 0 2 06 828 6 8 0 8 8 2:C 8 8A 2C 8A8 : 0A 0 A 8 0 CA06 2 0 6 A /. A 82 A 10A 2 08 8 0 8
0 C 0 8 : 2 0 6 A 8:: 1 10A C 02 C0: CA /. A 82 A 00 0 A 8 0 A 8
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
��������❏ C�&"�.6����
❖ ��)��� ���3>=�1G=�@8,?��AB❖ ��)��� DWH� ���1G=�@8�AB
❏ :75❖ ��+E0D�/���*�#���"'�!������!�A❖ DWH�DB Administrator �A❖ 2� AWS ���$��-F)%(�;4�9<� ���
Amazon S3 / AWS Glue / Amazon Redshift �
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
������❏ ���������������❏ ����������� #"��+"�&!�%�� '(
❏ ����������������$��� *��)������+"�&!� '(
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
��������
��� ������������
������ ����������
��� ��$���������! ���
���#� ��������"������� � ��
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
�����������
W eb and m obile data
Logs
Social M edia data
Stream ing data IO T data
Spreadsheets
Structured data Unstructured and Semi-structured data
������
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
���������Data Volume
������
1990 2000 2010 2020
Ge ne rat ed DataAvaila ble for Ana lysis������� ������
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
������������
Analysts
Business Users
Applications
Agile Real time
Flexible Scale
����� ��������
Data Scientists
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
+,*���������
�!���� ���
���������$/
���"��(�)%���� ��
#0.&�����-'OLTP ERP CRM LOB
Data Warehouse
Business Intelligence
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
R( )
SOLTP ERP CRM LOB
Data Warehouse
Business Intelligence
Data Lake
1001100001001010111001010101110010101000010111110110100011110010110010110
0100011000010
Devices Web Sensors Social
Data Catalog
Machine Learning
DW Queries
Big data processing Interactive Real-time
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
on AWS
D eU
SG an
t Bhd s
o s U c
m5 / 0 .5 2 00-
Snowball
Snowmobile KinesisData Firehose
KinesisData Streams
Amazon S3 AWS Glue
Redshift
EMR
AthenaKinesis
Elasticsearch Service
������ ��������
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
&�%)�!��+��
���/J�&�%�B"#'�<�78��H>����-E
&�%�FA��,@�K��$ (��6���-E
=9D�&�%��I5*� �!$#�C0��
�����.?�:��&�%G;13�42-E
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
���(8� ���4��! 1�/0��72����%6
( )) �.��� ����*,��': ���+������%6
5$����#)�-9�� ���&3���"%6
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
�������
,%��� ( >������!)'��&��$������� ��������*���-.>
�+#"��������� ( ������>
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
������������������ ������
à AWS Glue
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
AWS Glue �����
Z��+�('�%A�+�(��68$8 �4*):�<M�9T��"�,�PO�����"�,�@R��%(2�&���UL�"�,�K�G����;V
�
#�.�6%�=Q�?\��%!�5��>B�$30�MF���17/$3-8 ���?\�����CJ4'�%�X[��
��
LTS�+�(�WE�Y]������Z��+�('�%���� DH�IN����MF;V��
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
AWS Glue ����
��
Apache Hive �������,*
AWS����8.����
097��&�$'�
��
PySpark�Scala�6(��ETL����097�54�
<2������+=);
��
���%�1-);
3:����"�$'�
!��$'��#��1/);
Data Catalog Job Authoring Job Execution
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
AWS Glue Data Catalog ����$��������������')��&"(� ��$���Amazon RDS�Amazon Redshift�Amazon S3 �3�(…;,7D�20�9��H>�)��
❏ �����G:��������#� View�6A.C❏ � �%�1 �<+'�!B� ���4@?�')��❏ ���� &�(���� ���F-�=8��❏ Amazon Athena ���� Amazon Redshift Spectrum �JI�� ���!"��
❏ Apache Hive metastore������Amazon EMR �51���#'���&*�/E%��!����JI.C
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
������ on Amazon S3 with AWS Glue
O n prem ises data
W eb app data
Am azon R DS
O ther databases
Stream ing data
Your data
AMAZON QUICKSIGHT
AWS GLUE ETL
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Glue Data Catalog ���������
��������� Hive DDL ���������
Glue ������ API �����
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Glue Data Catalog ����������1
2 1 1
Data Catalog 1 .
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Crawler�����Crawler �M\Z� Data Catalog HV��KRQU :L��
❏ b��,�*)-!�aT ����,�* )$49��3*,�*�,�*�[Df WP��Data Catalog �+�07Y@ X=
❖ S�J��#)*2 classifier ec���,�* _g❖ Grok N ec��]M� classifier Y@�������
❏ R��,�* ^F��)$�1Y@ WP❖ )$�1�`< EP��+�07�.�(69 GR��❖ Amazon S3 ���,�*��Hive CN�/�+"'69 EP��
❏ dA�;�������)%(5�7�?���OI❖ &�.�8)����Crawler �OI �����>B ��
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Glue Data Catalog ������
Nested fields
�����
��� �
�������
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
���������������Crawler �'�#�!�-0�����'�#�F�� classifier �@9�KQ��62�&�*.���,#'�#� Data Catalog �I1��❏ classifier�'�#�)��+%(�L>��!��+�DC
❏ classifier �P4��OS�9���������=�3����H(0.0 � 1.0) �7��Crawler �OS�� �����MG
❏ ABN�� classifier �@9� Crawler �"%(�Crawler �+%$�����J5� A� classifier �?8��
❏ Glue � /�.�<�E:;�� classifier �RQ��������������� classifier ������ ���
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Crawler ����������
IAM Role
Glue Crawler
Data Lakes
Data Warehouse
Databases
AmazonRDS
AmazonRedshift
Amazon S3
JDBC Connection
Object Connection
Built-In Classifiers
MySQLMariaDB
PostreSQLOracle
Microsoft SQL ServerAmazon Aurora
Amazon Redshift
AvroParquet
ORCXML
JSON & BSONLogs
( A p a c h e ( G r o k ) , L in u x ( G r o k ) , M S ( G r o k ) , R u b y , R e d is , a n d m a n y o t h e r s )
Delimited(c o m m a , p ip e , t a b , s e m ic o l o n )
Compressions( Z IP , B Z IP , G Z IP , L Z 4 , S n a p p y )
���� classifier ����
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
���� classifier �����❏ Grok ��"�#2������%�5��:= �!�03���� �6,����� classifier �3'�����$7
❏ Grok���"��#4�1)�����.*����+<���/&8(�;19�-*
❏ Example:%{TIMESTAMP_ISO8601:timestamp} ¥[%{MESSAGEPREFIX:message_prefix}¥] %{CRAWLERLOGLEVEL:loglevel} :%{GREEDYDATA:message}
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
���� classifier1. ���� classifier ���� 2. Crawler ����� classifier ����
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Crawlers: ����������semi-structured unified schema
enumerateS3 objects
file 1
file 2
file N
…
struct
char
bool int
int
array
char int
identify file type and parse files
custom classifiers
Grok based parser
built-in classifiersJSON parser
CSV parserParquet parser
…
semi-structuredper-file schema
intchar
struct
char int
arraychar
int
array
intbool
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
���������
name: str id: num
Schema A root
addr
street: str city: str zip: num
name: str id: num
Schema B root
addr: str
Schema similarity heuristic§ ��� ���� +1 point§ ���� ���� +1 point§ sim > 0.7 ������
7
8.875
intersection
min(A,B)sim
à ���
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
������� �����
�����S3 ������
!� ����� ������"����������������� ��#����
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
��������
��������������������������
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
������ Import/Export
Apache Hive Metastore
Apache Hive Metastore
Import from an external metastore Export to an external metastore
AW S G LU E ETL AW S G LU E ETLAW S G LU E DATA C ATALO G
. :- . - / / / ./. . . . .
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
����� ��� AWS Glue �������������������������
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
����������
"%! �-+��#��&(�,����*)��� ���������������"%�'$�,
Amazon Athena ����&(�,(Amazon Athena �������� )
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Amazon Athena
,.1 .9�(+1�[`�Amazon S3 �MW������0�-�&$=�NICY
&$=DG�0�-AO�5�/!*;3@'�dMH7%�:.1�a_������ 1 &$=����cG� 30�90% U^�57%�:@+�HR�CY
ANSI SQL PF�"@-7#�+�JDBC/ODBC 2<"4�ZP0�-7%�:.1�AO�� �� \K� join Qb�VB
)�4>+��)�4�TW�Eb�[`�Amazon QuickSight(BI)�XJ�����
❏ ZP SQL�L_ �Amazon S3�0�-�]S��"@-<&/!8�&$=)�6+❏ TW���Eb��"@7<����?�2��0�-���
SQLQuery Instantly Pay per query Open Easy
$
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Amazon EMR
77��H30]=[LQ���0< ��.9�3;���IQ
_?E�EC2�1#&�<�!<��7��/'�<�!<����&���7<��db��O\?EVF���&�50-80% MG>^
Amazon S3 JW��%�!9���%�!� EMRFS�)�!� +�,.��2<��XT���4�Pe>^
Ra�BU2)��' Apache Hadoop & Apache Spark �DZ>^��6�!�#&�#0�*�'0:-�5(<���6�!"4�(<�`c
❏ 20 ���0< ��0:���&� ���6/8aS�C@AN�>^❏ Apache Spark�Apache Hive�Presto b AWS Glue Data Catalog �YK❏ �<!�06���9�'���47$�
$
Latest versions Low cost Use S3 storage Easy
Data Lake
100110000100101011100101010111001010100000111100101100101010001100001
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Amazon Redshift Spectrum❏ S3 ��+��� �����"���1>��
S3 data lakeAmazon Redshift data
Amazon Redshift Spectrumquery engine
S3 �=�����#�!�(� Amazon Redshift SQL ��)�:6/A
Redshift S3 ����� ��47
38)����!+���B�����*/A
-?����) $%��&,�C;5 @9:6<Parquet, ORC, Grok, Avro, CSV � %��&�!�=.
��',�� ��D�=����)02
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
������������ �������� ���������!�������
à Amazon Redshift Spectrum
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Amazon Redshift – Data Warehousing
/ SSD
Amazon S3
1 1,000
USDO
1/10
❏ 1/10❏ I ;
$
Fast at any scale InexpensiveOpen file formats Secure
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Amazon Redshift Spectrum (��)❏ S3 ��+��� �����"���1>��
S3 data lakeAmazon Redshift data
Amazon Redshift Spectrumquery engine
S3 �=�����#�!�(� Amazon Redshift SQL ��)�:6/A
Redshift S3 ����� ��47
38)����!+���B�����*/A
-?����) $%��&,�C;5 @9:6<Parquet, ORC, Grok, Avro, CSV � %��&�!�=.
��',�� ��D�=����)02
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Amazon Redshift Spectrum �������
LoadU nloadBackupRestore
S Q L C l ie n t s / B I T o o ls
J D B C / O D B C
Compute Node
Compute Node
Compute Node
L e a d e rN o d e
A m a z o n S 3
...1 2 3 4 N
A m a z o nR e d s h if tS p e c t r u m
;?C�+A��C0,������$❏ Leader Node
❖ SQL �)�"�)�❖ #�����@8❖ ?C��%3B����� ��
❏ Compute Node❖ (��&C0,��'��❖ ?C��%�2-❖ ��� load / unload / backup / restore
❏ Amazon Redshift Spectrum Node❖ Amazon S3 9��<5��%�2-❖ Redshift Spectrum �46�)��)���1>=*:�����!�����9���� ��%�.7>/
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Amazon Redshift Spectrum ��������
AmazonRedshift
JDBC/ODBC
...
1 2 3 4 N
Am azon S3Exabyte-scale object storage
D ata CatalogApache Hive Metastore
1
��4��NQSELECT COUNT(*)FROM S3.EXT_TABLEGROUP BY…
2
❖ ��4��4�%�*�)D�.&�/�#����8+�5
❖ ��'�$�7��5"(6�!��O�����Spectrum I 4��"(����<L
3 ��4�.38�H�8,1�(*�) JF
4
�8,1�(*�)� Data Catalog ��+�&� 28ES�BO(Dynamically prune partitions)
5 ;�8,1�(*�)��Amazon Redshift Spectrum I K��RG�4��"(�JF
6 Amazon Redshift SpectrumI�*�)�S3�'�$�"�08
7
Amazon Redshift SpectrumI'�$�A9�-�5$�!2�8���4�� 28�@>
8
Amazon Redshift�3"$�P�7��5'�"�D�!2�8�?CM���4�� 28�@>���
9 =:��3��8( T���
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
�� ������������(Schema on Read)
Data Catalog ����� Amazon Redshift �����������CREATE external schema archived_tripsfrom data catalog database 'sampledb' iam_role 'arn:aws:iam::123456789012:role/MySpectrumRole' region 'us-east-2’;
����������select * from svv_external_schemas
���� �����select * from svv_external_tables
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
����
❏ Amazon Redshift � AWS Glue ��� Data Catalog Amazon S3 ���!��"��%����������()�15 ��
❏()�4������*,� AWS Identity and Access Management (IAM) &�%�.-��15���
❏�����$����&�%��� ���Amazon Redshift �'2���#.-3�/��ARN(Amazon Resource Name) �&�%�+0��
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
������ on Amazon S3 with AWS Glue (��)
O n prem ises data
W eb app data
Am azon R DS
O ther databases
Stream ing data
Your data
AMAZON QUICKSIGHT
AWS GLUE ETL
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
������������ �������� �������� Redshift Spectrum �"!���#�����������������
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
�� ����� - 1 / 5
Amazon Redshift Spectrum ������������������������
❏ Redshift Spectrum ��?>���� Redshift �-"#���<A� 8>� �&�1���
❏)�/#�7@�6=7@��� �:���0(,�%�0$0!*�7@� Redshift Spectrum 9�3������.�5>��Redshift �-"#��7@�+'!$��;��42��
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
����� �� - 2 / 5
�������������– Apache Parquet ���
❏ Apache Parquet ��+�(M^.:�3<�$�+�(5+9�0;%72=%FG�>��]\@V�`LH.#�1),
❏ SVL_S3QUERY_SUMMARY *�/9�R�����-�*!&6=X��� Parquet . "9�K�I��S3 �B �[��CYN�4,8$'�AU���
❏ T� s3_scanned_rows � s3query_returned_rows ��� 2 ��4,8$'�QZ������CSV . "9�M^ ����W���Redshift Spectrum �� Redshift $7'(��P���+�(O_�D?S�JE����������
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
������ - 3 / 5
��� ����������– Parquet ��������� ��
❏ (�SQL��������"�!��"��4%-�3.�
❏ �� �),�������"���+� �*&����$/0��1'��������#2���:
SELECT query, segment, max(assigned_partitions) as total_partitions, max(qualified_partitions) as qualified_partitions FROM svl_s3partition WHERE query=<Query-ID> GROUP BY 1,2;
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
������ - 4 / 5
�����������������
❏ Amazon Redshift Spectrum '%1�K@A<EL��57� 2 ��3.2�D;8L❖'%13.2�'%1�� 1 (0$(��>G 10 �K@A<C�• ����'%1K@�A< !�� ������K@A<CM" • 9����!�K@A<C�����S3 #(&/4� (3*+CD: !
❖,�+3.2�,�+B�J?� ���� S3 #(&/4� '%1�IN ! �,�+)$-����C6� �• ��G�,�+)$-#FH� ���B:C�=��
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
����� �� - 5 / 5
Predicate pushdown ��������������������
❏ Amazon�Redshift Spectrum&�$��" �%��'��<:�SQL60������)=���������+=�A@��B)�❖ GROUP BY -�����>1C*5❖ ;(3/� LIKE ����!��'# �4.❖ COUNT/SUM/AVG/MIN/MAX/ �78�,92?*5❖ Regex_replace ;�*5
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
����� �� - 5 / 5 (Cont.)
Predicate pushdown ��������������������
❏ DISTINCT�ORDER BY����+*� SQL($��Amazon Redshift Spectrum�������� ������Amazon Redshift &"������%-�#'!�.�� ����%-�,��
/)�❖ DISTINCT � GROUP BY )� ��
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Amazon Redshift Spectrum 10 ���������https://aws.amazon.com/jp/blogs/news/10-best-practices-for-amazon-redshift-spectrum/
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
���
❏�����������$�"���'/���& � �32�+�2� -.1. ������ ����#)(�!4(�,%�+���-.
à AWS Glue
2. ����������������*���1���0 ����!4(�,%��-.à Amazon Redshift Spectrum
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
����������������� ���������� �������!
https://d1.awsstatic.com/whitepapers/Storage/data-lake-on-aws.pdf
Snowball
Snowmobile KinesisData Firehose
KinesisData Streams
Amazon S3 AWS Glue
Redshift
EMR
AthenaKinesis
Elasticsearch Service
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
����❏ AWS Glue
❖ https://aws.amazon.com/jp/glue/❏ AWS Glue �����
❖ https://aws.amazon.com/jp/glue/details/❏ AWS Glue � �����
❖ https://aws.amazon.com/jp/glue/developer-resources/
❏ Amazon Redshift❖ https://aws.amazon.com/jp/redshift/
❏ Amazon Redshift � �����❖ https://aws.amazon.com/jp/redshift/developer-resources/
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
���
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
���������� ��& S a• . / /
& :/ Q• W A• . -
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
/: / 88F S AR
8 .
���������������� ����������������������� ��
1 8 / @
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
������������� c S h W e
a :/ .- . / .-
: A :
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
• .. d r tc h e A jlm i
g . / .. . - /.
• Sou• na s:psW
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
10 -8237 - 8 /:86: 0 96:2 l pS
n ) 10 upks id 10 O re l a W
S cn ) 10 8 td t C 10 o
O zn ) 10 10 0d 10 0 996 ) m x
( n ) 92B : 8 A2 07688A /88 2 - 2 1 8 .:. 8 . 0 6
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
-:8 D C DK FDBF 8 EBF yS
) p -E O F 5 K F) z -E O F 1D KB 2BD 8N K E -E O F 128 ) p -:8R u g eiac o) z 9 K - MB) ) p vn ml koTs S-:8R u 0B K7 M N) z -:8 8 MB / K D A) p -E O F 1D KB / FK BF 8 MB 3 F K -E O F138 -:8 2 A K) ( z -:8 8N K E 4 F A) p 8 a a t faWb d er
A /88 7 7 2 - 2 1 8 7 .:. 87 . 0 7
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
�������� ����Q A )1
-Q A - )
Q QQ Q 1)
&- 1