Upload
doantu
View
224
Download
5
Embed Size (px)
Citation preview
●● Web portal, search engine in the Czech Republic● 30+ web services (news, email, media, listings…)● only Open Source technologies
●● PPC ads, Google AdWords competitor in Czech Republic
●● Software engineers, team leaders, database enthusiasts
● MySQL, HBASE, Hadoop, Analytics
● MySQL trainings, internal consultations
Who we are
2
Parquet file format
•
•
•
•
•
21
Row Group 1row 1 col 1row 2 col 1row 3 col 1row 4 col 1
row 1 col 2row 2 col 2row 3 col 2row 4 col 2
row 1 col 3row 2 col 3row 3 col 3row 4 col 3
\\\\
Row Group 2row 5 col 1
SQL Support
•
•
•
•
•
24
$ impala-shell
[impala1.test:21000] > USE db_example;
Query: USE db_example;
Database changed.
[impala1.test:21000] > SELECT * FROM example;
Query: SELECT * FROM example;
+-----+----------+
| Day | Audience |
+-----+----------+
| 20 | 122 |
+-----+----------+
| 21 | 129 |
+-----+----------+
2 rows in set (0.18 sec)
Impala Specific DDL
•
•
•
30
CREATE TABLE … PARTITIONED BY (column int)
CREATE TABLE … AS PARQUETCREATE TABLE … AS TEXTFILE SEPARATED BY “,”
COMPUTE STATS mytableREFRESH mytableINVALIDATE METADATA
Hadoop integration
•
2
HDFS Kudu
SparkMapReduce Tez
SqoopImport/export
HiveHQL
ImpalaSQL
HBaseNo SQL
StormStream
Hue / ODBC / ...
ImpalaSQL
Apache Kudu (incubating)
33
●
●
●
●
●
●
● Source: Cloudera Blog
Our use case
●○ Only for internal use, several queries per hour○ No client reports○ Billions of rows
●○ Group by (web, zone, position,..)○ Period (from one day up to all period)○ aggregated daily, weekly, yearly reports
●2
Yahoo! use case
●
●○ Asynchronous client report○ Around 15k request/hour, totally 6TB of data
●○ couldn’t handle the use case
●
2