53
基礎から学ぶ 超並SQLエンジンImpala 野 智聡 | Cloudera 株式会社

基礎から学ぶ超並列SQLエンジンImpala #cwt2015

Embed Size (px)

Citation preview

  • SQLImpala

    | Cloudera

  • 2 2015 Cloudera, Inc. All rights reserved.

    ( ) Customer Operations Engineer()

  • 3 2015 Cloudera, Inc. All rights reserved.

    Impala 2.0 Roadmap

  • 4 Cloudera, Inc. All rights reserved.

    Impala

  • 5 Cloudera, Inc. All rights reserved.

    Cloudera ImpalaHadoopSQL h>p://impala.io/

  • 6 Cloudera, Inc. All rights reserved.

    ImpalaHDFS HBase

    Hive ODBC / JDBC Kerberos / LDAP CDH() Cloudera / Oracle / MapR / Amazon

  • 7 Cloudera, Inc. All rights reserved.

    Impala Hive

    Hadoop

    ImpalaHive-

  • 8 Cloudera, Inc. All rights reserved.

    Impala -> SQL -> Hive -> (: nested type)

  • 9 Cloudera, Inc. All rights reserved.

    SQL on Hadoop Impala

    JDBC/ODBC BI/ (: Tableau, Zoomdata, MicroStrategy, QlikView, SAS)

    SQL Hive(MapReduce/Spark)

    ETL SparkSQL

    Spark SQL

    CDH5.4Hive on Spark/SparkSQL

  • 10 Cloudera, Inc. All rights reserved.

  • 11 Cloudera, Inc. All rights reserved.

    Impala

    impalad

    catalogd Statestore

    impala-shell(command line Client) ODBC / JDBC

    ODBC / JDBC

    SQL App

    Hive

    Metastore HDFS NN

    State Store catalogd HDFS DataNode

    Query Exec Engine

    Query Coordinator

    Query Planner

    impalad

  • 12 Cloudera, Inc. All rights reserved.

    Impala Daemon (impalad)HDFSDataNode impalad impalad

    impalad

    HDFS DataNode

    Query Exec Engine

    Query Coordinator

    Query Planner

    impalad

    HDFS DataNode

    Query Exec Engine

    Query Coordinator

    Query Planner

    impalad

    HDFS DataNode

    Query Exec Engine

    Query Coordinator

    Query Planner

    impalad

  • 13 Cloudera, Inc. All rights reserved.

    Catalog Service (catalogd) impaladHDFSHive

    impaladDDLHiveMetastore

    Hive Metastore HDFS NN

    State Store catalogd

    HDFS DataNode

    Query Exec Engine

    Query Coordinator

    Query Planner

    impalad

    DDL

    BlockHive

  • 14 Cloudera, Inc. All rights reserved.

    StateStore 1 Impalad

    catalogd

    HDFS DataNode

    Query Exec Engine

    Query Coordinator

    Query Planner

    impalad State Store

    /

  • 15 Cloudera, Inc. All rights reserved.

    Impala

    HDFS DN

    Query Exec Engine

    Query Coordinator

    Query Planner

    HBase HDFS DN

    Query Exec Engine

    Query Coordinator

    Query Planner

    HBase HDFS DN

    Query Exec Engine

    Query Coordinator

    Query Planner

    HBase

    ODBC / JDBC

    SQL App

    HDFS DataNode HDFS DataNode HDFS DataNode

  • 16 Cloudera, Inc. All rights reserved.

    Impala

    HDFS DN

    Query Exec Engine

    Query Coordinator

    Query Planner

    HBase HDFS DN

    Query Exec Engine

    Query Coordinator

    Query Planner

    HBase HDFS DN

    Query Exec Engine

    Query Coordinator

    Query Planner

    HBase

    ODBC / JDBC

    SQL App

    SQL

    HDFS DataNode HDFS DataNode HDFS DataNode

  • 17 Cloudera, Inc. All rights reserved.

    Impala

    HDFS DN

    Query Exec Engine

    Query Coordinator

    Query Planner

    HBase HDFS DN

    Query Exec Engine

    Query Coordinator

    Query Planner

    HBase HDFS DN

    Query Exec Engine

    Query Coordinator

    Query Planner

    HBase

    ODBC / JDBC

    SQL App

    impalad

    HDFS DataNode HDFS DataNode HDFS DataNode

  • 18 Cloudera, Inc. All rights reserved.

    Impala

    Query Exec Engine

    Query Coordinator

    Query Planner

    Query Exec Engine

    Query Coordinator

    Query Planner

    HDFS DataNode

    Query Exec Engine

    Query Coordinator

    Query Planner

    ODBC / JDBC

    SQL App

    HDFS (JOIN)

    HDFS DataNode HDFS DataNode

  • 19 Cloudera, Inc. All rights reserved.

    Impala

    HDFS DN

    Query Exec Engine

    Query Coordinator

    Query Planner

    HBase HDFS DN

    Query Exec Engine

    Query Coordinator

    Query Planner

    HBase HDFS DN

    Query Exec Engine

    Query Coordinator

    Query Planner

    HBase

    ODBC / JDBC

    SQL App

    impalad

    HDFS DataNode HDFS DataNode HDFS DataNode

  • 20 Cloudera, Inc. All rights reserved.

    Impala

    HDFS DN

    Query Exec Engine

    Query Coordinator

    Query Planner

    HBase HDFS DN

    Query Exec Engine

    Query Coordinator

    Query Planner

    HBase HDFS DN

    Query Exec Engine

    Query Coordinator

    Query Planner

    HBase

    ODBC / JDBC

    SQL App HiveQL

    HDFS DataNode HDFS DataNode HDFS DataNode

  • 21 Cloudera, Inc. All rights reserved.

    Disk

    MapReduceDisk

    Impala

  • 22 Cloudera, Inc. All rights reserved.

  • 23 Cloudera, Inc. All rights reserved.

    UDF ()UDF UDAF() Impala C++ UDF Java Hive UDF Python UDF

    h>ps://github.com/cloudera/impyla

  • 24 Cloudera, Inc. All rights reserved.

  • 25 Cloudera, Inc. All rights reserved.

    100 10

    10 1

    1000 GB

    100 GB

    Group A

    Group B

  • 26 Cloudera, Inc. All rights reserved.

    Impala (Authenecaeon)

    Kerberos/LDAP

    (Authorizaeon) Sentry(HDFS)

    (Audit) Cloudera Navigator

  • 27 Cloudera, Inc. All rights reserved.

    /I/O I/O

    bzip2

    : :

  • 28 Cloudera, Inc. All rights reserved.

    Parquet Impala

    I/O Impalasnappy

  • 29 Cloudera, Inc. All rights reserved.

    HBaseImpala

    Impala HBase External systems

    put SELECT * FROM hbase_tbl

    INSERT / INSERT VALUES get, scan

    put/getHadoopNoSQL

    ImpalaHBase HDFS

    HBase

  • 30 Cloudera, Inc. All rights reserved.

    Kudu ParquetHDFSKudu

    CDH 5.4

  • 31 Cloudera, Inc. All rights reserved.

    2.0

  • 32 Cloudera, Inc. All rights reserved.

    Impala 2.0(CDH5.2)

    Disk(Disk spill)

    SQL 2003Window(RANK, LAG) Where (VARCHAR, CHAR) (VAR_SAMP, VAR_POP)

  • 33 Cloudera, Inc. All rights reserved.

    Impala 2.1(CDH5.3)

    StateStore

  • 34 Cloudera, Inc. All rights reserved.

    Impala 2.2(CDH5.4)

    Amazon S3(unsupported)

    Cloudera Navigator

  • 35 Cloudera, Inc. All rights reserved.

    Roadmap

  • 36 Cloudera, Inc. All rights reserved.

    2015

    Nested type()

    EMC Isilon

  • 37 Cloudera, Inc. All rights reserved.

    2015/2016

    LlamaYARN

  • 38 Cloudera, Inc. All rights reserved.

    2016

    20 (mulecore join/runeme/HW)

    (nested type/UDF)

    /

    (Disk Spill) SQL

  • 39 Cloudera, Inc. All rights reserved.

  • 40 Cloudera, Inc. All rights reserved.

    Cloudera Impala HadoopSQL

    BI/

  • 41 Cloudera, Inc. All rights reserved.

    Impala

  • 42 Cloudera, Inc. All rights reserved.

    Impala4WebUI Hue

    QuickStartVM

    Cloud Cloudera Live

    Cloudera Manager

  • 43 Cloudera, Inc. All rights reserved.

    HueHue HP h>p://gethue.com/ Hue Demo site h>p://demo.gethue.com/ Query Editors Hive/Impala

  • 44 Cloudera, Inc. All rights reserved.

    QuickStartVMDownload site h>p://www.cloudera.com/content/www/en-us/downloads/quickstart_vms/5-4.html VMCDH Cloudera Manager(default )8-10GB

  • 45 Cloudera, Inc. All rights reserved.

    Cloudera LiveWeb site h>p://www.cloudera.com/content/www/en-us/get-started/cloudera-live.html Cloud (AWS) (m4.xlarge x 4) Tableau/Zoomdata(m4.xlarge +1)60 AWS

  • 46 Cloudera, Inc. All rights reserved.

    Cloudera Manager

    root(TUI)

    Readme

    OS

    $ curl -O h>p://archive.cloudera.com/cm5/installer/latest/cloudera-manager-installer.bin $ chmod 755 cloudera-manager-installer.bin $ sudo ./cloudera-manager-installer.bin

    $ sudo ./cloudera-manager-installer.bin --i-agree-to-all-licenses --noprompt --noreadme

  • 47 Cloudera, Inc. All rights reserved.

    Cloudera Manager

    2 3

    Cloudera Manager CDH

    1

  • 48 Cloudera, Inc. All rights reserved.

  • 49 Cloudera, Inc. All rights reserved.

    ImpalaDocument h>p://www.cloudera.com/content/www/en-us/documentaeon/enterprise/latest/topics/impala.html Impala() Engineer Blog h>p://blog.cloudera.com/ Cloudera Blog. Impala()

  • 50 Cloudera, Inc. All rights reserved.

    CDH ()[email protected]

    Cloudera ()http://community.cloudera.com/10%

  • 51 Cloudera, Inc. All rights reserved.

    Hadoop Hadoop h>p://gihyo.jp/admin/serial/01/how_hadoop_works gihyo.jp Impala201512-20161

  • 52 Cloudera, Inc. All rights reserved.

    We are hiring!

    [email protected]

  • Thank you.