Hadoop hive

  • View
    70

  • Download
    5

Embed Size (px)

Text of Hadoop hive

  1. 1. Apache Hive
  2. 2. Agenda What is Apache Hive How to Setup Tutorial Examples
  3. 3. Hive Hive Introduction Hive is a data warehouse infrastructure built on top of hadoop Compile SQL queries as MapReduce jobs and run on hadoop HDFS for storage data warehouse (DW) is a database specific for analysis and reporting purposes HiveQL JDBC ODBC Thrift Server Driver MetaStore web cli
  4. 4. Hadoop RDB : Hive Hive = HadoopRDB SQL( SQL MapReduce) SQL SQL 4 See : http://www.slideshare.net/Avkashslide/introduction-to-apache-hive-18003322
  5. 5. Hive Performance See : http://hortonworks.com/blog/pig-performance-and-optimization-analysis/
  6. 6. Hive .. CLI WebUI API JDBC and ODBC Thrift Server (hiveserver) Client API HiveQL Metastore DB, table, partition 6 figure Source : http://blog.cloudera.com/blog/2013/07/how-hiveserver2-brings-security-and-concurrency-to-apache-hive
  7. 7. ( setup ) http://archive.cloudera.com/cdh5/cdh/5/hive-0.13.1-cdh5.3.2.tar.gz ~/.bashrc conf/hive-env.sh metastore on hdfs pig shell export JAVA_HOME=/usr/lib/jvm/java-7-oracle export HADOOP_HOME=/home/hadoop/hadoop export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop export HIVE_HOME=/home/hadoop/hive export PATH=$PATH:$HIVE_HOME/bin $ hive hive> export HADOOP_HOME=/home/hadoop/hadoop export HIVE_CONF_DIR=/home/hadoop/hive/conf hadoop fs -mkdir -p /user/hive/warehouse hadoop fs -chmod g+w /tmp hadoop fs -chmod g+w /user/hive/warehouse Ps : mysql metastore_db hive-site.xml
  8. 8. 8 $ hive hive> create table A(x int, y int, z int) hive> load data local inpath file1 into table A; hive> select * from A where y>10000 hive> insert table B select * from A where y>10000 figure Source : http://hortonworks.com/blog/stinger-phase-2-the-journey-to-100x-faster-hive/
  9. 9. HiveSQL Hive RDMS HQL SQL HDFS Raw Device or Local FS MapReduce Excutor NO YES Index, Bigmap index 9 Source : http://sishuok.com/forum/blogPost/list/6220.html
  10. 10. : : DBtable csv Hadoop HDFS, .. : PIGMapReduce SQL DB : 10 sql server Hive
  11. 11. Hive 11 A1 12.5 HiveQL > create table A (nm String, dp String, id String); > create table B (id String, dt Date, hr int); > create table final (dp String, id String , nm String, avg float); > load data inpath file1 into table A; > load data inpath file2 into table B; > Insert table final select a.id, collect_set(a.dp), collect_set(a.nm), avg(b.hr) from a,b where b.hr > 8 and b.id = a.id group by a.id; nm dp Id id dt hr A1 A1 7/7 13 B1 A1 7/8 12 B2 A1 7/9 4 Tips : local hive load input file 1 byte create table & load data tool
  12. 12. : cd ~; git clone https://github.com/waue0920/hadoop_example.git cd ~/hadoop_example/hive/ex1 hadoop fs -put *.txt ./ hive -f exc1.hive : exc1.hive select * from table_name : hive> select * from final Q : ? Q : ?
  13. 13. table CREATE TABLE page_view( viewTime INT, userid BIGINT, page_url STRING, referrer_url STRING, ip STRING ) COMMENT 'This is the page view table' PARTITIONED BY(dt STRING, country STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY , tblproperties ("skip.header.line.count"="1"); STORED AS TEXTFILE LOCATION '/user/data/staging/page_view'; DROP TABLE pv_users; ALTER TABLE old_table_name REPLACE COLUMNS (col1 TYPE, ...); Create Table Alter Table Drop Table
  14. 14. data LOAD DATA INPATH '/user/data/pv_2008-06-08_us.txt' INTO TABLE page_view PARTITION(date='2008-06-08', country='US') INSERT OVERWRITE TABLE xyz_com_page_views SELECT page_views.* FROM page_views WHERE page_views.date >= '2008-03-01' AND page_views.date = '2008-03-01' AND page_views.date Ex1 : top 5 Ex2 : top 5 181~210 54700000 6531~60 28500000 121~150 11600000 181~210 213981 54700000 6531~60 129398 28500000 121~150 100112 11600000 http://hive.3du.me/
  15. 21. Reference Hive https://cwiki.apache.org/confluence/display/Hive/ Tutorial Hive http://hive.3du.me/