Performance evaluation of cloudera impala 0.6 beta with comparison to Hive

  • View
    5.165

  • Download
    0

Embed Size (px)

Text of Performance evaluation of cloudera impala 0.6 beta with comparison to Hive

  • 1. Cloudera impala 0.6 beta Performance Evaluation (with Comparison to Hive)Mar. 6, 2013 CELLANT Corp. R&D Strategy Division Yukinori SUDA @sudabon 1Copyright CELLANT Corp. All Rights Reserved.http://www.cellant.jp/

2. Cloudera impala 0.6 betav ChangeLogs from 0.5 beta v Cloudera Manager 4.5 and CDH 4.2 support Impala 0.6. v Support for the RCFile le format. v Added support for Impala on SUSE and Debian/Ubuntu.vRHEL5.7/6.2 and Centos5.7/6.2vSUSE 11 with Service Pack 1 or latervUbuntu 10.04/12.04 and Debian 6.03 2Copyright CELLANT Corp. All Rights Reserved. http://www.cellant.jp/ 3. System Environmentv Install via Cloudera Manager Free Edition 4.5.0 MasterSlaveDataNodeDataNodeDataNodeDataNode Active TaskTracker TaskTracker TaskTracker TaskTracker NameNode Impalad Impalad Impalad ImpaladDataNodeDataNodeDataNodeDataNodeStand-by TaskTracker TaskTracker TaskTracker TaskTracker NameNode Impalad Impalad Impalad ImpaladDataNode JobTracker DataNodeDataNode TaskTracker statestored TaskTracker TaskTracker Impalad Impalad Impalad 3 Servers11 ServersAll servers are connected with 1Gbps Ethernet through an L2 switch3Copyright CELLANT Corp. All Rights Reserved.http://www.cellant.jp/ 4. Server SpecicationvCPU l Intel Core 2 Duo 2.13 GHz with Hyper ThreadingvMemory l 4GBvDisk l 7,200 rpm SATA mechanical Hard Disk DrivevOS l Cent OS 6.2 4Copyright CELLANT Corp. All Rights Reserved. http://www.cellant.jp/ 5. Benchmarkv Use CDH4.2.0 + impala version 0.6 betav Use hivebench in open-sourced benchmark tool HiBench l https://github.com/hibenchv Modied datasets to 1/10 scale l Default conguration generates table with 1 billion rowsv Modied query sentence l Deleted INSERT INTO TABLE to evaluate read-only performancev Combines a few Hive storage format with a few compression method l TextFile, SequenceFile, RCFile l No compression, Gzip, Snappyv Comparison with job query latencyv Average job latency over 5 measurements5Copyright CELLANT Corp. All Rights Reserved.http://www.cellant.jp/ 6. Modied Datasets Uservisits table Rankings table 100 million rows 12 million rows Table Denitions Table Denitions sourceIPstring pageURL string destURL string pageRankint visitDate string avgDuration int adRevenue double userAgent string countryCode string languageCodestring searchWordstring durationint6Copyright CELLANT Corp. All Rights Reserved.http://www.cellant.jp/ 7. Modied QuerySELECT ONsourceIP,(R.pageURL = NUV.destURL)sum(adRevenue) as totalRevenue,group by sourceIPavg(pageRank) order by totalRevenue DESCFROM limit 1;rankings_t RJOIN (SELECTsourceIP,destURL,adRevenueFROMuservisits_t UVWHERE(datedi(UV.visitDate, 1999-01-01)>=0ANDdatedi(UV.visitDate, 2000-01-01)