55
DEBUGGING HIVE WITH HADOOP IN THE CLOUD Soam Acharya, David Chaiken, Denis Sheahan, Charles Wimmer Altiscale, Inc. #OCBigData @ 20140917T1845-0700

OC Big Data Monthly Meetup #5 - Session 1 - Altiscale

Embed Size (px)

DESCRIPTION

Debugging Hive with Hadoop in the Cloud --- Soam Acharya, David Chaiken, Denis Sheahan, Charles Wimmer @Altiscale

Citation preview

Page 1: OC Big Data Monthly Meetup #5 - Session 1 - Altiscale

DEBUGGING HIVE WITH HADOOP IN THE CLOUD Soam Acharya, David Chaiken, Denis Sheahan, Charles Wimmer Altiscale, Inc. #OCBigData @ 20140917T1845-0700

Page 2: OC Big Data Monthly Meetup #5 - Session 1 - Altiscale

WHO ARE WE?

•  Altiscale: Infrastructure Nerds! •  Hadoop As A Service •  Rack and build our own Hadoop clusters •  Provide a suite of Hadoop tools

o  Hive, Pig, Oozie o  Others as needed: R, Python, Spark, Mahout, Impala, etc.

•  Monthly billing plan: compute (YARN), storage (HDFS) •  https://www.altiscale.com •  @Altiscale #HadoopSherpa

Page 3: OC Big Data Monthly Meetup #5 - Session 1 - Altiscale

TALK ROADMAP

•  Our Platform and Perspective •  Hadoop 2 Primer •  Hadoop Debugging Tools •  Accessing Logs in Hadoop 2 •  Hive + Hadoop Architecture •  Hive Logs •  Hive Issues + Case Studies

o  Hive + Interactive (DRAM Centric) Processing Engines

•  Conclusion: Making Hive Easier to Use

Page 4: OC Big Data Monthly Meetup #5 - Session 1 - Altiscale

OUR DYNAMIC PLATFORM

•  Hadoop 2.0.5 => Hadoop 2.2.0 => Hadoop 2.4.1 •  Hive 0.10 => Hive 0.12 => Stinger (Hive 0.13 + Tez) •  Hive, Pig and Oozie most commonly used tools •  Working with customers on:

Spark, Impala, 0xdata, Flume, Camus/Kafka, …

Page 5: OC Big Data Monthly Meetup #5 - Session 1 - Altiscale

ALTISCALE PERSPECTIVE

•  What we do as a service provider… o  Performance + Reliability: Jobs finish faster, fewer failures o  Instant Access: Always-on access to HDFS and YARN o  Hadoop Helpdesk: Tools + experts ensure customer success o  Secure: Networking, SOC 2 Audit, Kerberos o  Results: Faster Time-to-Value (TTV), Lower TCO

•  Operational approach in this presentation… o  How to use Hadoop 2 cluster tools and logs

to debug and to tune Hive o  This talk will not focus on query optimization

Page 6: OC Big Data Monthly Meetup #5 - Session 1 - Altiscale

!!!Hadoop!2!Cluster!

Name!Node!!

Hadoop!Slave!

Hadoop!Slave!

Hadoop!Slave!

Resource!Manager!!

Secondary!NameNode!!

Hadoop!Slave!

Node!Managers!+!!

Data!Nodes!

QUICK PRIMER – HADOOP 2

Page 7: OC Big Data Monthly Meetup #5 - Session 1 - Altiscale

QUICK PRIMER – HADOOP 2 YARN

•  Resource Manager (per cluster) o  Manages job scheduling and execution o  Global resource allocation

•  Application Master (per job) o  Manages task scheduling and execution o  Local resource allocation

•  Node Manager (per-machine agent) o  Manages the lifecycle of task containers o  Reports to RM on health and resource usage

Page 8: OC Big Data Monthly Meetup #5 - Session 1 - Altiscale

HADOOP 1 VS HADOOP 2

•  No more JobTrackers, TaskTrackers •  YARN ~ Operating System for Clusters

o  MapReduce is implemented as a YARN application o  Bring on the applications! (Spark is just the start…)

•  Should be Transparent to Hive users

Page 9: OC Big Data Monthly Meetup #5 - Session 1 - Altiscale

HADOOP 2 DEBUGGING TOOLS

•  Monitoring o  System state of cluster:

!  CPU, Memory, Network, Disk

!  Nagios, Ganglia, Sensu!

!  Collectd, statd, Graphite

o  Hadoop level !  HDFS usage

!  Resource usage: •  Container memory allocated vs used

•  # of jobs running at the same time

•  Long running tasks

Page 10: OC Big Data Monthly Meetup #5 - Session 1 - Altiscale

HADOOP 2 DEBUGGING TOOLS

•  Hadoop logs o  Daemon logs: Resource Manager, NameNode, DataNode o  Application logs: Application Master, MapReduce tasks o  Job history file: resources allocated during job lifetime o  Application configuration files: store all Hadoop application

parameters

•  Source code instrumentation

Page 11: OC Big Data Monthly Meetup #5 - Session 1 - Altiscale
Page 12: OC Big Data Monthly Meetup #5 - Session 1 - Altiscale

ACCESSING LOGS IN HADOOP 2

•  To view the logs for a job, click on the link under the ID column in Resource Manager UI.

Page 13: OC Big Data Monthly Meetup #5 - Session 1 - Altiscale

ACCESSING LOGS IN HADOOP 2 •  To view application top level logs, click on logs. •  To view individual logs for the mappers and reducers,

click on History.

Page 14: OC Big Data Monthly Meetup #5 - Session 1 - Altiscale

ACCESSING LOGS IN HADOOP 2

•  Log output for the entire application.

Page 15: OC Big Data Monthly Meetup #5 - Session 1 - Altiscale

ACCESSING LOGS IN HADOOP 2

•  Click on the Map link for mapper logs and the Reduce link for reducer logs.

Page 16: OC Big Data Monthly Meetup #5 - Session 1 - Altiscale

ACCESSING LOGS IN HADOOP 2

•  Clicking on a single link under Name provides an overview for that particular map job.

Page 17: OC Big Data Monthly Meetup #5 - Session 1 - Altiscale

ACCESSING LOGS IN HADOOP 2

•  Finally, clicking on the logs link will take you to the log output for that map job.

Page 18: OC Big Data Monthly Meetup #5 - Session 1 - Altiscale

ACCESSING LOGS IN HADOOP 2

•  Fun, fun, donuts, and more fun…

Page 19: OC Big Data Monthly Meetup #5 - Session 1 - Altiscale

HIVE + HADOOP 2 ARCHITECTURE

•  Hive 0.10+

!!!Hadoop!2!Cluster!

Hive!CLI! Hive!Metastore!

Hiveserver!JDBC/ODBC!

AlaCon,!KeFle,!…!

Page 20: OC Big Data Monthly Meetup #5 - Session 1 - Altiscale

HIVE LOGS

•  Query Log location •  From /etc/hive/hive-site.xml:

<property>" <name>hive.querylog.location</name>" <value>/home/hive/log/${user.name}</value>"</property>""

SessionStart SESSION_ID="soam_201402032341" TIME="1391470900594""

"

Page 21: OC Big Data Monthly Meetup #5 - Session 1 - Altiscale

HIVE CLIENT LOGS •  /etc/hive/hive-log4j.properties:

o  hive.log.dir=/var/log/hive/${user.name}

2014-05-29 19:51:09,830 INFO parse.ParseDriver (ParseDriver.java:parse(179)) - Parsing command: select count(*) from dogfood_job_data"

2014-05-29 19:51:09,852 INFO parse.ParseDriver (ParseDriver.java:parse(197)) - Parse Completed"

2014-05-29 19:51:09,852 INFO ql.Driver (PerfLogger.java:PerfLogEnd(124)) - </PERFLOG method=parse start=1401393069830 end=1401393069852 duration=22>"

2014-05-29 19:51:09,853 INFO ql.Driver (PerfLogger.java:PerfLogBegin(97)) - <PERFLOG method=semanticAnalyze>"

2014-05-29 19:51:09,890 INFO parse.SemanticAnalyzer (SemanticAnalyzer.java:analyzeInternal(8305)) - Starting Semantic Analysis"

2014-05-29 19:51:09,892 INFO parse.SemanticAnalyzer (SemanticAnalyzer.java:analyzeInternal(8340)) - Completed phase 1 of Semantic Analysis"

2014-05-29 19:51:09,892 INFO parse.SemanticAnalyzer (SemanticAnalyzer.java:getMetaData(1060)) - Get metadata for source tables"

2014-05-29 19:51:09,906 INFO parse.SemanticAnalyzer (SemanticAnalyzer.java:getMetaData(1167)) - Get metadata for subqueries"

2014-05-29 19:51:09,909 INFO parse.SemanticAnalyzer (SemanticAnalyzer.java:getMetaData(1187)) - Get metadata for destination tables"

"

Page 22: OC Big Data Monthly Meetup #5 - Session 1 - Altiscale

HIVE METASTORE LOGS •  /etc/hive-metastore/hive-log4j.properties:

o  hive.log.dir=/service/log/hive-metastore/${user.name}

2014-05-29 19:50:50,179 INFO metastore.HiveMetaStore (HiveMetaStore.java:logInfo(454)) - 200: source:/10.252.18.94 get_table : db=default tbl=dogfood_job_data"

2014-05-29 19:50:50,180 INFO HiveMetaStore.audit (HiveMetaStore.java:logAuditEvent(239)) - ugi=chaiken ip=/10.252.18.94 cmd=source:/10.252.18.94 get_table : db=default tbl=dogfood_job_data "

2014-05-29 19:50:50,236 INFO metastore.HiveMetaStore (HiveMetaStore.java:logInfo(454)) - 200: source:/10.252.18.94 get_table : db=default tbl=dogfood_job_data"

2014-05-29 19:50:50,236 INFO HiveMetaStore.audit (HiveMetaStore.java:logAuditEvent(239)) - ugi=chaiken ip=/10.252.18.94 cmd=source:/10.252.18.94 get_table : db=default tbl=dogfood_job_data "

2014-05-29 19:50:50,261 INFO metastore.HiveMetaStore (HiveMetaStore.java:logInfo(454)) - 200: source:/10.252.18.94 get_table : db=default tbl=dogfood_job_data"

Page 23: OC Big Data Monthly Meetup #5 - Session 1 - Altiscale

HIVE ISSUES + CASE STUDIES

•  Hive Issues o  Hive client out of memory o  Hive map/reduce task out of memory o  Hive metastore out of memory o  Hive launches too many tasks

•  Case Studies: o  Hive “stuck” job o  Hive “missing directories” o  Analyze Hive Query Execution o  Hive + Interactive (DRAM Centric) Processing Engines

Page 24: OC Big Data Monthly Meetup #5 - Session 1 - Altiscale

HIVE CLIENT OUT OF MEMORY •  Memory intensive client side hive query (map-side join)

Number of reduce tasks not specified. Estimated from input data size: 999"In order to change the average load for a reducer (in bytes):" set hive.exec.reducers.bytes.per.reducer=<number>"In order to limit the maximum number of reducers:" set hive.exec.reducers.max=<number>"In order to set a constant number of reducers:" set mapred.reduce.tasks=<number>"java.lang.OutOfMemoryError: Java heap space! at java.nio.CharBuffer.wrap(CharBuffer.java:350)" at java.nio.CharBuffer.wrap(CharBuffer.java:373)" at java.lang.StringCoding$StringDecoder.decode(StringCoding.java:138)"

Page 25: OC Big Data Monthly Meetup #5 - Session 1 - Altiscale

HIVE CLIENT OUT OF MEMORY

•  Use HADOOP_HEAPSIZE prior to launching Hive client •  HADOOP_HEAPSIZE=<new heapsize> hive <fileName>"

•  Watch out for HADOOP_CLIENT_OPTS issue in hive-env.sh! •  Important to know the amount of memory available on

machine running client… Do not exceed or use disproportionate amount.

$ free -m" total used free shared buffers cached"Mem: 1695 1388 306 0 60 424"-/+ buffers/cache: 903 791"Swap: 895 101 794"!!

Page 26: OC Big Data Monthly Meetup #5 - Session 1 - Altiscale

HIVE TASK OUT OF MEMORY

•  Query spawns MapReduce jobs that run out of memory •  How to find this issue?

o  Hive diagnostic message o  Hadoop MapReduce logs

Page 27: OC Big Data Monthly Meetup #5 - Session 1 - Altiscale

HIVE TASK OUT OF MEMORY •  Fix is to increase task RAM allocation… set mapreduce.map.memory.mb=<new RAM allocation>; "set mapreduce.reduce.memory.mb=<new RAM allocation>;"

•  Also watch out for… set mapreduce.map.java.opts=-Xmx<heap size>m; "set mapreduce.reduce.java.opts=-Xmx<heap size>m; "

•  Not a magic bullet – requires manual tuning •  Increase in individual container memory size:

o  Decrease in overall containers that can be run o  Decrease in overall parallelism

Page 28: OC Big Data Monthly Meetup #5 - Session 1 - Altiscale

HIVE METASTORE OUT OF MEMORY

•  Out of memory issues not necessarily dumped to logs •  Metastore can become unresponsive •  Can’t submit queries •  Restart with a higher heap size: export HADOOP_HEAPSIZE in hcat_server.sh

•  After notifying hive users about downtime: service hcat restart"

Page 29: OC Big Data Monthly Meetup #5 - Session 1 - Altiscale

HIVE LAUNCHES TOO MANY TASKS

•  Typically a function of the input data set •  Lots of little files

Page 30: OC Big Data Monthly Meetup #5 - Session 1 - Altiscale

HIVE LAUNCHES TOO MANY TASKS •  Set mapred.max.split.size to appropriate fraction of data size

•  Also verify that

hive.input.format=org.apache.hadoop.hive.ql.io.CombineHiveInputFormat"

Page 31: OC Big Data Monthly Meetup #5 - Session 1 - Altiscale

CASE STUDY: HIVE STUCK JOB

From an Altiscale customer:

“This job [jobid] has been running now for 41 hours. Is it still progressing or has something hung up the map/reduce so it’s just spinning? Do you have any insight?”

Page 32: OC Big Data Monthly Meetup #5 - Session 1 - Altiscale

HIVE STUCK JOB

1.  Received jobId, application_1382973574141_4536, from client

2.  Logged into client cluster. 3.  Pulled up Resource Manager 4.  Entered part of jobId (4536) in the search box. 5.  Clicked on the link that says:

application_1382973574141_4536"6.  On resulting Application Overview page, clicked on link

next to “Tracking URL” that said Application Master

Page 33: OC Big Data Monthly Meetup #5 - Session 1 - Altiscale

HIVE STUCK JOB 7.  On resulting MapReduce Application page, we clicked on the

Job Id (job_1382973574141_4536). 8.  The resulting MapReduce Job page displayed detailed status

of the mappers, including 4 failed mappers 9.  We then clicked on the 4 link on the Maps row in the Failed

column. 10. Title of the next page was “FAILED Map attempts in

job_1382973574141_4536.” 11. Each failed mapper generated an error message. 12. Buried in the 16th line: Caused by: java.io.FileNotFoundException: File does not exist: hdfs://opaque_hostname:8020/HiveTableDir/FileName.log.date.seq !

Page 34: OC Big Data Monthly Meetup #5 - Session 1 - Altiscale

HIVE STUCK JOB

•  Job was stuck for a day or so, retrying a mapper that would never finish successfully.

•  During the job, our customers’ colleague realized input file was corrupted and deleted it.

•  Colleague did not anticipate the affect of removing corrupted data on a running job

•  Hadoop didn’t make it easy to find out: o  RM => search => application link => AM overview page => MR

Application Page => MR Job Page => Failed jobs page => parse long logs

o  Task retry without hope of success

Page 35: OC Big Data Monthly Meetup #5 - Session 1 - Altiscale

HIVE “MISSING DIRECTORIES”

From an Altiscale customer:

“One problem we are seeing after the [Hive Metastore] restart is that we lost quite a few directories in [HDFS]. Is there a way to recover these?”

Page 36: OC Big Data Monthly Meetup #5 - Session 1 - Altiscale

HIVE “MISSING DIRECTORIES” •  Obtained list of “missing” directories from customer:

o  /hive/biz/prod/* •  Confirmed they were missing from HDFS •  Searched through NameNode audit log to get block IDs that

belonged to missing directories.

13/07/24 21:10:08 INFO hdfs.StateChange: BLOCK* NameSystem.allocateBlock: /hive/biz/prod/incremental/carryoverstore/postdepuis/lmt_unmapped_pggroup_schema._COPYING_. BP-798113632-10.251.255.251-1370812162472 blk_3560522076897293424_2448396{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, replicas=[ReplicaUnderConstruction[10.251.255.177:50010|RBW], ReplicaUnderConstruction[10.251.255.174:50010|RBW], ReplicaUnderConstruction[10.251.255.169:50010|RBW]]}"

Page 37: OC Big Data Monthly Meetup #5 - Session 1 - Altiscale

HIVE “MISSING DIRECTORIES”

•  Used blockID to locate exact time of file deletion from Namenode logs:

13/07/31 08:10:33 INFO hdfs.StateChange: BLOCK* addToInvalidates: blk_3560522076897293424_2448396 to 10.251.255.177:50010 10.251.255.169:50010 10.251.255.174:50010 "•  Used time of deletion to inspect hive logs

Page 38: OC Big Data Monthly Meetup #5 - Session 1 - Altiscale

HIVE “MISSING DIRECTORIES” QueryStart QUERY_STRING="create database biz_weekly location '/hive/biz/prod'" QUERY_ID=“usrprod_20130731043232_0a40fd32-8c8a-479c-ba7d-3bd8a2698f4b" TIME="1375245164667" : QueryEnd QUERY_STRING="create database biz_weekly location '/hive/biz/prod'" QUERY_ID=”usrprod_20130731043232_0a40fd32-8c8a-479c-ba7d-3bd8a2698f4b" QUERY_RET_CODE="0" QUERY_NUM_TASKS="0" TIME="1375245166203" : QueryStart QUERY_STRING="drop database biz_weekly" QUERY_ID=”usrprod_20130731073333_e9acf35c-4f07-4f12-bd9d-bae137ae0733" TIME="1375256014799" : QueryEnd QUERY_STRING="drop database biz_weekly" QUERY_ID=”usrprod_20130731073333_e9acf35c-4f07-4f12-bd9d-bae137ae0733" QUERY_NUM_TASKS="0" TIME="1375256014838"

Page 39: OC Big Data Monthly Meetup #5 - Session 1 - Altiscale

HIVE “MISSING DIRECTORIES”

•  In effect, user “usrprod” issued: At 2013-07-31 04:32:44: create database biz_weekly location '/hive/biz/prod' At 2013-07-31 07:33:24: drop database biz_weekly •  This is functionally equivalent to:

hdfs dfs -rm -r /hive/biz/prod"

Page 40: OC Big Data Monthly Meetup #5 - Session 1 - Altiscale

HIVE “MISSING DIRECTORIES”

•  Customer manually placed their own data in /hive – the warehouse directory managed and controlled by hive

•  Customer used CREATE and DROP db commands in their code

o  Hive deletes database and table locations in /hive with impunity

•  Why didn’t deleted data end up in .Trash? o  Trash collection not turned on in configuration settings o  It is now, but need a –skipTrash option (HIVE-6469)

Page 41: OC Big Data Monthly Meetup #5 - Session 1 - Altiscale

HIVE “MISSING DIRECTORIES” •  Hadoop forensics: piece together disparate sources…

o  Hadoop daemon logs (NameNode) o  Hive query and metastore logs o  Hadoop config files

•  Need better tools to correlate the different layers of the system: hive client, hive metastore, MapReduce job, YARN, HDFS, operating sytem metrics, …

By the way… Operating any distributed system would be totally insane without NTP and a standard time zone (UTC).

Page 42: OC Big Data Monthly Meetup #5 - Session 1 - Altiscale

CASE STUDY – ANALYZE QUERY

•  Customer provided Hive query + data sets (100GBs to ~5 TBs)

•  Needed help optimizing the query •  Didn’t rewrite query immediately •  Wanted to characterize query performance and isolate

bottlenecks first

Page 43: OC Big Data Monthly Meetup #5 - Session 1 - Altiscale

ANALYZE AND TUNE EXECUTION

•  Ran original query on the datasets in our environment: o  Two M/R Stages: Stage-1, Stage-2

•  Long running reducers run out of memory o  set mapreduce.reduce.memory.mb=5120"o  Reduces slots and extends reduce time

•  Query fails to launch Stage-2 with out of memory o  set HADOOP_HEAPSIZE=1024 on client machine

•  Query has 250,000 Mappers in Stage-2 which causes failure

o  set mapred.max.split.size=5368709120 to reduce Mappers

Page 44: OC Big Data Monthly Meetup #5 - Session 1 - Altiscale

ANALYSIS: HOW TO VISUALIZE?

•  Next challenge - how to visualize job execution? •  Existing hadoop/hive logs not sufficient for this task •  Wrote internal tools

o  parse job history files o  plot mapper and reducer execution

Page 45: OC Big Data Monthly Meetup #5 - Session 1 - Altiscale

ANALYSIS: MAP STAGE-1

Page 46: OC Big Data Monthly Meetup #5 - Session 1 - Altiscale

Single!reduce!task!

ANALYSIS: REDUCE STAGE-1

Page 47: OC Big Data Monthly Meetup #5 - Session 1 - Altiscale

ANALYSIS: MAP STAGE-2

Page 48: OC Big Data Monthly Meetup #5 - Session 1 - Altiscale

ANALYSIS: REDUCE STAGE-2

Page 49: OC Big Data Monthly Meetup #5 - Session 1 - Altiscale

ANALYZE EXECUTION: FINDINGS

•  Lone, long running reducer in first stage of query •  Analyzed input data:

o  Query split input data by userId o  Bucketizing input data by userId o  One very large bucket: “invalid” userId o  Discussed “invalid” userid with customer

•  An error value is a common pattern! o  Need to differentiate between “Don’t know and don’t care”

or “don’t know and do care.”

Page 50: OC Big Data Monthly Meetup #5 - Session 1 - Altiscale

INTERACTIVE (DRAM CENTRIC) PROCESSING SYSTEMS

•  Loading data into DRAM makes processing fast! •  Examples: Spark, Impala, 0xdata, …, [SAP HANA], … •  Streaming systems (Storm, DataTorrent) may be similar •  Need to increase YARN container memory size

Page 51: OC Big Data Monthly Meetup #5 - Session 1 - Altiscale

HIVE + INTERACTIVE: WATCH OUT FOR CONTAINER SIZE •  Caution: larger YARN container settings for interactive

jobs may not be right for batch systems like Hive •  Container size: needs to combine vcores and memory: yarn.scheduler.maximum-allocation-vcores yarn.nodemanager.resource.cpu-vcores ..."

Page 52: OC Big Data Monthly Meetup #5 - Session 1 - Altiscale

HIVE + INTERACTIVE: WATCH OUT FOR FRAGMENTATION

•  Attempting to schedule interactive systems and batch systems like Hive may result in fragmentation

•  Interactive systems may require all-or-nothing scheduling •  Batch jobs with little tasks may starve interactive jobs

Page 53: OC Big Data Monthly Meetup #5 - Session 1 - Altiscale

HIVE + INTERACTIVE: WATCH OUT FOR FRAGMENTATION

Solutions for fragmentation… •  Reserve interactive nodes before starting batch jobs •  Reduce interactive container size (if the algorithm permits) •  Node labels (YARN-726) and gang scheduling (YARN-624)

Page 54: OC Big Data Monthly Meetup #5 - Session 1 - Altiscale

CONCLUSIONS

•  Hive + Hadoop debugging can get very complex o  Sifting through many logs and screens o  Automatic transmission versus manual transmission

•  Static partitioning induced by Java Virtual Machine has benefits but also induces challenges.

•  Where there are difficulties, there’s opportunity: o  Better tooling, instrumentation, integration of logs/metrics

•  YARN still evolving into an operating system •  Hadoop as a Service: aggregate and share expertise •  Need to learn from the traditional database community!

Page 55: OC Big Data Monthly Meetup #5 - Session 1 - Altiscale

QUESTIONS? COMMENTS?