Innovations in Apache Hadoop MapReduce, Pig and Hive for improving query performance

  • View
    1.522

  • Download
    1

Embed Size (px)

DESCRIPTION

Innovations in Apache Hadoop MapReduce, Pig and Hive for improving query performance - Vinod Kumar Vavilapalli - Gopal Vijayaraghavan

Text of Innovations in Apache Hadoop MapReduce, Pig and Hive for improving query performance

  • 1.Innovations In Apache Hadoop MapReduce,Pig and Hive for improving queryperformancegopalv@apache.orgvinodkv@apache.orgPage 1

2. Hortonworks Inc. 2013 3. Operation Stinger Hortonworks Inc. 2013 Page 3 4. Performance at any cost Hortonworks Inc. 2013 5. Scalability Already works great, just dont break it for performance gains Isolation + Security Queries between different users run as different users Fault tolerance Keep all of MRs safety nets to work around bad nodes in clusters UDFs Make sure they are User defined and not Admin defined Hortonworks Inc. 2013 6. First things first How far can we push Hive as it exists today? Hortonworks Inc. 2013 7. Benchmark spec The TPC-DS benchmark data+query set Query 27 (big joins small) For all items sold in stores located in specified states during a givenyear, find the average quantity, average list price, average list salesprice, average coupon amount for a given gender, marital status,education and customer demographic. Query 82 (big joins big) List all items and current prices sold through the store channel fromcertain manufacturers in a given price range and consistently had aquantity between 100 and 500 on hand in a 60-day period. Hortonworks Inc. 2013 8. TL;DR TPC-DS Query 27, Scale=200, 10 EC2 nodes (40 disks) Hortonworks Inc. 2013 9. TL;DR - II TPC-DS Query 82, Scale=200, 10 EC2 nodes (40 disks) Hortonworks Inc. 2013 10. Forget the actual benchmark First of all, YMMV Software Hardware Setup Tuning Text formats seem to be the staple of all comparisons Really? Everybodys using it but only for benchmarks! Hortonworks Inc. 2013 11. What did the trick? Mapreduce? HDFS? Or is it just Hive? Hortonworks Inc. 2013 12. Optional Advice Hortonworks Inc. 2013 13. RCFile Binary RCFiles Hive pushes down column projections Less I/O, Less CPU Smaller files Hortonworks Inc. 2013 14. Data organization No data system at scale is loaded once & left alone Partitions are essential Data flows into new partitions every day Hortonworks Inc. 2013 15. A closer look Now revisiting the benchmark and its results Hortonworks Inc. 2013 16. Query27 - Before Hortonworks Inc. 2013 17. Before Hortonworks Inc. 2013 18. Query 27 - After Hortonworks Inc. 2013 19. After Hortonworks Inc. 2013 20. Query 82 - Before Hortonworks Inc. 2013 21. Query 82 - After Hortonworks Inc. 2013 22. What changed? Job Count/Correct plan Correct data formats Correct data organization Correct configuration Hortonworks Inc. 2013 23. What changed? Data FormatsData OrganizationQuery Plan Hortonworks Inc. 2013 24. Hortonworks Inc. 2013 25. Is that all? NO! In Hive Metastore RCFile issues CPU intensive code In YARN+MR Parallelism Spin-up times Data locality In HDFS Bad disks/deteriorating nodes Hortonworks Inc. 2013 26. In Hive NO! In Hive Metastore RCFile issues CPU intensive code In YARN+MR Parallelism Spin-up times Data locality In HDFS Bad disks/deteriorating nodes Hortonworks Inc. 2013 27. In Hive NO! In Hive Metastore RCFile issues CPU intensive code In YARN+MR Parallelism Spin-up times Data locality In HDFS Bad disks/deteriorating nodes Hortonworks Inc. 2013 28. Hive Metastore 1+N Select problem SELECT partitions FROM tables; /* for each needed partition */ SELECT * FROM Partition .. For query 27 , generates > 5000 queries! 4-5 seconds lost on each call! Lazy loading or Include/Join are general solutions Datanucleus/ORM issues 100K NPEs try.. Catch.. Ignore.. Metastore DB Schema revisit Denormalize some/all of it? Hortonworks Inc. 2013 29. In Hive NO! In Hive Metastore RCFile issues CPU intensive code In YARN+MR Parallelism Spin-up times Data locality In HDFS Bad disks/deteriorating nodes Hortonworks Inc. 2013 30. RCFile issues RCFiles do not split well Row groups and row group boundaries Small row groups vs big row groups Sync() vs min split Storage packing Run-length information is lost Unnecessary deserialization costs Hortonworks Inc. 2013 31. ORC file format A single file as output of each task. Dramatically simplifies integration with Hive Lowers pressure on the NameNode Support for the Hive type model Complex types (struct, list, map, union) New types (datetime, decimal) Encoding specific to the column type Split files without scanning for markers Bound the amount of memory required forreading or writing. Hortonworks Inc. 2013 32. In Hive NO! In Hive Metastore RCFile issues CPU intensive code In YARN+MR Parallelism Spin-up times Data locality In HDFS Bad disks/deteriorating nodes Hortonworks Inc. 2013 33. CPU intensive code Hortonworks Inc. 2013 34. CPU intensive code Hive query engine processes one row at a time Very inefficient in terms of CPU usage Lazy deserialization: layers Object inspector calls Lots of virtual method calls Hortonworks Inc. 2013 35. Tighten your loops Hortonworks Inc. 2013 36. Vectorization to the rescue Process a row batch at a time instead of a single row Row batch to consist of column vectors The column vector will consist of array(s) of primitive types as far as possible Each operator will process the whole column vector at atime File formats to give out vectorized batches for processing Underlying research promises Better instruction pipelines and cache usage Mechanical sympathy Hortonworks Inc. 2013 37. Vectorization: Prelim results Functionality Some arithmetic operators and filters using primitive type columns Have a basic integration benchmark to prove that the whole setup works Performance Micro benchmark More than 30x improvement in the CPU time Disclaimer: Micro benchmark! Include io or deserialization costs or complex and string datatypes Hortonworks Inc. 2013 38. In YARN+MR NO! In Hive Metastore RCFile issues CPU intensive code In YARN+MR Data locality Parallelism Spin-up times In HDFS Bad disks/deteriorating nodes Hortonworks Inc. 2013 39. In YARN+MR NO! In Hive Metastore RCFile issues CPU intensive code In YARN+MR Data locality Parallelism Spin-up times In HDFS Bad disks/deteriorating nodes Hortonworks Inc. 2013 40. Data Locality CombineInputFormat AM interaction with locality Short-circuit reads! Delay scheduling Good for throughput Bad for latency Hortonworks Inc. 2013 41. In YARN+MR NO! In Hive Metastore RCFile issues CPU intensive code In YARN+MR Data locality Parallelism Spin-up times In HDFS Bad disks/deteriorating nodes Hortonworks Inc. 2013 42. Parallelism Can tune it (to some extent) Controlling splits/reducer count Hive doesnt know dynamic cluster status Benchmarks max out clusters, real jobs may or may not Hive does not let you control parallelism particularly in case of multiple jobs in a query Hortonworks Inc. 2013 43. In YARN+MR NO! In Hive Metastore RCFile issues CPU intensive code In YARN+MR Data locality Parallelism Spin-up times In HDFS Bad disks/deteriorating nodes Hortonworks Inc. 2013 44. Spin up times AM startup costs Task startup costs Multiple waves of map tasks Hortonworks Inc. 2013 45. Apache Tez Generic DAG workflow Container re-use AM pool service Hortonworks Inc. 2013 46. AM Pool Service Pre-launches a pool of AMs Jobs submitted to these pre-launched AMs Saves 3-5 seconds Pre-launched AMs can pre-allocate containers Tasks can be started as soon as the job is submitted Saves 2-3 seconds Hortonworks Inc. 2013 47. Container reuse Tez MapReduce AM supports Container reuse Launched JVMs are re-used between tasks about 4-5 seconds saved in case of multiple waves Allows future enhancements re-using task data structures across splits Hortonworks Inc. 2013 48. In HDFS NO! In Hive Metastore RCFile issues CPU intensive code In YARN+MR Data locality Parallelism Spin-up times In HDFS Bad disks/deteriorating nodes Hortonworks Inc. 2013 49. Speculation/bad disks No cluster remains at 100% forever Bad disks cause latency issues Speculation is one defense, but it is not enough Fault tolerance is a safety net Possible solutions: More feedback from HDFS about stale nodes, bad/slow disks Volume scheduling Hortonworks Inc. 2013 50. General guidelines Benchmarking Be wary of benchmarks! Including ours! Algebra with X Hortonworks Inc. 2013 51. General guidelines contd. Benchmarks: To repeat, YMMV. Benchmark *your* use-case. Decide your problem size If (smallData) { Mysql/Postgres/Your smart phone} else { Make it work Make it scale Make it faster } If it is (seems to be) slow, file a bug, spend a little time! Replacing systems without understanding them Is an easy way to have an illusion of progress Hortonworks Inc. 2013 52. Related talks Optimizing Hive Queries by Owen OMalley Whats New and Whats Next in Apache Hive by GuntherHagleitner Hortonworks Inc. 2013 53. Credits Arun C Murthy Bikas Saha Gopal Vijayaraghavan Hitesh Shah Siddharth Seth Vinod Kumar Vavilapalli Alan Gates Ashutosh Chauhan Vikram Dixit Gunther Hagleitner Owen OMalley Jintendranath Pandey Yahoo!, Facebook, Twitter, SAP and Microsoft all contributing. Hortonworks Inc. 2013 54. Q&A Thanks! Hortonworks Inc. 2013