HBaseCon 2013: General Session

Embed Size (px)


Welcome! Michael Stack, Software Engineer, Cloudera & HBase PMC Chair 9:00-9:05am Conference MC Michael Stack, Chair of the HBaseCon 2013 Program Committee, welcomes you to the conference and offers a preview of the day. The Apache HBase Community: Best Ever and Getting Better Amr Awadallah, CTO and Co-founder, Cloudera 9:05-9:15am Amr comments on the explosion of interest in Apache HBase over the past few years, how that interest has influenced the Hadoop stack overall, and why Cloudera considers its involvement in the HBase community to be so important. State of the Apache HBase Union Michael Stack & Lars Hofhansl, Architect, Salesforce.com 9:15-9:40am Release-managers-in-crime Michael and Lars offer a look back, and a look forward, at HBase releases and what they have brought us (and will bring us in the future). The Apache HBase Ecosystem Aaron Kimball, Chief Architect, WibiData 9:40-10:05am Today, HBase stands as Apache Hadoop did years ago, a project with a growing and vibrant community in its own right. In this talk, Aaron will overview some of the projects built on top of HBase that you’ll get a chance to learn about during the day – each of these projects having grown out of a need to use HBase for an application that requires real-time atomic access to data. As an example, he’ll present the motivations for Kiji and how it is helping organizations create amazing new applications using HBase and Hadoop. Overview of Apache HBase at Facebook (Slides Not Available) Liyin Tang, Software Engineer, Facebook & HBase PMC Member 10:05-10:30am In this keynote, you’ll get an overview of how HBase is used at Facebook. Explore Facebook’s applications using HBase as an OLTP service, which require high reliability, efficiency, and scalability, and how HBase can tolerate small network glitches and rack failures. You’ll also learn the use cases for adopting HBase as a batch processing service and various optimizations to scale processing throughput. Finally, learn Facebook’s thoughts about the future of HBase.

Citation preview

  • 1.Hosted by Welcome Michael Stack, Software Engineer, Cloudera & HBase PMC Chair

2. Goals of HBaseCon 2013 Bring the Apache HBase community together Encourage contributions to the HBase ecosystem Share challenges and solutions for HBase 1 2 3 3. HBaseCon 2013 Session Tracks Operations Internals Ecosystem Case Studies Session Track 3Track 1 Track 2 Track 3 Track 4 4. HBaseCon 2013 Program Committee Gary Helmling Lars Hofhansl Jonathan Hsieh Doug Meil Andrew Purtell Enis Sztutar Michael Stack Chair Liyin Tang Architect Engineer Software Engineer Chief Software Architect Systems Architect Member of Technical Staff Software Engineer Software Engineer 5. Thank You to Our Sponsors Community Sponsor Conference Sponsors Media Sponsors 6. Visit Sponsors = Chance to Win = 7. Conference Notes Please fill out the overall conference survey Reception is 5:40pm 8:00pm in the Yerba Buena Foyer Connecting to the internet Wireless network = Marriott Conference Passcode = db075b 8. Hosted by The Apache HBase Community: Best Ever and Getting Better Amr Awadallah, CTO and Co-founder, Cloudera @awadallah 9. The Apache HBase Community Has Never Been Healthier JIRA ActivityCommits Activity 10. The Market for HBase Skills is Bigger than Ever 11. The HBase Ecosystem is Rich and Expanding HBaseCon 2013 speakers from these companies this year (logos below the dotted line are net-new from 2012!) 12. Top 5 Reasons Cloudera Loves HBase Its vibrant community is a benchmark for the entire Apache Hadoop ecosystem. Its a first-class citizen inside the Hadoop stack. It allows us to offer support services for which a lot of customers will pay good money. It draws top-drawer engineer talent to Cloudera. It gives us an excuse to host this tremendous conference and throw a big party for the community! 1 2 3 4 5 13. Hosted by Thank You for Attending HBaseCon and Thank You for Contributing! 14. Hosted by State of the Apache HBase Union Michael Stack, Software Engineer, Cloudera & HBase PMC Chair and Lars Hofhansl, Architect, Salesforce.com 15. We are your Release Managers! Mr. (0.94.x) Lars Hofhansl Michael Stack (0.95.x/0.96.x) 16. Introducing... 17. Your PMC... 18. Your Committers... 19. MVP and the award goes to 20. Diverse Team* *http://hbase.apache.org/team-list.html 21. Deploys Multitenant multifarious feature store a.k.a dumping ground Stumbleupon, Y!, Salesforce Reconciliation store eBay Timeseries OpenTSDB, Salesforce, FB ODS Lots-o-entities store Flurry, Kiji, Genome Lots-o-entities BLOBs, FB Messages 22. OLTP & OLAP 23. Dev Rate 24. # of Commits Total Files 2021 Total Lines of Code 832122 Total Commits 6615 Authors 39 25. JIRA: 2008-2013 26. JIRA: Adoption 27. JIRA: Opened vs Closed 28. New Committers (by First Commit) vs. Active Committers (One Commit/Month) 29. Commits/Month over Time (0.94/trunk) 30. 419 jiras in 0.94.0 660 jiras in 0.94.1 0.94.8 31. Frequent, small releases Train model 46 week cycles 32. Wire compatible between releases Upgrade possible to any point release Test stability 33. Focus on: Performance (FB, Salesforce) Stability 34. http://www.flickr.com/photos/sysli/3026288256/sizes/o/in/photostream/ 35. >1000 So far... 36. http://www.flickr.com/photos/allspaw/5815258929/sizes/o/in/photostream/ 37. http://www.flickr.com/photos/38595542@N02/3690830720/sizes/o/in/photostream/ 38. Hadoop HDFS Fixes Faster Recovery Detection Replay Assign 39. System tables Filesystem Up in zookeeper Over the wire 40. RPC Implements protobuf service Specification! Data on the side Encoding Compression PB DATA 41. Snapshots By table Snapshot, clone, restore, export Inexpensive Just metadata Good for... ackups Replication Offline processing 42. Compactions Pluggable Tiered Striped Trigger 43. Tests Cluster test module Standalone or cluster Sizeable x data x runtime Borrows test types from all over Netflix ChaosMonkey Apache Accumulo linked-list dataloss checker 44. Miscellaneous Smarter load balancer Revamped Metrics UI Etc. Hadoop 1.x and 2.x 45. HBase Ecosystem 46. Chasm 47. kiji.org Entity-centric, simple model Types, complex, compound types Each cell is schema versioned Works across MR & REST, etc. Production users Open-source 48. SQL A SQL skin over HBase Coprocessors, custom filters, jdbc driver https://github.com/forcedotcom/phoenix Phoenix 49. 1.0? Next 50. Related: QoS Next Latency resilience/Latency tolerance* Bring home the outliers * http://static.googleusercontent.com/external_content/untrusted_dlcp/research.google.com/en/us/people/jeff/Berkeley-Latency-Mar2012.pdf 51. Next Faster scans (OLAP) 52. Next More databasey!!! Statistics 2ndary indexing Take 3 Types Serialization Keep sort order 53. [email protected] [email protected] 54. Hosted by The Apache HBase Ecosystem Aaron Kimball, Chief Architect, WibiData 55. About me 56. In the beginning There was a search engine and when building indexes got too hard Along came an elephant to help push: 57. The ASF is an ecosystem 58. And so is Hadoop 59. HBase: The new ecosystem Phoenix 60. Now powering 61. Big Data Applications Customer RelationsMobile Web Applications Hadoop & HBase Storage Analytics Serving Real-time model scoring Investigative analytics 62. Major application targets MapReduce, Cascading, Crunch 63. Big Data Apps are hard to build Serialization & versioning Deployment Communication between teams Front end, back end, short request, batch, real time Every Java developer should be able to build Big Data Apps today its too hard. 64. Kiji Kiji is designed to help you build real-time Big Data Applications on Apache HBase + + 100% Apache 2 licensed 65. Kiji architecture 66. Leading design decisions Store your data in HBase Encode it using Avro An entity-centric table design Manage a data dictionary around tables Distribute writes across the cluster 67. Key features Work with big data in rich types with schema evolution Guides users to successful application design Scala-based modeling language Integration with front-end systems Deployment of real-time model scoring 68. Kiji Go to kiji.org and download the BentoBox Zero-config Hadoop + HBase + Kiji instance Batteries included 15-minute quickstart guide and a tutorial with full source code 69. Come attend ! Want a deep dive on Kiji? KijiCon is tomorrow! A 1-day workshop of tutorials & hacking Register @ kijicon.eventbrite.com 70. Conclusions Each month shows new peak interest in HBase The ecosystem is growing Open source technologists are working hand in hand to make HBase more accessible Wed love your help in the community! 71. [email protected] Build big data applications.