Upload
hadoop-summit
View
8.316
Download
2
Tags:
Embed Size (px)
Citation preview
I accidentally the Namenode HDFS reliability at Facebook
Andrew Ryan Facebook April 2012
The HDFS Namenode: SPOF by design
▪ Single Point of Failure by design
▪ All metadata operations go through Namenode
▪ Early designers made tradeoffs: features & performance first
Namenode Secondary Namenode
Datanode Clients
Data
Simplified HDFS Architecture: Namenode as SPOF
HDFS major use cases at Facebook Data Warehouse and Facebook Messages
Data Warehouse Facebook Messages
# of clusters <10 10’s Size of clusters Large
(100’s – 1000’s of nodes)
Small (~100 nodes)
Processing workload MapReduce batch jobs
HBase transactions
Namenode load Very heavy Very light End-user downtime impact
None Users without Messages
HDFS at Facebook: 2009-2012 Some things have changed…
2009 2012 # HDFS clusters 1 >100
Largest HDFS cluster size (TB) 600TB >100PB
Largest HDFS cluster size (# files) 10 million 200 million
HDFS cluster types MapReduce MapReduce, HBase, MySQL backups, +more
HDFS at Facebook: 2009-2012 …and some things have not
2009 2012 Single points of failure in HDFS Namenode Namenode
HDFS cluster restart time 60 minutes 60 minutes
Namenode failover method Manual, complicated
Manual, complicated
SPOF Namenode as a cause of downtime
Unknown Unknown
Data Warehouse
▪ Storage and querying of structured log data using Hive and Hadoop MapReduce
▪ Composed of dozens of tools/components
▪ A “vigorous and creative” user population Storage (HDFS)
Compute (MapReduce)
Query (Hive)
Workflow (Nocron)
UI Tools
Hadoop
Data Warehouse: all incidents 41% are HDFS-related
Data Warehouse: SPOF Namenode incidents 10% are SPOF Namenode
Facebook Messages
Messages Cell
Application Server
HBase/HDFS/ZK
Haystack
Clients (www, chat, MTA, etc.)
Anti-spam
Mail Servers
User Directory Service
Outbound Mail
Messages: all incidents 16% are HDFS-related
Messages: SPOF Namenode incidents 10% are SPOF Namenode
What would happen if… Instead of this…
Namenode Secondary Namenode
Datanode Clients
Data
Simplified HDFS Architecture: Namenode as SPOF
What would happen if… We had this!
Primary Namenode
Standby Namenode
Datanode Clients
Data
Simplified HDFS Architecture: Highly Available Namenode
AvatarNode is our solution
AvatarNode datanode view AvatarNode client view
AvatarNode is… ▪ A two-node, highly available Namenode with manual failover
▪ In production today at Facebook
▪ Open-sourced, based on Hadoop 0.20: https://github.com/facebook/hadoop-20
AvatarNode does not… ▪ Eliminate the dependency on shared storage for image/edits
▪ Provide instant failover (~1 second per million blocks+files)
▪ Provide automated failover
▪ Guarantee I/O fencing for Primary/Standby (although precautions are taken)
▪ Require Zookeeper at all times for proper normal operation (required for failover)
▪ Allow for >2 Namenodes to participate in an HA cluster
▪ Have any special network requirements
Wrapping up… ▪ The SPOF Namenode is a weak link of HDFS’s design
▪ In our services which use HDFS, we estimate we could eliminate:
▪ 10% of service downtime from unscheduled outages
▪ 20-50% of downtime from scheduled maintenance
▪ AvatarNode is Facebook’s solution for 0.20, available today
▪ Other Namenode HA solutions are being worked on in HDFS trunk (HDFS-1623)
Questions?
Page 19
Sessions will resume at 11:25am