1
Big Data Ecosystem Hadoop is a distributed system oriented batch , cut for processing large data sets. Hadoop users find themselves manipulating the HDFS file system or develop low MapReduce programs often from scratch. Subprojects to Hadoop born of this and provide mechanisms and features that simplify the handling and processing of large data sets. We briefly present a few in this section. A full list can be found here: Big Data Ecosystem . What HBase? HBase allows integration with Hadoop with a key storage system / value commonly called binary storage or key / value store in English. This sub-project Hadoop is also inspired by the project BigTable of Google. What Hive? Hive creates a relational database in the HDFS file system. The project allows developers to write queries in a language called SQL near HiveQL, which are then translated as MapReduce programs on the cluster. The advantage is to provide a language that developers know for writing MapReduce programs. What Pig? The project Pig Hive is positioned as in the sense that it provides developers with level language (DSL) dedicated to the analysis of large data volumes. It is then for developers used to create scripts via Bash or Python, for example. Furthermore, Pig is extensible in the sense that, if a function is not available, it is possible to enrich via specific developments in a low-level language (Java, Python ...). In the same vein the project Pig , thereScalding that draws the power of Scala language to develop MapReduce programs. What Sqoop? Sqoop is a project that helps to interact with relational database management systems to Hadoop. The project allows you to import and export data to and from a database. What Mahout? Mahout provides implementations of algorithms to make intelligence. It provides, for example, algorithms for data partitioning and the automatic classification in a MapReduce environment. Hadoop training in Hyderabad is one of the leading training institute in Ameerpet , Hyderabad to learn Hadoop. For working professionals we are providing Hadoop online Training in Hyderabad . For more details you can visit at http://hadooptraininginhyderabad.co.in

Big data ecosystem 2

Embed Size (px)

Citation preview

Page 1: Big data ecosystem 2

Big Data Ecosystem

Hadoop is a distributed system oriented batch , cut for processing large data sets. Hadoop users find themselves manipulating the HDFS file system or develop low MapReduce programs often from scratch. Subprojects to Hadoop born of this and provide mechanisms and features that simplify the handling and processing of large data sets. We briefly present a few in this section. A full list can be found here: Big Data Ecosystem . What HBase? HBase allows integration with Hadoop with a key storage system / value commonly called binary storage or key / value store in English. This sub-project Hadoop is also inspired by the project BigTable of Google. What Hive? Hive creates a relational database in the HDFS file system. The project allows developers to write queries in a language called SQL near HiveQL, which are then translated as MapReduce programs on the cluster. The advantage is to provide a language that developers know for writing MapReduce programs. What Pig? The project Pig Hive is positioned as in the sense that it provides developers with level language (DSL) dedicated to the analysis of large data volumes. It is then for developers used to create scripts via Bash or Python, for example. Furthermore, Pig is extensible in the sense that, if a function is not available, it is possible to enrich via specific developments in a low-level language (Java, Python ...). In the same vein the project Pig , thereScalding that draws the power of Scala language to develop MapReduce programs. What Sqoop? Sqoop is a project that helps to interact with relational database management systems to Hadoop. The project allows you to import and export data to and from a database. What Mahout? Mahout provides implementations of algorithms to make intelligence. It provides, for example, algorithms for data partitioning and the automatic classification in a MapReduce environment. Hadoop training in Hyderabad is one of the leading training institute in Ameerpet , Hyderabad to learn Hadoop. For working professionals we are providing Hadoop online Training in Hyderabad. For more details you can visit at http://hadooptraininginhyderabad.co.in