25
Why do I need Hadoop?

Big-Data Hadoop Tutorials - MindScripts Technologies, Pune

Embed Size (px)

DESCRIPTION

MindScripts Technologies, is the leading Big-Data Hadoop Training institutes in Pune, providing a complete Big-Data Hadoop Course with Cloud-Era certification.

Citation preview

Page 1: Big-Data Hadoop Tutorials - MindScripts Technologies, Pune

Why do I need Hadoop?

Page 2: Big-Data Hadoop Tutorials - MindScripts Technologies, Pune
Page 3: Big-Data Hadoop Tutorials - MindScripts Technologies, Pune
Page 4: Big-Data Hadoop Tutorials - MindScripts Technologies, Pune
Page 5: Big-Data Hadoop Tutorials - MindScripts Technologies, Pune

Business analytics

Business analytics focuses on developing new insights and understanding of business performance based on data and statistical methods.

Page 6: Big-Data Hadoop Tutorials - MindScripts Technologies, Pune

Problem : Too much data

Page 7: Big-Data Hadoop Tutorials - MindScripts Technologies, Pune

Big Data!!

Page 8: Big-Data Hadoop Tutorials - MindScripts Technologies, Pune

Velocity

How fast data is being produced and how fast the data must be processed to meet

demand.

Have a look through analytics lens!

Page 9: Big-Data Hadoop Tutorials - MindScripts Technologies, Pune

Variability

highly inconsistent with periodic peaks

Is something big trending in the social media?

Difference in Variety and Variability

Page 10: Big-Data Hadoop Tutorials - MindScripts Technologies, Pune

Megabytes,Gigabytes…Terabyte : To put it in some perspective, a

Terabyte could hold about 300 hours of good quality video. A Terabyte could hold 1,000 copies of the Encyclopedia Britannica.

Petabyte :   It could hold 500 billion pages of standard printed text.

Exabyte: It has been said that 5 Exabytes would be equal to all of the words ever spoken by mankind.

Page 11: Big-Data Hadoop Tutorials - MindScripts Technologies, Pune

Human Generated Data and Machine GeneratedData

Page 12: Big-Data Hadoop Tutorials - MindScripts Technologies, Pune

Challenges of Big Data

Sheer size of Big DataBig Data is unstructured or semi

structured.No point in just storing big data, if

we can't process it.

Page 13: Big-Data Hadoop Tutorials - MindScripts Technologies, Pune
Page 14: Big-Data Hadoop Tutorials - MindScripts Technologies, Pune

Hadoop enables a computing solution that is:

Scalable– New nodes can be added as needed, and added without needing to change data formats, how data is loaded, how jobs are written, or the applications on top.

Cost effective– Hadoop brings massively parallel computing to commodity servers.

Flexible– Hadoop is schema-less, and can absorb any type of data, structured or not, from any number of sources.

Fault tolerant– When you lose a node, the system redirects work to another location of the data and continues processing without missing a beat.

Page 15: Big-Data Hadoop Tutorials - MindScripts Technologies, Pune

Power of Map Reduce

Page 16: Big-Data Hadoop Tutorials - MindScripts Technologies, Pune
Page 17: Big-Data Hadoop Tutorials - MindScripts Technologies, Pune

Course Content

IntroductionHadoop: Basic Concepts  What is Hadoop? The Hadoop Distributed File System Hadoop Map Reduce Works Anatomy of a Hadoop Cluster 

Hadoop daemonsMaster DaemonsName nodeJob TrackerSecondary name nodeSlave Daemons Job trackerTask tracker

Page 18: Big-Data Hadoop Tutorials - MindScripts Technologies, Pune

HDFS ( Hadoop Distributed File System )

Blocks and SplitsInput SplitsHDFS SplitsData ReplicationHadoop Rack AwareData high availabilityData IntegrityCluster architecture and block placementAccessing HDFSJAVA ApproachCLI ApproachProgramming PracticesDeveloping MapReduce Programs in Local ModeRunning without HDFS and MapreducePseudo-distributed ModeRunning all daemons in a single nodeFully distributed modeRunning daemons on dedicated nodes

Page 19: Big-Data Hadoop Tutorials - MindScripts Technologies, Pune

Writing a MapReduce ProgramExamining a Sample MapReduce ProgramWith several examplesBasic API ConceptsThe Driver CodeThe MapperThe ReducerHadoop's Streaming API 

Common MapReduce Algorithms Sorting and SearchingIndexingClassification/Machine LearningTerm Frequency - Inverse Document FrequencyWord Co-OccurrenceHands-On Exercise: Creating an Inverted IndexIdentity MapperIdentity ReducerExploring well known problems using MapReduce applications

Page 20: Big-Data Hadoop Tutorials - MindScripts Technologies, Pune

Debugging MapReduce ProgramsTesting with MRUnitLoggingOther Debugging Strategies.

Advanced MapReduce Programming A Recap of the MapReduce FlowThe Secondary SortCustomized Input Formats and Output Formats

Page 21: Big-Data Hadoop Tutorials - MindScripts Technologies, Pune

Hadoop Ecosystem

HBaseHBase conceptsHBase architectureRegion server architectureFile storage architectureHBase basicsColumn accessScansHBase use casesInstall and configure HBase on a multi node clusterCreate database, Develop and run sample applicationsAccess data stored in HBase using clients like Java, Python and PearlHBase and Hive IntegrationHBase admin tasksDefining Schema and basic operation

Page 22: Big-Data Hadoop Tutorials - MindScripts Technologies, Pune

Hive Hive conceptsHive architectureInstall and configure hive on clusterCreate database, access it from java clientBucketsPartitionsJoins in hiveInner joinsOuter JoinsHive UDFHive UDAFHive UDTFDevelop and run sample applications in Java/Python to access hive

Page 23: Big-Data Hadoop Tutorials - MindScripts Technologies, Pune

PIG Pig basicsInstall and configure PIG on a clusterPIG Vs MapReduce and SQLPig Vs HiveWrite sample Pig Latin scriptsModes of running PIGRunning in Grunt shellProgramming in EclipseRunning as Java programPIG UDFsPig Macros

FlumeFlume conceptsInstall and configure flume on clusterCreate a sample application to capture logs from Apache using flume 

Page 24: Big-Data Hadoop Tutorials - MindScripts Technologies, Pune

SqoopGetting SqoopA Sample ImportDatabase ImportsControlling the importImports and consistencyDirect-mode importsPerforming an Export

Page 25: Big-Data Hadoop Tutorials - MindScripts Technologies, Pune

Contact Us

AddressMindScripts Technologies,2nd Floor, Siddharth Hall, Near Ranka Jewellers, Behind HP Petrol Pump, Karve Rd,Pune 411004

AddressMindScripts Technologies,C8, 2nd Floor, Sant Tukaram Complex ,Pradhikaran, Above Savali Hotel, Opp Nigdi Bus Stand,Nigdi, Pune - 411044

Call 9595957557 8805674210 9764560238 9767427924 9881371828

[email protected]