What is Big Data

What is big data? Processing the huge amount of data sets.Processing huge amount of data through traditional dbm system or applications is difficult.Share, search, transfer, analysis...In traditional days unstructured data is growing up compare to relational data.We have to handle the data.Internet is the main primary path to develop the bigdata. Digital grow is 2nd.Big data characteristics: volume(size), velocity(speed) , verity.Volume:now a days the large volume of data is generating , and future expected data is also very large(800 peta bytes of data).Velocity: the large amount of data is generating at very speed means with in very less time.Verity: data is coming as verity ( structured , unsturcture data, different source of data)

What is semi structure data?We have a pattern data but we cant define the in schema.

Big data scenarios : Recommended search : Amazon searching for the books it will also gives recommended books. And if you post the resume it will gives recommended jobs to you. We cant fit that data in db. Process of search speed.In telecome:Analysing the data CDR (calling data record) and network analysis and failures of networks.Govt projt: aadhaar cards and health schemas.Shopes will provides the cords , becz they will under stands customer buying pattern. Churn analysis means customer leaving the product present cust buying the x product and he is changed to y productthis kind of analysis is called churn analysis. Try to avoid churn then good adv to the shoper or sotre or productIf you know feture exception wiil provide accordinglyLocation based analysis , usefull to give the promotions

Existing soluctions: How hadoop is solving the problems:What is hadoop?Hadoop is the open source software for large scale of data distributed and processing system.Hadoop introduction:What is echo system?Self standing software build on top of the hdfs(hadoop).Why and how it will works?MapReducer:What is MapReducer?Why mapreducer and how it will works?1.Mapreducer is the programming model for data processing.2. mapreducer is program written in java, python,ruby run on hadoop systems to data processing.3.mapreducer will break the work into two phase A) mapper B) reducer4.map and reducer are having key , value as input and output5.program have two function one is map function and reducer function is 2nd.6.finding the max tempeture in the year.7. in this map program is preparation phase, and reducer is the search for max temperature8.

import Java.io.Exception;

import org.appache.hadoop.io.IntWritable;import org.appache.hadoop.io.LongWritable;import org.appache.hadoop.io.Text;import org.appache.hadoop.mapreducer.Mapper;

public calss MaxTemperature extands Mapper

{

private static final int MISSING = 9999;

public void map(LognWritable key, Text value, Context context) throws IOException, InterruptedException {

String line=Text.toString();String year=line.Substr(15,19);int airTemif(line.CharAt("87")=='+'{airTem=Integer.ParseInt(line.Substr(87,93);}else{airTem=Interger.ParseInt(line.Substr(88,93);}

String quality=line.Substring(92,93);if(airTem !='9999' and quality.match("[012345]")){context.writer(new Text(year), new IntWritable(airTemp));}}}}

The job(MapReducer program) execution process is done through two processJobtarcker and tasktracker Jobtracker is consolidate all jobs to run scheduled task on tasktracker Accessible: we can move all raw data into nodesRobust: fault tolerance if any node get fail it will store replica of dataSimple:Scable: we can add any number of nodes to cluster and accessible

What are hadoop components

A new namenode have: versions , fsimage, fstime, edits.

Documents

What is Big Data