14
BIG DATA By Kay Burn

What is Big Data?

Embed Size (px)

DESCRIPTION

Please take a look at my Big Data presentation on Slideshare which includes some Big Data market insight, hot technologies and trends for recruitment. Check out my blog as well for information. http://lnkd.in/dFQTpvN Call me on 01179087000 or email kay.burn@ projectpeople.com

Citation preview

Page 1: What is Big Data?

BIG DATABy Kay Burn

Page 2: What is Big Data?

Big Data is the ocean of information we swim in every day – vast zeta bytes of data flowing from our computers, mobile devices, and machine sensors.

With the right solutions, organizations can dive into all data and gain valuable insights that were previously unimaginable.

More data may lead to more accurate analyses and more accurate analysis may lead to more confident decision making. And better decisions can mean greater operational efficiencies, cost reductions and reduced risk.

Page 3: What is Big Data?

Lower Data Storage Costs by keeping non important data stored on Hadoop clusters and amalgamate Data Warehouse with Hadoop clusters.

Unearth patterns for leakages and issues by identifying true causes of issues, catching fraud and abuse cases.

Make informed decisions by pinpointing product buzz

information from social media and the web to lower cost of

product to market and product development lifecycle

Differentiate from competitors by using insight to align to customers needs.

Increase customer base with targeted campaigns via social media, market analysis, identify competitors customers complaints via social media and identify and target unsatisfied customers. Click Here: Big Data Blog - Kay Burn

Sales lead generation by identifying customer needs from the web pages, blogs and social media.

Deeper understanding of customers personalities, personas and profiles from Facebook, LinkedIn and Twitter which helps to create new service streams.

Accurate action plans for new products, business strategy and complaints.

Page 4: What is Big Data?

Big data is enormous volumes of data. Data is generated by machines, networks and human interaction on systems like social media the volume of data to be

analysed is massive.

Variety refers to the many sources and types of data both structured and unstructured. We used to store data from sources like spreadsheets and databases. Now data comes in the form of emails, photos, videos, monitoring devices, PDFs, audio, etc

Big Data Velocity deals with the pace at which data flows in from sources like business processes, machines, networks and human interaction with things like social media sites, mobile devices, etc.

Veracity refers to the biases, noise and abnormality in data. Is the data that is being stored, and mined meaningful to the problem being analysed.

Page 5: What is Big Data?

23%

18%

17%

11%

9%

7%

7%6% 2%0%

Big Data Contract Jobs

Big Data

Scala

Hadoop

Spark

NoSQL

MongoDB

Cassandra

MapReduce

Cloudera

CouchDB

0 50 100

Big Data

NoSQL

MongoDB

Hadoop

Scala

Cassandra

Spark

MapReduce

Big Data Contract Jobs

Big Data Contract Job Landscape in London

7 dayswww.jobsite.co.uk

Big Data Contract Job Landscape in London

7 dayswww.indeed.co.uk

Page 6: What is Big Data?

0

5

10

15

20

25

Big DataCandidates

Big Data Contract Candidate LandscapeIn London

7 dayswww.jobsite.co.uk

Project People’s rapidly growing,

qualified and clean Big Data

Contract Candidate Database

MongoDB

BigData

Scala NoSQL HadoopCassand

raSpark

Big Data CandidateLandscape

4918 3748 2630 2232 1953 815 147

0

1000

2000

3000

4000

5000

6000

Nu

mb

er

Of

Can

did

ate

s Big Data Candidate Landscape

Page 7: What is Big Data?

Hadoop is Java-based framework and written entirely in Java.

The combination of Hadoop and Java skills is the number one combination in demand among all Hadoop jobs.

Java skills come hand in hand while writing code for the following in Hadoop: MapReduce programming using Java User Defined Functions in Pig and

Hive scripts of Hadoop Applications. Client Applications in Hbase.

Hadoop is a Natural career progression route for Java professionals.

Page 8: What is Big Data?

Hadoop is a free, Java-based programming framework that supports the processing of large data sets in a distributed computing environment.

Instead of relying on expensive, proprietary hardware and different systems to store and process data, Hadoop enables distributed parallel processing of huge amounts of data across inexpensive, industry-standard servers that both store and process the data, and can scale without limits. With Hadoop, no data is too big.

Hadoop was initially inspired by papers published by Google outlining its approach to handling an avalanche of data, and has since become the de facto standard for storing, processing and analysing hundreds of terabytes, and even petabytes of data.

Hadoop can provide fast and reliable analysis of both structured and unstructured data.

Imagine you had a file that was larger than your PC's capacity. You could not store that file, right? Hadoop lets you store files bigger than what can be stored on one particular node or server. So you can store very large files.

Page 9: What is Big Data?

There are four categories of NoSQL• Key Value• Document• Column Family• Graph

Document databases pair each key with a complex data structure known as a document.

With NoSQL databases you can mix and match to create a database solution that is tailored to the businesses needs.

Wide-column stores such as Cassandra and HBase are optimized for queries over large datasets, and store columns of data together, instead of rows.

Graph stores are used to store information about networks, such as social connections.

Key-value stores are the simplest NoSQL databases. Every single item in the database is stored as an attribute name (or "key"), together with its value.

Page 10: What is Big Data?

Python is an excellent choice for Data Scientists to do their day to day activities as it provides extensive libraries.

Python is a powerful, flexible, open-source language that is easy to learn, easy to use, and has powerful libraries for data manipulation and analysis

General-purpose programming language as well as being easy to use for analytical and quantitative computing.

Python has been used in scientific computing for many years.

Python is one of the most popular languages in the world, ranking higher than Perl, Ruby, and JavaScript by a wide margin.

Page 11: What is Big Data?

• It Runs on the Java Virtual Machine

• It is More Concise and Readable than Java

• Easy to Learn and "Exciting"

• Solve functional problems

Objected oriented programming language (OOP)This helps produce programs that are easier to read and maintain.

Functional programmingThe advantage of functional programming is that there are no side effects - a function takes input and produces output , that is all. This make it easy to write error free programs that can scale or can be executed in parallel. Scala does not need to know whether the data is structured or unstructured.

Brevity Less code mean fewer bugs and less time spent on maintenance.

Static TypesUnlike Java, Scala supports type inferenceWhich means it is able to detect unstructured data types such as a picture, web page or video.

Scala, a scalable language specializing in functional and object-oriented programming, has been running on the Java Virtual Machine for several years now, enjoying adoption from enterprises and start-ups alike.

Page 12: What is Big Data?

Run programs up to 100x faster than Hadoop MapReduce in memory, or 10x faster on disk

Write applications quickly in Java, Scala or Python.

Combine SQL, streaming, and complex analytics. Spark powers a stack of high-level tools.

Spark provides simple and easy-to-understand programming APIs that can be used to build applications at a rapid pace in Java, Python or Scala.

Page 13: What is Big Data?

Hadoop Ecosystem Components Example

Page 14: What is Big Data?

Kay Burn

My name is Kayleigh or Kay for short, I am a Senior Big Data Consultant at Project People providing global recruitment solutions within Big Data, Data Science, Business Intelligence & Insight.

Call me on 01179087000 or 07803415865 to discuss your next Big Data project.Email [email protected] out my blog here: http://kayburn.wix.com/southwest