Upload
cengage-learning
View
599
Download
0
Embed Size (px)
DESCRIPTION
There is a flood of information online from tweets,feeds, status updates, photos, government, private, and other sources. Just how big is “big data”? This presentation will share examples of big and open data in the cloud:where it comes from, how it’s stored, and what you can do with it. Learn to incorporate real world data online for your students to analyze using Excel; create data visualizations and infographics, and understand the impact of Data as a Service as a model for cloud computing.
Citation preview
Drinking from the Fire Hose: Tools for Interpreting and Teaching
with Big Data
Mark FrydenbergBentley University
CourseMateEnhanced
Edition
77 Movies and TV Shows!
What's your Bacon Index?
Kevin
Bob
1
Ann2Joe
3
Kim
42 X
APIs
Friend of a Friend
Social Graph
Big Data
'Big data' refers to a collection of tools, techniques and technologies which make it easy to work with data at any scale.
powerof60.com
3 V's
• Volume - amount of data is larger than those conventional relational database infrastructures can handle
• Velocity - the rate at which data is generated, processed and analyzed in (real) time
• Variety – data formats are unstructured and inconsistent
Volume: How Big is Big Data?
Yottabyte?
Walmart
• Walmart collects more than 2.5 petabytes of data every hour from its customer transactions.
• A petabyte is one quadrillion bytes, or the equivalent of about 20 million filing cabinets’ worth of text.
http://hbr.org/2012/10/big-data-the-management-revolution/ar
Velocity: Drinking from the Firehose
• Scrutinize 5 million trade events created each day to identify potential fraud
• Analyze 500 million daily call detail records in real-time to predict customer churn faster
A Variety of Big Data Sources
McKinsey&Company Report (2011)
• Data is part of every industry and business function.
• Data creates value.• Big data becomes a basis of
competition and growth.• Some sectors will achieve
greater gains.• Shortage of people with
analytical skills.• Need policies related to
privacy, security, ownership.
3000 tweets per seconddata is disorganizedHow does twitter use its data?
Twitter Visualization
Big Data Technologies
• HADOOP: scalable storage, parallel computation
• NoSQL: distributed querying
What this Means
• Change your web page and Google finds it in minutes.
• Ten years ago, you would have to submit a request to Yahoo! to reindex your site.
• All you need is a lot of servers. • Google has a million of them.• No problem.
Collaborative Filtering
Collaborative Filtering
Me You
Black Beauty
Camera TripodThe Black
Stallion
Variety: Semantic Web
RelFinder
Unstructured Data
Health Care
Analyzing Big Data
explore.data.gov
Searching Big Data
Fusion Table Visualizations
Fusion Table Visualizations
Fusion Table Visualizations