50
Is Is Big Data Big Data like High School like High School Sex - Sex - Lots of Talk Lots of Talk but little action? but little action?

Southeast Linuxfest -- Is BIG DATA Like High School Sex?

Embed Size (px)

DESCRIPTION

Is BIG DATA something real? Why do we need it? Well, this is a skeptical view on the subject.

Citation preview

Page 1: Southeast Linuxfest -- Is BIG DATA Like High School Sex?

Is Is Big DataBig Data like High School like High School

Sex -Sex -Lots of Talk Lots of Talk

but little action?but little action?

Page 2: Southeast Linuxfest -- Is BIG DATA Like High School Sex?

Google Test

● BIG DATA 838,000,000

Page 3: Southeast Linuxfest -- Is BIG DATA Like High School Sex?

Google Hit Test

● BIG DATA 838,000,000● World Peace 118,000,000

Page 4: Southeast Linuxfest -- Is BIG DATA Like High School Sex?

Google Hit Test

● BIG DATA 838,000,000● World Peace 118,000,000● Cure Cancer 20,800,000

Page 5: Southeast Linuxfest -- Is BIG DATA Like High School Sex?

Google Hit Test

● BIG DATA 838,000,000● World Peace 118,000,000● Cure Cancer 20,800,000● Kardashian 64,300,000● Roswell UFO 259,000● JFK Conspiracy 3,240,000

Page 6: Southeast Linuxfest -- Is BIG DATA Like High School Sex?

Is Big Data Good?

Page 7: Southeast Linuxfest -- Is BIG DATA Like High School Sex?

Wall Street Journal May 19th 2014Big Data Banking Is Not Just for Big Banks

By Seth Rosensweig, John Milani and Michael B. Flynn

✔ The key element in success for any project involving Big Data is accepting and embracing decision making with less-than-ideal information.

Page 8: Southeast Linuxfest -- Is BIG DATA Like High School Sex?

What could go wrong there?

Page 9: Southeast Linuxfest -- Is BIG DATA Like High School Sex?

New York Times Nov 2004

What Wal-Mart Knows About Customers' Habits

By CONSTANCE L. HAYS

● HURRICANE FRANCES was on its way, barreling across the Caribbean, threatening a direct hit on Florida's Atlantic coast. Residents made for higher ground, but far away, in Bentonville, Ark., executives at Wal-Mart Stores decided that the situation offered a great opportunity for one of their newest data-driven weapons, something that the company calls predictive technology

● A week ahead of the storm's landfall, Linda M. Dillman, Wal-Mart's chief information officer, pressed her staff to come up with forecasts based on what had happened when Hurricane Charley struck several weeks earlier. Backed by the trillions of bytes' worth of shopper history that is stored in Wal-Mart's computer network, she felt that the company could "start predicting what's going to happen, instead of waiting for it to happen," as she put it

● The experts mined the data and found that the stores would indeed need certain products - and not

just the usual flashlights. "We didn't know in the past that strawberry Pop-Tarts increase in sales, like seven times their normal sales rate, ahead of a hurricane," Ms. Dillman said in a recent interview. "And the pre-hurricane top-selling item was beer."

Page 10: Southeast Linuxfest -- Is BIG DATA Like High School Sex?

AR Redneck != FL Redneck

Page 11: Southeast Linuxfest -- Is BIG DATA Like High School Sex?

Define the Problem!

Page 12: Southeast Linuxfest -- Is BIG DATA Like High School Sex?

http://en.wikipedia.org/wiki/Big_data

● Big data is a blanket term for any collection of data sets so large and complex that it becomes difficult to process using on-hand database management tools or traditional data processing applications.

Page 13: Southeast Linuxfest -- Is BIG DATA Like High School Sex?

What Parts do you need?

Page 14: Southeast Linuxfest -- Is BIG DATA Like High School Sex?

What sort of Hardware do you need for a Hadoop Cluster?

Here are the recommended specifications for DataNode/TaskTrackers in a balanced Hadoop cluster:

● 12-24 1-4TB hard disks in a JBOD (Just a Bunch Of Disks) configuration

● 2 quad-/hex-/octo-core CPUs, running at least 2-2.5GHz

● 64-512GB of RAM● Bonded Gigabit Ethernet or 10Gigabit Ethernet

(the more storage density, the higher the network throughput needed)

http://blog.cloudera.com/blog/2013/08/how-to-select-the-right-hardware-for-your-new-hadoop-cluster/

Here are the recommended specifications for NameNode/JobTracker/Standby NameNode nodes. The drive count will fluctuate depending on the amount of redundancy:

● 4–6 1TB hard disks in a JBOD configuration (1 for the OS, 2 for the FS image [RAID 1], 1 for Apache ZooKeeper, and 1 for Journal node)

● 2 quad-/hex-/octo-core CPUs, running at least 2-2.5GHz

● 64-128GB of RAM● Bonded Gigabit Ethernet or 10Gigabit Ethernet

Page 15: Southeast Linuxfest -- Is BIG DATA Like High School Sex?

http://www.enterprisetech.com/2013/11/08/cluster-sizes-reveal-hadoop-maturity-curve/

Kaushik says that the average Hadoop cluster size reflects follows a fairly predictable curve. “Our observation is that companies typically

experiment with cluster size of under 100 nodes and expand to 200 or more nodes in the production stages. Some of the advanced adopters cluster sizes are over 1,000 nodes.”

Page 16: Southeast Linuxfest -- Is BIG DATA Like High School Sex?

That is over $400K in servers alone!(1)

● Then add in floor space,power, a few DevOpsminions, a/c, supportcontracts, extra cleaning staff, and miscellaneous computer room stuff!

● (1) Yes, you will get a discount if you buy 200

Page 17: Southeast Linuxfest -- Is BIG DATA Like High School Sex?

DataInformed April 2014

What it Takes to Succeed with Big Databy Thomas H. Davenport

● Jeff Bezos of Amazon is known for

saying, “We never throw away data,” simply because it is difficult to know when it may become important for a product or service offering down the road.

Page 18: Southeast Linuxfest -- Is BIG DATA Like High School Sex?

http://en.wikipedia.org/wiki/Big_data

● Big data is difficult to work with using most relational database management systems and desktop statistics and visualization packages, requiring instead "massively parallel software running on tens, hundreds, or even thousands of servers". What is considered "big data" varies depending on the capabilities of the organization managing the set, and on the capabilities of the applications that are traditionally used to process

and analyze the data set in its domain. "For some organizations, facing hundreds of gigabytes of data for the first time may trigger a need to reconsider data management options. For others, it may take tens or hundreds of terabytes before data size becomes a significant consideration."

Page 19: Southeast Linuxfest -- Is BIG DATA Like High School Sex?

Digital Landfill

● No, you can not keep all the data

✔ PCI✔ IRS✔ HIPPA✔ ?

Page 20: Southeast Linuxfest -- Is BIG DATA Like High School Sex?

The Vendors

Page 21: Southeast Linuxfest -- Is BIG DATA Like High School Sex?

IBM @ Hadoop World '13

● Lots of new vendors● Market will shake out 75% in two

years● Therefore buy IBM as they are an

old company

Page 22: Southeast Linuxfest -- Is BIG DATA Like High School Sex?

And How is Big Data at Solving Problems? Really??

Page 23: Southeast Linuxfest -- Is BIG DATA Like High School Sex?

Science Fair Projects

● http://www.csc.com/big_data/success_stories/99789-avis_budget_group_big_data_success_story

– During economic downturn, AVIS decides to focus on customer service.

● http://www.scmp.com/comment/insight-opinion/article/1096811/brain-behind-lady-gagas-big-data

– Lady Gaga asks Facebook & Twitter fans to join mail list

● http://www.clickz.com/clickz/news/2223543/ad-tech-new-york-big-data-drives-business-for-1800-flowers-and-discovery-digital

– 1-800-Flowers remembers import dates such as birthdays

● http://www.datameer.com/learn/videos/us-womens-olympic-cycling-team-big-data-story.html

– Cycling team picks up five seconds!

Page 24: Southeast Linuxfest -- Is BIG DATA Like High School Sex?

Hawthorne Effect

Page 25: Southeast Linuxfest -- Is BIG DATA Like High School Sex?

http://en.wikipedia.org/wiki/Observer%27s_paradox#Westinghouse_effect

● efficiency engineers in the 1920s and 1930s were trying to determine if improved working conditions such as better lighting

improved the performance of production workers. The engineers noted that when they provided better working conditions in the production line, efficiency increased. But when the engineers returned the production line to its original conditions and observed the workers, their efficiency increased again. The engineers determined that it was merely the observation of the factory workers, not the changes in the conditions in production line, that increased the measured efficiency

Page 26: Southeast Linuxfest -- Is BIG DATA Like High School Sex?
Page 27: Southeast Linuxfest -- Is BIG DATA Like High School Sex?

Is the world full of new data?

Page 28: Southeast Linuxfest -- Is BIG DATA Like High School Sex?

Example of Data Creep

● Gender– Female

– Male

Page 29: Southeast Linuxfest -- Is BIG DATA Like High School Sex?

Example of Data Creep

● Gender– Female

– Male

– Null (no data)

Page 30: Southeast Linuxfest -- Is BIG DATA Like High School Sex?

Example of Data Creep

● Gender– Female

– Male

– Null (no data)

– State of California has 17 official statuses

– Facebook has 50+

Page 31: Southeast Linuxfest -- Is BIG DATA Like High School Sex?

That may be why DBAs Go Bald!

Page 32: Southeast Linuxfest -- Is BIG DATA Like High School Sex?

Lets See How Past Predictions Turned Out as a Guide

Page 33: Southeast Linuxfest -- Is BIG DATA Like High School Sex?

http://www.forbes.com/sites/gilpress/2013/05/09/a-very-short-history-of-big-data/

● 1944 Fremont Rider, Wesleyan University Librarian, publishes The Scholar and the Future of the Research Library. He estimates that American university libraries were doubling in size every sixteen years. Given this growth rate, Rider speculates that the Yale Library in 2040 will have “approximately 200,000,000 volumes, which will occupy over 6,000 miles of shelves… [requiring] a cataloging staff of over six thousand persons.”

Page 34: Southeast Linuxfest -- Is BIG DATA Like High School Sex?

Yale Library Today

● 15,000,000 volumes as of 2014 – 185,000,000 volumes to go!!!

Page 35: Southeast Linuxfest -- Is BIG DATA Like High School Sex?

Yale Library Today

● 15,000,000 volumes as of 2014 – 185,000,000 volumes to go!!!

Page 36: Southeast Linuxfest -- Is BIG DATA Like High School Sex?

Is BIG DATA new-sh??

● 1961 Derek Price publishes Science Since Babylon, in which he charts the growth of scientific knowledge by looking at the growth in the number of scientific journals and papers. He concludes that the number of new journals has grown exponentially rather than linearly, doubling every fifteen years and increasing by a factor of ten during every half-century. Price calls this the “law of exponential

increase,” explaining that “each [scientific] advance generates a new series of advances at a reasonably constant birth rate, so that the number of births is strictly proportional to the size of the population of discoveries at any given time.”

Page 37: Southeast Linuxfest -- Is BIG DATA Like High School Sex?

http://www.indexmundi.com/g/g.aspx?c=xx&v=25

Birth Rate … dropping

Page 38: Southeast Linuxfest -- Is BIG DATA Like High School Sex?

1971 Arthur Miller

The Assault on Privacy -- “Too many information handlers seem to measure a man by the number of bits of storage capacity his dossier will occupy.”

Page 39: Southeast Linuxfest -- Is BIG DATA Like High School Sex?

1971 Arthur Miller

The Assault on Privacy -- “Too many information handlers seem to measure a man by the number of bits of storage capacity his dossier will occupy.”

Page 40: Southeast Linuxfest -- Is BIG DATA Like High School Sex?

So Big Data ain't that new, eh?

● 1997 Michael Lesk publishes “How much information is

there in the world?” Lesk concludes that “There may be a few thousand petabytes of information all told; and the production of tape and disk will reach that level by the year 2000. So in only a few years, (a) we will be able [to] save everything–no information will have to be thrown out, and (b) the typical piece of information will never be looked at by a human being.”

Page 41: Southeast Linuxfest -- Is BIG DATA Like High School Sex?

Yeah, Big Data!

● May 2012 Danah Boyd and Kate Crawford publish “Critical Questions for Big Data” in Information, Communications, and Society. They define big data as “a cultural, technological, and scholarly phenomenon that rests on the interplay of: (1) Technology: maximizing computation power and algorithmic accuracy to gather, analyze, link, and compare large data sets. (2) Analysis: drawing on large data sets to identify patterns in order to make economic, social, technical, and legal claims. (3)

Mythology: the widespread belief that large data sets offer a higher form of intelligence and knowledge that can generate insights that were previously impossible, with the aura of truth, objectivity, and accuracy.”

Page 42: Southeast Linuxfest -- Is BIG DATA Like High School Sex?

Maybe it is not BIG DATA

Page 43: Southeast Linuxfest -- Is BIG DATA Like High School Sex?

Some Examples, please, Dave!

Page 44: Southeast Linuxfest -- Is BIG DATA Like High School Sex?

http://www.correlated.org/

Page 45: Southeast Linuxfest -- Is BIG DATA Like High School Sex?

http://bigthink.com/neurobonkers/the-bad-science-of-satoshi-kanazawa

Page 46: Southeast Linuxfest -- Is BIG DATA Like High School Sex?

http://www.buzzfeed.com/kjh2110/the-10-most-bizarre-correlations

Page 47: Southeast Linuxfest -- Is BIG DATA Like High School Sex?
Page 48: Southeast Linuxfest -- Is BIG DATA Like High School Sex?

Maybe for IE Tech Support Engineers

Page 49: Southeast Linuxfest -- Is BIG DATA Like High School Sex?

So what do YOU do?

● Quantify why you really want a BIG DATA project– Like to buy servers in quantities of 10,000

– You will be at retirement age by time the project really gets reviewed

– You own stock in disk drive companies

– Your stochastic analysis shows your boss will not understand anyway, so why not!

– Probabilistic study of patterns ROI > Co$t

Page 50: Southeast Linuxfest -- Is BIG DATA Like High School Sex?

Questions, Hopefully Answers

● This slide desk will be on slideshare.net/davestokes● @stoker●