10
Big Data Seminar Leonard Langsdorf S&P Capital IQ

CIO Seminar on Big Data

Embed Size (px)

Citation preview

Page 1: CIO Seminar on Big Data

Big Data Seminar

Leonard LangsdorfS&P Capital IQ

Page 2: CIO Seminar on Big Data

Big Data Seminar Overview

Everyone is talking about the power of Big Data, but much like when the Cloud first emerged, there is still a lot to evaluate to determine if it makes sense on an individual organizational level. The session is aimed at sharing thoughts and questions about the potential of, and opportunities within, Big Data. What is big data?How do I know if I have a big data problem?What can I get out of big data?What is a data scientist? Where do I start?

Page 3: CIO Seminar on Big Data

Why Big Data?

Peter Skomoroch, Work with Big Data @LinkedInThe accuracy & nature of answers you get on large data sets can be completely different from what you see on small samples. Big data provides a competitive advantage. For the web data sets you describe, it turns out that having 10x the amount of data allows you to automatically discover patterns that would be impossible with smaller samples (think Signal to Noise). The deeper into demographic slices you want to dive, the more data you will need to get the same accuracy.

Page 4: CIO Seminar on Big Data

What is Big Data?

Big data usually includes data sets with sizes beyond the ability of commonly used software tools to capture, curate, manage, and process the data within a tolerable elapsed time. Big data sizes are a constantly moving target, as of 2012 ranging from a few dozen terabytes to many petabytes of data in a single data set. With this difficulty, new platforms of "big data" tools are being developed to handle various aspects of large quantities of data.http://en.wikipedia.org/wiki/Big_data

Page 5: CIO Seminar on Big Data

How do I know if I have a Big Data Problem?

When the size of the data itself becomes part of the problem'

Page 6: CIO Seminar on Big Data

What can I get out of Big Data?

– Amazon Recommendation engine – Google/ Yahoo translation – Updated Maps – Better connections through @ linked in– More relevant data

Page 7: CIO Seminar on Big Data

What is a Data Scientist? 

Data scientists solve complex data problems through employing deep expertise in some scientific discipline. It is generally expected that data scientists are able to work with various elements of mathematics, statistics and computer science, although expertise in these subjects are not required. However, a data scientist is most likely to be an expert in only one or two of these disciplines and proficient in another two or three. There is probably no living person who is an expert in all of these disciplines - if so they would be extremely rare. This means that data science must be practiced as a team, where across the membership of the team there is expertise and proficiency across all the disciplines.

http://en.wikipedia.org/wiki/Data_science

Page 8: CIO Seminar on Big Data

Where do I Start?

Cloud vs Roll your own ? Appliance solutions ? Third party provider ?

Page 9: CIO Seminar on Big Data

Technologies & Tools