25
January 2013 at University of Brighton http://meetup.com/Big-Data-Brighton

Big Data Brighton | Big Data in Academia | Jan 2013

Embed Size (px)

DESCRIPTION

Four talks about Big Data in Academia at Big Data Brighton Jan 2013. Two of the talks' slides are here. I'll upload Miltos' slides when I receive them. Dr Patricia Roberts, Senior Lecturer & Researcher in database design, development and management, University of Brighton - Structured vs Unstructured Data: why structure matters. Simon Wibberley, PhD student in computational linguistics at the Text Analytics Group at the University of Sussex. Real-time text stream analysis, event detection, and entity recognition. Event detection on Twitter.

Citation preview

Page 1: Big Data Brighton | Big Data in Academia | Jan 2013

January 2013 at

University of Brighton

http://meetup.com/Big-Data-Brighton

Page 2: Big Data Brighton | Big Data in Academia | Jan 2013

Agenda• Miltos Petridis, Professor of Computer Science, University

of Brighton

• Dr Patricia Roberts, Senior Lecturer & Researcher in database design, development and management, University of Brighton - Structured vs Unstructured Data: why structure matters.

• Simon Wibberley, PhD student in computational linguistics at the Text Analytics Group at the University of Sussex. Real-time text stream analysis, event detection, and entity recognition. Event detection on Twitter.

• Kevin Long, Teradata - Summary and Business context

Page 3: Big Data Brighton | Big Data in Academia | Jan 2013
Page 4: Big Data Brighton | Big Data in Academia | Jan 2013

Big Data

“A  new  generation  of  technologies  and  architectures, designed to economically extract value from very large volumes of a wide variety of data, by enabling high-speed capture,  discovery  and/or  analysis”1

New investment initiatives are coming, such as in the US in 2012:

“more  than  $200  million  in  new  funding  through six agencies and departments to improve  the  nation’s   ability to extract knowledge and insights from large and complex collections  of  digital  data”  2

Page 5: Big Data Brighton | Big Data in Academia | Jan 2013

Knowledge and insights... hmm Before companies rush to use the technologies

they should be asking some questions:

• Can we make any assumptions about the

quality of the data we are using?

• Is there a significant difference between structured and unstructured data?

• Can the underlying structure of the data affect what you can do with it?

Page 6: Big Data Brighton | Big Data in Academia | Jan 2013

In this brief talk, I will be examining these

questions with reference to my research and recent trends

Page 7: Big Data Brighton | Big Data in Academia | Jan 2013

Can we make any assumptions about the quality of the data we are using?

• One of the problems about the recent explosion in the amount of data is that some data (particularly collected from social networking sites) is of dubious quality – A straw pole of my students found that 1 in 5

deliberately enter incorrect data about themselves online to protect their identity

• We might not have any assurance that the data is true or that it is correctly linked to metadata – Is data typed? – Is the data related to other data? How is it related? – Are relationships between data and its meaning

being lost?

Page 8: Big Data Brighton | Big Data in Academia | Jan 2013

A view of different data models 3

Page 9: Big Data Brighton | Big Data in Academia | Jan 2013

Is there a significant difference between structured and unstructured

data? • How is data structured? • Does the underlying data model matter? • What are the options for a data model? • Over the years many models of data have

evolved and most are still in use • Data models used give insights into

assumptions about the semantics of the data

Page 10: Big Data Brighton | Big Data in Academia | Jan 2013

Finding  meaning  from  ‘flat’  data

• A  problem  with  ‘flat’  or  unstructured  data  representations is that it has traditionally been difficult to aggregate and present to users in a way that they can understand

• In contrast, structured data can be summarised easily and its structure represents the meaning of data within an organization

• Data analytics are changing this by presenting  accessible  information  from  ‘flat’  data

Page 11: Big Data Brighton | Big Data in Academia | Jan 2013

Can the underlying structure of the data affect what you can do with it?

• The short answer from my research is ‘YES’

• How it affects what you can do with the data is the long answer – It is really easy to store a piece of data but

retrieving it (intact with its meaning and its relationships to other data) is more difficult

– When  ‘Big  Data’  technologies  are  used  to  knowledge and insights from the data we should be sure that the technology is not introducing new problems

Page 12: Big Data Brighton | Big Data in Academia | Jan 2013

Impedance mismatch problems

• Moving data from one paradigm to another often causes the meaning to be lost

• Can cause problems for developers who move data from one paradigm to another

• Also a problem for end users who may lose the connections

Page 13: Big Data Brighton | Big Data in Academia | Jan 2013

A way forward

• Working out goals in your data management • Understanding the structure of the data you

are using, wherever it comes from • Getting assurance about the quality of the

data • Then having confidence that the knowledge

and insights are based in firm foundations

Page 14: Big Data Brighton | Big Data in Academia | Jan 2013

Thank you

Any questions?

Page 15: Big Data Brighton | Big Data in Academia | Jan 2013

References 1. Carter, P (2011) , Big Data Analytics: Future

Architectures, Skills and Roadmaps for the CIO, SAS White paper, IDC Go-to-Market Services

2. E. Gianchandani. Obama administration unveils $200m big data r&d initiative. In The Computing Community Consortium (CCC) Blog, 2012.

3. Renzo Angles and Claudio Gutierrez. 2008. Survey of graph database models. ACM Comput. Surv. 40, 1, Article 1 (February 2008)

Page 16: Big Data Brighton | Big Data in Academia | Jan 2013

Event Detecon on Twi�er

Simon Wibberley

Text Analycs Group

University of Sussex

[email protected]

Page 17: Big Data Brighton | Big Data in Academia | Jan 2013

What are Events? We just don’t know.

Page 18: Big Data Brighton | Big Data in Academia | Jan 2013

Event Categories

Constrained Unconstrained

Well Reported

Poorly ReportedInteresting

Relatively Easy Interesting

Very Tricky

Page 19: Big Data Brighton | Big Data in Academia | Jan 2013

Algorithms

• Query Driven

– Volume / rate analysis of matching data

– Addresses constrained event type

• Data Driven

– Mine stream for interesng data

– Addresses unconstrained event type

Page 20: Big Data Brighton | Big Data in Academia | Jan 2013

GB Dressage Gold

Page 21: Big Data Brighton | Big Data in Academia | Jan 2013

London Riots

Page 22: Big Data Brighton | Big Data in Academia | Jan 2013

London Riots

Page 23: Big Data Brighton | Big Data in Academia | Jan 2013

Event Characterisaon

• Fill in unknowns

• Self explanatory for (very) constrained events

• Select representave / well formed Tweet[s]

• Term relevance / clustering

• Topic analysis

• Geo-locaon / Enty extracon

Page 24: Big Data Brighton | Big Data in Academia | Jan 2013

CASM

• Centre for the Analysis of Social Media

• Collaboraon between DEMOS and TAG

• Applying text analycs to social media to

answer sociological quesons

• OSI funded EU senment anaylsis pilot project

h�p://www.demos.co.uk/projects/casm/

Page 25: Big Data Brighton | Big Data in Academia | Jan 2013

Ethics

Narrow Broad

Anonymous

Identity Preserving StasiJudiciary

Me!Social Science

Reffin, J (2012)