29
Text, Content, and Social Analytics: BI for the New World Seth Grimes Alta Plana Corporation @sethgrimes TDWI – Washington DC July 15, 2011

Text, Content, and Social Analytics: BI for the New World

Embed Size (px)

DESCRIPTION

Presentation by Seth Grimes to the TDWI Washington DC chapter, July 15, 2011

Citation preview

Page 1: Text, Content, and Social Analytics: BI for the New World

Text, Content, and Social Analytics: BI for the New World

Seth GrimesAlta Plana Corporation

@sethgrimes

TDWI – Washington DCJuly 15, 2011

Page 2: Text, Content, and Social Analytics: BI for the New World

Text, Content & Social Analytics

Table of Content:1. Principles.2. Perspectives.3. Semantics.4. Text/content analytics.5. Social.6. BI for the New World.

Page 3: Text, Content, and Social Analytics: BI for the New World

Text, Content & Social Analytics

Imperatives for the 2010s:Do more with more.

“It’s Not Information Overload. It’s Filter Failure”: Clay Shirky, 2008.

• More sources & types of data.• Greater data volumes.• New hardware and methods.

Automate more, more intelligently.• Analytics.• Semantics.

Engage. Socialize.

Page 4: Text, Content, and Social Analytics: BI for the New World

Text, Content & Social Analytics

I see three categories of data:1. Quantities, whether measured,

observed, or computed.2. Content, which I’ll characterize as

non-quantitative information.3. Metadata (semantic & structural)

describing quantities and content.

• Our concern is content, analytics & fusion.

• Structured/unstructured is a false dichotomy.

• Where do relationships fit?

Page 5: Text, Content, and Social Analytics: BI for the New World

Text, Content & Social Analytics

DW & BI relate numbers...

...but by-the-numbers BI lacks doesn’t explain.

Page 6: Text, Content, and Social Analytics: BI for the New World

Text, Content & Social Analytics

Questions for business (& government):

What are people saying? What’s hot/trending?

What are they saying about {topic|person|product} X?

... about X versus {topic|person|product} Y?

How has opinion about X and Y evolved?

How has opinion correlated with {our|competitors’|general} {news|marketing|sales|events}?

What’s behind opinion, the root causes?

Who are opinion leaders?

How does sentiment propagate across multiple channels?

Page 7: Text, Content, and Social Analytics: BI for the New World

Text, Content & Social Analytics

The answers are here...

But how do you get at them?

Page 8: Text, Content, and Social Analytics: BI for the New World

Text, Content & Social Analytics

“In this example, you can quickly see that the Drooling Dog Bar B Q has gotten lots of positive reviews, and if you want to see what other people have said about the restaurant, clicking this result is a good choice.”

-- http://googleblog.blogspot.com/2009/05/more-search-options-and-other-updates.html

“In the recap of [Searchology] from Google’s Matt Cutts, he tells us that: ‘If you sort by reviews, Google will perform sentiment analysis and highlight interesting comments.’

-- Bill Slawski, “Google's New Review Search Option and Sentiment Analysis,” http://www.seobythesea.com/?p=1488

Page 9: Text, Content, and Social Analytics: BI for the New World

Text, Content & Social Analytics

Text Analytics!

More generally...

Page 10: Text, Content, and Social Analytics: BI for the New World

Text, Content & Social Analytics

Analytics is a collection of tools and techniques that extract insights from data. Apply or embed analytics within business contexts – collect data and information about customers, markets, suppliers, and business processes – use results to inform, drive, and optimize business decision making – and you harness analytics as a core BI asset.

Page 11: Text, Content, and Social Analytics: BI for the New World

Text, Content & Social Analytics

http://www.tropicalisland.de/NYC_New_York_Brooklyn_Bridge_from_World_Trade_Center_b.jpg

x(t) = t

y(t) = ½ a (et/a + e-t/a)

=acosh(t/a)

http://en.wikipedia.org/wiki/Seven_Bridges_of_K%C3%B6nigsberg

Analytics seeks structure in “unstructured” sources.

Page 12: Text, Content, and Social Analytics: BI for the New World

Text, Content & Social Analytics

“Statistical information derived from word frequency and distribution is used by the machine to compute a relative measure of significance, first for individual words and then for sentences.”

-- H.P. Luhn, The Automatic Creation of Literature Abstracts, IBM Journal, 1958.

Text analytics models text.

http://wordle.net

Page 13: Text, Content, and Social Analytics: BI for the New World

Document input and processing

Knowledge handling is key

Desk Set (1957): Computer engineer Richard Sumner (Spencer Tracy) and television network librarian Bunny Watson (Katherine Hepburn) and the "electronic brain" EMERAC.

Hans Peter Luhn “A Business Intelligence

System”IBM Journal, October 1958

Page 14: Text, Content, and Social Analytics: BI for the New World

“This rather unsophisticated argument on ‘significance’ avoids such linguistic implications as grammar and syntax... No attention is paid to the logical and semantic relationships the author has established.”

-- H.P. Luhn

Page 15: Text, Content, and Social Analytics: BI for the New World

Text, Content & Social Analytics

Named entities – people, companies, geographic locations, brands, ticker symbols, etc.

Topics and themes

Sentiment, opinions, attitudes, emotions

Concepts, that is, abstract groups of entities

Events, relationships, and/or facts

Metadata such as document author, publication date, title, headers, etc.

Other entities – phone numbers, e-mail & street addresses

Other

0% 10% 20% 30% 40% 50% 60% 70% 80%

71%

65%

60%

58%

55%

53%

40%

15%

Text Analytics 2009: User Perspectives on Solutions and Providers

My 2009 text-analytics market survey asked, [What information] do you need (or expect to need) to extract or analyze:

Page 16: Text, Content, and Social Analytics: BI for the New World

Text, Content & Social Analytics

Page 17: Text, Content, and Social Analytics: BI for the New World

Text, Content & Social Analytics

From document to DB; an IBM example: “The standard features are stored in the STANDARD_KW table, keywords with their occurrences in the KEYWORD_KW_OCC table, and the text list features in the TEXTLIST_TEXT table. Every feature table contains the DOC_ID as a reference to the DOCUMENT table.”

Page 18: Text, Content, and Social Analytics: BI for the New World

Text, Content & Social Analytics

Ken Jennings, IBM Watson, and Brad Rutter play Jeopardy!

https://secure.wikimedia.org/wikipedia/en/wiki/File:Watson_Jeopardy.jpg

Welcome to the New World.

The Far Side by Gary Larson

Page 19: Text, Content, and Social Analytics: BI for the New World

Text, Content & Social Analytics

Search BI

Text Analytic

sSemantic search

Information Access

Integrated analytics

In a sense, text analytics, by generating semantics, bridges search and BI to turn Information Retrieval into Information Access.

Page 20: Text, Content, and Social Analytics: BI for the New World

Text, Content & Social Analytics

Have we arrived?

2001: A Space Odyssey, Stanley Kubrick

Page 21: Text, Content, and Social Analytics: BI for the New World

Text, Content & Social Analytics

http://www.businessweek.com/magazine/content/04_19/b3882029_mz072.htm

En route.

Page 22: Text, Content, and Social Analytics: BI for the New World

Text, Content & Social Analytics

Intelligent computing involves:Big (and little) Data.• Quantities.• Content.• Metadata.

Analytics.Semantics.Integration.Inference

Page 23: Text, Content, and Social Analytics: BI for the New World

Text, Content & Social Analytics

Semantics enables better content production, management & use.

Semantics captures –Meaning

RelationshipsContext

Understanding– the sense of “unstructured” online, social, and enterprise information, for content consumers and publishers.

Semantics unites data of all types.

Page 24: Text, Content, and Social Analytics: BI for the New World

Text, Content & Social Analytics

Content, composites, connections.

Page 25: Text, Content, and Social Analytics: BI for the New World

Text, Content & Social Analytics

Content, composites, connections, 2.

Page 26: Text, Content, and Social Analytics: BI for the New World

Text, Content & Social Analytics

Content, composites, connections, 3.

Page 27: Text, Content, and Social Analytics: BI for the New World

Text, Content & Social Analytics

From connections to influence: What’s wrong with these pictures? (Radian6, Sysomos, Klout)

Page 28: Text, Content, and Social Analytics: BI for the New World

Text, Content & Social Analytics

Social analytics:1. Use social data in analyses

(alongside enterprise & online information).• Content.• Connections.

2. Bring BI to social analyses.3rd & 4th senses of social analytics:

3. Adopt agile, collaborative methods.

4. Share your data.A challenge: Enterprise-social-online

data integration.

Page 29: Text, Content, and Social Analytics: BI for the New World

Text, Content, and Social Analytics: BI for the New World

Seth GrimesAlta Plana Corporation

@sethgrimes

TDWI – Washington DCJuly 15, 2011