Upload
seth-grimes
View
108
Download
1
Tags:
Embed Size (px)
DESCRIPTION
Presentation by Seth Grimes to the TDWI Washington DC chapter, July 15, 2011
Citation preview
Text, Content, and Social Analytics: BI for the New World
Seth GrimesAlta Plana Corporation
@sethgrimes
TDWI – Washington DCJuly 15, 2011
Text, Content & Social Analytics
Table of Content:1. Principles.2. Perspectives.3. Semantics.4. Text/content analytics.5. Social.6. BI for the New World.
Text, Content & Social Analytics
Imperatives for the 2010s:Do more with more.
“It’s Not Information Overload. It’s Filter Failure”: Clay Shirky, 2008.
• More sources & types of data.• Greater data volumes.• New hardware and methods.
Automate more, more intelligently.• Analytics.• Semantics.
Engage. Socialize.
Text, Content & Social Analytics
I see three categories of data:1. Quantities, whether measured,
observed, or computed.2. Content, which I’ll characterize as
non-quantitative information.3. Metadata (semantic & structural)
describing quantities and content.
• Our concern is content, analytics & fusion.
• Structured/unstructured is a false dichotomy.
• Where do relationships fit?
Text, Content & Social Analytics
DW & BI relate numbers...
...but by-the-numbers BI lacks doesn’t explain.
Text, Content & Social Analytics
Questions for business (& government):
What are people saying? What’s hot/trending?
What are they saying about {topic|person|product} X?
... about X versus {topic|person|product} Y?
How has opinion about X and Y evolved?
How has opinion correlated with {our|competitors’|general} {news|marketing|sales|events}?
What’s behind opinion, the root causes?
Who are opinion leaders?
How does sentiment propagate across multiple channels?
Text, Content & Social Analytics
The answers are here...
But how do you get at them?
Text, Content & Social Analytics
“In this example, you can quickly see that the Drooling Dog Bar B Q has gotten lots of positive reviews, and if you want to see what other people have said about the restaurant, clicking this result is a good choice.”
-- http://googleblog.blogspot.com/2009/05/more-search-options-and-other-updates.html
“In the recap of [Searchology] from Google’s Matt Cutts, he tells us that: ‘If you sort by reviews, Google will perform sentiment analysis and highlight interesting comments.’
-- Bill Slawski, “Google's New Review Search Option and Sentiment Analysis,” http://www.seobythesea.com/?p=1488
Text, Content & Social Analytics
Text Analytics!
More generally...
Text, Content & Social Analytics
Analytics is a collection of tools and techniques that extract insights from data. Apply or embed analytics within business contexts – collect data and information about customers, markets, suppliers, and business processes – use results to inform, drive, and optimize business decision making – and you harness analytics as a core BI asset.
Text, Content & Social Analytics
http://www.tropicalisland.de/NYC_New_York_Brooklyn_Bridge_from_World_Trade_Center_b.jpg
x(t) = t
y(t) = ½ a (et/a + e-t/a)
=acosh(t/a)
http://en.wikipedia.org/wiki/Seven_Bridges_of_K%C3%B6nigsberg
Analytics seeks structure in “unstructured” sources.
Text, Content & Social Analytics
“Statistical information derived from word frequency and distribution is used by the machine to compute a relative measure of significance, first for individual words and then for sentences.”
-- H.P. Luhn, The Automatic Creation of Literature Abstracts, IBM Journal, 1958.
Text analytics models text.
http://wordle.net
Document input and processing
Knowledge handling is key
Desk Set (1957): Computer engineer Richard Sumner (Spencer Tracy) and television network librarian Bunny Watson (Katherine Hepburn) and the "electronic brain" EMERAC.
Hans Peter Luhn “A Business Intelligence
System”IBM Journal, October 1958
“This rather unsophisticated argument on ‘significance’ avoids such linguistic implications as grammar and syntax... No attention is paid to the logical and semantic relationships the author has established.”
-- H.P. Luhn
Text, Content & Social Analytics
Named entities – people, companies, geographic locations, brands, ticker symbols, etc.
Topics and themes
Sentiment, opinions, attitudes, emotions
Concepts, that is, abstract groups of entities
Events, relationships, and/or facts
Metadata such as document author, publication date, title, headers, etc.
Other entities – phone numbers, e-mail & street addresses
Other
0% 10% 20% 30% 40% 50% 60% 70% 80%
71%
65%
60%
58%
55%
53%
40%
15%
Text Analytics 2009: User Perspectives on Solutions and Providers
My 2009 text-analytics market survey asked, [What information] do you need (or expect to need) to extract or analyze:
Text, Content & Social Analytics
Text, Content & Social Analytics
From document to DB; an IBM example: “The standard features are stored in the STANDARD_KW table, keywords with their occurrences in the KEYWORD_KW_OCC table, and the text list features in the TEXTLIST_TEXT table. Every feature table contains the DOC_ID as a reference to the DOCUMENT table.”
Text, Content & Social Analytics
Ken Jennings, IBM Watson, and Brad Rutter play Jeopardy!
https://secure.wikimedia.org/wikipedia/en/wiki/File:Watson_Jeopardy.jpg
Welcome to the New World.
The Far Side by Gary Larson
Text, Content & Social Analytics
Search BI
Text Analytic
sSemantic search
Information Access
Integrated analytics
In a sense, text analytics, by generating semantics, bridges search and BI to turn Information Retrieval into Information Access.
Text, Content & Social Analytics
Have we arrived?
2001: A Space Odyssey, Stanley Kubrick
Text, Content & Social Analytics
http://www.businessweek.com/magazine/content/04_19/b3882029_mz072.htm
En route.
Text, Content & Social Analytics
Intelligent computing involves:Big (and little) Data.• Quantities.• Content.• Metadata.
Analytics.Semantics.Integration.Inference
Text, Content & Social Analytics
Semantics enables better content production, management & use.
Semantics captures –Meaning
RelationshipsContext
Understanding– the sense of “unstructured” online, social, and enterprise information, for content consumers and publishers.
Semantics unites data of all types.
Text, Content & Social Analytics
Content, composites, connections.
Text, Content & Social Analytics
Content, composites, connections, 2.
Text, Content & Social Analytics
Content, composites, connections, 3.
Text, Content & Social Analytics
From connections to influence: What’s wrong with these pictures? (Radian6, Sysomos, Klout)
Text, Content & Social Analytics
Social analytics:1. Use social data in analyses
(alongside enterprise & online information).• Content.• Connections.
2. Bring BI to social analyses.3rd & 4th senses of social analytics:
3. Adopt agile, collaborative methods.
4. Share your data.A challenge: Enterprise-social-online
data integration.
Text, Content, and Social Analytics: BI for the New World
Seth GrimesAlta Plana Corporation
@sethgrimes
TDWI – Washington DCJuly 15, 2011