Upload
michael-white
View
230
Download
0
Embed Size (px)
DESCRIPTION
Over the last 6 years the communication landscape has changed significantly. Never has it been more important for the public relations industry to maintain reputation. This report uncovers in-depth alternative solutions to the terms 'measure and understand' for capturing quantitative and qualitative data. These measuring metrics and techniques may be used by the public relations industry to achieve their campaigns' objectives.
Citation preview
S0806637
Michael White © 2012 Word Count: 9,876
Maintaining Reputation
through Online Analytics Why the Public Relations Industry must Adapt or Die
Presented as part of the requirement for an award within the Undergraduate Modular
Scheme at the University of Gloucestershire (April 2012)
April 2012 Maintaining Reputation through Online Analytics PUR334
1 | P a g e
Declaration DECLARATION: This dissertation is the product of my own work. I agree that it
may be made available for reference and photocopying at the discretion of the
University.
Author’s Signature:
Michael White
Date 11/04/2012
April 2012 Maintaining Reputation through Online Analytics PUR334
2 | P a g e
Abstract
Over the last 6 years the communication landscape has changed significantly.
The advent of Facebook, Twitter, YouTube and other social networks introduced
a range of additional communication channels. Never has it been more
important for the public relations industry to maintain reputation. Whilst an
understanding of social networking tools now exists within the industry,
confusion still exists surrounding the range of metrics available online and how
these can be utilised to effectively provide Return on Investment (ROI) for
clients. A number of 3rd party measuring tools now exists allowing similar ‘search,
measure, understand and engage’ solutions.
This report uncovers in-depth alternative solutions to the terms ‘measure and
understand’ for capturing quantitative and qualitative data. These measuring
metrics and techniques may be used by the public relations industry to achieve
their campaigns’ objectives.
Main Findings:
1. ROI is relied upon for reputation management and direct sales.
2. Third party measuring tools exist but are not perfect.
3. The PR industry needs standardisation.
4. Semantic Analysis works but has not yet been perfected.
April 2012 Maintaining Reputation through Online Analytics PUR334
3 | P a g e
Acknowledgements
Throughout writing this dissertation concerning online measurement the author
drank approximately 180 cups of coffee, smoked 450 cigarettes and listened to
900 hours’ worth of music. Despite this congenial lifestyle writing this
dissertation was only made possible by the following people.
The author’s parents: For having hope in their seven year old child with
dyslexia who could not read or write.
Lecturer, Practitioner and Extraordinaire, David Phillips: For his guidance
surrounding semantic analytics.
Microsoft Librarian, David K. Stewart: For providing research through the
Microsoft UK library.
Graduated PR student, Michael Healey: Who once interviewed the author
for his own dissertation and was an inspiration for writing this one.
Wikipedia: The author’s unreferenced secret weapon.
April 2012 Maintaining Reputation through Online Analytics PUR334
4 | P a g e
Table of Contents
Declaration 1
Abstract 2
Acknowledgements 3
Introduction 6
1.0 Literature Review 8
1.1 Public Relations Industry: Adapt or Die 9
1.2 Web Analytics 2.0 12
1.3 How to Measure Sales and Relationships 16
1.4 Introducing the Semantic Web 20
2.0 Methodology 24
2.1 Research Sample Design 26
2.2 Ethical Considerations 28
3.0 Latent Semantic Indexing Research into Neville Hobson’s
Twitter timeline 29
3.1 LSI Python Script 30
3.2 Retrieval, Filter and Identification 32
3.3 Term Count Model and Singular Value Decomposition 33
3.4 The Results 38
4.0 Evaluation 41
4.1 Evaluation of Latent Semantic Indexing 41
4.2 Bayesian Inference and Other Interpretations 43
April 2012 Maintaining Reputation through Online Analytics PUR334
5 | P a g e
5.0 Conclusion 45
5.1 ROI is relied upon for reputation management and direct sales 45
5.2 Third party measuring tools exist but are not perfect 45
5.3 The PR industry needs standardisation 45
5.4 Semantic Analysis works but has not yet been perfected 45
References 47
Illustrations 51
Appendix 59
April 2012 Maintaining Reputation through Online Analytics PUR334
6 | P a g e
Introduction
The public relations industry is in a state of rapid change. On the 1st March 2012
the Public Relations Society of America (PRSA) announced the results of a vote
which concluded with their modern definition of PR (White, 2012):
“Public relations is a strategic communication process that builds mutually
beneficial relationships between organizations and their publics.”
This definition is similar to the UK’s Chartered Institute of Public Relations (CIPR)
(CIPR, 2012):
“Public relations is about reputation – the result of what you do, what you say
and what others say about you. Public relations is the discipline which looks after
reputation, with the aim of earning understanding and support and influencing
opinion and behaviour. It is the planned and sustained effort to establish and
maintain good will and mutual understanding between an organisation and its
publics”.
The Public Relations Consultants Association (PRCA), a UK organisation,
definition of PR is extremely similar to the CIPR (PRCA, 2012):
“Public relations is all about reputation. It’s the result of what you do, what you
say, and what others say about you. It is used to gain trust and understanding
between an organisation and its various publics – whether that’s employees,
customers, investors, the local community – or all those stakeholder groups…”
For this PR society, chartered institute and association the emphasis on
‘reputation’ is clear but a modern definition of PR must take into consideration
how the growth of digital communication channels provide an opportunity for
April 2012 Maintaining Reputation through Online Analytics PUR334
7 | P a g e
the PR industry to expand into additional service areas outside of reputation
management.
Furthermore, a viable method of measuring reputation has not yet been
discovered. Measuring online sentiment levels fulfils the CIPR’s understand of
reputation being “what others say about you” but it is not yet possible to align
sentiment with the global values of a brand, product or service.
Whilst the sharp increase of communication channels being made available
across a range of communication platforms will inevitably impact reputation
management, the definition of PR should also be in question. Since 2006 the
introduction of, now popular, social networks have made available additional
measurement metrics. Some of these metrics are already being utilised by the
online advertising industry to generate direct sales for their clients. As made
clear by some small public relations agencies managing online advertising
campaigns for their clients (Jefkins, 2000). This dissertation explores the
possibility of public relations finding additional ways to measure reputation
online and understanding that digital PR is not just concerned with reputation
but also direct sales.
Within the literature review a succinct but broad assessment of the various
online measurement metrics were examined before an in-depth study into how
semantic analysis could be used to measure public relations activities. All
documents associated with the study can be found in the appendices.
April 2012 Maintaining Reputation through Online Analytics PUR334
8 | P a g e
1.0 Literature Review
This review seeks to identify, examine and compare key forms of online
measurement. The purpose is to understand the scale of online metrics available
for digital public relations campaigns and the interpretation of data involved. The
information within this literature review will continue to serve as a necessary
foundation for the research present in section 3.0.
Preparing this review has involved consolidating relevant published texts, gaining
insights through marketing based blogs, examining online journal databases and
keyword searches on micro-blogging platform Twitter. The author’s personal
experiences within the field of public relations and online advertising are also
included.
This review is comprised of the following sections:
Public relations industry: adapt or die
Web Analytics 2.0
How to measure sales and relationships
Introducing the semantic web
April 2012 Maintaining Reputation through Online Analytics PUR334
9 | P a g e
1.1 Public Relations Industry: Adapt or Die
Edelman’s annual 8095 report researched 3,100 millennials across 8 different
countries. The Millennial generation accounts for those between the ages of 17
to 32 as of 2010, their behaviours showing a stark difference compared to baby
boomers (born early 1946 – 1964) and generation x (born early 1960 – 1980).
Evidence in the report exampled the close relationship millennials have with
brands online (Gould, 2010):
28%relied upon brands to make a positive impact in the world
36% relied upon brands to learn about new trends
18% announced they would switch to a competing brand if they were
offered tools to help them in other areas of life
16% relied upon brands to help achieve personal goals
Organisations must ensure that their brands adapt effectively to fit to the online
environment, often referred to as Web 2.0 (Gordon, 2011). The pressure is on
the public relations industry to have the confidence to manage customer
relations on the front lines. The latest PRCA barometer revealed the gloomy
outlook accurately surmised through a response to the report by Weber
Shandwick’s vice-president (Owens, 2012),
“Clients are saying there is an uncertain market and that we have got to be
smarter with our budgets. We are seeing more quarter-by-quarter release of
budgets – there is a desire for more control”.
PRCA’s barometer revealed a worrying lack of confidence which the public
relations industry is beginning to face on the verge of a possible double-dip
recession. Clients are holding back their budgets and the public relations industry
needs to prove effective ROI. The horizon of social networking platforms over the
last 6 years has pressured a vast array of industries to adapt or die. Whilst public
April 2012 Maintaining Reputation through Online Analytics PUR334
10 | P a g e
relations agencies, in-house professionals and consultants are all gradually
endorsing social media as part of a wider campaign strategy – knowing strategy
and tactics is not enough. Calculations of performance measurement must reach
a standard which not only upholds the values in the definition of public relations
but will be endorsed by the CIPR (Chartered Institute of Public Relations). Not
only have the tools which public relations professionals use changed, but the
industries very definition must be adapted.
The formula for ROI calculates the return of an investment divided by the costs
(Investopedia, 2011).
The public relations industry has to identify the key values from social media, in
relation to the campaign they are running, in order to conclude the necessary
ROI calculation. Public relations theory is integral to understanding how
communication channels should adapt.
Prominent thought leader, Brian Solis, announces in his latest book “The End of
Business as Usual” (2011) that the medium is no longer the message. A play on
words from Marshall McLuhan’s famous coinage from many years before, “the
medium is the message”. Audiences are heavily sharing on social networks which
are transforming behaviours which, in western society, insinuating the
hypodermic needle theory ineffective. According to hypodermic needle theory
(also known as magic bullet theory), “the mass media could influence a very
large group of people directly and uniformly by ‘shooting’ or ‘injecting’ them
with appropriate messages designed to trigger a response” (Gupta, 2006, p. 36).
According to Brian Solis, “media channels that compete for our attention are
transforming our behaviours, empowering users to take control of the
Figure 1 - ROI
April 2012 Maintaining Reputation through Online Analytics PUR334
11 | P a g e
information that reaches them… messages are reborn through context and the
relevant experiences of people and organisations we value” (Solis, 2011, p. 15).
The public relations industry has reached a critical stage requiring quick but
considered evolution. All content focused industries must evolve much like
natural selection in nature. When British evolutionary biologist, Richard Dawkins,
wrote “Climbing Mount Improbable” he referred to an analogy of creatures
reaching the peak of their evolution which resulted in their fixture in the natural
world or extinction as another creature continued through natural selection. The
same applies for the public relations industry as the internet landscape is shared
with online advertising. The CIPR must protect the industry by defining its role
through the purpose of public relations campaigns. The public relations we see
today may be indistinguishable in three years’ time.
Only three years ago there were many websites designed with landing pages for
users once referred through a search engine (Phillips & Young, 2009). Last year
Facebook could have been considered the social hub for many users before
visiting a website. In the last few months Google+’s affect upon the Google
search algorithm has meant an era of social search (Goold, 2012). The landing
page of a website could be considered less significant in an era when online
recommendation has first taken place. A powerful factor considering Edelman’s
8095 report (at the beginning of this chapter) as more millennials discover
through sharing. This is only one of many developments which public relations
have experienced in the 21st Century. In a recent CIPR interview Dr Jon White
provided a quick definition of public relations as a social psychology (CIPR TV,
2011); the public relations industry must understand how to measure and
understand. Discovering ROI measurements starts through evaluation of a
messages’ context which explains relevancy for a public relations campaign. The
industry must adapt, not die.
April 2012 Maintaining Reputation through Online Analytics PUR334
12 | P a g e
1.2 Web Analytics 2.0
The public relations industry must adapt or die which is why measurement is
integral for every business to survive. To understand corporate reputation,
relationships must be measured for success (Paine, 2011). The term Web 2.0 is
frequently referred to in context of the evolution of online - websites provide the
facilities for information sharing and collaboration. This form of communication
can be likened to several of Grunig and Hunt’s four models (Grunig & Hunt,
1984):
1. Press Agentry
Description: One Way Communication. Publicity focused
In Practice: Little research into the audience necessary. Half-truths can be told
with the outcome of behaviour manipulation.
2. Public Information
Description: One Way Communication. Accuracy Necessary.
In Practice: Little research into the audience necessary. Accuracy is essential but
feedback is not measured.
3. Two Way Asymmetric
Description: Feedback used to change attitudes
In Practice: Feedback from the audience used to adapt messages for behavioural
change, not manipulation.
4. Two Way Symmetric
Description: A conversation
In Practice: Removes the need of a journalist as a mediator, allowing
conversation and adaptation from both parties involved.
April 2012 Maintaining Reputation through Online Analytics PUR334
13 | P a g e
In terms of online communication channels Web 1.0 describes how messages
were communicated across websites as one way communication through the use
of ‘Press Agentry’ and ‘Public Information’ models. Just as the information was
communicated it could be said that analytics 1.0 were apparent. The metrics
available were found on the basis of a clickstream data. This data has its
limitations. Avinash Kaushik is the author of the leading research and analytics
blog, Occam’s Razor. Within his latest book “Analytics 2.0” he makes the
distinction that clickstream asks the question ‘what?’ rather than ‘why?’
Clickstream data includes (Google Analytics, 2012):
Visits – The total amount of visits to a website
Unique Visits – The unduplicated amount of visits to a website
New visits – A measurement of new visits versus returning visits
Page views – The amount of pages views on a website
Time – The average amount of time from all visits
Frequency of Visit – The total amount of times a user has returned to a
website
Bounce Rate – The percentage of single-page visits in which the person
left your site from the landing page
Traffic Sources – This includes data from search engines, referring sites
and other traffic sources.
Keywords – This shows the keywords a user typed into a search engine
before arriving on a website.
What clickstream statistics can mean for a digital public relations campaign is
increased revenue, reduced costs and an improvement of customer satisfaction
(Kuashik, 2010). Google Analytics specialises in clickstream data analysis, it is free
to use and vital to measure results online. The formula for use depends upon the
values you place in your ROI.
April 2012 Maintaining Reputation through Online Analytics PUR334
14 | P a g e
The starting origin of online collaboration is almost impossible to pinpoint. It may
have begun with Morse code in the 1800s. In reality collaboration began with the
arrival of two historical events; the CTSS (Compatible Time-Sharing System) and
the invention of HTML (Hypertext Mark-up Language). Both of these
developments are examples of the human and technological developments
which explain where Web 2.0 is today. Email was human communication and
HTML used hyperlinks which is what defines the internet as WWW (World Wide
Web). The Barabasi-Albert model is drawn from an algorithm which represents
scale-free networks (Barabasi et al, 1999). It is an example of the interconnected
structure of the internet but also how humans connect across a social network.
Figure 3 - Barabasi-Albert model
Figure 2 - ROI
April 2012 Maintaining Reputation through Online Analytics PUR334
15 | P a g e
The Barabasi-Albert model above is shown with 18 points of connection. Imagine
the scale of Facebook with its 800 million active users, 800 million points of
connection (Facebook, 2012). Web Analytics 2.0 is made possible through the
transparency which is exhibited through the content sharing across a vast array
of social networking platforms. Content is flowing freely across the internet and
it must be listened to and measured because:
You need to keep track of your stakeholders
You need to provide your client the best ROI
We need the public relations industry to evolve
Over the last 6 years we have not just seen the rise of social networking
platforms but also 3rd party measuring tools such as Brandwatch, Radian6 and
Sysomos. These top self-service social media analytics all offer services playing
on a variation of ‘search, measure, understand and engage’ technology.
Organisations who use these tools as part of a social media strategy type in
search terms along with Boolean strings – the results not only showing what
customers may be remarking but allows organisations to plan engagement
tactics. In evaluating this data it is necessary to grasp the definitions upheld by
the industry:
Quantitative: Data that refers to numbers and frequencies (number of updates,
average subscribing rate, etc.)
Qualitative: Data that provides information of meaning (status updates, tweets,
etc.)
Correlation: Works with quantifiable data to find relation between variables.
The exponential growth of social media requires public relations industry to
consider the correlation of data before the data mining processes of 3rd party
providers. We are currently heading towards an era of correlation based digital
public relations where mass sentiment results in reaction based communication.
April 2012 Maintaining Reputation through Online Analytics PUR334
16 | P a g e
1.3 How to measure sales and relationships
Web Analytics 2.0 is concerned with presenting the ‘What?’ and ‘Why?’ behind
clickstream data (Kuashik, 2010). During the 1990s the online advertising
industry witnessed the revolution of ‘one-to-one’ marketing which is “where
direct response, direct mail, the internet and the interactive opportunities of
digital TV come together” (White, 2000, p. 203). Over the following 10 years the
online advertising formed its own standardisation for measurement. This
involves measuring the below metrics (Gay, Charlesworth and Esen, 2007):
CPC (Cost Per Click)
CPA (Cost Per Action)
CPL (Cost Per Lead)
CTR (Click Through Rate)
CR (Conversion Rate)
CPM (Cost Per Thousand)
Calculations
CTR = CLICKS / IMPRESSIONS
CR = CPA / CLICKS
CPM = (TOTAL COST / IMPRESSIONS)*1000
Using the above metrics and calculations correlation may then be found between
the advertised product/brand and advertising MPU (Media Placement Units). For
instance a clothing brand may be advertising jeans for males between the ages of
18 – 25. Costs within network advertising can be attributed to individual metrics;
usually the CPC, CPA or CPM. In some instances a hybrid cost method may be
attributed (CPC and CPA) to provide the client with a better ROI.
This advertising campaign is run on the basis of sales which mean an action tag
will be placed on the client’s website attributed with the cost of £5.00; this is a
April 2012 Maintaining Reputation through Online Analytics PUR334
17 | P a g e
CPA cost method. For each sale through advertising they spend £5.00, the RRP
(Recommended Retail Price) of the jeans on the website is £19.99 each. If the
advertising campaign were to run then the graph of results may appear as below.
Costs within network advertising can be attributed to individual metrics; usually
the CPC, CPA or CPM. In some instances a hybrid cost method may be attributed
(CPC and CPA) to provide the client with a better ROI.
In the above example the client spent £16,835 running the network advertising
campaign (excluding internal marketing costs) with the £5.00 being spent on
each sale through advertising. However the actual sale costs of the jeans on the
client’s website were £19.99 leaving a £14.99 profit gap. Gross revenue was
therefore £67,306.33 leaving net revenue of £50,471.33.
ADVERTISING SPEND – TOTAL SALES = NET PROFIT
As stated the above calculations are set out as an example of how analytics are
used in network advertising to generate sales. These analytics are found through
Figure 3 – Online advertising example statistics
April 2012 Maintaining Reputation through Online Analytics PUR334
18 | P a g e
the same JavaScript method as Google Analytics (and a host of other free
analytics tools), they are not advertising exclusive.
Advertising is tasked with tracking sales based upon clickstream data. Could 2012
be the year when the public relations industry utilises these metrics to not only
raise awareness through social networks but also track sales? It would not be the
first time that public relations industry has relied upon the advertising industry
for validity. Even though the CIPR does not officially endorse AVE (Ad Value
Equivalency) a research paper published in 2003 by the IPR1 describes its
demand by bosses and clients for use (Fox, 2003). The calculation for AVE is:
MEASURING COLUMN INCHES * ADVERTISING RATES = EQUIVELENT COST
Or
SECONDS WITHIN BROADCAST MEDIA * ADVERTISING RATES = EQUIVELENT
COST
The comparison between advertising and public relations is a cause for concern
as clients may presume an equal outcome of messages’ effect. This is to ignore
the additional calculations which may be used to multiply an additional 1.5 to 1.6
to the number (industry standard rates) to manipulate ROI for the client. In
essence AVE follows the same calculation as CPM in online advertising – with a
greater concept of accuracy. Within public relations the outcome of relationships
could be measured through symmetrical communication (Childers, 1999) which
assists with:
Understanding the needs of stakeholders
Tracking the effectiveness of messages
1 The IPR (Institute of Public Relations) gained Chartered status in 2005 making it the
CIPR. (http://publicsphere.typepad.com/mediations/2005/02/ipr_wins_charte.html)
April 2012 Maintaining Reputation through Online Analytics PUR334
19 | P a g e
Listening to mediators (Journalists, Bloggers and Opinion Leaders) to
provide them with relevant content.
In terms of clickstream data the contextual relevancy of messages is found
through correlation. Patterns within data are evaluated against performance
objectives to assume poor or positive results. With social networks it is possible
to focus upon quantitative and qualitative statistics with the introduction of the
semantic web.
April 2012 Maintaining Reputation through Online Analytics PUR334
20 | P a g e
1.4 Introducing the Semantic Web
Today social networks are largely comprised of text based content which
requires an algorithm for detecting linguistics and presenting such data as
qualitative data sets. Semantic analytics are therefore an amalgamation between
text analytics and network ontologies. Recent research presents a dependency
upon RDF (Resource Description Framework)2, a model which allows data sets to
be placed within web pages (RDF Working Group, 2004). This creates a
noteworthy distinction between hyperlinks and RDF (Lee, 2009);
“Like the web of hypertext, the web of data is constructed with documents on
the web. However, unlike the web of hypertext where links are relationships
anchors in hypertext documents written in HTML, for data they links between
arbitrary things described by RDF”3.
The creation of RDF links allows navigation across one data source to many
others; with the addition of a FOAF (Friend of a Friend) data link it is possible to
attribute identification to another author (Lee, 2009). When FOAF is used with
RDF a social network is created between individuals and data sets (Golbeck &
Rothstein, 2008), allowing a significant degree of accuracy between content,
context and a network of relationships.
Currently data mining for semantic data is achieved through semantic search
engines (through crawlers) or semantic web browsers. Presenting data in an
understandable format is accomplished through the use of OWL (Web Ontology
Language), a sublanguage for applying additional vocabulary for when data
needs to be processed by machines rather than humans (McGuiness &
Harmelen, 2004).
2 Which is written using XML (Extensible Mark-up Language)
3 This quote written by founding father of the World Wide Web in 2006 (revised in 2009),
Tim Berners-Lee, signalled the founding of Linking Open Data project which aims to make data freely available to everyone.
April 2012 Maintaining Reputation through Online Analytics PUR334
21 | P a g e
The semantic web is made possible through all the above technical elements, the
question is how to utilise the conventions for analytical processing. A research
paper published by University of Georgia and University of Maryland entitled
“Semantic Analytics on Social Networks: Experience in Addressing the Problem of
Conflict of Interest Detection” describes the semantic research method as
follows (Meza, 2005):
1) Obtaining high quality data
Extraction of data from sites which includes metadata extraction from sources to
ensure relevancy.
2) Data preparation
Mostly data clear up and evaluation
3) Entity disambiguation
Attach relevant data to the correct entity
4) Metadata and ontology representation
Importing or exporting data as RDF/RDFS and OWL.
5) Querying and inference techniques
Data processing to enable semantic analytics and discovery
6) Visualization
Prepare data in a readable format
7) Evaluation
Comparison needed between shown data and other evidence to see if a
correlation appears.
April 2012 Maintaining Reputation through Online Analytics PUR334
22 | P a g e
The process of obtaining semantic analytics depends upon the task; research is
new and therefore experimental. Semantic measurement methods require a re-
imagining of ranking methods which may be used to measure blogs in the past
simply based upon clickstream data as proposed by Katie Delahaye Paine (2011)
– such methods could now be considered archaic.
The most recent semantic measurement method is called Latent Semantic
Analysis (LSA) which evaluates underlying meaning and concepts behind
language to build relationships between nouns and adjectives (Puffinware,
2010). Content is no longer king, context is king (Solis, 2011) and LSA provides
contextual relationships which closely illustrates natural language recognition
(Landauer, Foltz & Laham, 1998).
With regards to the semantic web, PR professionals are already ahead of the
game with their knowledge of values behind relationships. Just as Brian Solis
observed that new technologies are adjusting our behaviours (Solis, 2011), the
public relations industry must change their behaviour of how they utilise new
media – adapt or die. This begins with:
Adjusting our terminology from referring to stakeholders as ‘audiences’
to instead ‘publics’, removing the illusion of control that public relations
professionals still believe they have (Grunig, 2009).
Building relationships on a symmetrical basis rather than asymmetrical
(Grunig, 2011).
Understanding that intent is necessary4 so that a stakeholder
understands that a message is relevant (Theaker, 2008) and listening to
feedback.
4 This theory is discussed in the textbook “Human Communication” written by Michael
Burgoon, Frank G. Hunsaker and J. Dawson.
April 2012 Maintaining Reputation through Online Analytics PUR334
23 | P a g e
Based upon the nexus of values associated with multiple entities (individuals)
created from semantic analytics a study of linguistic pragmatics can be used to
form the correct rhetoric for stakeholders. The approach considers context of
content online and provides a method for public relations professionals to
provide meaning behind their messages (Mackey, 2005). Thus allows the
completion of campaign objectives to assist in raising awareness and change of
behaviour (which may even result in direct sales).
April 2012 Maintaining Reputation through Online Analytics PUR334
24 | P a g e
2.0 Methodology
As stated in the introduction to this dissertation the author intends to research
into three different areas:
1) The unprecedented growth of digital communication channels.
2) To assess the current usage of online metrics for evaluating web 1.0 and
web 2.0 platforms.
3) To assess the potential usage of semantic analysis for the public relations
industry.
To accurately research each of these areas the author deployed a variety of
different research methods. The main research piece of this dissertation is the
research into semantic analysis and the research present in the literature review
will be used to complement and provide perspective for the conclusion of the
research. Due to the modern nature of the research present within this
dissertation it was not possible for the author to reference or interview anyone
involved with PR campaigns using semantic measurement as nobody is practicing
it yet.
The table below outlines the mixture of secondary and primary research utilised,
along with how these align with research aims and objectives. The first research
aim is designed to provide an academic insight into the growth of digital
communication channels and how they are measured. The second research aim
relies heavily upon primary research as it is an experimental piece of research.
Research Aim Objectives Secondary Primary
To assess the
current usage of
online metrics for
To explore the
increasing use of
digital
Literature Review N/A
April 2012 Maintaining Reputation through Online Analytics PUR334
25 | P a g e
Figure 4 – Research table
evaluating web
1.0 and web 2.0
platforms.
communication
channels
To explore the
symmetry between
traditional and
digital
communication
channels
Literature Review N/A
To explore current
metrics for online
measurement
Literature Review Observations
To explore the
potential of the
semantic analysis
method
Literature Review N/A
To assess the
potential usage
of semantic
analysis for the
public relations
industry.
Conducting
research into latent
semantic indexing
(LSI)
Literature Review
Published Texts
Observations
Testing
April 2012 Maintaining Reputation through Online Analytics PUR334
26 | P a g e
Figure 5 – Research layout
To make it clear how the range of research methods and several research aims
provide conclusions to the questions provided at the start of this dissertation the
author has constructed a visual table.
Data collected for the semantic analysis research was achieved through
extracting data from Neville Hobson’s Twitter timeline and interpreting data
manually and through a python script5. This interpretation includes visually
displaying results using a singular value decomposition algorithm. The results of
this research can be found within the conclusion of this dissertation.
2.1 Research sample design
Literature Review
The literature review was conducted within this dissertation to gain
understanding of the progress of digital communication, the range of metrics
available and to achieve perspective surrounding semantic analysis. The review
was achieved by reading a wide range of PR publications; practitioners published
books and wider reading into online marketing. All material was selected on the
basis of its relevancy. This also included using digital communication channels:
5 This script is available to view in the appendices.
Research
Methods
Literature
Review
Observations
and Testing
Findings
Research
Evaluation
Conclusion
April 2012 Maintaining Reputation through Online Analytics PUR334
27 | P a g e
Google+
Google Reader
Online Journals
Online Databases
Due to the nature of the research within the literature review no primary sources
of data collection were chosen. However all secondary evidence was selected
based upon the credentials of their authors.
Data Analysis
Before approaching the research into Latent Semantic Indexing (LSI) it was
important for the author to note the types of data which would be collected:
Quantitative Data
This data takes the forms of numerical figures.
Qualitative Data
This data takes for the form of letters, words and sentences.
Correlating Data
Observing patterns between two or more pieces of data and presenting
these patterns as results. In terms of LSI this could take the form of
contextual patterns.
Knowing each stage of the LSI analysis was done through additional research
which has all been referenced within this dissertation. Stages of this research
have been included within section 3.0 in order to maintain research integrity.
April 2012 Maintaining Reputation through Online Analytics PUR334
28 | P a g e
2.2 Ethical considerations
Making the decision to know which online data should be collected for LSI
research was made with a conscious approach. It was important that the data
used has a clear human source so that patterns can be detected. Neville
Hobson’s Twitter timeline was eventually selected due to its public nature, but
care was still taken not to publish a tweet widely if it had the possibility to
distress the original author.
The script used for LSI analysis was not originally programmed by the author of
this dissertation. However modifications were made concerning the data
inputted into the script, slight modification to variables to show appropriate
results and a correct to the script due to an update made to Python 2.7. This
script has been made available in the appendices.
All other material referenced within this dissertation is publically available.
April 2012 Maintaining Reputation through Online Analytics PUR334
29 | P a g e
3.0 Latent Semantic Indexing (LSI)
Research into Neville Hobson’s Twitter
timeline
An ideal example for presenting the benefit of Latent Semantic Indexing (LSI) is
to observe how search engines such as Google operate. When a user provides a
search term an exact lexical match would not be appropriate due to the
existence of synonymies (Duz, 2008). Therefore an example search of “Cheap
gardening spades shop” could result in a lexical match of card playing, gambling,
gardening, etc. In reality the Boolean search query would return every Google
indexed webpage that includes all four words. Instead Google uses a version of
LSI to understand the patterns of words across every indexed webpage (among
other methods). This mathematical technique uses Singular Value Decomposition
(SVD) to identify the context between words. The process assumes that similar
words will be used within the same contexts, discovered through the
relationships between words. Through the contextual basis of word weightages
LSI is able to identify the category of written documents. For public relations
professionals this method, when delivered through an automated algorithm, re-
imagines stakeholder analysis.
Words that are usually written about a celebrity can be analysed to
understand associated values.
Research into competitors can be done to understand related terms
which can then be targeted in Search Engine Optimisation (SEO)
adaptations.
Understanding the values behind stakeholder groups to craft messages
effectively.
April 2012 Maintaining Reputation through Online Analytics PUR334
30 | P a g e
Automatic categorisation of media releases to understand the contexts
they should appear in.
Brand values become something being referred to by users online rather
than fixed in a marketing department.
The possibilities of LSI in public relations will become clear through time. As a
piece of research into LSI this technique has been used within this dissertation to
identify the key themes surrounding Neville Hobson’s Twitter timeline.
Neville Hobson first began blogging in 2002, a hobby which grew to incorporate
how a business should communicate using digital communication channels.
Today he has over 25 years’ experience in public relations, marketing
communication and financial relations (Hobson, 2012). His acclaimed status is
clearly exampled by his popular Twitter profile boasting over 10,000 followers (as
of 12/02/2012).
3.1 LSI Python Script
This LSI research was conducted using a modified version of this Python Latent
Semantic Analysis code: http://www.puffinwarellc.com/index.php/news-and-
articles/articles/33-latent-semantic-analysis-tutorial.html?start=2. The script was
run using Python 2.7 using additional scientific libraries NumPy and SciPy.
Modifications to the script include a change of subject data, change of stop
words, a display command to print index words and a line to stop the program
automatically closing upon build.
Evaluating Neville Hobson’s Twitter timeline using LSI has involved the following
steps:
1. Retrieve 50 tweets from Hobson’s timeline (11 Feb – 9 Feb 2012).
April 2012 Maintaining Reputation through Online Analytics PUR334
31 | P a g e
As a piece of manual LSI research 50 tweets provided an adequate
sample. An automated algorithm could pull hundreds of tweets for
analysis.
2. Filter URLs, hashtags, retweets and numerical values.
LSI is concerned with qualitative data in the form of words out of English
syntax. All the data needs to be associated with Neville Hobson (hence no
retweets).
3. Identify index words.
These are words which occur twice or more in the sample data, are not
stop words (such as ‘it’, ‘the’, ‘a’, ‘if’, etc.) and must carry meaning.
4. Discover correlation using Term Count Model (TCM).
The TCM presents the initial stages of LSI by capturing the frequency of
index words from retrieved data.
5. Apply weightages to index words.
Once the frequency of index words has been discovered an algorithm is
used to apply contextual weightages to words.
6. Visual display of results.
Each index word, with their unique weightage, is presented in a graph.
Words plotted in certain sections of the graph indicate categories.
7. Interpretation of Results
Understand the data.
April 2012 Maintaining Reputation through Online Analytics PUR334
32 | P a g e
3.2 Retrieval, Filter and Identification
Retrieving 50 tweets from Neville Hobson’s Twitter timeline involved a simple
copy and paste into a word document6. The sample of tweets which were
extracted is from a single calendar period between the 9th – 11th February 2012.
Any tweets which were Re-Tweets (RTs) were disregarded as this research into
LSI requires data unique to Neville Hobson.
The data filtration process describes the clean-up process of extracting purely
qualitative data. In the context of data usually found posted on Twitter this
involved removing:
URLS
HashTags
Re-Tweets (RTs)
Numerical values
@replies to other users
Once the data has been filtered the second stage of LSI is to identify the “index
words” of the document. These are words that appear twice or more within the
captured data. So for instance, if the first tweet contained the word “social” and
the thirtieth also contained “social” – this makes “social” an index word. All index
words are connotative which means that their meanings can be interpreted
against other index words.
Retrieving the index words of this document involved several forms of
verification. The first stage involved manually reading over Neville Hobson’s
tweets and highlighting index words individually. This involved identifying index
words within the document and measuring their frequency of appearance. To
verify this manual process, which is subject to error, an adjustment to the python
6 This document can be found in the appendix
April 2012 Maintaining Reputation through Online Analytics PUR334
33 | P a g e
script was made to display the self.keys variable (line 95) to show the index
words:
1. Advice
2. Business
3. Comments
4. Daily
5. Era
6. Event
7. Fun
8. Global
9. Google
10. Hobson
11. Looks
12. Media
13. Morning
14. Networks
15. Neville
16. Perspectives
17. Post
18. Reading
19. Recording
20. Sharing
21. Snow
22. Social
23. Today
In doing so it was possible to identify any “stop words” within the sample data
through the process of elimination. This concerns examining English sentence
syntax to identify coordinating conjunctions, pronouns, adjectives and verbs. For
this sample data this included the omission of the following words:
'on','just','to','for', 'great', 'i', 'between','and','a','good', 'is', 'the', 'of', 'some', 'in',
'other', 'why', 'get', 'by', 'I', 'as', 'use', 'says', 'out', 'too', 'via', 'here', 'it', 'about',
'an', 'at', 'be', 'coming', 'especially', 'I', 'into', 'its', 'make', 'need', 'not', 'one',
'prime', 'still', 'thanks','that', 'we', 'well', 'what', 'will', 'with'.
3.3 Term Count Model and Singular Value Decomposition
There are several ways to measure the initial results of LSI. These include the
Term Count Model (TCM) and Singular Value Decomposition (SVD). The TCM
marks the initial stage of LSI for understanding the frequency of index word
mentions. LSI works by reducing the structured syntax of language to instead
recognising individual key words. The TCM places the initial data results of the
April 2012 Maintaining Reputation through Online Analytics PUR334
34 | P a g e
retrieved data into a count model so that it is possible to understand how
frequent key words appear in each extracted tweet. This process alone does not
result in any viable data but does allow for SVD to take place later in the process.
The TCM results of Neville Hobson’s Twitter timeline data can be found on the
next page7. Figures three and four show the data as the initial spread sheet table
and as a graph. At this early stage it is already apparent that the key word ‘social’
is by far the most frequent word.
Please turn over
7 Larger versions of figure 6 and 7 can be found in the illustrations.
April 2012 Maintaining Reputation through Online Analytics PUR334
35 | P a g e
Fig
ure
6 -
TC
M
Fig
ure
7 -
Vis
uali
sa
tio
n o
f T
CM
April 2012 Maintaining Reputation through Online Analytics PUR334
36 | P a g e
Now that the TCM table has been constructed it is necessary to revert back to
the Python script to have the selected data broken down into different
dimensions. This process is called Singular Value Decomposition (SVD) and is an
algorithm built to show on a visual basis the relationship between each key word
and the term of which they originate from. The number of dimensions available
in SVD is relative upon the data sets selected and the purpose of the SVD
process. In terms of evaluating SVD for Twitter timeline data three dimensions
have been used. A histogram can be used to understand the importance of each
singular value based upon the data sets used (Puffinware, 2010). The meaning
behind each dimension is as follows:
Dimension 1: The TCM frequency of each index word.
Dimension 2: The X value relationship dimension.
Dimension 3: The Y value relationship dimension.
As the first dimension of SVD simply measures the frequency of each index word
it will not be necessary to implement. Therefore dimensions two and three will
be utilised for the SVD model. In turn these will form the X and Y axis on a
comparative scatter graph. The scatter graph works by noting the values of
dimension two and dimension three which form each of the different
coordinates on the graph. As each of the dimensions have been discovered
through using an algorithm which notes each key word’s relationship with the
term they originate from, the data should show clusters of similar words
associating around particular values. For instance ‘advice’, ‘comments’ and
‘sharing’ may closely align with each other and may be interpreted as a social
category.
Fig 8 shows a list of each key word and the associated values under dimension
two and dimension three.
April 2012 Maintaining Reputation through Online Analytics PUR334
37 | P a g e
Figure 8 – SVD dimensions table
Once these values have been aligned using a Microsoft Excel spread sheet table
the results appear as shown on the next page.
April 2012 Maintaining Reputation through Online Analytics PUR334
38 | P a g e
3.4 The Results
As expected certain key words have aligned more closely with some others
dependent upon their original relationship with the tweet from which they
originated from. This explains why the individual key words ‘Neville’, ‘Hobson’
and ‘Daily’ has aligned to form their original simple sentence again as each word
equally appears in exactly the same tweets. The original fifty tweets have not
been included on this graph as their sheer number would have made it
impossible to interpret the key word results and their very existence would not
assist to fulfil the research task necessary for this dissertation. If smaller data sets
had been used (perhaps evaluating a handful of newspaper articles) then the
original terms would have had a meaningful value when compared against the
extracted key words. The final stage of this LSI research concludes with a manual
interpretation of the weighted key word sets.
Figure 9 – Visual SVD
April 2012 Maintaining Reputation through Online Analytics PUR334
39 | P a g e
Figure 10 – Visual SVD with categorisation
This final stage requires manual interpretation of the categories which are
present as a result of SVD and LSI research. The three circled categories could be
classed as the following:
Red: Broadcasting
These three words are loosely based around the application of
broadcasting.
Blue: Community
Without a doubt these key words are all associated with community
activities and social business. Notice how all four of the words are to do
with the creation and sharing of information on Twitter. This may also
show that Neville Hobson has some influence as a user on Twitter.
Yellow: Authority & Teaching
This could also be labelled as a social category but with respect to Neville
Hobson’s timeline show that he has authority and teaching. Notice how
‘comments’, ‘reading’ and ‘advice’ are closely weighted on the scatter
April 2012 Maintaining Reputation through Online Analytics PUR334
40 | P a g e
graph which may indicate that some tweets are about commenting and
publishing articles.
April 2012 Maintaining Reputation through Online Analytics PUR334
41 | P a g e
4.0 Evaluation
4.1 Evaluation of Latent Semantic Indexing
Despite the apparent success of the research within this dissertation concerning
LSI the author must note there are five important areas of improvement needed
with this system.
Small data set
For this research 50 tweets were captured for analysis which has left
words such as “era”, “networks” and “snow” uncategorised when
weighted by SVD. A larger data sample would provide increased accuracy
and depth into Neville Hobson’s online activity.
Shared meaning
LSI is unable to understand that some words may be spelt exactly the
same but their meanings may differ. Whilst the word ‘reading’ was
categories under “Authority & Teaching” the context of the sentence it
originated from may have actually meant the location Reading. In order
for LSI to understand the actual meanings behind words an additional
research process would need to be used before SVD.
The clean-up process
Extracting tweets from Twitter for analysis is a process which requires a
large amount of data clear-up. For an automated process an algorithm
would need to be constructed in order to identify hashtags, urls and
@replies. As LSI can be implemented on a number of different digital
communication channels then separate algorithms would need to be
constructed to implement different data clean-up processes.
April 2012 Maintaining Reputation through Online Analytics PUR334
42 | P a g e
Interpretation
LSI represents patterns of words. Within this example we can see how the
words “Neville”, “Hobson” and “Daily” have all been attributed the same
weightage through SVD as all words only appear in the same tweets. As
LSI can only identify words with the same meaning this leaves the word
“Morning” entirely separate from “Daily” even though both share close
meaning. In the same way the words “social” and “networks” have been
grouped differently even though the two words are usually frequently
used to describe the same term, “social networks”. Therefore LSI
provides a pattern but additional interpretation is needed to identify
word categories.
Automation is key
The research into LSI in this dissertation is extremely basic in comparison
to the large data sets that would exist within a PR agency or in-house
environment. It has taken a month for the author to fully understand the
process of LSI to process a small data set of Neville Hobson’s Twitter
timeline. For this measurement process to be used professionally then an
automated system would need to be constructed which can quickly crawl,
extract, clean-up and process data. Despite extensive research an
organisation or agency offering these services does not yet exist.
April 2012 Maintaining Reputation through Online Analytics PUR334
43 | P a g e
4.2 Bayesian Inference and Other Interpretations
The key stage of LSI is concerned with the nature of the SVD which takes place.
For the research within this dissertation the author has approached key word
weighting based upon a three dimensional analysis but discarding the first
dimension for more accurate results. However, curating the results of LSI can
take many forms which all take place after the SVD process. These processes
have not been applied to the processed data within this research piece due to
the small data set. Yet these different processes have been listed below.
Bayesian inference
Bayesian inference is a mathematical method used to understand to what extent
is a notion true or false. In statistical terms this is known as Boolean logic (MS
Research, 1998) and this is a process which works in the background for almost
all variable based computing solutions. In this respect (Radford, 1998), “all forms
of uncertainty are expressed in terms of probability”. Therefore the system
works based upon a posteriori8 justifications which make it perfect for curating
the results of LSI. If a LSI system used an advanced Bayesian inference script then
the LSI algorithm could be completely automated, based upon an initial human
evaluation of categorising key words against sub-set categories.
Benefits: Fully automated system; Machine learning environment.
Considerations: Advanced script needed; Risk of misinterpretation of
words.
Natural language analysis
This process would involve taking the end results of LSI and then putting them
through a further process so that each key word is categorised under certain
concepts. For instance the word ‘Reading’ can be defined to either be linked to
an activity or a location. This would be achieved by manually weighting the word
8 The term ‘a posteriori’ is Latin to explain “from the later” and in philosophy explains
knowledge gained from empirical evidence or experience.
April 2012 Maintaining Reputation through Online Analytics PUR334
44 | P a g e
closer to each of the two concepts by reinforcing its relationship with close
words. For instance if ‘Reading’ and ‘Car’ were to appear within the same syntax
then natural language analysis would result in ‘Reading’ being a location in this
instance. This is a process which works upon the basis of Boolean probability
which would mean it could be used in parallel with Bayesian inference.
Benefits: Fully automated system; Machine learning environment; More
accurate results.
Considerations: Advanced script needed; Risk of multiple languages;
Unknown semantic concepts.
Manual weighting system
The simplest way to curate the results from LSI would be to evoke a manual
weighting system. This would involve users of a partially automated LSI
programme to make judgements concerning the results of analysis. This may
take the form of a star based rating system, a numbered relevance system (1 –
10) or manually grouping certain results together under their own set categories.
Benefits: Easy to set up.
Considerations: Time consuming; Risk of human error; No machine
learning.
April 2012 Maintaining Reputation through Online Analytics PUR334
45 | P a g e
5.0 Conclusion
1. ROI is relied upon for Reputation Management and Direct Sales
The public relations industry has always deployed an algorithm in order
to understand how a client receives their ROI. In the past this has
involved the use of AVE models but online it is necessary for the CIPR to
invoke a standardisation for practitioners to utilise.
2. Third party measuring tools exist but are not perfect
Clickstream data exists to answer the ‘What?’ and ‘Why?’ questions
behind data. However there are a range of third party measuring tools
that capture this clickstream data and use their own algorithms to
provide sentiment levels. These programmes can be used but only at a
professional’s own discretion as the calculations for sentiment are not
usually publically available.
3. The PR industry needs standardisation
The Online Advertising industry has been used as an example within this
dissertation to show how that particular industry has applied their own
standardisation behind online metrics. In this respect the public relations
industry is years behind; not only is not there a standard for measuring
traditional PR but a standard does not yet exist for digital public relations.
As the Chartered body the CIPR must organise standard measurement
metrics so that services can be better understood by clients and by the
agencies offering services.
4. Semantic Analysis works but has not yet been perfected
The research into LSI shows how this measurement method could be
utilised by PR professionals to measure reputation online. As of the
publication of this dissertation no organisations exist who can offer this
form of measurement. However this may change in the next couple of
April 2012 Maintaining Reputation through Online Analytics PUR334
46 | P a g e
years. This form of measurement is already being utilised by Google to
deliver their search results and will most likely be used by the PR industry
to measure their activities online. A bigger research study would be
required to really show how LSI could revolutionise the digital PR
industry.
April 2012 Maintaining Reputation through Online Analytics PUR334
47 | P a g e
References 1. Barabasi, et al. (1999) ‘Emergence of Scaling in Random Networks’,
Science Journal, 509-512 [online]. Available at: http://www.sciencemag.org/content/286/5439/509.full (Accessed: 26 January 2012)
2. Childers, L. (1999) ‘Guidelines for Measuring Relationships in Public
Relations’, the Institute for Public Relations. University of Florida.
3. CIPR TV. (2011) ‘CIPR TV Discusses Broadcast PR and the PR 2020 Report’. Retrieved January 26, 2012 from YouTube: http://www.youtube.com/watch?v=pzUYBEm-E6w&feature=youtu.be
4. CIPR. (2012) ‘What is PR?’ Retrieved April 04, 2012 from CIPR website:
http://www.cipr.co.uk/content/careers-cpd/careers-pr/what-pr
5. Duz, M. (2008) ‘Latent Semantic Indexing LSI Explained’. Retrieved April
04, 2012 from SEO blog: http://www.seo-blog.com/latent-semantic-
indexing-lsi-explained.php
6. Facebook. (2012) ‘Statistics’. Retrived January 26, 2012 from Facebook: https://www.facebook.com/press/info.php?statistics
7. Fox, J. B. (2003) ‘A Discussion of Advertising Value Equivalency (AVE)’, The
Institute for Public Relations. University of Florida.
8. Gay, R. Charlesworth, A and Esen, R. (2007) Online Marketing: a
customer-led approach. Oxford: Oxford University Press.
9. Golbeck, J. and Rothstein, M. (2008) ‘Linking Social Networks on the Web with FOAF: A Semantic Web Case Study’. University of Maryland.
10. Google Analytics. (2012) ‘Google Analytics Product Tour’. Retrieved
January 26, 2012 from Google Analytics website: http://www.google.com/analytics/tour.html
11. Goold, P. (2012) ‘Google’s ‘Search, plus Your World’ Highlights the
Additional Benefits of Social Activity, says Punch Communications’. Retrieved January 26, 2012 from Yahoo News website: http://news.yahoo.com/google-search-plus-world-highlights-additional-benefits-social-081625020.html
12. Gordon, A. (2011) Public Relations. Oxford: Oxford University Press.
13. Gould, D. (2010) ‘8095 Report: For Millennials, Brand Preference is a
Form of Self Expression’. Retrieved January 26, 2012 from PSFK website:
April 2012 Maintaining Reputation through Online Analytics PUR334
48 | P a g e
http://www.psfk.com/2010/10/8095-report-for-millennials-brand-preference-is-a-form-of-self-expression.html
14. Grunig, E. J. and Hunt, T. T. (1984). Managing Public Relations. United
States: Holt, Rinehart & Winston.
15. Grunig, J. E. (2009). Paradigms of global public relations in an age of digitalisation. Prism 6(2): http://praxis.massey.ac.nz/prisms_on-line_journ.html
16. Gupta, O. (2006). Encyclopaedia of Journalism and Mass Communication.
India: Isha Books 17. Hobson, N. (2012) ‘About’. Retrieved April 04, 2012 from Neville Hobson’s
blog: http://www.nevillehobson.com/about/
18. Investopedia. (2011) ‘Return on Investment – ROI’. Retrieved January 26, 2012 from Investopedia website: http://www.investopedia.com/terms/r/returnoninvestment.asp#axzz1jzpgEI6N
19. Jefkins, F. (2000) Advertising. Edinburgh: Pearson Education Limited.
20. Kaushik, A. (2010) Web Analytics 2.0: The art of online accountability & science of customer centricity. Indiana: Wiley Publishing
21. Landauer, K. T., Foltz, W. P. and Laham, D. (1998) ‘An Introduction to
Latent Semantic Analysis’, Discourse Processes Journal, 25, 259-284 [online] Available: http://lsa.colorado.edu/papers/dp1.LSAintro.pdf (Accessed: 26 January 2012)
22. Lee, B. T. (2009) ‘Linked Data’. Retrieved January 26, 2012 from W3
website: http://www.w3.org/DesignIssues/LinkedData.html
23. Mackey, S. (2005) ‘Rhetorical Theory of Public Relations: Opening the door to semiotic and pragmatism approaches’, The Annual Meeting of Australian and New Zealand Communication Association. Deakin University.
24. McGuinness, L. D. and Harmelen, V. F. (2004) ‘OWL Web Ontology
Language Overview’. Retireved January 26, 2012 from W3 website: http://www.w3.org/TR/owl-features/
25. Meza, A. B, et al. (2005) ‘Semantic Analytics on Social Networks:
Experiences in Addressing the Problem of Conflict of Interest Detection’. University of Georgia & University of Maryland.
April 2012 Maintaining Reputation through Online Analytics PUR334
49 | P a g e
26. MS Research. (1998) ‘Basics of Bayesian Inference and Belief Networks’. Retrieved April 07, 2012 from Microsoft Research website: http://research.microsoft.com/en-us/um/redmond/groups/adapt/msbnx/msbnx/basics_of_bayesian_inference.htm
27. Owens, J. (2012) ‘PRCA Trends Barometer Reveals Concerns About
Industry Outlook’. Retrieved January 26, 2012 from PR Week: http://www.prweek.com/news/rss/1112783/PRCA-trends-barometer-reveals-concerns-industry-outlook/
28. Paine, D. K. (2007) ‘How to set benchmarks in social media: Exploratory
research for social media, lessons learned’. KDPaine & Partners.
29. Paine, D. K. (2011) Measure what Matters: Online Tools for Understanding Customers, Social Media, Engagement, and Key Relationships. New Jersey: John Wiley & Sons.
30. Phillips, D. and Young. P. (2009) Online Public Relations: A practical guide
to developing an online strategy in the world of social media (2nd Ed). London: Kogan Page.
31. PRCA. (2012) ‘What is PR?’ Retrieved April 04, 2012 from PRCA website:
http://www.prca.org.uk/What_is_PR
32. Puffinware. (2010) ‘Latent Semantic Analysis (LSA) Tutorial’. Retrieved January 26, 2012 from iMetaSearch website: http://www.puffinwarellc.com/index.php/news-and-articles/articles/33-latent-semantic-analysis-tutorial.html
33. Radford, M. (1998) ‘Philosophy of Bayesian Inference’. Retrieved April 07, 2012 from Toronto University website: http://www.cs.toronto.edu/~radford/res-bayes-ex.html
34. RDF Working Group. (2004) ‘Resource Description Framework (RDF).
Retrieved January 26, 2012 from W3C website: http://www.w3.org/RDF/
35. Solis, B. (2011) The End of Business as Usual: Rewire the way you work to succeed in the consumer revolution. New Jersey: John Wiley & Sons.
36. Theaker, A. (2008). The Public Relations Handbook (3rd Ed). Oxon:
Routledge
37. White, M. (2012) ‘Considering PRSA’s Definition of PR’. Retrieved April 04,
2012 from Michael White’s blog:
April 2012 Maintaining Reputation through Online Analytics PUR334
50 | P a g e
http://www.mikewhite.co.uk/2012/03/19/considering-prsas-definition-
of-pr/
38. White, R. (2000) Advertising (4th Ed). Berkshire: McGraw-Hill Publishing
Company.
April 2012 Maintaining Reputation through Online Analytics PUR334
51 | P a g e
Illustrations
Figure 1: ROI
Figure 2: Barabasi-Albert model
Figure 3: Online advertising example statistics
April 2012 Maintaining Reputation through Online Analytics PUR334
52 | P a g e
Figure 4: Research table
Research Aim Objectives Secondary Primary
To assess the
current usage of
online metrics for
evaluating web
1.0 and web 2.0
platforms.
To explore the
increasing use of
digital
communication
channels
Literature Review N/A
To explore the
symmetry between
traditional and
digital
communication
channels
Literature Review N/A
To explore current
metrics for online
measurement
Literature Review Observations
To explore the
potential of the
semantic analysis
method
Literature Review N/A
To assess the
potential usage
of semantic
analysis for the
public relations
industry.
Conducting
research into latent
semantic indexing
(LSI)
Literature Review
Published Texts
Observations
Testing
April 2012 Maintaining Reputation through Online Analytics PUR334
53 | P a g e
Figure 5: Research layout
April 2012 Maintaining Reputation through Online Analytics PUR334
54 | P a g e
Figure 6: TCM
April 2012 Maintaining Reputation through Online Analytics PUR334
55 | P a g e
Figure 7: Visualisation of TCM
April 2012 Maintaining Reputation through Online Analytics PUR334
56 | P a g e
Figure 8: SVD Dimensions table
April 2012 Maintaining Reputation through Online Analytics PUR334
57 | P a g e
Figure 9: Visual SVD
April 2012 Maintaining Reputation through Online Analytics PUR334
58 | P a g e
Figure 10: Visual SVD with categorisation
April 2012 Maintaining Reputation through Online Analytics PUR334
59 | P a g e
Appendix
Copy of Python LSI Tweet Analysis Code
from numpy import zeros
from scipy.linalg import svd
#following needed for TFIDF
from math import log
from numpy import asarray, sum
titles = ["Firefox on Win just updated to version. Critical security fix",
"March release for Samsung Galaxy S II Android update. Anywhere between and days away",
"Asus Transformer Prime review stars and a good q: What is the point of the Prime",
"yw. Great post, some good contribs to the issue in the other comments",
"Added to the conversation on Guardian post about recording phone interviews for podcasts",
"Why Social Media Jobs Get Filled By Younger Folks: Infographic",
"Viewpoint: V for Vendetta and the rise of Anonymous. Great read",
"FTW",
"I tend to take a power strip with sockets and only one adapter",
"things you still need to know about social media social business. Spot on. especially",
"tips for managing negative comments online. Good advice",
"thanks, Kerry, good refocus",
"The Neville Hobson Daily is out",
"Breakfast supplies",
"U.S. Air Force May Buy 18,000 Apple IPad2s for Flight Crews. Businessweek via",
"we can do that, Ellee, would be fun",
"Morning. Beautiful, sunny and, terrific start to the weekend",
"Thinking that Google Hangouts is a pretty neat tool, especially the screen sharing feature",
April 2012 Maintaining Reputation through Online Analytics PUR334
60 | P a g e
"An imaginative approach to a difficult (macabre, perhaps) topic to talk about - what happens to your digital conte",
"fyi re Feel free to RT",
"thanks. A good story. Almost as good as",
"hi Sylvie. Not aware of any recent surveys on smallbiz and use of social networks in Australia",
"of UK small businesses use social networks for business, says survey",
"Global perspectives on social media",
"Blog Global perspectives on social media",
"The Neville Hobson Daily is out",
"Google is getting into the music hardware business says the",
"The FTSE social media index. Ranking methodology explained, too. Via",
"Texas Jury Strikes Down Patent Trolls Claim to Own the Interactive Web. Good result",
"What does (and doesn't) on Twitter and Facebook. Hard to get English plainer than this",
"there's a good shoe shiner in the enclosed courtyard at Devonshire Square, EC",
"yes, same here, not much traffic coming in to Reading from the A4 east",
"Ads coming to the LinkedIn mobile app",
"Seeing that Harry Redknapp is still a news headline. Come on, FA, just give him the job",
"of course a lot of snow is a relative expression",
"Driving into Reading shortly should be fun",
"Morning. Quite a bit of snow out there. Well, an inch or two anyway",
"uksnow Will it settle? Looks unlikely although tomorrow morning will tell",
"The tone of life on social networking sites Behavioural study by Pew, interesting findings",
"File Sharing in the Post MegaUpload Era Mainly, staggeringly less efficient.",
"End of an era: Kodak discontinues its camera business",
April 2012 Maintaining Reputation through Online Analytics PUR334
61 | P a g e
"I suspect is the one to ask that: is anyone recording the Google session at",
"Looks a must-be-there event: Google at",
"we need to make that happen",
"that looks a great event, Holly, thanks But I won't be in London that day unfortunately",
"Many thanks to for his superb insight & advice on social media monitoring today <= my pleasure",
"Windows Consumer Preview due February: why it's not called beta",
"The Neville Hobson Daily is out! Top stories today via",
"wrestles with microblog revenue plan user loyalty, monetize",
"we'll make it work"
]
stopwords = ['on','just','to','for', 'great', 'i', 'between','and','a','good', 'is', 'the', 'of', 'some', 'in', 'other', 'why', 'get', 'by', 'I', 'as', 'use', 'says', 'out', 'too', 'via', 'here', 'it', 'about', 'an', 'at', 'be', 'coming', 'especially', 'I', 'into', 'its', 'make', 'need', 'not', 'one', 'prime', 'still', 'thanks','that', 'we', 'well', 'what', 'will', 'with']
ignorechars = ''',:'!'''
class LSA(object):
def __init__(self, stopwords, ignorechars):
self.stopwords = stopwords
self.ignorechars = ignorechars
self.wdict = {}
self.dcount = 0
def parse(self, doc):
words = doc.split();
for w in words:
w = w.lower().translate(None, self.ignorechars)
if w in self.stopwords:
continue
elif w in self.wdict:
April 2012 Maintaining Reputation through Online Analytics PUR334
62 | P a g e
self.wdict[w].append(self.dcount)
else:
self.wdict[w] = [self.dcount]
self.dcount += 1
def build(self):
self.keys = [k for k in self.wdict.keys() if len(self.wdict[k]) > 1]
self.keys.sort()
self.A = zeros([len(self.keys), self.dcount])
for i, k in enumerate(self.keys):
for d in self.wdict[k]:
self.A[i,d] += 1
def calc(self):
self.U, self.S, self.Vt = svd(self.A)
def TFIDF(self):
WordsPerDoc = sum(self.A, axis=0)
DocsPerWord = sum(asarray(self.A > 0, 'i'), axis=1)
rows, cols = self.A.shape
for i in range(rows):
for j in range(cols):
self.A[i,j] = (self.A[i,j] / WordsPerDoc[j]) * log(float(cols) / DocsPerWord[i])
def printA(self):
print self.keys
print 'Here is the count matrix'
print self.A
def printSVD(self):
print 'Here are the singular values'
print self.S
print 'Here are the first 3 columns of the U matrix'
print -1*self.U[:, 0:3]
print 'Here are the first 3 rows of the Vt matrix'
April 2012 Maintaining Reputation through Online Analytics PUR334
63 | P a g e
print -1*self.Vt[0:3, :]
mylsa = LSA(stopwords, ignorechars)
for t in titles:
mylsa.parse(t)
mylsa.build()
mylsa.printA()
mylsa.calc()
mylsa.printSVD()
raw_input("\n\nPress The Enter Key To Exit")