Upload
others
View
3
Download
0
Embed Size (px)
Citation preview
CONSTEL ANALYTICS A VISUAL TOOL FOR WEB ANALYTICS
STUDENT: PIERRE VANHULST
SUPERVISOR: DENIS LALANNE
APRIL 2014
DEPARTMENT OF INFORMATICS – MASTER PROJECT REPORT
Département d’Informatique - Departement für Informatik • Université de Fribourg - Universität
Freiburg • Boulevard de Pérolles 90 • 1700 Fribourg • Switzerland
phone +41 (26) 300 84 65 fax +41 (26) 300 97 31 [email protected] http://diuf.unifr.ch
Constel Analytics Abstract
Constel Analytics Abstract
ABSTRACT
Web Analytics is a significant tool in the hands of webmasters: knowing who your users are and how
they fare on your website is definitely useful, as it can both help with marketing and usability. Still,
the basic assumption of this master thesis is that web analytics remains largely underexploited,
notably because of the way data are displayed: it is hard to get something out of the amount of
collected data. While many users know how to gather basic statistics, fewer are those who use the
full extent of modern systems.
The first part of this project focuses on defining what Web Analytics is and, in the light of this
definition, what is its closely related field of research known as « Data Visualization ». Based on this,
we reviewed a set of existing web analytics solutions in order to assert whether their visualizations
could be more informative and make complex relationships easier to understand. As we concluded
that there was room for improvement, we selected one of the reviewed solution as a base – Google
Analytics - and used its data to build an application.
This application – called “Constel Analytics” - had one main goal: displaying the interactions
between the pages of a website in a comprehensive way. Amongst its side objectives, it also aimed
at being easy to adopt, adapt and deploy for most webmasters. Using open-source technologies at
its core, Constel Analytics was evaluated on two Swiss institutional websites. While the results of the
evaluation highlighted a few limitations, the visualizations was globally successful and remains open
for further development.
KEYWORDS
Web Analytics, Visualizations, Visual Analytics, Communication, Google Analytics, Piwik, Data-Driven
Documents, Force-directed layout
Constel Analytics Table of content
- I -
TABLE OF CONTENT
INTRODUCTION ..................................................................................................................... 1
Table of content ................................................................................................................................... 1
Motivation ................................................................................................................................ 2
Aims .......................................................................................................................................... 2
Methods ................................................................................................................................... 2
Structure of this document ...................................................................................................... 3
BACKGROUND ....................................................................................................................... 5
Table of content ................................................................................................................................... 5
Web Analytics........................................................................................................................... 6
Definition ............................................................................................................................. 6
The scope of web analytics ................................................................................................. 7
Data visualization ..................................................................................................................... 9
Definition & aims ................................................................................................................. 9
Example of data Visualizations .......................................................................................... 10
Visual Tools for Web Analytics ............................................................................................... 17
Definition & taxonomy ...................................................................................................... 17
Technological survey ......................................................................................................... 19
Wrap-up of the review ...................................................................................................... 24
Summary ............................................................................................................................................ 25
CONSTEL ANALYTICS ............................................................................................................ 27
Table of content ................................................................................................................................. 27
Aims and design ..................................................................................................................... 28
Aims ................................................................................................................................... 28
Design ................................................................................................................................ 29
Technologies .......................................................................................................................... 33
Data source ....................................................................................................................... 33
Visualization ...................................................................................................................... 36
Programming language & Framework .............................................................................. 39
Other technologies ............................................................................................................ 40
Software Architecture ............................................................................................................ 41
Constel Analytics Table of content
- II -
Structure of the application .............................................................................................. 41
Data Processing ................................................................................................................. 43
View & visualization .......................................................................................................... 45
Summary ............................................................................................................................................ 54
EVALUATION ....................................................................................................................... 55
Table of Content................................................................................................................................. 55
Setting up the evaluation for this project .............................................................................. 56
Selection of the websites for this evaluation .................................................................... 56
Evaluation protocol ........................................................................................................... 61
Performances ......................................................................................................................... 63
Evaluation 1: Unifr “Course offerings” .............................................................................. 63
Evaluation 2: HEP of Canton Vaud .................................................................................... 64
Usefulness &Usability ............................................................................................................ 65
Evaluation 1: Unifr “Course offerings” .............................................................................. 65
Evaluation 2: HEP of Canton Vaud .................................................................................... 71
Evaluation with the ten activities .......................................................................................... 76
Summary ............................................................................................................................................ 77
DISCUSSION ......................................................................................................................... 79
Table of content ................................................................................................................................. 79
Information management ...................................................................................................... 80
Various UI improvements ................................................................................................. 80
Detection of groups ........................................................................................................... 80
Categorization of the interactions .................................................................................... 82
New features .......................................................................................................................... 83
Sitemap comparison.......................................................................................................... 83
Visits typology ................................................................................................................... 84
Other sources of data ........................................................................................................ 85
Summary ............................................................................................................................................ 85
CONCLUSION ....................................................................................................................... 87
Table of content ................................................................................................................................. 87
Wrap up .................................................................................................................................. 88
Conclusion .............................................................................................................................. 88
Future work ............................................................................................................................ 89
REFERENCES ........................................................................................................................ 91
APPENDIX A “BILLION DOLLAR-O-GRAM VISUALIZATION” ........................................................... 95
Constel Analytics Table of content
- III -
APPENDIX B “DIJKSTRA’S IMPLEMENTATION” ............................................................................. 98
APPENDIX C “EVALUATIONS’ SCREENSHOTS” ............................................................................ 100
Evaluation 1 ...................................................................................................................................... 100
Evaluation 2 ...................................................................................................................................... 102
APPENDIX D “INSTALLATION GUIDE” ........................................................................................ 103
Requirements ................................................................................................................................... 103
Downloading the source code ......................................................................................................... 103
Deployment & configuration ........................................................................................................... 103
Hosting account configuration ...................................................................................................... 103
Website configuration ................................................................................................................... 104
APPENDIX E “SOURCE CODE” .................................................................................................... 106
Constel Analytics List of Figures
- V -
LIST OF FIGURES
Figure 1 - Web Analytics Management Lifecycle. .................................................................................. 6
Figure 2 - "Le digital, un écosystème complexe" ................................................................................... 8
Figure 3- 10 main activities of Data Visualization's users. ..................................................................... 9
Figure 4 - "Cartesian layout" of a Dendrogram .................................................................................... 10
Figure 5 - "Radial layout" of a Dendrogram ......................................................................................... 11
Figure 6 - "Chicago Lobbyists Force-Directed Graph Visualization", by Christopher Manning ........... 12
Figure 7 - "Among the Oscar Contenders, a Host of Connections" by Mike Bostock .......................... 12
Figure 8 - IEA's Sankey Diagram ........................................................................................................... 13
Figure 9 - "Map of Napoleon's Russian Campaign", by Charles Minard. ............................................. 14
Figure 10 - "Treemap of votes by county, state and locally predominant recipient in the US
Presidential Elections of 2012" ............................................................................................................ 15
Figure 11 - "The Billion Dollar-o-Gram 2013" ...................................................................................... 15
Figure 12 - Example of Funnel from Adobe Marketing Cloud (formerly Omniture) in 2009 ............... 16
Figure 13 - Heatmap by AT Internet ..................................................................................................... 16
Figure 14 - Example of "Heatmap" from Crazy Egg .............................................................................. 17
Figure 15 - Yandex.Metrica "link's map" .............................................................................................. 23
Figure 16 - Final categorization of the Web Analytics tool .................................................................. 24
Figure 17 - High-level diagram of Constel Analytics ............................................................................. 29
Figure 18 - In-details diagram of Constel Analytics .............................................................................. 30
Figure 19 - Main components of Google Analytics APIs ...................................................................... 33
Figure 20 - OAuth 2.0 workflow with Google APIs ............................................................................... 34
Figure 21 - D3 selections ...................................................................................................................... 37
Figure 22 - enter() selection in console log .......................................................................................... 37
Figure 23 - Final representation of the example .................................................................................. 38
Figure 24 - "Multi-Force Foci Layout" .................................................................................................. 39
Figure 25 – In-details diagram of Constel Analytics with technologies ............................................... 41
Figure 26 - File tree for VTWABundle ................................................................................................... 42
Figure 27 - Constel Analytics UI split in four parts ............................................................................... 45
Constel Analytics List of Figures
- VI -
Figure 28 - Difference between linear and logarithmic scale in nodes’ representation. .................... 47
Figure 29- Difference between constant Link Distance (blue) and function Link Distance (orange) .. 49
Figure 30 - Constel Analytics Toolbox .................................................................................................. 49
Figure 31 - Example of search .............................................................................................................. 50
Figure 32 - Example of zoom over a large graph .................................................................................. 50
Figure 33 - Example of path.................................................................................................................. 51
Figure 34 - Example of Highlight .......................................................................................................... 51
Figure 35 - Example of "Closer related nodes" Function. .................................................................... 52
Figure 36 - Example of Minimal Weight's slider ................................................................................... 53
Figure 37 – Example of Filters .............................................................................................................. 53
Figure 38 – Example of Timelapse ........................................................................................................ 54
Figure 39 - Constel Analytics, development website "L'Organisation Très Secrète" between 2014-02-
15 and 2014-03-06 ............................................................................................................................... 56
Figure 40 - Example of subgraphs ........................................................................................................ 57
Figure 41 - Individual page tracking on a low-trafic website ............................................................... 57
Figure 42 - Constel Analytics, development website "L'Organisation Très Secrète" from 2013-01-19
to 2014-03-10 ....................................................................................................................................... 57
Figure 43 - Overlapping branches with no connection ........................................................................ 58
Figure 44 - Homepage of "Course Offerings" ....................................................................................... 59
Figure 45 - The page for the "Information systems" course ................................................................ 59
Figure 46 - HEPL Homepage ................................................................................................................. 60
Figure 47 - Initial visualization for "Course Offerings" from 2014-01-29 to 2014-02-17 ..................... 64
Figure 48 - Main visualization for Eval. 1, 2014-02-15 to 2014-03-06 ................................................. 66
Figure 49 - Country filters for Eval. 1 .................................................................................................... 67
Figure 50 - 30 days timelapse, Eval 1 ................................................................................................... 68
Figure 51 - It is necessary to scroll down the page in order to view "Current page" information. ..... 69
Figure 52 - Difference between expected traffic and actual traffic in Eval. 1 ..................................... 71
Figure 53 - Main visualization for Eval. 2, 2014-03-27 to 2014-04-15. ................................................ 72
Figure 54 - New Visitors segment for Eval. 2 ....................................................................................... 73
Figure 55 - Filtered results for France for Eval. 2 ................................................................................. 74
Figure 56 - Timelapse (30 days) for Eval. 2 ........................................................................................... 75
Figure 57 - Possible future layout for Constel Analytics ...................................................................... 80
Figure 58 - "Target render" for groups’ detection ............................................................................... 81
Constel Analytics List of Figures
- VII -
Figure 59 – “Target render” for interactions’ categorization .............................................................. 83
Figure 60 - Sketch for a possible "site map" visualization ................................................................... 83
Figure 61 - Sketch for Visitors' typology. .............................................................................................. 84
Figure 62 - New Visitors visualization, Eval. 1 .................................................................................... 100
Figure 63 - Mobile and Tablet traffic for Eval.1. ................................................................................. 100
Figure 64 - Referral traffic for Eval.1 .................................................................................................. 101
Figure 65 - Swiss traffic for Eval.1. ..................................................................................................... 101
Figure 66 - Spain (left) and Italy (right) filtered results, Eval. 1 ......................................................... 102
Figure 67 – Country filters for Eval.2. ................................................................................................. 102
Figure 68 - Where to find the necessary information to configure HappyR Google Analytics .......... 105
Constel Analytics List of tables
- IX -
LIST OF TABLES
Table 1 – Web Analytics tools sorted by Business Model (April 2014) ................................................ 20
Table 2 – Web Analytics tools sorted by Features (April 2014) ........................................................... 22
Table 3 – Academic Web Analytics / Web Usage Mining tools sorted by Visualizations (April 2014) 23
Table 4 - Symbols used by Constel Analytics ........................................................................................ 31
Table 5 – Default parameters for MainController ............................................................................... 42
Table 6 – Methods in AnalyticsRequestService.................................................................................... 45
Table 7 – Main characteristics of use cases ......................................................................................... 63
Table 8 – Insights acquired thanks to the Main Visualization, Eval. 1 ................................................. 66
Table 9 – Insights acquired thanks to the Segments, Eval. 1 ............................................................... 67
Table 10 – Insights acquired thanks to the Filters, Eval. 1 ................................................................... 68
Table 11 – Insights acquired thanks to Timelapse, Eval. 1 ................................................................... 69
Table 12 – Insights acquired thanks to the Main Visualization, Eval. 2 ............................................... 72
Table 13 – Insights acquired thanks to the Segments, Eval. 2 ............................................................. 73
Table 14 – Insights acquired thanks to the Filters, Eval. 2 ................................................................... 74
Table 15 – Insights acquired thanks to Timelapse, Eval. 2 ................................................................... 75
Table 16 – Different types of interactions............................................................................................ 82
Constel Analytics Introduction - Motivation
- XI -
ACKNOWLEDGEMENTS
Many people were involved in this project and I’d like to thank them with these few lines.
The first one is indeed Dr. Denis Lalanne who personally supported this thesis during half a year. His
insights and knowledge were inseparable from its success. Along with him, the whole DIVA group
provided me with the opportunity to study web analytics: thanks for this and sorry for all the days
I’ve spent squatting different desks!
University of Fribourg and HEP of Canton Vaud’s staffs were also highly involved in this project.
Thanks to Nicolas Fretigny, Samuel Crausaz, Serge Keller, Barbara Fournier and Bertrand Mure for
their time, their enthusiasm and the relevance of their remarks during the evaluation. Thanks to M.
Philippe Schmid for his advices regarding the quality of Constel Analytics’ code: while all the ideas
were not taken into account at the time of this writing, they will very soon.
Some other people also deserve my thanks.
Brice Bottégal, for his concise and precise article that guided my late researches about large
web analytics tools. It is not easy for a student to actually test all these, so having a starting
point like this one was really helpful.
David, for his professional proofreading. Thanks for the extra hours of work!
Francesca, for our study of Facebook Insights (which didn’t make it to this document). I hope
you could learn as much as I did.
Mathias, for his wonderful algorithms. Geez, you always have an answer for everything!
Nicole, for her quality proofreading. Sorry for the eyes, I hope my English mishtakesh didn’t
hurt too much!
Finally, it goes without saying that my family deserves all my gratitude as well. Supporting me all
these months – or even years – was not a piece of cake every day. Thank you!
Constel Analytics Introduction - Motivation
- 1 -
INTRODUCTION
“The price of light is less than the cost of darkness”. - Arthur C. Nielsen, Market Researcher & Founder of ACNielsen
TABLE OF CONTENT
INTRODUCTION ..................................................................................................................... 1
Table of content ................................................................................................................................... 1
Motivation ................................................................................................................................ 2
Aims .......................................................................................................................................... 2
Methods ................................................................................................................................... 2
Structure of this document ...................................................................................................... 3
Constel Analytics Introduction - Motivation
- 2 -
MOTIVATION
More than half of modern Websites are using Google Analytics [1]. From marketing team to usability
engineers [2], everyone knows about the benefits of web analytics and wants to understand their
website’s uses and visitors. But while gathering basic data is easy, getting more relevant insights –
such as how many users drop off during a registration process – is a much harder task. Most
webmasters are simply unable to find the information or to produce advanced reports.
For most parts, the problem is not so much about getting the data, as web analytics tools already
gather formidable amount of them, it is about how to display them. This is where visualization
comes in: displaying data in a comprehensive way is a new and dynamic field of research from which
many findings have already been put into application. Until now, however, most web analytics tools
seem to be confined to simplistic visualizations.
AIMS
This thesis has several objectives. We sorted them hierarchically, starting with a global objective
split into sub-objectives.
Discovering the impact of data visualization on web analytics
This is the main objective of this thesis: to understand how much a visualization can
influence the understanding of complex information. It implies several sub-objectives such as
clearly defining web analytics and data visualization.
o Reviewing the features and visualizations of current web analytics tools
In order to challenge our main assumption, it is necessary to review the current
visualizations used in web analytics tools and discuss how efficient they are and what
information they can display.
o Implementing a transversal application to visualize complex information
Eventually, this thesis aims to evaluate the relevance of new, more advanced data
visualizations in the context of web analytics. An application will be developed with
two ideas: displaying unusual and relevant information while being able to adapt
various websites. The actual efficiency of this visualization will be assessed during an
evaluation.
METHODS
In order to achieve these objectives, the definition of web analytics and data visualizations has been
compiled from scientific literature. A survey of current web analytics tools has been led on this
basis, trying to challenge the main assumption of this document, while taking interest into their
general features as well. Eventually, a web application was developed, using one of the advanced
visualizations as its core feature with data collected by other web analytics tools accessed through
their APIs. A qualitative evaluation on two institutional websites was led to try to understand what
gains this application could bring and how efficient it could be to understand complex information
that regular web analytics software would not be able to show.
Constel Analytics Introduction - Structure of this document
- 3 -
STRUCTURE OF THIS DOCUMENT
This document contains four main sections. The first one – Background – introduces the definitions
and explains our survey of current web analytics tools. The second one presents Constel Analytics,
the application developed in the context of this Master thesis. The third section gives an evaluation
of this application, testing its performances as well as its usability and relevance. Eventually, several
short-term improvements of the initial version of Constel Analytics are offered in the Discussions
section, the last part of this document, based on the feedbacks of the evaluators.
Constel Analytics Background - Structure of this document
- 5 -
BACKGROUND
“Nothing in all the world is more dangerous than sincere ignorance and conscientious stupidity”. - Martin Luther King, Jr.
This chapter covers the background aspects of this project. Definitions and examples of the different
fields of research involved will be presented, with the aim of giving a better understanding of what
this project is about, as the later sections of this document refer to those definitions intensively.
In the light of those definitions, a selection of visual tools for web analytics will be presented.
Eventually, several conclusions will be made regarding the efficiency of their visualizations.
TABLE OF CONTENT
BACKGROUND ....................................................................................................................... 5
Table of content ................................................................................................................................... 5
Web Analytics........................................................................................................................... 6
Definition ............................................................................................................................. 6
The scope of web analytics ................................................................................................. 7
Data visualization ..................................................................................................................... 9
Definition & aims ................................................................................................................. 9
Example of data Visualizations .......................................................................................... 10
Visual Tools for Web Analytics ............................................................................................... 17
Definition & taxonomy ...................................................................................................... 17
Technological survey ......................................................................................................... 19
Wrap-up of the review ...................................................................................................... 24
Summary ............................................................................................................................................ 25
Constel Analytics Background - Web Analytics
- 6 -
WEB ANALYTICS
DEFINITION
Web Analytics is a field of research that built its own nomenclature to define some of its key-
aspects. This chapter will only covers essential points.
Wikipedia defines Web Analytics in the following way:
“Web analytics is the measurement, collection, analysis and reporting of internet data for purposes
of understanding and optimizing web usage.” [3]
This definition, initially given by the Web Analytics Association (WAA) [4], makes a distinction
between the methods (“measurement, collection, analysis and reporting of internet data”) and the
aims (“understanding and optimizing web usage”). While the aims seem clear, the methods might
require some more discussions.
Figure 1 - Web Analytics Management Lifecycle. Measurement establishes the data that need to be collected, collection gathers the raw data from
the website, analysis computes dimensions out of the collected data and reporting displays the data in a comprehensive way.
Figure 1 illustrates the Web analytics management lifecycle within an entity (company,
association, ...). Each step influences the next one, the whole cycle working iteratively.
The measurement step is about establishing a Web analytics strategy prior to data collection: a
company should know which data are relevant and how they will support the understanding and the
optimization of its web usage.
Constel Analytics Background - Web Analytics
- 7 -
The second step of a web analysis – collection – aims at gathering raw data that will serve as a basis
for the reports. This collection can take multiple forms which are usually sorted into two categories:
log file analysis and page tagging.
Most servers produce log files which contain the history of all the received connections (as well as
error reports produced because of those connections). Each type of server has its own way of
recording connections and most of them allow users to configure and customize it. Log file analysis
is an interesting option because it does not require any code or modification on the analyzed
website, data being taken directly from the server. However, it suffers from several limitations, such
as the relative weakness of the collected data (no information about the users’ technologies nor
about their actions that do not generate requests to the server), caching (cached pages do not need
requests to the server, thus are not taken into account) or the fact that data are not always
accessible to the final users (as in the case of a shared Hosting, for instance).
Faced with these problems, page tagging is a solution that eventually imposed itself as a quasi-
standard. The idea here is to add a snippet – or “tag” - on each of the analyzed website’s page. This
snippet gathers a certain amount of data and sends them to the web analytics’ system which, in
turn, stores them one way or another (relational database, logfiles, ...). It is possible to collect
precise information about the visitors’ computer equipment (browsers, operating systems, ...) or
about certain events on the page (like a click on a button that does not ask a resource to the web
server). Those “tags” can be more or less complex depending on the system that is used and might,
in some cases, slow down a website (in particular when the servers are busy). It must be noted that
nothing prevents web analytics tools from using both methods in parallel: those systems are known
as “hybrids”.
Once the raw data are recorded, they are aggregated in the analysis phase: they are used to
calculate several dimensions – i.e. the number of views for each page or the number of visitors using
a particular browser. Some of the web analytics tools (mostly free tools offered as a “Software as a
Service”, or “SaaS”) do not give access to the raw data, which are, at best, retained on the hosting
company’s servers, at worst deleted after a few weeks. Most of the data available to the end users
are thus aggregated data.
The presentation of those data during the last step, report, is a discipline in itself. This discipline is a
prominent part of this project: in order to make this colossal amount of data understandable,
advanced visualization techniques are used. Chapter 2.2 covers a definition and some examples of
those techniques.
THE SCOPE OF WEB ANALYTICS
Extending the scope of web analytics is one of the major trends identified during the review
proposed in chapter 2.3. The analysis is not limited to the scope of the website: we are speaking
about web usage in general. In the past, companies typically had a single website whose
interconnections with the rest of the Internet were limited, but the rise of social networks and
smartphones have changed the situation. The Facebook page of a company might have more impact
Constel Analytics Background - Web Analytics
- 8 -
on the success of an advertising campaign than an isolated website. Mobile applications must also
be taken into account, regardless of their nature: either actual independent application or pseudo-
mobile version of a website available for download. In order to build a global overview of a modern
and large company, it is no longer possible to be limited to a simple website: we must understand
how it interacts with all the other channels of communication.
This new scope of web analytics explains the rise of tools that take into account different channels,
amongst other things.
Figure 2 - "Le digital, un écosystème complexe" [5]. Many new devices connect to various channels. All this must be monitored in order to
understand how modern web users behave online.
Similarly, a change is occurring with objects connecting to Internet: simple computers were our only
way to browse the web fifteen years ago, and now a multitude of objects connect sometimes
without their owners noticing it. Figure 2 illustrates this new challenge: smartphones, already well-
established in our habits, are one of the numerous examples of this new ecosystem, and others will
soon follow, along with new trends of computer science (such as wearable computing, pervasive
computing, ...). Hence, taking into account those connections and understanding their nature (either
human or not) is a new issue for Web Analytics. The leaders of the industry are already building
systems able to handle those changes.
Constel Analytics Background - Data visualization
- 9 -
DATA VISUALIZATION
DEFINITION & AIMS
As explained in the introduction of this document, the amount of data collected globally is
overwhelming our ability to assimilate knowledge. In other words, we are almost too efficient at
gathering data, biting off more than we can chew [6].
One of the culprits could be our difficulty to read those data easily. However, sight is known to have
the fastest and the biggest bandwidth of all of the five senses: when data are presented in a
relevant way, it is possible to understand complex concepts with a few visuals. “Data Visualization”
is the study of the methods allowing to present those data in the most efficient way, taking in
consideration esthetics as well as functionality.
Making those data comprehensive requires to understand what their users are looking for. Robert
Amar, James Eagan and John Stasko think it is possible to classify the goals of visualizations’ users.
They have defined ten categories of activities that can be represented with three different axes
(Figure 3) [7].
Figure 3- 10 main activities of Data Visualization's users. They are distributed on three axes.
A successful visualization supports most of those activities, as long as they fit the context in which it
was created. They can be used both as a way to gather requirements during a visualization’s
development or to evaluate it afterwards.
Constel Analytics Background - Data visualization
- 10 -
EXAMPLE OF DATA VISUALIZATIONS
Several visualizations were reviewed during this project because of their potential to display web
analytics data.
A Dendrogram is a tree layout used to represent clusters usually produced by hierarchical clustering
[8]. In order to build clusters, it is necessary to split a set of data using either a top-down greedy
algorithm or the opposite bottom-up approach, both being quite complex operations (O(n3) or even
O(2n)) [9].
Dendrograms look like trees, being split into two branches at several levels, with all the individual
values of a dataset being arranged at the bottom. The closer the values are, the higher they are
correlated. It is possible to quantify the distance between two values by looking at the height of the
deepest branch to which they both belong.
Dendrograms can be displayed radially, taking advantage of the usual Radial Tree’s pros such as
better space optimization [10]. Below are two examples of Dendrogram, one being Cartesian – or
hierarchical - while the other is radial. They both represent the Flare1 class hierarchy.
Figure 4 - "Cartesian layout" of a Dendrogram [11]. It is hardly readable because how far it stretches.
Figure 4 points out how difficult it can be for a large Cartesian tree diagram to use space efficiently,
while in Figure 5, the graph takes nearly half the space for the same information. To temper this, it
must be noted that Cartesian layouts are easier to interpret [12]: the final choice between the two
options depends on the usage.
In the context of Web Analytics, dendrograms could be used to represent clusters produced from
the collected data, like the nature of a website’s visits. See subchapter 5.2.3 for more details.
1 Flare is a Java library to produce data visualizations. It partly inspired the development of D3. [43]
Constel Analytics Background - Data visualization
- 11 -
Figure 5 - "Radial layout" of a Dendrogram [11]. It optimizes the available space much better than Cartesian layout.
Network graphs are used to represent a network of connections composed of nodes and links.
Those visualizations can be used to illustrate a multitude of situations – from the traditional
“traveling salesman problem” to the relationships between protagonists of a story. The repartition
of nodes across the graph can be computed through different methods. Amongst these methods,
Force-directed graph drawing is one of most popular [13].
At a theoretical level, Force-directed graphs display a set of nodes and a set of edges influenced by
at least two forces: one that draws connected nodes closer like a spring and another one that makes
them repel each other like electrically charged particles. The idea is that nodes that are highly
connected will be drawn to each other, while isolated nodes will be repelled further. Force-Directed
graphs are “living” graphs: the two forces influence nodes and edges iteratively, until the graph
reaches an equilibrium in which nodes and edges do not move anymore. Further forces can be
added to the graph, in order to influence the display. While the algorithms used to draw Force-
Directed graphs are easy to understand, their running time is high (usually O(n3)) [14], as each
iteration needs to compute the forces for each pair of nodes. However, splitting the graph and
computing the repulsive force only when a pair of nodes is close, while ignoring pairs with distant
nodes, is an efficient optimization of the running time.
On a practical level, Force-Directed graphs are used to show the interconnections between several
data points. Force-Directed graphs have proven to be highly understandable for everyone, which
explains why they are increasingly popular on the Internet. One interesting side of force-directed
layouts is that they naturally cluster data while arranging the points, while other layout methods
such as Balloon Trees require preprocessing.
Below are two examples of Force-Directed graphs.
1. Chicago Lobbyists Force-Directed Graph Visualization: developed by Christopher Manning
[15] as an attempt to display 50 highest paid lobbyists in Chicago and their relationships. It is
Constel Analytics Background - Data visualization
- 12 -
possible to isolate and display precise information about a node by hovering it with the
cursor (Figure 6).
Figure 6 - "Chicago Lobbyists Force-Directed Graph Visualization", by Christopher Manning [15]. An example of Force-Directed graph used to
display the highest paid lobbyists in Chicago.
2. Among the Oscar Contenders, a Host of Connections: developed by Mike Bostock [16], this
visualization displays the interconnections between the actors nominated for the 2013
Oscar. Clicking on the name of a movie displays the related actors and the name of those
who were nominated for an Oscar (Figure 7).
Figure 7 - "Among the Oscar Contenders, a Host of Connections" by Mike Bostock [16]. Another example of Force-directed graph: esthetically
pleasant, it provides a clear map of the relationship between actors and directors.
Network diagrams and particuarly Force-directed layouts can be used to display relationships
between several components. Web Analytics could use it to display the interactions between the
pages of a website (which is the main purpose of the application developed during this project, see
section 3 for more).
Sankey diagrams are used to display the steps through which one or several flows pass. Sources
from which the flows emanate are placed at the left of the graph, their targets are placed at the
Constel Analytics Background - Data visualization
- 13 -
right of the graph, and intermediate steps are in the middle. The flows take the shape of arrows,
whose size indicates their intensity. It is possible for a flow to come back from where it comes,
though this tends to make the diagram much less readable for users.
Sankey diagrams can be used in conjunction with other visualizations, like actual maps or bar charts.
There exists several examples of Sankey Diagram, mostly in the field of energy transfers (which was
historically their original purpose). We selected two: one that shows an advanced use of Sankey
diagrams and the other that explains how it is possible to map a flow going through different steps
with geographical information.
1. IEA Sankey web application: the International Energy Agency offers a web application that
allows its users to see how energy is produced and distributed, thanks to an interactive
Sankey Diagram. Clicking on any of the intermediate steps display a pie chart which provides
a clearer breakdown of the flows going through it. The application provides many tools to
play with the visualization, such as the possibility to select different periods to visualize and
see how they evolve through time, to rearrange the flows manually or to filter data by
countries (Figure 8)
Figure 8 - IEA's Sankey Diagram [17]. Types of energy are at the left border, they go through intermediate steps before being distributed between
their different usages at the right of the graph.
2. Charles Minard’s Map of Napoleon’s Russian Campaign: the Charles Minard’s Map of
Napoleon’s Russian Campaign is an historical example of use of a Sankey Diagram (before
they were actually called like that) mixed with geographical mapping. The brown arrow
displays the number of soldiers entering Russia while the black arrow depicts the number of
soldiers leaving Russia. Other metrics, like the temperature measured at certain geographical
locations, are also displayed (Figure 9).
Constel Analytics Background - Data visualization
- 14 -
Figure 9 - "Map of Napoleon's Russian Campaign", by Charles Minard. The brown line shows the soldiers in Russia, the black lines shows those who
leave Russia. The graph displays other data, such as the temperature.
Without surprise, Sankey Diagrams are used to display the flow of visitors on a website. Google
provides a brilliant example of such a use, example that will be discussed in the next chapter.
Treemap is a tree layout whose main purpose is to give a better sense of the size of each element. A
Treemap looks like a group of labelled, nested rectangles organized hierarchically [18]. The global
window represents the roots of the tree, the first group of rectangles represent its first branches
and the inner rectangles represent deeper levels. Ideally, the rectangles have similar ratio and are
arranged by order of importance (the largest being at the top left while the smallest is at bottom
right) but the final result depends on the algorithm: there are several of them and none can
guarantee both perfect order and ratio. Thanks to its dense layout, Treemap layout is not only
effective in managing space, but also offers its viewers the unique ability to compare geographically
distant nodes [10].
The example below (Figure 10) shows how a Treemap can immediately give a sense of sizes when it
comes to electoral results. In this example, blue rectangles represent Democrats and red
Republicans during the 2012 presidential election.
Another example follows (Figure 11) [19], showing how Treemapping can be used to display
financial measures. This visualization is interactive and can be rearranged by category or size. As the
whole visualization is “vertically impracticable” for this document, only a part of it is shown. The full
version can be found in Appendix A.
Constel Analytics Background - Data visualization
- 15 -
Figure 10 - "Treemap of votes by county, state and locally predominant recipient in the US Presidential Elections of 2012" [18]. Red is Republican
votes, blue is Democrat votes.
Figure 11 - "The Billion Dollar-o-Gram 2013" [19]. The full-size graph can be found in Appendix A.
Treemaps could be used by a Web Analytics tool to display some of the visitors’ information, such as
geographical data (Continent, Country, Province and City) or technical data (Browser’s family,
Brower and Browser’s version).
Constel Analytics Background - Data visualization
- 16 -
Web Analytics tools already use a few unusual visualizations to represent certain specific
information. As these visualizations were not taken into account for the application developed
within this project, we will not discuss them in details and simply present them with an example.
FUNNEL
Funnel visualization is generally used to describe how many visitors went through a process
(generally, a Goal) and how many left at each step.
Figure 12 - Example of Funnel from Adobe Marketing Cloud (formerly Omniture) in 2009. It shows the Order process on a website: 328'998 visitors
get to the Customer Information’s pages, only 38.5% of them get to the Billing Information’s pages and 45.7% of the remaining reaches the Orders’
page.
HEAT MAP
Heat maps are matrixes using colors to show the intensity of a relationship. Usually, “warm” colors
indicate a strong relationship. They can be used to display clusters similarly to other visualizations
presented above.
Figure 13 - Heatmap by AT Internet, showing the hours of connection on several websites for different European countries.
Constel Analytics Background - Visual Tools for Web Analytics
- 17 -
HITMAP
While most tools do not make a distinction between using a heat map to visualize clicks on a
website and using it to visualize clusters in a matrix, we found necessary to show the difference of
use between the two. We defined Hitmap, also known as Clickmap, as a heatmap applied directly on
a webpage. It allows users to see where visitors clicked the most.
Figure 14 - Example of "Heatmap" from Crazy Egg. The blue areas mean that there were a few clicks on the links, while warmer (green, yellow and
red) areas indicate more clicks.
VISUAL TOOLS FOR WEB ANALYTICS
DEFINITION & TAXONOMY
According to the previous chapters, “web analytics tools” could be defined as software which
manage all the four steps of web analytics while reporting the aggregated data in a visually
understandable fashion.
Constel Analytics Background - Visual Tools for Web Analytics
- 18 -
Our initial assumption was that these tools were usually too complex for regular users, mostly
because of the way the data are displayed. In order to challenge this assumption, we surveyed as
many web analytics software as possible. Our study led us to analyze not only the visualizations
proposed, but also the key-features and the marketing targets of web analytics products. The
presentation of our results will be done in two parts.
The Business Model of a web analytics tool is our first criteria to sort them. By “Business model”, we
mean “nature of the tool”, “targets of the tool” and “data storage”.
Our review found four possible “natures” for the tools: open-source, free, paying and academic. It is
possible for a tool to provide several options and thus be part of several groups.
The tools’ targets have been defined according to the range of their features, their pricing options
and their communication. We classified targets into four groups:
Personal: individuals looking for basic insights about a small website
SMEs: small or average companies, including associations and foundations, able to pay a
monthly fee for the tool but needing higher-level features
Large companies: national or multinational organizations, monitoring multiple channels on
multiple websites, or even mobile applications
Scientists: individuals interested by some specific, prototypal features
The data storage is another critical point that is to be taken into account. SaaS store their data on
their own servers, implying possible privacy issues. On the contrary, On-premise tools can be
installed directly in the users’ infrastructure.
The second part of this study focuses on the actual features proposed by the tools. As it became
clear that there was a difference between academic projects – focusing on selected and innovative
features – and public tools – providing a full array of functionalities – we decided to manage the two
groups for this part separately.
We assume that public tools come in with several features by default, such as goals setting, real-
time data or complete APIs. When one of those features is particularly developed or proposes
something new, an indication is present in the cell “other notable features”. The differentiating
criteria for commercial tools are:
Collection method: either logfile analysis or page-tagging. Page-tagging can take the form of
JavaScript tagging (most usual) or others (such as PHP-tagging in the case of self-hosted
systems).
Advanced visualizations: whether or not the tool provides some non-standard visualizations,
standard visualizations including “bar charts”, “pie charts”, “line charts” and “two-
dimensional plots”.
Constel Analytics Background - Visual Tools for Web Analytics
- 19 -
Data exploration: the different options available to explore the data from the UI. Filters,
segments, time lapse (several periods being visualized one after the other), datasets
comparisons, search and sorting are parts of this point. “Data-crossing” means that the UI
allows users to split records according to a secondary dimension like “All visitor’s countries
by visitor’s browsers”.
Visitors’ information: the range of information the tool can offer about visitors. It can be
either technical information (browsers,...), geographical information (i.e. country), age,
gender or centers of interest (sport, cinema, ...).
Other notable features: a list of particular features that could differentiate the software from
the competition.
Academic projects will be sorted according to their collection method, their proposed visualizations
and a small description of their purpose (instead of having a full list of features).
The selection of the tools for this study is not exhaustive because of the number of Web Analytics
tools that exist. As more than 40 tools were identified, only 20 of them have made their way to the
tables in the following chapters as we tried to keep the most representative tools for each category.
Keeping in mind that it was not possible to actually access all those software – especially those
reputed as the most performing and professional – we relied on other studies as much as possible
to describe the way they work [20] [21]. Other sources include marketing materials – which is
equally essential as it points out which are their intended targets – web articles [22] and
demonstrations of the products.
In our efforts to find as many tools from as many sources as possible, several academic projects
were reviewed as well [23], as it was assumed their visualizations could be different from the public
tools’ visualizations.
TECHNOLOGICAL SURVEY
Tools Nature Targets Data storage
Web Usability Probe [24] [25] Academic Scientists On-premise
WebQuilt [26] Academic Scientists On-premise
WebPUM [27] Academic Scientists On-premise
Labroche, Lesot and Yaffi’s Web Usage Mining and Visualization Tool [28]
Academic Scientists On-premise
Yandex Metrica Free All SaaS
AT Internet Free + Paying Large companies, Personnal
SaaS
Clicky Free + Paying options SMEs, Personnal SaaS
Google Analytics Free + Paying options All, Larges companies (premium)
SaaS
Constel Analytics Background - Visual Tools for Web Analytics
- 20 -
ShinyStat Free + Paying options All SaaS
Woopra Free + Paying options All SaaS
Open Web Analytics Open-Source All On-premise
AWStats Open-Source SMEs, Personnal On-premise
Piwik Open-source + Paying options All SaaS, On-premise
Adobe Marketing Cloud (Omniture)
Paying Large companies SaaS, On-premise
IBM Coremetrics & Unica Paying Large companies SaaS, On-premise
comScore Digital Analytx Paying Large companies SaaS, On-premise
Webtrends Analytics Paying Large companies SaaS, On-premise
Mint Paying SMEs, Personnal On-premise
Mouseflow Paying SMEs, Personnal SaaS
Crazy Egg Paying SMEs, Personnal SaaS
Table 1 – Web Analytics tools sorted by Business Model (April 2014)
Table 1 is sorted according to the nature of the tool, as it is the most relevant criterion of this first
comparison. The second most important criterion is definitely the targets of the tool, as they tend to
shape both its communication and its features.
Our study made it quite clear that most of the “large companies”-oriented tools were branded as
marketing tools: it would seem that this niche is more promising than others. They are paying tools,
providing additional services and support and while SaaS are usually preferred to On-premise [5],
the option of installing the tool directly in the customer’s infrastructure almost always exists.
Tools aiming at smaller companies and private users are usually more neutral in their
communication, though marketing is still their primary orientation. While many of those tools are
proposed for free with paying options, there exists important differences between the pricings
proposed. Most limit drastically the free version (either by allowing a very limited amount of visits
to be taken into account or by restricting many features) and dispose of a wide range of offers which
makes comparisons harder. Some SMEs-oriented tools limit themselves to a smaller range of
features whose implementation is particularly advanced (Crazy Egg, Mouseflow).
As for the academic projects, it would seem that a higher proportion aims to improve websites’
usability or business intelligence rather than marketing. Most are “Web Mining”-oriented, providing
techniques to find patterns out of data and to predict behavior, something that can eventually lead
to marketing usages. While we do not believe that this study is enough to assert that this is an
actual trend, one could assume that the barrier between marketing and computer science in the
academic world is stronger than the one between decision support and computer science, despite
some companies’ attempts to reduce it [29]. This should be confirmed by further studies.
Constel Analytics Background - Visual Tools for Web Analytics
- 21 -
Tools Collection method
Advanced visualizations
Data exploration
Visitors information
Other notable features
Adobe Marketing Cloud (formerly Omniture)
Page-tagging Funnel
Hitmap
Heatmap
Path visualization
Data-crossing
Filters
Segmentations
Geographical
Technical
Age
Gender
Interest
AB Testing
Mobile features
Social features
AT Internet Page-tagging Funnel
Heat map
Hitmap
Sankey
Data-crossing
Filters
Segmentations
Geographical
Age
Gender
Technical
Interests
AB Testing / Multivariate
Mobile features
Soft Tagging
Social features
Multi-channels tracking
AWStats Logfile analysis
- Data-crossing
Filters
Segmentations
Geographical
Technical
Web compression data
HTTP Status Code
Clicky Page-tagging Hitmap
Funnel
Filters
Segmentations
Geographical
Technical
AB Testing
Uptime monitoring
Individual tracking
comScore Digital Analytix
Page-tagging Hitmap
Data-crossing
Filters
Segmentations
Geographical
Technical
Age
Gender
Interest
Individual tracking
Multi-channels tracking
AB Testing / Multivariate
Mobile features
Social features
Persona-driven dashboards
Raw data manipulation
Crazy Egg Page-tagging Hitmap (several variant, including « Confetti » which maps clicks and referrers)
Filters - Scrollmap (showing where visitors abandon scrolling)
Google Analytics Page-tagging Sankey
Funnel
Hitmap
Data-crossing
Filters
Segmentations
Geographical
Age
Gender
Interests
Technical
AB Testing
Multi-channels tracking
Mobile features
Social features
SiteSpeed
Individual tracking (premium)
IBM Coremetrics & Unica
Page-tagging Heat map
Funnels
Channel Venn
Data-crossing
Filters
Segmentations
Geographical
Technical
Age
Gender
Interest
AB Testing
Mobile features
Social features
Automatic marketing recommendations
Mint Page-tagging - - Geographical Large set of plugins (Pepper)
Constel Analytics Background - Visual Tools for Web Analytics
- 22 -
Technical
Mouseflow Page-tagging Hitmap (actual replay of the users’ session)
- - Session replay
Open Web Analytics
Page-tagging Funnel Data-crossing
Filters
Segmentations
Geographical
Technical
Individual tracking
Mouse tracking
Piwik Page-tagging, Logfile Analysis
Sankey Filters
Segmentations
Geographical
Technical
Individual tracking
ShinyStat Page-tagging Path visualization Data-crossing
Segmentations
Geographical
Technical
B2B analysis
Webtrends Analytics
Page-tagging Use of infographics to display data
Funnel
Path visualization
Hitmap
Data-crossing
Filters
Segmentations
Geographical
Technical
Age
Gender
Interest
AB Testing / Multivariate
Mobile features
Social features
Import data from third-party
Woopra Page-tagging Funnel Filters
Segmentations
Geographical
Technical
Individual tracking
Live chat with visitors
Yandex.Metrica Page-tagging Sankey
Funnel
Hitmap
Filters
Segmentations
Geographical
Age
Gender
Interests
Technical
Mobile features
Table 2 – Web Analytics tools sorted by Features (April 2014)
“Large-companies”-oriented tools focus a lot on multi-channel analysis, which seems to be the
future of Web Analytics as suggested in chapter 2.1.2. They are also quite similar in terms of
features, distinguishing themselves on services, implementation and pricing.
When it comes to visualizations, very few of these tools dare to think out of the box: most limit
themselves to standard “dashboard visualizations” like pie charts, bar charts, line charts or two-
dimensional axes diagrams using two metrics to compare dimensions at best. Larger tools usually
provide a way to display visitors’ path (either by a sankey, either by something similar), a Hitmap
and Funnels. As stated in 2.3.1.3, our review might be incomplete as we couldn’t use all the services
listed above. However, we can assert that visualization is a very secondary selling argument for most
of these tools: it would seem that instead of working and communicating on visualizations, larger
software prefer to offer means of customizing reports for their customers. Webtrends Analytics 10
seems to be the most innovative when it comes to data visualization and made an interesting step
towards end-users by facilitating the creation of infographics.
Of all these tools, Google Analytics and Yandex.Metrica stand out for two reasons: they are free and
they propose professional and advanced features for everyone. Particularly, their use of Sankey
diagrams to display the flow of visitors is interesting (see Figure 15 for an example), though both
implementations tend to suffer from some problems: Google Analytics’ Sankey makes pages appear
Constel Analytics Background - Visual Tools for Web Analytics
- 23 -
several times instead of allowing traffic to go back, while Yandex.Metrica’s stretch horizontally,
making it hard to read.
When it comes to open-source projects, Piwik and Open Web Analytics are both progressive
systems. While OWA proposes several advanced features like mouse tracking and hitmaps, Piwik
benefits from a more complete API that is suitable for near-professional plugins.
Figure 15 - Yandex.Metrica "link's map". It stretches horizontally, but allows flows to go back to a previous page if needed.
Collection method Visualizations proposed Notes
WebQuilt Logfile analysis Network visualization Is not intended for large audience. Up to 20 to 100 participants would go through tasks prepared by a web designer. Framework that can serve as a basis for advanced visualizations.
Labroche, Lesot and Yaffi’s Web Usage Mining and Visualization Tool
Page-tagging Network visualization Full system, from collection to report. Aims to display the usage of a website by reading its web log. Based on Leader Ant algorithm to process data.
WebPUM Logfile analysis Network visualization
Line charts
Full system, from collection to report. Fitted for prediction of future users’ movement by the use of a graph-partitioning algorithm and a similarity of subsequences.
Web Usability Probe Page-tagging Frequencies Web Usability Probe aims at making Usability testing easier by letting the designers record their “ideal path” on their website and comparing it to the users’ paths.
Table 3 – Academic Web Analytics / Web Usage Mining tools sorted by Visualizations (April 2014)
Constel Analytics Background - Visual Tools for Web Analytics
- 24 -
Comparing these academic projects with commercial tools revealed something of interest: among
others, network visualizations is almost constantly used to display how pages or users interact with
the analyzed website. This seems somewhat surprising as these visualizations are not used in
commercial software.
WRAP-UP OF THE REVIEW
Our study led us to consider several categories of tools. These categories determine which features
those tools are likely to propose. Figure 16 below illustrates our final categorization.
Figure 16 - Final categorization of the Web Analytics tool, based on our previous analysis. The 20 reviews tools are distributed as the leaves of the
branches.
Academic tools have a few prototypal features. They tend to propose network diagrams to visualize
their data. We made a distinction between the web mining tools (predict and clusters users’
behavior) and web analytics tools (display users’ behavior).
Close to Academic projects, tools with specific features are becoming more common. In order to
stand out, they focus on them by presenting a few innovative ideas and are sold to companies that
cannot afford larger solutions.
Also aiming at SMEs, some tools propose most of the usual end-user features of web analytics while
being much cheaper than larger software. As they cannot offer all the detailed information of high-
end competition, they try to stand out by specializing themselves on a whole dimension (Marketing
for Woopra, self-hosting and plugins for Mint) and take interest into small and medium companies
looking for web analytics tools and support without being able to invest thousands of USD each
month.
Constel Analytics Background - Visual Tools for Web Analytics
- 25 -
Larger tools propose similar set of features and visualizations. They offer professional support and
differentiate themselves with advanced features aimed at large companies, such as manipulation of
raw data (Digital Analytix), high modularity (Adobe Marketing Cloud) or tag management (AT
Internet).
Free tools have different faces and are hard to depict as a whole. Google Analytics leads the way
with its incredible market share (over 80%) with the idea that free users joining large companies
would go for Google Analytics Premium rather than the competition. All the other tools, including
the paying tools aiming at SMEs, compare themselves with Google Analytics. Yandex.Metrica aimed
at matching Google Analytics’ features. Piwik wanted to free the web analytics by providing a similar
set of features while being self-hosted, just like Open Web Analytics whose UI is definitely inspired
by Google Analytics’.
SUMMARY
Web analytics is defined as a way to optimize web usage by analyzing reports based on web data
gathered through different methods. The efficiency of those reports depend on how clear those
visualizations are – which, in turn, is a field of research in itself known as Data visualization: efficient
visualizations must support several users’ activities that have been identified and classified.
Knowing this, a review of the available visual tools for web analytics has been conducted and we
concluded that current commercial software are late in adopting advanced visualizations while
academic projects seem already more inclined to do so.
The following sections describe a possible solution to this problem.
Constel Analytics Constel Analytics - Visual Tools for Web Analytics
- 27 -
CONSTEL ANALYTICS
“It feels like we're all suffering from information overload or data glut. And the good news is there might be an easy solution to that, and that is
using our eyes more”. - David McCandless, data journalist and information designer
This chapter covers all the aspects of the application built during this project. It presents the reasons
behind its development, the way it was envisioned, the technologies selected for its realization and
its architecture.
TABLE OF CONTENT
CONSTEL ANALYTICS ............................................................................................................ 27
Table of content ................................................................................................................................. 27
Aims and design ..................................................................................................................... 28
Aims ................................................................................................................................... 28
Design ................................................................................................................................ 29
Technologies .......................................................................................................................... 33
Data source ....................................................................................................................... 33
Visualization ...................................................................................................................... 36
Programming language & Framework .............................................................................. 39
Other technologies ............................................................................................................ 40
Software Architecture ............................................................................................................ 41
Structure of the application .............................................................................................. 41
Data Processing ................................................................................................................. 43
View & visualization .......................................................................................................... 45
Summary ............................................................................................................................................ 54
Constel Analytics Constel Analytics - Aims and design
- 28 -
AIMS AND DESIGN
Amongst the points that were raised in the previous chapter, data visualization in Web Analytics
tools is one of the most problematic: with the current existing tools, webmasters can answer simple
questions like “what are the most popular browsers on my website”, but their understanding of
advanced data, like “how every pages of a given website work together”, is limited.
In order to overcome this recurrent issue, an application has been developed during this project. Its
name is "Constel Analytics", because of the shape of the visualization that it generates. This chapter
exposes its aims and its design, without regard for the technologies required to achieve them.
AIMS
Constel Analytics aims at making the relations between pages of a given website easier to see.
Instead of redeveloping a collection system – which would be out of the scope of this project - it
relies on data gathered by others and retrieves them through their APIs. Data are then processed
and presented in a more meaningful way.
Constel Analytics knows two types of objectives: design objectives and implementation objectives.
Design objectives are the fundamentals of Constel Analytics. The final evaluation of the application
was based upon them (see section 4).
Interactions between pages: the main objective of Constel Analytics is to offer advanced
visualizations in the context of web analytics. During this project, the first visualization of the
application will be developed: the visualization of interactions between pages. It must allow
users to understand how their website’s traffic works.
Independence from Data sources: Constel Analytics should be able to use data from
different sources. Thus, the Data Processing part must be as loosely coupled as possible from
the visualization, so that the application can be adapted without redeveloping large parts of
its code.
Ease of installation: Constel Analytics must be usable by a maximum of webmasters,
therefore it must :
o Rely on common technologies so that it can be installed without requiring complex
manipulations on the server
o Propose easy installation and configuration
Constel Analytics Constel Analytics - Aims and design
- 29 -
Scalability and adaptability: Constel Analytics will evolve beyond this initial project. Other
contributors must be able to handle it and improve it.
DESIGN
This chapter covers the theoretical aspects of Constel Analytics' design.
Figure 17 - High-level diagram of Constel Analytics.
At a conceptual level, Constel Analytics works as follows (Figure 17):
0. A website regularly sends data to a web analytics tool (called “Data Source” on Figure
13).
1. A user visits Constel Analytics and asks for a report of the traffic on her website according
to different criteria (dimension, filters, period, ...).
2. Constel Analytics queries the data Source.
3. The data source sends the data to Constel Analytics, which processes them.
4. Constel Analytics returns a visualization of the data to the user.
Constel Analytics thus performs two distinct tasks: it first processes the data received from the data
source and displays the processed result. If the same data are called twice (multiple queries on the
same criteria), the application must be able to keep them in cache for reuse without loss of time.
Constel Analytics Constel Analytics - Aims and design
- 30 -
Figure 18 - In-details diagram of Constel Analytics
Figure 18 shows a more detailed schema of Constel Analytics.
1. The main controller receives the user’s request.
2. The main controller checks if the required data are already cached.
a. If that is the case, it sends the data to the view.
b. Otherwise, it calls the Data Processing module which queries the Data Source and
processes the results before returning them to the Controller. The controller sends
the processed data to the view.
3. The main controller renders a visualization, using the view.
The traffic between the pages of a website is defined as the exchange between each pair of pages of
the said website. By "exchange", we mean "a visitor who passes from a page X to a page Y". Two
elements are required to compute this:
The list of pages. Pages are defined by their Title or their URI, depending one which is unique
(some websites use the same title for several pages, while some others use several URIs for
the same page). A page has five properties:
o a title
o a measured metric, usually the number of times that the page was viewed
(“pageviews”), for a given period
o one or several URLs
o a group to which the page belongs. Groups are defined by the hierarchical level of the
URIs. For instance, "/forum/index.php" and "/forum/" would be in the same group,
while "/blog/index.php" would be in another one. It implies that all URIs related to
one Page Title should be in the same hierarchical level.
o a weight equal to the number of links between this page and the others.
Constel Analytics Constel Analytics - Aims and design
- 31 -
The list of interactions from one page to another. An interaction is defined by its unique pair
of pages (source and target), and has one more property: the intensity of the interaction.
Concretely, the intensity represents how many times visitors went from source to target.
By default, Constel Anayltics focuses on the most important pages of a website. Therefore, the list of
pages is built based on the measured metric: for instance, the most viewed pages are taken into
account first. It is the same for transitions, which are sorted by intensity. Note that if a transition is
very common, but covers two pages that are not in the list of most viewed pages, then it will be
ignored when processing data. This case is very rare and could occur with sites where the
distribution of visits is very homogeneous.
Once these two elements are processed by Constel Analytics, data can be visualized.
As this project aims at displaying the traffic between pages, network diagrams are the obvious best
choice: pages will be represented as vertex and interactions as edges.
The force-directed layout method has been selected for two reasons:
Its distribution of nodes on the graph is immediately understandable by a majority of
people. Moreover, a force-directed graph can easily display hundreds of nodes, unlike other
methods. From a technical point of view, however, the processing time required to display
such a diagram can be problematic and should be minimized using a QuadTree.
Force-Directed layouts automatically generate clusters: some of the other visualizations, like
Balloon Tree, might be easier to understand but would impose more constraints (in the
Balloon Tree example, a child can have only one parent) and ask for prior preprocessing.
Forced-directed visualizations are one step closer to Visual Analytics compared to other
Network Drawing methods.
The available data gathered from the Data Source will be represented by the following symbols:
Information Criterion
Page Node, represented by circles
Interactions Curves between nodes
Measured metrics of a page Size of the node (logarithmic scale)
Intensity of an interaction Opacity of the link redundant with proximity of the nodes
Group of a page Color of the node
Weight of a page Opacity of the border of the node
Table 4 - Symbols used by Constel Analytics
The design phase and the evaluation phase have highlighted the importance to answer to different
groups of needs in order to make the visualization useful. These partially correspond to the ten
activities discussed in chapter 2.2.1:
Constel Analytics Constel Analytics - Aims and design
- 32 -
Finding datapoints: this requirement includes the ability to discover information about the
pages (title, number of views, ...), the ability to explore the graph and unravel entangled
vertexes and edges.
Retrieving values: some users need to immediately identify a specific page, in order to
compare its actual position to its expected position.
Arranging datapoints: clustering and sorting nodes is necessary to have a better view of how
the website works. As for correlation, it requires the ability to compare different sets of
data.
Constel Analytics addresses those needs with a set of tools.
FINDING DATAPOINTS: ZOOM, MINIMAL WEIGHT SLIDER
Visualizations with many nodes and many edges can be hard to read. In order to let the users
explore the graph, Constel Analytics proposes two zooming options. The first one works like a
magnifying glass: a part of the graph is magnified, but the rest of the graph is still visible in order to
keep a sense of the global context. The “magnifying glass effect” is interesting when it comes to
small areas, but fails to reorganize the whole graph. Therefore, a global zoom has been added in
order to spread or to narrow the whole graph.
Another feature allows to filter pages and links according to their importance, supporting the “find
extremum” and “filter” activities. As for the anomalies, we expect the force-directed graph to show
them clearly thanks to the position of the nodes (a node that should not belong to a cluster is easy
to spot).
RETRIEVING VALUES: PATH & SEARCH
Constel Analytics offers a dynamic research field, highlighting nodes whose title or URIs match the
searched string.
Similarly, a user may want to identify a path between two pages. To meet this need, the Path Finder
feature was implemented. It identifies what is the most likely path between two pages (source and
target).
ARRANGING DATAPOINTS: FILTERS, SEGMENTS & TIMELAPSE
While the force-directed layout will take care of the sorting and clustering of nodes and links,
Constel Analytics proposes different tools to compare several graphs: filters, segments and
timelapse.
Filters and segments operate in a similar way: they can limit the visualization according to certain
criteria. Filters generally represent a single dimension: one can for example display the visualization
only for visitors coming from a particular country, or visitors using a particular technology (browser,
operating system, ...). Segments represent a more "complex" variant of the filters and must be
defined directly in the data source’s interface. They work as a set of metrics and dimensions limiting
the data displayed to a particular population.
Constel Analytics Constel Analytics - Technologies
- 33 -
Finally, the timelapse feature allows users to visualize the evolution of the traffic over different
periods. Several graphs are generated and the webmaster can compare them in order to find out if
the visitors behave differently during the selected periods.
TECHNOLOGIES
DATA SOURCE
The Chapter 2.3 describes a list of Web Analytics tools that could be used as a Data Source for
Constel Analytics. While the Chapter focuses mostly on their use as complete and independent Web
Analytics tools, their APIs were also reviewed. Those reviews led to think that of all the candidates,
Google Analytics would be the best initial choice for two reasons:
It is widely used, thus allowing Constel Analytics to reach a large group of potential users.
It provides powerful and complete APIs, allowing to reach a vast amount of data with only a
few requests.
Google splits Analytics APIs into four main components (Figure 19):
Figure 19 - Main components of Google Analytics APIs
The Collection component, which focuses on “how to gather data”. APIs provide different
ways of sending interaction data to Google Analytics. Figure 19 shows that the new
analytics.js passes through “Measurement Protocol” like Android and iOS SDKs, while the
older ga.js does not. This “Measurement Protocol” is a standard way of sending data to
Google Analytics: it can be used to send custom data through HTTP requests.
The Configuration component contains all the user settings of a given user’s account. The
APIs allow developers to retrieve the segments or the goals which were configured by the
user.
The Processing component which computes the reports accessible in Google Analytics
regular interface. Obviously, Google does not provide APIs to influence the way this
processing is done.
Constel Analytics Constel Analytics - Technologies
- 34 -
The Reporting component which takes care of the processed data and how to display them.
The Reporting APIs retrieve the processed data and are able to access almost all of the
dimensions proposed by Google Analytics regular interface.
OAuth is a standard born from the efforts of various American companies which wanted some
interoperability between their applications. The first version was released as a complete protocol in
2010. It led to a second version, quite different as it leaves some flexibility in its implementation
(similar to a Framework rather than to a strict protocol). Google, Twitter, Microsoft and Facebook
are among the most popular companies that use OAuth 2.0.
Each company has its own implementation of OAuth 2.0, which may vary on different dimensions
such as the type of data needed for the application to issue an access token. In the case of Google, it
is important to distinguish the difference between the account hosting the application accessing
data from Google and the user account whose data will be used by the application. They can be
different: for example, an application can be hosted on a Google developer’s account without being
limited to its data.
Figure 20 shows how Google implemented OAuth 2.0 for their APIs.
Figure 20 - OAuth 2.0 workflow with Google APIs
1. When an application wants to access data from Google, it first requests an access token by
declaring, through the URL of the request, what kind of access it requires.
2. The user of the application is asked to log in and confirm that she wants to let the application
access her data.
3. Google returns an authorization code.
4. The application uses this authorization code to ask for a token.
5. Google returns two tokens: an access token used to access Google APIs and a refresh token
that must be stored in order to ask for a new access token later.
6. The application can now access Google APIs.
Constel Analytics Constel Analytics - Technologies
- 35 -
From a more technical point of view, the last step returns a “client object” to the PHP application.
This object can be used to initialize an analytics service object.
Once authenticated, requesting data to the Google Analytics API is pretty easy. The first step is to
create an analytics service object thanks to the client object received at the end of the
authentication. Then, it is possible to select a website affiliated to the user’s account and query the
Reporting APIs for data (see “Available dimensions & metrics” below).
While Google’s resources are accessible for free, the company imposes an access limit – known as
“quota” - which varies depending on the service requested. In the case of Google Analytics, two
limitations are to be taken into account:
The maximum number of results returned by a request cannot exceed 10'000 rows. It is
however possible to overcome this limitation by sending a second, similar query starting
with the 10’001th result.
An application using Google Analytics’ data is limited to 10'000 requests a day. It is possible
to go beyond this limit by paying a fee depending on how many requests are made.
These two limitations are not a problem for Constel Analytics because as presented in chapter 3.1.2,
Constel Analytics requires only two main objects from Google Analytics: the list of pages and the list
of interactions. Knowing that a Force-Directed graph with more than 1,000 nodes and 10,000
interactions is almost unreadable and more that 5,000 graphs are unlikely to be generated in one
day, the limit of 10,000 queries per day will probably never be exceeded.
Google Analytics provides around 200 dimensions and metrics, sorted by categories [30]. It ranges
from visitors' information (country, city, returning vs new, number of visits, ...) to site speed (page
loading time, DOM interactive time, ...). Since Reporting APIs allow developers to cross several
dimensions and metrics when requesting them, the amount of accessible information is
considerable. For instance, requesting "pagePath" dimension with "pageviews" metric return a list
of page URIs and the number of times they have been visited. But by crossing "pagePath" and
"previousPagePath", the API will return every unique pair of "pagePath" and "previousPagePath"
with the number of times this connection has been visited.
It is important to note that Google Analytics APIs do not allow developers to retrieve all the visited
pages of every visit. Therefore it is not possible for a third party application to build a Sankey
Diagram similar to Google Analytic's Behavior Flow.
This limitation implies serious risks of misunderstanding the visualization because of a phenomenon
called memory loss: it is not possible to track individuals on the graph and thus, distinguish Hub
nodes from popular nodes will be much harder.
Constel Analytics Constel Analytics - Technologies
- 36 -
As an example, let us consider two highly-connected webpages from a photographer: one is the
“Legal information” page, accessible from everywhere as it is part of the footer, and the other is the
main “Photostream” that serves as an index for all the pictures. It can be assumed that visitors will
go on the “Legal information” page to check how they are allowed to use the pictures displayed on
the website and then go back to the previous page. If the website has a large traffic, then different
visitors will get to the “Legal information” page from possibly many different pages. The
“Photostream”, however, will likely act as a real hub, visitors going back to it to select new pictures
to display. The clear difference of use between the two pages will have to be deduced by the users –
assuming that they know the website. A partial solution to this problem is presented in Chapter
5.1.5.
VISUALIZATION
Visualization is the cornerstone of Constel Analytics, therefore it was necessary to find a technology
that would support it. Data-Driven Documents, better known as D3, is an open-source Javascript
library developed by Mike Bostock, Jeff Heer and Vadim Ogievetsky [31]. It aims at binding data and
DOM elements easily. D3 is regarded as a successor of Protovis, as it was developed by the same
persons, and benefits from the enthusiasm of the academic world. Its openness offered the
possibility to analyze the way it works in order to ensure that it would suit the project. D3 is
distributed under the BSD license.
From a technical point of view, D3 produces DOM elements, usually in-line SVG images. These visual
elements are bound to data – mostly JavaScript arrays or JSON - used to generate them. D3 has a
unique way of binding a set of data to a visualization.
We believe that in this case, examples speak more than theory: below is a typical D3 snippet,
followed by a short explanation.
1. var dataArray = [23,43,58];
2. d3.selectAll(“.nodes”)
3. .data(dataArray)
4. .enter()
5. .append(“circle”)
6. .attr(“cx”, function(d) {return d}).attr(“cy”, function(d){ return d }).attr(“r”, function(d){ return d/10 })
7. .attr(“fill”, “red”);
Line 1: An array of data is created. It contains simple integers that will be used to feed the
visualization.
Line 2: D3 selects a set of DOM elements – in this example, all the “.nodes” elements. It is possible
that there exist no “.nodes” elements. This is not a problem for D3, as it simply selects an empty set
in this case.
Constel Analytics Constel Analytics - Technologies
- 37 -
Line 3: Once the set is selected, D3 attributes a set of data through the “data()” function. In this
case, the three values of dataArray are sent to the empty “.nodes” set. When doing so, three
selections are created (Figure 21):
The enter selection which contains all the new data whose keys do not exist in the former
data
The update selection which contains all the data whose keys are found in the new data
The exit selection contains all the former data with no corresponding key in the new data
Figure 21 - D3 selections. The Enter selection contains all the new data, the Exit selection contains all the data that aren’t present in the new set
and the Update selection contains all the data that are present in both the former set and the new set.
Obviously, all the data sent from dataArray are new, since there are no data in the “.nodes” set so
far.
Line 4: D3 select all the new data (from the “enter” set). Figure 22 shows the result of this line in the
Firefox developer console. Three objects created, with the __data__ attribute corresponding to the
data sent from dataArray.
Figure 22 - enter() selection in console log. The data are bound (__data__), but there aren’t visible on the page yet.
Constel Analytics Constel Analytics - Technologies
- 38 -
These objects have no visual representation for now, but the following lines handle this...
Line 5: For each new element added to the set, D3 uses the append() function to create “circle”
DOM elements.
Line 6: All the circles generated receive a “x” position, a “y” position and radius attributes related to
their __data__.
Line 7: All the circles generated are filled with red color. Figure 23 shows the final representation of
this short code example: the three dots are visible in the upper part of the window, and the lower
part shows the console results. A SVGCircleElement has been created for each of the data sent from
dataArray.
Figure 23 - Final representation of the example. The three nodes are visible at the top of the window. The log shows that data are bound to visual
elements (SVGCircleElement).
It is possible to update data using the update selection, or to remove data using the exit selection,
just as with the enter selection.
On the top of its data-binding feature, D3 proposes several layout methods to compute how the
data should be spread across the surface of a DOM element. Amongst those layout methods is an
implementation of the Force-Directed layout that will be used in this project [32]. As D3 is a
renowned academic project, its implementation of Force-directed layout was not questioned.
However, it is necessary to understand the way D3 manages Force-layout in order to guarantee that
the visual result will match the expectations2.
By default, D3 uses three forces to build a Force-Directed graph:
1. A “gravity” force to keep nodes within a sphere, in order to avoid them from being repulsed
out of the SVG area.
2 Full sourcecode can be found here : https://github.com/mbostock/d3/blob/master/src/layout/force.js
Constel Analytics Constel Analytics - Technologies
- 39 -
2. A “charge”, generally used to repulse nodes. Its value can be either negative (for repulsion)
or positive (for attraction). This charge is computed only for nearby nodes which are situated
inside the same part of the quadtree, using Barnes-Hut simulation.
3. An attraction force between linked nodes.
D3 also uses a “cooling parameter” known as “alpha”, ranging from 1 to 0, which progressively
decreases with each iteration of the Force-Directed layout computation. The intensity of each
iteration is also related to this alpha value, which means that initial iterations have more influence
on the graph than the latest. D3’s Force-Directed layout remains active as long as the alpha didn’t
reach 0.
D3’s implementation of force-directed layout does not take into account potential weight of the
edges – they are not even required to have one. This might cause a problem in Constel Analytics,
since strong interactions between pages are supposed to be immediately visible. However, two
options can be used to allow hypothetical edges’ weight to be taken into account:
The “Link distance” method allows to set up a targeted distance between each node (20 by
default). It can be either a constant (all the nodes will have the same target link distance) or
a function, in which case a different distance can be set for each link according. This
“targeted link distance” is a weak constraint.
Force-Directed Layout in D3 uses an event called “tick” which is triggered at each iteration of
the graph’s computation. Defining a function called during this event allows subsequent
forces to be applied to the graph, amongst other things. For instance, it is possible to add
specific attraction to different spots of the graph (see Figure 24 for an example: four
divergent forces attract different groups of nodes in opposite directions).
Figure 24 - "Multi-Force Foci Layout" [33]. Force-directed graph with additional forces to attract points.
PROGRAMMING LANGUAGE & FRAMEWORK
An easy way to make Constel Analytics accessible to the greatest number is to develop it as a web
application. Thus, most end users do not even have to install any software to use it: they just have
to go to an URL.
Constel Analytics Constel Analytics - Technologies
- 40 -
Amongst the common programming languages on the web, PHP is a server-side scripting language
born in 1995 and maintained by the PHP Group. It remains extremely notorious with a market share
of 81.8%, far from the 17.8% of ASP.Net (its main rival) [34]. Being both open-source and
widespread, PHP is the most eligible programming language for Constel Analytics.
Despite its success, PHP suffers from a certain unpopularity within the programming community: its
weak typing and its numerous shortcuts are known to encourage poor programming practices. Thus,
in order to save some development time and impose a standard structure for this project, it was
decided early on that a Framework should be used. There exists several PHP Frameworks and one of
the most popular is Symfony 2 (several prominent PHP applications use Symfony, such as the latest
version of Drupal). This popularity can be explained by its high modularity, its active developers’
community and the support it receives from large companies [35]. Since Symfony also proposes
several modules that would make integration of Google API easier, it definitely seemed like a good
choice for Constel Analytics. Symfony is distributed under the terms of MIT licenses.
Like many other Frameworks, Symfony 2 has a Model-View-Controller approach. It extended this
approach in order to achieve more flexibility.
At the root of a Symfony project lies the “app” folder which contains utilities (including a “console”
file which allows managing the application through a terminal), configuration files and the
application’s cache.
A Symfony 2 application is structured into bundles - independent components which are thought as
removable without making the whole application unusable. Many developers working with Symfony
are swift to share their non-confidential bundles, thus contributing to the elaboration of a wide
range of open-source extensions facilitating the creation of other applications. Those can be
installed using Composer. Bundles can have their own Model, View and Controller components and
work together thanks to a powerful routing system provided by the Framework.
Symfony 2 uses a specific nomenclature and this project will rely on it. Mainly, controllers are known
as “actions” and objects that are used globally by all the bundles are called “services” (which belong
to a “Service container” callable from everywhere).
OTHER TECHNOLOGIES
Other technologies were used during the course of this project. As they are not directly related to
the main goals of Constel Analytics, they will be only summarily presented here:
- Twitter Bootstrap: Bootstrap is a popular Front-end Framework for web applications. It is
proposed by Twitter under the MIT license, and provides convenient JavaScript and CSS
libraries to style a website. Constel Analytics uses it in order to get a decent-looking UI
without effort.
- JQuery: Well-known Framework for JavaScript, JQuery is a requirement for Twitter Bootstrap
to work. It provides different handful functions to make JavaScript easier and offers a large
Constel Analytics Constel Analytics - Software Architecture
- 41 -
amount of plug-ins for different purposes. Constel Analytics relies on JQuery to carry out
Ajax requests and also uses the plugin JQuery UI to display HTML sliders. It is distributed
under a MIT license.
- Twig: Twig is a PHP templating system developed by SensioLabs. One of its main purpose is
to make Views’ development easier by using a more concise syntax than PHP’s. Symfony
uses it by default and thus, this project does too. It uses a BDS license.
- Git: Git is a revision control system that was used during this project to keep track of the
changes. The repository was hosted on Bitbucket and will be opened once the code is
cleaned up to a sufficient level. Git uses GNU General Public License V2.
SOFTWARE ARCHITECTURE
In the previous chapters, the aims of Constel Analytics were presented, as well as the technological
means that will help materlialize them. This chapter focuses on the actual structure of Constel
Analytics.
STRUCTURE OF THE APPLICATION
Below is an updated diagram of Constel Analytics, adding the different technologies presented in
the previous chapter. PHP and Symfony act as a basis for the whole project, Twig and D3 take care
of the view, the cache uses data in JSON format and Google Analytics serves as a Data Source which
implies using OAuth for the authentication. Below is an updated diagram of Constel Analytics,
including technologies (Figure 25).
Figure 25 – In-details diagram of Constel Analytics with technologies
Most of Constel Analytics’ code is contained within a bundle called “VTWABundle” residing in the
“src” folder of the Symfony 2 application (“VTWA” stands for “Visual tool for web analytics”, the
development name of the project). Figure 26 shows the arborescence of the bundle.
Constel Analytics Constel Analytics - Software Architecture
- 42 -
Figure 26 - File tree for VTWABundle
This first version of Constel Analytics relies on one single controller, called MainController. This
controller has three different actions:
indexAction(), which displays the homepage of Constel Analytics.
interactionsAction(), which displays the main visualization.
ajaxAction(), which is called by Ajax requests when refreshing the visualization.
interactionsAction() is the central action of Constel Analytics : it processes the GET Data containing
the visualization’s parameters, checks if the required JSON data already exist in the cache, initialize
the AnalyticsRequest service and saves its output in the opposite case, then renders the final web
page thanks to the View interactions.html.twig. ajaxAction() does a similar job but simply returns
raw JSON data for new visualizations without reloading the page. In the absence of value, the
default parameters for a request are:
Parameter Description Default value
$startDate The day from which results are fetched 20 days ago
$endDate The day up to which results are fetched Today
$segment The segment applied to the results None
$filter The filters applied to the results None
$selectedFilter The dimension selected to display at the bottom of the page
Country
$metric The metric used to display the results Page views
$maxPages The maximum number of pages on the graph
100
$maxInteractions The maximum number of interactions on the graph
1000
Table 5 – Default parameters for MainController
The DependyInjection folder contains two files which take care of the configuration parameters for
Constel Analytics. Namely, Constel Analytics can take up to four parameters:
Constel Analytics Constel Analytics - Software Architecture
- 43 -
json_cache_path, the path to the cache directory. By default, the cache is saved under
the %kernel.root_dir%/var/vtwa/storage directory.
site_domain_name is the name of the website’s domain, used by Constel Analytics to
redirect to the website while visualizing the different paths associated with a page.
max_pagelevel is the maximum level up to which Constel Analytics should go when defining
to which group a page belongs. The default value is “1”.
distinction allows users to define which of URI or Title should be used when distinguishing
pages. The default value is “title”.
The Resources folder regroups many files, including:
the services’ configuration file
all the view files
all the public resources that are sent to the /web/ directory of Symfony 2 (CSS and JavaScript
libraries, for the most part)
At the bottom of the tree, the services directory contains the AnalyticsRequestService’s definition
which handles the connection to Google Analytics and processes the data sent to the main
controller.
DATA PROCESSING
The Data Processing module takes the form of a Symfony 2 service - vtwa.analytics_request - so that
it can be called from everywhere in the application. The service’s definition – found in the file
services.yml - uses HappyR3’s happyr.google.analytics.page_statistics as a parent, since it asks for
the same base arguments, while requiring additional arguments corresponding to three of the four
parameters of the bundle: the path of the cache’s directory, the maximum level for groups and the
distinction used to identify pages.
The definition of the class itself is close to the PageStatisticsService class from HappyR’s Google
Analytics bundle, for it was initially thought as a simple extension of this class. However, in order to
guarantee more independence in the development of the class, it was decided not to inherit
PageStatisticsService class.
Below is a description of the methods defined in AnalyticsRequestService:
Name of the method Arguments taken Description Return value
hasAccessToken() Void Checks if an access token already exists. Similar to HappyR’s method.
The token or false
saveAccessToken() Void Saves the generated OAuth 2.0 access token.
void
3 HappyR – for Happy Recruiting - is a recruiting company which kindly provides some of its Symfony bundles for free. [44]
Constel Analytics Constel Analytics - Software Architecture
- 44 -
in_array_r() $needle: the researched term
$haystack: the array
$parent = null: the previous array
This method is an extension of the original in_array() function of PHP.
Looks inside nested arrays recursively for a given $needle.
The key of the array’s parent, the key of the array if there is no parent or false
parseDimensionOrMetrics() $haystack: an array of dimensions or metrics
Translates a set of dimensions or metrics into their Google Analytics’ counterparts.
A string with all the methods, starting with “ga:” (prefix for all Google Analytics’ dimensions and metrics) and separated by semicolons
saveJson Various variables corresponding to the request’s parameters
Save the processed results into a JSON file in the cache directory. The name of the file is a unique md5 hash built upon all the parameters of the request.
A string, the content of the file
getSegments() void Get all the segments defined in the Google Account.
Though this does not change anything inside the code, this request uses the Configuration APIs.
An array of segments
getFilters() $startDate: the day from which results are fetched
$endDate: the day up to which results are fetched
$dimensions:
$segment: a segment applied to the results
$metric: the metric measured
Get the 50 first results of the dimension selected for filters, sorted by the metrc used to generate this visualizations.
An array of filters
getAnalyticsData() $dimensions, $metric
$maxResults = 10000: the maximum results
$sort = null: the sort order of the results
$filters = null: a list of filters
$startDate, $endDate, $segment
This method queries Google Analytics with the specified parameters.
An array of results from Google Analytics
getPagesByPath() $startDate, $endDate, $segment, $metric, $maxResults, $filter
This method calls getAnalyticsData() to request a list of pages, differentiated by their pagePath. It also computes to which group each page belongs.
An array of pages
getPagesByTitle() $startDate, $endDate, $segment, $metric, $maxResults, $filter
Similarly to getPagesByPath(), this method request the list of pages, but differentiated by pageTitle instead.
An array of pages
getPages() $startDate, $endDate, $segment, $metric, $maxResults, $filter
This method simply forwards the request to getPagesByPath() or getPagesByTitle() depending on the “distinction” configuration of Constel Analytics.
The result of getPagesByPath() or getPagesByTitle()
getInteractions() $pagesList: an array of pages produced by getPages()
Generates a list of interactions, based on the list of pages
An array of interactions defined by the source page,
Constel Analytics Constel Analytics - Software Architecture
- 45 -
$startDate, $endDate, $segment, $metric, $maxResults, $filter
previously produced and sent as an argument.
the target page and the amount of metric generated for this interaction
Table 6 – Methods in AnalyticsRequestService
When MainController calls AnalyticsRequestService, it first asks for the list of the segments through
the getSegments() method and the list of the filters through the getFilters() method. The results are
not cached as they do not require excessive processing time. Then if no cache file exists for the
required parameters, MainController calls the getPages() method, followed by the getInteractions()
method. It generates an array with those values, encode them into JSON and calls the saveJson()
method.
Once this whole process is done, MainController renders the results thanks to the
interactions.html.twig View, described in the following subchapter.
VIEW & VISUALIZATION
The file interactions.html.twig extends layout.html.twig, which contains the basis for all the pages of
Constel Analytics (HTML headers, global structure of each page, ...). The User Interface of the main
visualization is split into four levels, as can be seen in Figure 27.
Figure 27 - Constel Analytics UI split in four parts. From top to bottom: global navigation menu (red), parameters (orange), visualization (green) and
advanced tools (blue).
Constel Analytics Constel Analytics - Software Architecture
- 46 -
The upper part (red) is occupied by a global navigation menu. Ultimately, it is supposed to lead to
the different sections of Constel Analytics but at the time of this writing, the only working sections
are the Homepage and “Interactions between pages”, the other items working as placeholders.
Below, in orange, are different parameters grouped here because they require a page reloading that
can occur by clicking on the “Analyze” button to the right. Among the options, the period, optional
segment, the metrics and the maximum number of pages and interactions can be decided. Those
data are sent to the Main Controller via the GET method.
The green part is the visualization in itself. Next to the graph are tree boxes that are supposed to be
visible all the time during its exploration because they provide useful information and utilities to the
users : a toolbox (described a dedicated subchapter below), a “General Information” box which
provides the users a few references when they need to compare quantitative data and a “Current
page” box which displays all the information of the hovered node (title of the page, number of
metrics, the different URIs related to the page and the list of all its interactions).
The blue part groups all the advanced, Ajax tools. All of them are described in details below.
Since the graph can be redrawn at several occasions, the code which generates it is encapsulated
within a function called generateForceGraph() that can be called at any time. This function requires
two parameters :
JSON data serving as a basis for the visualization
The ID of the DOM object that will host the visualization
generateForceGraph() can be split into three phases, described below.
NODES GENERATION
Nodes are represented using simple SVGCircleElements. Their radius is defined according to the
amount of the selected metric for the visualization, ranging ~2 (smallest node) to 12 (largest node)
according to the following function:
1. .attr("r", function (d) {
2. return 2 + (Math.log(d.value) / Math.log(largest)) * 10;
3. })
A logarithmic scale is used because most websites tend to have less than half-a-dozen of very
important pages with the others being far behind, which made it difficult to distinguish small pages
from average pages. With a logarithmic scale, the largest pages still stand out as they tend to be
much more central and connected in the graph. A difference between the result of a linear scale and
the result of a logarithmic scale can be seen in Figure 28.
Constel Analytics Constel Analytics - Software Architecture
- 47 -
Figure 28 - Difference between linear and logarithmic scale in nodes’ representation.
The “mouseover” event is used to display all the information regarding the hovered node. It mainly
goes through a previously produced list of connections between each node (following Christopher
Manning’s example with the Chicago Lobbyists’ visualization [15]) and indicates those which are
connected in the “Current page” box under the “Page connections” panel.
As clicking on nodes can trigger several actions, the “mousedown” event was used to define the
priorities of these actions: PathFinder goes first, followed by the Highlight feature.
LINKS GENERATION
The links are represented using SVGPathElements (or SVG Paths) in order to be esthetically more
pleasant than the usual straight lines. However, doing so asks for a new step while building the
nodes and the edges list, since paths require an intermediate point between the start and the end.
Following Mike Bostock’ ideas [36], a new set of links, called bilinks, is generated, containing the
source node, the intermediate node, the target node. For our visualization’s specific needs, the
value of the link is also added to the set.
4. var nodes = data.nodes.slice();
5. var links = [];
6. var bilinks = [];
7.
8. data.links.forEach(function (link) {
9. var s = nodes[link.source],
10. t = nodes[link.target],
11. i = {},
12. v = link.value
13.
14. nodes.push(i);
15. links.push({source: s, target: i, value: v}, {source: i, target: t, value: v});
16. bilinks.push({source: s, intermediate: i, target: t, value: v});
17. });
Constel Analytics Constel Analytics - Software Architecture
- 48 -
So we come out with three new sets: one set of nodes with additional entries without titles that will
serve as intermediate points, a set of bilinks and a set of regular links pointing either to the
intermediate point instead of the former target, or from the intermediate point instead of the
source. While the set of regular links is passed to the force-directed layout’s generations, the bilinks
are used to display the links once the data are bound. Among the attributes, the path’s opacity
ranges ~10% to 80% depending on the intensity of the link.
FORCE-DIRECTED LAYOUT PARAMETERS
The parameters used by the Force-Directed layout in Contel Analytics can be found below.
1. var force = self.force = d3.layout.force()
2. .linkDistance(function (o) {
3. diviser = 1 + o.value / mostIntense;
4. return 10 / diviser;
5. })
6. .on("tick", tick)
7. .charge(c)
8. .nodes(nodes)
9. .links(links)
10. .size([width, height])
11. .linkStrength(1)
12. .gravity(0.1)
13. .start();
In this graph, linkDistance is set as a function. By doing so, D3 goes through all the links of the graph
and computes their targeted link distance according to an anonymous function. The purpose here is
to bring closer nodes that are linked by strong interactions. In order to achieve this, the default
linkDistance (10) is divided by a more or less large number (ranging from ~1 to 2) according to the
link’s intensity: thus, this function sets the targeted link distance of the most intense link to 5.
In a large and heavily interconnected graph, this does not impact as much as intended because of all
the other constraints. Figure 29 shows how this constraint works on a simple graph: in this example,
each node are connected to another one. The edge between B and C has a weight of 100 while the
other two have a weight of 1. Two visualizations are overlapped on this figure: in blue, the
visualization with a constant Link Distance (10) and orange, the visualization with the function
described above. While there are a few differences, they are likely to go unnoticed in a wider graph,
which prompted the developement of an additional feature to gather nodes linked by heavy edges
(see “Toolbox” below).
Constel Analytics Constel Analytics - Software Architecture
- 49 -
Figure 29- Difference between constant Link Distance (blue) and function Link Distance (orange)
On the tick event, the function tick() is called. This function has three objectives:
Enabling / disabling the Progress Bar.
Checking if the CloserRelatedNodes option is enabled (see toolbox below), in which case it
brings linked nodes closer.
Displaying the computed position of each node and each link once the alpha reaches 0.01 (a
threshold low enough to make sure the position will be precise). At this point, it stabilizes
the graph by stopping the computation.
The charge is computed according to a heuristic function designed after several measurements with
the development website. The idea was to find the optimal charge for 10 different sets of nodes and
derive a function able to output close results.
1. var n = data.nodes.length,
2. charge = -Math.round(311 / Math.sqrt(n))
Further evaluations pointed that this function is not working well for graphs with a different ratio of
nodes/links, mainly because it does not take into account the number of links which influences how
the links are spread.
Figure 30 - Constel Analytics Toolbox
Constel Analytics Constel Analytics - Software Architecture
- 50 -
The toolbox (Figure 30), situated in the upper right corner of the graph, offers several tools to
interact with the main visualization. Those tools do not require new requests to be processed and
act exclusively through Javascript.
The graph's progress bar is located at the top of the Toolbox. It restarts when the Graph is redrawn
using Ajax requests, and is set in motion when the Graph is redrawn when changing the zoom (see
below). All this is achieved using Twitter Bootstrap inner “Progress bar” feature, which relies on CSS
and HTML.
The search field allows users to find one or several pages within the graph. Each "keyup" event in
the field calls a Javascript function which checks each node's URLs and Title. If the input string is
present, the matching nodes are highlighted with a wide red border (see Figure 31 for an example).
s
Figure 31 - Example of search. The red, thick borders indicate the two pages matching the desired string.
The row situated under the progress bar proposes four tools:
“Zoom”, which works as a magnifying glass when the mouse’s cursor passes over the graph.
Clicking somewhere on the graph ”freeze” the zoom on place.
Based on a D3 module, this zoom deforms the graph as if it was seen from the eyes of a fish.
Figure 32 - Example of zoom over a large graph. The green part is magnified.
Constel Analytics Constel Analytics - Software Architecture
- 51 -
When generating the force-directed graph, the displacement of each node is precomputed.
The event “mouseover” is used to track the move of the cursor over the graph. When a node
enters the radius of the magnifying glass, its new position is determined by a function taking
into account its precomputed position and its proximity to the center of the glass, in order to
generate a progressive zooming effect. In order to keep a certain readability, the radius of
the nodes remains the same: the zooming effect only changes the distance separating each
pair of nodes. Figure 32 shows this effect on a relatively highly-connected graph.
“Path”, for “PathFinder”, allows to establish the most likely path between two nodes. The
user just has to click on a source node and on a target node. If there exists a between the
two, a purple path is drawn while all the other links are hidden (Figure 33)
Figure 33 - Example of path. All the links are hidden in order to keep the path more visible.
Dijkstra's algorithm was used to compute the shortest and most likely path (see Appendix B
for details of implementation). It is important to note that the PathFinder feature does not
highlight a real user’s path. It computes which path would be the most likely out of the
pages’ traffic, but it does not mean that anyone actually took that path.
“Highlight”, enabled by default, allows users to click on a node in order to highlight the
nodes that are linked to it. By holding Shift while clicking, it is possible to select several nodes
and their respective links. Figure 34 shows an example of highlight with two nodes selected.
Figure 34 - Example of Highlight. Two nodes are selected, showing only their relatives.
Constel Analytics Constel Analytics - Software Architecture
- 52 -
In order to find which nodes are connected to the targeted node, an array of relationships
between nodes is built during the generation of the graph. Each time a node is selected, a
function reviews all the nodes and checks if they're connected.
“Closer” stands for “Closer related nodes”. Once activated, the user is required to click on
the “redraw” button at the bottom of the box. The idea behind this option is to add more
graphical importance to the heavy interactions, overcoming one of the possible shortcoming
of Force-Directed layout. Figure 35 shows a small Force-Directed graph before the use of
“Closer” (left) and after (right).
At each tick, a function passes through all the nodes and checks which are their linked nodes.
It processes to bring them closer by moving the source node to the target node depending
on the intensity of the interaction. As it stands now, this feature breaks any other constraint
and tends to generate a lot of overlapping nodes. Additional constraints could prevent this to
happen.
Figure 35 - Example of "Closer related nodes" Function. Left is regular function, right shows how “Closer related nodes” impacts the nodes’
position.
The “Minimal Weight” slider hides the least important nodes of the graph (their importance being
relative to the metric selected - pageviews, by default). Its purpose is to make the graph clearer by
cleaning it from potentially disturbing points. It is also a good way to check which nodes are the
most prominent on the graph, as this is not always obvious at glance.
The value visible under the slider is the logarithm base 10 of the selected minimal weight. Each time
the slider moves, a function is called. For every nodes, it checks whether the logarithm base 10 of
the node's value is smaller than the value of the slider in which case the node is hidden (example
with Figure 36).
Constel Analytics Constel Analytics - Software Architecture
- 53 -
Figure 36 - Example of Minimal Weight's slider. The smaller nodes are hidden.
The second slider and the “redraw” button are related: they allow to redefine the level of zoom of
the whole graph, by opposition to the Fisheye magnifying glass that only magnifies a part of it. The
value visible under the slide corresponds to the absolute value of the graph’s current charge. Once a
new value is selected, a click on the “redraw” button resets the computation of the graph (a process
that takes as much time as the initial computation).
Filters are presented using bar charts (Figure 37). They are located at the bottom of the page and
displayed according to the user’s choice of dimension when generating the graph, sorted by
descending order of importance. As with the node’s radius, a linear scale would have presented the
risk of having a few long bars for the main values of the selected dimension, the others being left far
behind with indistinguishable changes of size. A logarithmic scale has been used instead so that this
case would not present itself.
Figure 37 – Example of Filters. It can take up to 50 values and displays them with a logarithmic scale.
The user can click on one or several bars in order to select the desired values. Clicking the “Filter the
results” button then initializes an Ajax request to ajaxAction() from the Main Controller. The result is
sent to the generateForceGraph() function.
In this initial release of Constel Analytics, the timelapse feature is limited to four different periods
being displayed as for separated, smaller graphs. In order to facilitate the readability of the graphs,
hovering a node with the mouse will highlight it over the other three graphs.
The timelapse option is present at the bottom of the page. It asks the user to choose which lapse
she wants to display (day, week or month). Once the option selected, a click on the “Get the data”
button launches an anonymous function. It computes the periods for the four graphs, then proceeds
to launch successive ajax requests to ajaxAction(). Each result is forwarded to the
“generateForceGraph” function, specifying a new DIV each time (Figure 38).
Constel Analytics Constel Analytics - Software Architecture
- 54 -
Figure 38 – Example of Timelapse. Four graphs display different periods of time.
SUMMARY
Constel Analytics’s main objective is to facilitate the understanding of interactions between the
pages of a website. The application must also be as loosely coupled to the data source as possible, in
order to be adapted for other sources during further developments. Furthermore, the application
should also be easy to install and deploy.
Supporting these aims, the design of Constel Analytics implies the use of an external Data Source
that is queried by Constel Analytics. The application processes the data, stores it and displays an
interactive Force-Directed graph.
To achieve this design, various technologies have been selected. Google Analytics serves as a Data
source thanks to its popularity and its APIs, the D3 library takes care of the force-directed
visualization and the backend part is managed by PHP and Symfony2.
Concretely, Constel Analytics takes the form of a Symfony 2 bundle providing a service that queries
Google Analytics and a View that display the data along with advanced tools to explore them.
Constel Analytics Evaluation - Software Architecture
- 55 -
EVALUATION
“The beginning of thought is in disagreement - not only with others but also with ourselves.”
- Eric Hoffer, moral and social philosopher
This section covers how Constel Analytics was evaluated. The first chapter briefly describes the
website that served as a base during the development and exposes the reasons for the choice of
testing subjects and methodology. The second chapter focuses on performances: what were the
technical problems met during the tests on different websites? The third and last chapter presents
the insights and the comments made by the evaluators during the interviews: what did they
discover thanks to Constel Analytics, what were the Usability issues identified and, globally, what
are their feelings towards the application?
TABLE OF CONTENT
EVALUATION ....................................................................................................................... 55
Table of Content................................................................................................................................. 55
Setting up the evaluation for this project .............................................................................. 56
Selection of the websites for this evaluation .................................................................... 56
Evaluation protocol ........................................................................................................... 61
Performances ......................................................................................................................... 63
Evaluation 1: Unifr “Course offerings” .............................................................................. 63
Evaluation 2: HEP of Canton Vaud .................................................................................... 64
Usefulness &Usability ............................................................................................................ 65
Evaluation 1: Unifr “Course offerings” .............................................................................. 65
Evaluation 2: HEP of canton Vaud ..................................................................................... 71
Evaluation with the ten activities .......................................................................................... 76
Summary ............................................................................................................................................ 77
Constel Analytics Evaluation - Setting up the evaluation for this project
- 56 -
SETTING UP THE EVALUATION FOR THIS PROJECT
Once the first version of Constel Analytics was developed, two evaluations were conducted to
challenge both its relevance and its performances. For most part, the feedbacks received during the
evaluations will result in further development whose issues and implementations will be described
in section 5.
SELECTION OF THE WEBSITES FOR THIS EVALUATION
During its development, Constel Analytics has been based on a personal website whose traffic was
humble (about 1'500 pagesviews distributed over 300 pages in 20 days). This website, that we will
call “development website” and benefiting from 6 years of Google Analytics’ data, is split into four
parts: a blog, a forum, a wiki and an index linking each of the sections. This website was the first to
serve for an informal evaluation of Constel Analytics.
Figure 39 - Constel Analytics, development website "L'Organisation Très Secrète" between 2014-02-15 and 2014-03-06
Figure 39 above presents the interactions between pages of the development website over a period
of 20 days. Firstly, it can be observed that the blog (blue) represents a weak majority of the traffic
linked to the main graph. The forum (light orange) benefits from an almost similar traffic. The Wiki
(vert) remains rather small and the index (bright orange and light blue) works, unsurprisingly, as the
hub of the website.
Nodes in the periphery, particularly numerous, are isolated pages that were visited by users with
little interest in the rest of the site: most of the time, they find the website following a specific
search on the web and do not want to browse beyond the landing page. Logically, the majority of
these points come from the forum, as forums usually host sections dealing with various subjects
which are unrelated to the main point of the website. Finally, some peripherial nodes are related to
each other in the form of sub-graphs (Figure 40) that represent specific and detached topics.
Constel Analytics Evaluation - Setting up the evaluation for this project
- 57 -
Figure 40 - Example of subgraphs. Users arrived through a search engine, read a few related pages and left the website.
The website’s weak traffic allows to isolate a few individual paths (example with Figure 41). Among
others, it is possible to observe that a particular user visited a succession of thematic pages related
to each other, these being attached to the main graph.
Figure 41 - Individual page tracking on a low-trafic website. The branch visible here comes from the main graph.
Figure 42 - Constel Analytics, development website "L'Organisation Très Secrète" from 2013-01-19 to 2014-03-10
Constel Analytics Evaluation - Setting up the evaluation for this project
- 58 -
The development underlined the fact that a graph composed of 1'000 nodes and 10'000 links is
almost unusable (Figure 42): the visual render allows to distinguish the big trends of the analyzed
website, but the necessary processing time is too high: it systematically exceeds the 30 seconds of
scripts’ maximum execution time generally allocated by standard PHP installations.
Issues with the nodes’ position have been observed as well: the space within the graph deploys
itself being too limited, it is frequent that two unrelated branches overlap, creating the fake
impression that their pages are linked while it is not the case (example with Figure 43). This problem
was particularly prominent during the evalution of the HEP of Canton Vaud’s website (see
subchapter 4.3.2).
Figure 43 - Overlapping branches with no connection. The blue branch go through the orange branch, possibly giving the impression that they are
related while it isn’t the case.
Despite those issues, it was apparent that Constel Analytics allows to identify efficiently the
structure of the development website and to visualize its traffic quickly. However, it did not mean
that the same would be true for all websites, as this graph was unique to the developement
website. Constel Analytics being a tool that must adapt to all sites, it was necessary to evaluate it
with other data. This evaluation had two distinct objectives :
- Challenging the relevance of the force-directed visualization with various users
- Testing the flexibility and the performances of Constel Analytics on websites whose size and
purposes are different
Two institutional websites were selected for this evaluation : the new section “Course offerings” of
University of Fribourg and the integrality of HEP of Canton Vaud’s website.
First choice for this evaluation, the website of University of Fribourg benefits from a large traffic as
well as various sections (that we will call “sub-sites”) available in different languages. This evaluation
was set up with the collaboration of the University’s webmasters, Nicolas Fretigny and Samuel
Crausaz, and with Serge Keller, scientific collaborator in the Communication and Media service.
Constel Analytics Evaluation - Setting up the evaluation for this project
- 59 -
Figure 44 - Homepage of "Course Offerings"
The different sub-sites of University of Fribourg are not bound to the same Google Analytics’
account: thus it was not possible to make an evaluation of all of them. However, the webmasters
would have liked to visualize the performances of “Course Offerings”, a new sub-site deployed
online in november 2013 (a view of the homepage in Figure 44). Available in three languages
(french, german and english), this website aims at appealing new students (Swiss or foreign) while
encouraging current Bachelor students to remain in Fribourg for their Master cursus.
Figure 45 - The page for the "Information systems" course
Constel Analytics Evaluation - Setting up the evaluation for this project
- 60 -
The most important part of this sub-site is the course offerings: the integrality of the available
studies are described, both at Bachelor and Master level. In addition to their informative nature, the
courses’ pages (example with Figure 45) are intended to lead Bachelor students to the Master
courses’ pages, thanks to the presence of a tab highlighting this possibility. The link to apply at
University of Fribourg is also highlighted, facilitating the transition from information research to
registration.
The sub-site also proposes two additional sections drawing attention at the benefits of living in
Fribourg:
“Life in Fribourg”, aiming at foreign students, proposes general information about Fribourg,
its geographical, sociological and economical situations. The objective is to present all the
extra-curricular advantages of being in Fribourg as a student.
“Organisation of Studies” describes the rules of the University. General information about
Bologna system and its local implement can be found there, along with administrative
specificities of University of Fribourg.
Finally, “Course Offerings” proposes a last section called “More Ressources” which offers different
links pointing to other important sub-sites of University of Fribourg (sub-sites of each Faculty,
courses calendar, ...).
“Course Offerings” being translated in three languages, visualizing this with Constel Analytics was
one of the main points of interest of this evaluation. Implied by this particularity, the different
methods allowing users to isolate certain parts of the graph for further analyses would be
challenged.
Figure 46 - HEPL Homepage
Constel Analytics Evaluation - Setting up the evaluation for this project
- 61 -
The second evaluation of Constel Analytics was led on the website of the Haute École Pédagogique
of Canton Vaud (also known as “HEPL”, which stands for “Haute École Pédagogique de Lausanne”).
The High School is currently improving its visibility in Switzerland and around the world, notably
thanks to a new website online since november 2011. The traffic there is high, just as the total
number of pages. This evaluation was led with the help of Philippe Schmid, Head of Information
Service, of Guillaume Vanhulst, Rector of the HEP of Canton Vaud, Barbara Fournier, Head of
Communication and Bertrand Mure, Project Manager for HEP Vaud website.
The new website of HEPL has several portals whose audiences are different. The one that was
analyzed is the main portal, dedicated to all common visitors. It is split into four main sections
(visible on Figure 46):
“Mission et organisation” describing the aims of the High School and the way it works. Job
offers and international information are also present under this section.
“Formation” describing the various course offerings of the HEPL.
“Recherche” which contains all the information related to the projects led within the HEPL as
well as publications
“Actualités et agenda” which covers all the events organized by or with the HEPL.
In addition, an extranet offers practical information to the students and HEPL employees.
The current website is known for its high verticality: finding information about a particular
collaborator or a specific publication requires to browse through many pages.
HEPL’s main portal aims at various visitors:
Future students looking for information about the school
Current students looking for information about their cursus or daily courses
Future employees of the High School who want to know what the HEPL could offer
Current employees who could be interested by events or might want to change their
personal information
External scientists interested by HEPL’s publications
Therefore the objective of Constel Analytics varies greatly depending on the visitors. The evaluation
was led by members of the administration whose vision is rather global: their purpose was simply to
observe the traffic and check if it changes according to several events.
EVALUATION PROTOCOL
Both evaluations were conducted following the same protocol. Constel Analytics being a prototype,
it was more relevant to lead qualitative evaluations rather than quantitative. Indeed, the latters
would have asked for resources out of the scope of this project, as evidenced by the annual BELIV
workshop (Beyond Time and Errors : Novel Evaluation Methods for Visualization) which regularly
draw attention to the fact that quantitative evaluations for data visualization are hard to set up [37].
Constel Analytics Evaluation - Setting up the evaluation for this project
- 62 -
Moreover, at this stage of the development, qualitative evaluations can retrieve more relevant
information.
The protocol of the evaluations was as follows:
0. Initial phase: discussions with the evaluators to present the project and set up the objectives
described above, access to the Google Analytics’ data
1. Performance tests and correction of the potential impediments that would have blocked the
rest of the evaluation
2. Familiarization of the evaluators with Constel Analytics
3. Interview and live demonstration of Constel Analytics
After configuring Constel Analytics, the next step is to test whether the application displays the data
correctly. Specifically, the processing speed and data integrity are verified. Bugs and other
optimization problems affecting either are corrected so as to be able to carry the rest of the
evaluation in acceptable conditions. This step is also an opportunity to make some preliminary
findings on the information it is possible to retrieve thanks to Constel Analytics.
Once these two steps are completed, Constel Analytics is deployed online so that evaluators can
access it.
The evaluators have several days to familiarize themselves with Constel Analytics, with the help of a
description of the different features.
The interview takes place, the evaluators attending together as a panel for the discussion. On this
occasion, the relevance of each group of features of Constel Analytics is tested. Using Constel
Analytics, evaluators aim to find at least three pieces of information for each group of features.
These groups are as follows:
Main visualization
Segments
Filters
Timelapse
The speed and ease with which the evaluators are able to find this information is assessed
qualitatively. During the interview, the evaluators are also invited to give their opinion freely.
The following sections provide the results obtained when applying the protocol described above.
Constel Analytics Evaluation - Performances
- 63 -
PERFORMANCES
Observations regarding the performances of Constel Analytics were mostly made during step 2 of
the evaluation protocol.
Below is a table summarizing the main characteristics of each use case:
Evaluation 1: Unifr “Course Offerings”
Evaluation 2 : HEP of Canton Vaud
Purpose Description of the course offerings for the University
Main institutional portal of the High School
Audiences Students of the University Various audiences (students, employees, ...)
Traffic Average : ~800 different pages seen in 20 days
High : +1’000 different pages seen in 20 days
Deployment University’s server Dedicated shared hosting
Table 7 – Main characteristics of use cases
EVALUATION 1: UNIFR “COURSE OFFERINGS”
From a technical point of view, the configuration went smoothly. However, the number of steps
required to configure Analytics Constel exceeds those of a standard web application that can be
found on the Internet. More problematic, the fact that is is necessary to use the Google Cloud
Console (to create a Client ID), Google Analytics (to recover the account ID and website tracking
code) and the Symfony configuration file makes this operation quite errors-prone.
One main issue was identified while testing Constel Analytics on “Course Offerings”: different pages
have the same title. Namely, a course given both in Master and Bachelor curriculums has the same
title, which made both pages appear as one in the force-directed visualization, a problem that
impedes the integrity of the data. Figure 47 shows that only a few nodes are visible in the initial
visualization, Master and Bachelor pages being fused. In order to overcome this limitation, a
configuration option was implemented to specify which of the title or the URL have to be taken into
account (using "title" by default). The Analytics Service was modified to support both ways of
gathering data. Moreover, using URLs allow to save one request to Google Analytics as there is no
need to differentiate the titles from the URL anymore. On the other hand, using URLs instead of
titles creates one node for each research, which makes the graph somehow less readable. In order
to keep track of pages with similar titles, it was decided that hovering a node would change the
filling color of all similarly named nodes. Once implemented, this option introduced a number of
bugs related to the selection of nodes (some functions made use of the title to identify nodes). For
instance, the pathfinder feature does not work anymore.
Constel Analytics Evaluation - Performances
- 64 -
Figure 47 - Initial visualization for "Course Offerings" from 2014-01-29 to 2014-02-17. Only four colors were visible, Master and Bachelor courses’
pages are merged.
The processing time was deemed reasonable, though the generation of multiple graphs (as with the
timelapse feature) might lead to execution’s time error. Ajax requests were adapted to limit the
risks of such situations.
The deployment should have occurred on the University’s servers. However, Symfony 2’s
requirements were higher than expected and it was decided to deploy the prototype on the
author’s personal website instead.
EVALUATION 2: HEP OF CANTON VAUD
Like in the previous evaluation, configuration for the HEPL website has not posed any technical
problem, while several Usability issues were identified (see next chapter for more information).
However, a few more problems were observed.
The amount of visited pages was much higher than anticipated. HEPL had over 1’000 pages
visited in 20 days, which made the visualization excessively slow to display. It could take up
to 3 minutes to display a graph, while most PHP installations limit execution time to 30 secs.
Beside a few basic optimizations (removing useless loops through the sets of results), a new
option was introduced to let the user decide how many pages should be displayed. By
default, Constel Analytics only displays up to 100 pages and 1’000 interactions which greatly
speeds up the display time without losing the main structure of the website.
Every node had the same color. The first hierarchical level of the website's URLs is actually
always the same ("cms"). As this might be the case of many other websites, it was decided to
modify the way that groups are determined. Instead of getting this information from Google
Analytics, Constel Analytics now discovers the hierarchical levels by itself, splitting the full
path according to the number of slashes (“/”). Working this way induces a risk of having too
Constel Analytics Evaluation - Usefulness &Usability
- 65 -
many groups for a readable visualization (the HEPL website has more than 40 different
hierarchical levels). A configuration option was thus implemented in order to specify up to
which level Constel Analytics should analyze the URL (1 by default).
Once again, deployment was problematic. Following the suggestion of the evaluators, the
deployment should have been made on a hosted website specifically purchased from a private
service provider. However, the server's configuration was not up to date: a problem between
Symfony 2 and libxml2 prevented the application from working properly. Therefore it was decided
to deploy Constel Analytics on the authors' website.
USEFULNESS &USABILITY
Feedbacks regarding the relevance and the usability of Constel Analytics were mainly collected
during step 4. However, a few observations were also made during step 2. The results are split into
three parts:
observations made during step 2
insights acquired by the evaluators during the interview
remarks and suggestions of the evaluators
EVALUATION 1: UNIFR “COURSE OFFERINGS”
Several usability issues were identified during the performances’ evaluation.
It is easy to neglect one of the steps during the application’s configuration. Being force to use
several administration panels can lead users to confusion and impede the installation of
Constel Analytics.
This visualization works as if there were three similar parts (one for each language).
Unfortunately, similar pages between languages do not have the same color. For instance,
“Information System” in English and “Systèmes d’information” in French are not related by
any mean. It is hard to keep track of the evolution of a single page over the three different
languages.
Below are the observations made by the evaluators during the Interview. Additional screenshots of
this evaluation can be found in Appendix C. The interview lasted one hour and a half, during which
the three evaluators went through all four features of Constel Analytics.
Constel Analytics Evaluation - Usefulness &Usability
- 66 -
MAIN VISUALIZATION
Figure 48 - Main visualization for Eval. 1, 2014-02-15 to 2014-03-06. The three groups are visible: French is at the top left of the graph, German is at
the right of the graph and English is at the bottom of the graph.
Figure 48 shows a possible representation for the “Course Offerings” website. It is limited to 250
pages and 10'000 interactions. Evaluators found the following information, sorted by chronological
order of discovery:
# Insight Description How did they discover this Ease and speed
1 Three main areas Expected behavior.
Without surprise, three areas clearly stand out. They represent the three languages in which the website is proposed. The german and the french parts are similar in terms of size and structure while the english part is slightly smaller.
Simply by looking at the three main groups. Structure was deemed similar because of the two recurrent entry points in each group and the fact that colors’ groups seem to match. Size was qualitatively appreciated.
This insight was acquired without effort as soon as the graph appeared.
2 Sections are well divided Expected behavior.
Users of the website tends to explore it section by section. “Life in Fribourg” and “Organisation of studies” are loosely tightened to the rest of the graph, for instance. However, a few branches have connections with “Life in Fribourg”, as there were links pointing to this section in their description.
It appears that besides the colors used to represent courses, other colors are clustered into individual branches without many interactions with the others.
The information was acquired after a few minutes, as the evaluators started to move the cursor of the mouse over the graph and check details.
3 Bachelor students do not care about Master courses
Unexpected behavior.
The evaluators expected the Bachelor students to read Master courses’ pages after they read the Bachelor courses’ pages they were after, but it seems like it did not work.
If the students had behaved in the expected way, there would have been some visible branches in the graph in which Bachelor pages precede Master’s. The evaluators deemed that it was not the case.
This information was not easy to understand, as it required the evaluators to understand which colors were attributed to Master and Bachelor pages for each language. Then they had to double-check whether those branches exist or not.
Table 8 – Insights acquired thanks to the Main Visualization, Eval. 1
Constel Analytics Evaluation - Usefulness &Usability
- 67 -
SEGMENTS
The evaluators selected three segments in order to challenge the interest of this feature.
# Segments : Insight Description How did they discover this
Ease and speed
4 New visitors : German-speaking visitors seemed to be more represented
Unexpected behavior.
By selecting this segment, evaluators hoped to highlight potential differences between the new visitors compared to returning visitors. At first glance, it seemed that German visitors are more represented in this category. This unexpected observation can be explained by the fact that the evaluators – being responsible for the implementation of this new site – are French. They have frequently visited the site, possibly influencing the Returning Visitors segment in favor of French-speakers.
The size of each group was estimated at glance.
This insight was acquired quickly, without much effort.
5 Tablet and mobile: no difference in traffic’s structure
Unexpected behavior.
Evaluators expected mobile visitors to behave differently from regular visitors. It did not seem so.
By comparing the non-segmented graph to this one, it appeared that there were not different in the patterns of the visits.
This information required the evaluators to check several nodes and appreciate the global distribution of the graph. This process was arguably quick.
6 Referal traffic: english articles lead to other languages’ sections
Unexpected behavior.
It seems like people referring to people landing on the website through external links tend to switch for another language. A few hypotheses were made regarding this: perhaps that some French and German visitors reached Unifr website through an English article and switched for their mother language after the first page.
The evaluators noted that the English part is less distinct and spread across the French and the German groups.
This information was found by appreciating the global distribution of the graph, which was done quickly.
Table 9 – Insights acquired thanks to the Segments, Eval. 1
FILTERS
Various observations were made using the Filters’ bar chart visualization and the filters’ application
on the main visualization.
Figure 49 - Country filters for Eval. 1. It is easy to see that China comes right after the main western countries.
# Filter : Insight Description How did they discover this
Ease and speed
Constel Analytics Evaluation - Usefulness &Usability
- 68 -
7 China : second after Western countries Unexpected behavior.
China comes right after Western countries in terms of visits for the analyzed period.
The bars were compared (Figure 49)
It was fast and simple, as all it required was a glance at the order of the bars.
8 Switzerland: majority of French-speakers
Expected behavior.
It was assumed that a majority of visitors were French-speakers, notably because of the evaluators being French-speakers.
The size of the French parts were compared across the main visualization, the filtered visualization for Switzerland and the filtered visualization for Germany.
Constel Analytics does not provide quantitative information, but the sizes of the French group was arguably larger in Switzerland.
9 Spain and Italy: more French-speakers than others
Expected behavior.
Latin countries have more visitors on the French part of the website than on the German part. In the case of Spain, French is clearly dominant.
By checking the filtered results for Italy and Spain, it was observed that the French group was larger.
The size of the German and English parts were clearly smaller.
Table 10 – Insights acquired thanks to the Filters, Eval. 1
Other dimensions were also used: the webmasters were interested to see if the browsers had
impact on the way visitors navigate through the website. More specifically, several features were
disable for older version of Internet Explorer. Unfortunately, this could not be confirmed.
TIMELAPSE
The timelapse was used in order to detect changes from month to month. Mainly, the webmasters
had introduced a few more pages for late subscriptions to the University and wanted to check
whether those were going to appear.
Figure 50 - 30 days timelapse, Eval 1. It is possible to see slight differences in the size of three groups over the months.
Constel Analytics Evaluation - Usefulness &Usability
- 69 -
# Period : Insight Description How did they discover this
Ease and speed
10 30 days : the late subscriptions do not appear
Expected behavior.
Fribourg proposes the specificity of accepting late subscriptions. Several pages were introduced to allow these. They did not appear in the graph, reinforcing the idea that late subscriptions are definitely a rare case.
After using the dynamic search for the name of the pages, it appeared that they were not present.
A simple search was enough to assert this.
11 30 days : more French-speakers at the start
Unexpected behavior.
The number of French-speaking visitors was proportionally higher during the first months of “Course Offerings”. Once again, this might be explained by the fact that the evaluators were French-speakers themselves.
The size of the French group decreased with months, as shown with Figure 50.
A glance was enough to find this insight.
12 30 days: size of the English part varies according to calendars
Expected behavior.
The size of the English part varies from month to month, according to other countries’ calendars. It is assumed that foreign students start looking for appliance around the end of their semester.
The size of the English group varied with months.
Unlike the previous observation, this one was not clear enough. It was only assumed that the English group varied in size.
Table 11 – Insights acquired thanks to Timelapse, Eval. 1
Figure 51 - It is necessary to scroll down the page in order to view "Current page" information.
The majority of comments that emerged during the interview focused on access to information.
In the case of a screen with a low resolution or a large zoom level, the box allowing to see
the name of the current node disappears under the display line of the browser, forcing the
user to scroll down the page (Figure 51). This information should be visible at any time.
Constel Analytics Evaluation - Usefulness &Usability
- 70 -
Evaluators have repeatedly tried to click on the nodes to stabilize the information in the
"current page" box: they thought that clicking on a node would disable the display of
information at mouseover.
From one visualization to another, the color and the size of the nodes are not consistent.
This is mainly a problem in the case of the timelapse feature, since the only way to identify a
node present on several graphs is to hover it with the mouse’s cursor.
A single page, accessed by the same URI ending with and without a slash appears as two
different nodes on the graph. For example, the URIs “/fr” and “/fr/” are considered as two
different pages. That explains the presence of two main entry points (blue nodes) for each
language.
The value shown under the “minimal weight” slider, the size of the nodes and the height of
the filters’ bars all use a logarithmic scale, but there is no indication of this anywhere. The
evaluators were misled repeatedly.
The intensity of the interactions is hardly visible. It is also impossible to see the direction of
an interaction.
In case Google Analytics returns no result, Constel Analytics displays generic PHP error
messages outside of the layout.
In addition to the remarks on the current features of Constel Analytics, evaluators suggested adding
a printable version of the main visualization, in order to present statically. The ability to zoom in on
a selected branch in order to see its details has also been proposed: the standard visualization
would present only a hundred points, and the user would be free to zoom in on a portion of the
graph in order to show others nodes.
Overall, the evaluators were interested in Constel Analytics and are satisfied by the discoveries
made during the test. However, they noted that the application is meant for exploration and is not
optimal for a report.
The evaluation led to interesting findings that can however be questioned. While it is possible that
some deductions are wrong (which would be a problem of interpretation rather than a problem of
the tool), Constel Analytics itself possibly misled the evaluators. Findings #4 and #8, for instance, are
far from confirmed as the French and the German groups had arguably the same size. Without
quantitative information, Constel Analytics only leaves us with assumptions that cannot be verified.
And while finding #5 stated that there was no difference between mobile/tablet traffic and desktop
traffic, a closer look would spot that the English part is slightly smaller and that the structure of each
language is different (less interconnections between the branches). Additional measures must be
provided to prevent those misinformation.
Constel Analytics Evaluation - Usefulness &Usability
- 71 -
Figure 52 - Difference between expected traffic and actual traffic in Eval. 1. If visitors went from different search pages to the Master course's page,
then the node will not be related to the Bachelor course's page as much as we expected during the evaluation.
One of their main questions was to know whether Bachelor students visit Master’s courses. Finding
#3 stated that it was not the case, because it was assumed that there should have been branches
consisting of three nodes (search for courses -> Bachelor course’s page -> Master course’s page).
However, assuming that some Master students went directly from search to the Master course’s
page, then the page would not look like a separate branch anymore and could be positioned in a
more ambiguous way. Figure 52 illustrates this issue.
This problem could be solved by limiting the number of links coming out each node, so that only the
significant interactions will be taken into account.
EVALUATION 2: HEP OF CANTON VAUD
It has been found that graphs with a lot of interactions tend to spread themselves across wider
distances. Being the most interconnected graph of the evaluation, HEPL’s visualization highlighted
the weakness of the current Charge-computation function used by Constel Analytics. As this
function does not take into account the number of interactions, charge was constantly
overestimated and many nodes were lost out of the SVG area.
The evaluation lasted one hour with two evaluators going through the four features. Unfortunately,
we were not able to collect 12 information for all the features of Constel Analytics. Instead,
Constel Analytics Evaluation - Usefulness &Usability
- 72 -
additional suggestions were given by the evaluators (see 4.3.2.3) regarding the evolution of the
application. All the screenshots of this evaluation can be found in Appendix C.
MAIN VISUALIZATION
Figure 53 - Main visualization for Eval. 2, 2014-03-27 to 2014-04-15.
# Insight Description How did they discover this Ease and speed
1 Structure of the traffic matches the expectations
Expected behavior.
Globally, the traffic on the website matches what the webdesigners had planned. There exists several branches leading specific Course Offerings, a dedicated part for students’ information (regarding internships, legal aspects, ...).
By looking at the graph, it is easy to spot the different branches. Mainly, Course offerings (light blue) and the student’s information part (orange and violet nodes at the top of the graph).
This observation was made as soon as the meaning of the nodes’ colors was identified.
2 Few interactions between branches
Expected behavior.
There is not a strong traffic between the courses’ description, which was intended as each branch matches specific needs.
There are no links between the different branches of the graph.
This observation was made along with the previous one.
Table 12 – Insights acquired thanks to the Main Visualization, Eval. 2
Constel Analytics Evaluation - Usefulness &Usability
- 73 -
SEGMENTS
Figure 54 - New Visitors segment for Eval. 2. The student's information part is much smaller.
# Segments : Insight Description How did they discover this
Ease and speed
3 New Visitors: student’s information part is less important
Expected behavior.
The student’s information part is smaller because this section regards current students of the school who are supposed to visit the website frequently.
By comparing the size of the student’s section with the main, non-segmented visualization.
Discovered quickly as it was the most anticipated change.
4 New Visitors: “Accès rapide” is not used to browse through the webiste
The website has a “quick access” section that is supposed to lead visitors to their destination quickly. It does not seem that those pages are really used. This observation could have been done in the main visualization as well.
Browsing through the nodes was necessary to figure this out.
Table 13 – Insights acquired thanks to the Segments, Eval. 2
Constel Analytics Evaluation - Usefulness &Usability
- 74 -
FILTERS
Figure 55 - Filtered results for France for Eval. 2. Most of the visible branches represent successions of pages in the Recherche section.
# Filter : Insight Description How did they discover this
Ease and speed
5 France : less visitors looking at the course offering
Expected behavior.
Foreign visitors tend to visit the publications, seminar reports and events more than the Swiss visitors. It seems logical as the further they live from Switzerland, the less likely they are to be looking for a formation or a job at HEP of Canton Vaud.
The branches are different. Course offerings are still present but are slightly smaller while new branches are formed, with events, agendas and other activities.
It was quickly observed that the structure of the graph was different. Understanding the actual difference asked for a few searches as colors had changed and did not match the previous graphs.
6 South Africa : not much success despite diplomatic attempts
Expected behavior.
The High School negociated agreements with Burkina Faso and Mozambique during the past years. It does not seem that those agreements led a lot of visitors from these countries on the website, which can be explained by the lack of infrastructure there.
By looking at the list of countries: Burkina Faso is at position #36 while Mozambique simply does not appear.
Observing the bar chart was enough to get this insight.
Table 14 – Insights acquired thanks to the Filters, Eval. 2
Constel Analytics Evaluation - Usefulness &Usability
- 75 -
TIMELAPSE
Figure 56 - Timelapse (30 days) for Eval. 2. The node with the thick red border on the third graph is Fukushima's event. The white node present on
the three first graphs is the main page of Freinet's event (related nodes are in saumon in the first two graphs and in violet in the third)
# Period : Insight Description How did they discover this
Ease and speed
7 30 Days : Freinet’s exposition generated traffic
Expected behavior.
The HEP of Canton Vaud led several event during the last months, among which the visit of the last inhabitant of Fukushima and an exposition about Freinet. The latter was prepared over several months with a dedicated section of the website which generated a consequent traffic.
The section dedicated to the Freinet’s exposition appear in January, February and March graphs. The Fukushima’s event appears in March.
A search was enough to locate the pages on all four graphs.
Table 15 – Insights acquired thanks to Timelapse, Eval. 2
As HEP of Canton Vaud does not currently have a specific Web Analytics strategy, the two evaluators
thought of this application as a possible solution for their needs: in this context, they regard it as
somewhat limited and provided many suggestions regarding the features that could make it more
useful.
This visualization does not currently provide information that could help improve a website. As it
stands now, Constel Analytics shows the general traffic of a website, and as long as the
webdesigners made a decent job, it is unlikely to spot any weaknesses beside the most obvious
ones.
Amongst the suggestions for improvement, the possibility to zoom over a group of pages and see
further details about them was the most important. The visualization would act as a visual
Constel Analytics Evaluation - Evaluation with the ten activities
- 76 -
dashboard to reach different sections with more details about individual users’ tracking and
population’s composition.
Another point raised by the evaluators is the possibility to analyze several Google Analytics’ account,
as each of the HEP’s portal has a separate ID. This would help see how those portals interact
between each other (we assume the visualization would look similar to the three groups of the
previous evaluation).
Beside these suggestions, some of the UI issues identified during evaluation 1 were also spotted.
The fact that colors of the groups change and that the layout is recomputed each time the page is
refreshed rather than being frozen were deemed as problematic.
EVALUATION WITH THE TEN ACTIVITIES
In addition to the remarks made by the evaluators during the interviews, a review of the
visualization can be done in the light of the ten activities discussed in chapter 2.2.
Characterize distribution: Force-directed layouts are best at supporting characterization of
the distribution. It is easy to understand the structure of the traffic with a few glances,
something that would be near-impossible with usual visualizations found in current web
analytics tools. Ironically, this observation was made by the author of this document while
he was conducting the evaluations rather than by the evaluators themselves.
Find anomalies: Anomalies in the context of the traffic of a website can take different
shapes, like a webpage being related to unexpected sections (or the opposite) or several
connections being stronger than anticipated. Spotting such anomalies is possible thanks to
several factors, like the grouping of the pages, which visually indicates to which group each
page belongs or the opacity of the interactions. In practice however, it might be hard to spot
the few nodes that do not belong to a group in a very large graph.
Find extremum: Finding extremum is possible by merely looking at the graph. The largest
node usually stands out pretty well, just like the most intense interaction. Local extrema are
harder to spot but this activity is supported by an efficient “minimal weight” slider: this is
one of the most precious information given by Constel Analytics as it clusters pages and
proposes a way to retrieve the most and the least important of them in each cluster.
Retrieve value: Retrieving specific values can be hard at first glance, since there are no
indication of pages’ name if the users do not hover nodes with their mouse. The search
supports this activity as intended, but it could be possible to add a few textual indications
directly on the graph.
Cluster: Clustering nodes works as good as the characterization of the distribution: the force-
directed layout groups nodes as intended.
Sort: Sorting nodes is not possible with a Force-directed graph as the algorithm computes
the position of each node.
Correlate: Correlation in Constel Analytics would consist of comparisons between several
datasets, such as “did the mobility section of our website gain visibility following the latest
Constel Analytics Evaluation - Evaluation with the ten activities
- 77 -
votations ?”. With its ability to compare several datasets, Constel Analytics provides basic
tools for correlation. However, evaluations have pointed out that their implementation is
still far from perfect.
Filter: Filtering is possible thanks to the dedicated eponymous feature. However, it could go
further by allowing users to select several dimensions and metrics.
Compute derived value: Computing derived values is hardly possible as statistics are lacking
regarding the data. The Force-directed layout does not help here neither. Several solutions
have been considered in the following section.
Determine range: Determining ranges is not possible by looking at the graph itself.
Computed values in the nearby panels must be consulted in order to get quantitative
information regarding the exact ranges.
SUMMARY
In order to assess Constel Analytics, qualitative evaluations have been conducted because they were
deemed better suited than quantitative evaluations due to the complexity of the latters. Two
institutional websites with different structures have been selected to serve as a basis for this
assessment.
The evaluation protocol consisted first in configuring Constel Analytics, testing its performances and
deploying it online. In a second step, evaluators familiarized themselves with the application before
being interviewed during a live-demonstration, so as to gather feedbacks on both usability and
relevance of the Constel Analytics.
Most of the technical evaluation worked well, though some of the application’s secondary aims are
not reached, as evidenced by the risky deployment that required a backup solution in both cases.
Feedbacks of the two assessments globally overlap. Constellation Analytics is a relevant and
interesting application whose main goal – providing a visualization of a website’s traffic – is reached.
However, it requires some more work to be really useful to a website. It also suffers from usability
problems and lack of quantitative information, not to mention the risk of bias induced by the nature
of the visualization currently implemented.
A part of the observations has already led to changes in Constel Analytics. The solutions that have
not been implemented yet are described in the following section.
Constel Analytics Discussion - Evaluation with the ten activities
- 79 -
DISCUSSION
“A person who never made a mistake never tried anything new.” - Albert Einstein, scientist
On the basis of the feedbacks collected during the evaluations, several ideas of improvement for
Constel Analytics were discussed in order to make it go beyond its initial limitations. As these ideas
were explored further, several potential implementations emerged. This chapter aims at discussing
those implementations which take the form of short-term improvements that could be immediately
taken.
TABLE OF CONTENT
DISCUSSION ......................................................................................................................... 79
Table of content ................................................................................................................................. 79
Information management ...................................................................................................... 80
Detection of communities ................................................................................................. 80
Categorization of the interactions .................................................................................... 82
Various UI improvements ................................................................................................. 80
New features .......................................................................................................................... 83
Sitemap comparison.......................................................................................................... 83
Other sources of data ........................................................................................................ 85
Vistits typology .................................................................................................................. 84
Summary ............................................................................................................................................ 85
Constel Analytics Discussion - Information management
- 80 -
INFORMATION MANAGEMENT
Among the feedbacks of the evaluations, several points regarding the efficiency of the current
visualization were raised as it appeared that it lacks information (mostly quantitative information,
but not only). Solutions discussed below focus on how to improve the Force-directed graph in order
to make it more efficient.
VARIOUS UI IMPROVEMENTS
Figure 57 - Possible future layout for Constel Analytics. The General information box is hidden by default, leaving more place for the tools.
Information about the current page are directly visible next to the cursor.
Evaluations led to the conclusion that the User Interface of Constel Analytics might need several
changes in order to improve its usability. Among others, it is necessary to rethink the layout of the
visualization, as the boxes to the right tend to take too much place and sometimes require scrolling
in order to be visible. Spreading those information directly on the graph could be a solution as it
would not require so much space on the page. Figure 57 shows how much place could be spared by
hiding the general information into a lightbox (it is not useful to keep it visible all the time) and using
tooltips to display information about the hovered page.
Other solutions to improve the readability of the graph include:
Displaying only interactions between groups of pages (should the “detection of
communities” point be implemented)
Providing a way to see interaction’s actual intensity instead of simply guessing it by the
opacity of the link
Comparing dataset by making the main graph evolves rather than seeing four different
graphs at once
DETECTION OF GROUPS
The two evaluations identified how problematic it could be for users to clearly detect groups – or
communities - of pages in large Force-Directed graphs. The way pages are currently grouped (using
their path) is undeniably useful since it allows users to “find landmarks” by mapping their site’s
Constel Analytics Discussion - Information management
- 81 -
structure on the visualization. However, it does not help making actual groups of pages stand out. In
highly-interconnected graphs, figuring out which are the different groups and how they interact
together can be tricky and misleads users. By using a community-detection algorithm, it would be
possible to automatically group pages according to their proximity. Those groups could then be
compared with the time-lapse features.
Figure 58 shows a manually detected Force-directed graph with groups’ detection, based on an
additional website that was used during the development of the project4. It presents the way that
groups could be displayed on a graph, by encapsulating them into larger transparent sets with
colorized borders. Naming of those groups could be either automatic by taking the name of the
most prominent page or by checking similarities between names, either manual.
Figure 58 - "Target render" for groups’ detection. This is the visualization of Nid du Phénix’s website (a defunct store in Fribourg), which had a
forum (light blue at the top of the graph) and several sections of products that were manually highlighted. Ideally, the application should be able to
group the pages automatically and allows users to name the groups by themselves.
Automatic group detection could work using an algorithm like Girvan-Newman algorithm [38].
Because of its relative complexity, group detection should work as an option located in the toolbox
so the users will choose to use it or not.
As it stands now, Constel Analytics also fails to provide quantitative data that could be used for
weekly or monthly reports. Automatically grouping communities of webpages would allow Constel
4 This is the visualization of nid-du-phenix.ch, a now defunct website for an entertainment store in Fribourg that closed in 2009.
Constel Analytics Discussion - Information management
- 82 -
Analytics to display their weight (either as percentage or number), which would in turn allows
webmasters to know whether a particular population has grown or not over the last period. This
would help support the “Compute derived values” activity, which was set aside in the first version of
the application.
CATEGORIZATION OF THE INTERACTIONS
Network visualizations suffer from several pitfalls, amongst which is the risk of misunderstanding
the nature of a link. Concretely, evaluations have shown that Constel Analytics can mislead its users
because the graph suffers from memory loss and from a lack of information about the nature of the
interactions. While the memory loss issue could only be solved by using a Data Source that allows
querying visits’ individual path, the second can be addressed by performing a small modification on
Constel Analytics. As it stands now, the application categorizes nodes according to their group. It is
possible to do the same with links.
As an example based on the dimensions available on Google Analytics, it could be imaginable to
categorize four types of interactions, according to the time spent by the user at the destination
page. It would change the second query (see subchapter 3.3.2 for information about the Data
Processing) by requesting a new metric (ga:avgTimeOnPage).
Natures of interactions Characteristics Description
Regular interactions Average or long time on page Read an article, check an image, write a message, ...
HUB / Navigation interactions Short time on page On a gallery website, visitors go back to the photostream to select different pictures to see.
Bounce interactions Shortest time on page Clicked on a page, figured it is not the desired information, go back to the previous page.
Exit interactions Exit page (no time recorded) Leave the website, possibly after finding the desired information or without so.
Table 16 – Different types of interactions
The exact definition of what is a short or long time spent on a page depends on the website’s
average (a histogram could be used to display this information as well).
Figure 59 shows how this categorization could work, using different colors for the categories of
interactions – while it would be possible to represent them with symbols at the end of the link. In
this example, C is clearly a central page linking a few other pages (like an index linking to all the
chapters of a book). D is often used as an exit page (which means that it could be the end of a
book).Thanks to those categories, it is possible to understand how – and likely why – do visitors
move from a page to another.
Constel Analytics Discussion - New features
- 83 -
Figure 59 – “Target render” for interactions’ categorization. Black (faded) links represent the bouncing visits, blue links represent the Hub visits, the
white link represent the exit and orange links are regarded as “regular”.
This solution would however make the graph less readable, as there could be up to 4 edges between
each pair of nodes. This problem could be solved by displaying those links only in certain
circumstances and without recomputing the whole graph (a click on a node, a certain level of
zoom, ...).
NEW FEATURES
SITEMAP COMPARISON
One of the remarks that emerged often during the evaluations was the fact that Constel Analytics
does not directly show anomalies in the website’s traffic – these have to be discovered by exploring
the graph. A solution proposed is to compare the actual traffic with the ideal sitemap – or site index
– built by the webdesigner.
Figure 60 - Sketch for a possible "site map" visualization. The size of the trait represent the traffic: the thicker it is, the more intense the traffic is.
In order for Constel Analytics to display such a map, the application would have to analyze the
whole arborescence of the website and sort all its path levels. It would then be possible to show the
map as a tree layout, whose branches would indicate the intensity of the traffic (either with color,
opacity or size). This visualization would make anomalies in the traffic immediately visible, while the
Constel Analytics Discussion - New features
- 84 -
force-directed layout would offer the advantage of shaping the whole site according to its actual
traffic rather than according to the webdesigner’s intends. Figure 60 shows an example of how such
a visualization could work.
The use of a radial tree layout could be relevant, as large websites tend to have a lot of pages to
display. But even so, the granularity of such a tree would have to remain relatively low, as a tree
with over 1’000 leaves would be clearly impossible to read.
VISITS TYPOLOGY
During the conception phase of Constel Analytics, it was initially intended to build a visualization
about visits and visitors. The idea behind this was to sort the different types of visitors of a given
website, classifying them into different clusters according to their similarity.
The main question posed by this feature is "what is visitors' similarity?". Visitors can be sorted
according to various dimensions or metrics: most tools already provide information about
demographics, technology and others. An innovative way of sorting users would be to check the
succession of pages they visited and the time they have spent on each page. Finding similarities
between the progressions of visitors could indicated different usages – and thus, users – of the
website. The algorithm developed by N. Labroche, M.-J. Lesot and L. Yaffi [28] could be used and
extended to take into account the time spent by the users on each page, as this metric is also
revealing of their interests [39].
Using a clustering algorithm, groups of visitors could be visualized as a dendrogram and allows users
to name the clusters and save them as segments for further use (Figure 61 shows an example of the
visualization with named groups of users).
Implementing this feature is not possible with the free version of Google Analytics as its APIs cannot
retrieve all the visits’ steps. Using alternatives like Piwik would be a solution.
Figure 61 - Sketch for Visitors' typology. 7 visitors are distributed at various levels, depending on the similarities of their visits’ steps.
Constel Analytics Discussion - New features
- 85 -
OTHER SOURCES OF DATA
While Google Analytics is the most widely used web analytics tool, it is not the only one and several
companies might want to use Constel Analytics with other sources of data. In particular, companies
which regard data privacy as a sensitive issue are usually reluctant to use external tools and would
rather go for self-hosted systems like Piwik or Open Web Analytics.
Piwik being one of the most popular choices after Google Analytics, we studied the possibility of
using it as a Data Source. It benefits from its openness and from the fact that the data are stored
locally, which can be an advantage for performance as much as privacy. Moreover, Piwik provides
complete APIs to access its data either internally or through RESTful services, which make it eligible
as a source of data for Constel Analytics. Moreover, Piwik offers a lot more details about individual
visits which opens new perspectives for the Visits’ typology feature.
However, a few limitations prevent the integration of Piwik in Constel Analytics. Namely, two main
problems were identified:
The current APIs do not provide a convenient way to list the interactions between pages, as
it would require one request for each page. From a performance perspective, this is not
possible considering the fact that those requests are likely to be handled through the RESTful
APIs.
Piwik can sort pages according to their URL or according to their Titles, but there are no links
between the two.
In order to integrate Piwik, two options have been considered. The first one is to access Piwik’s
database directly and use the raw data. The second one is to contribute to Piwik and extend its API.
Both solutions should be analyzed in further work.
SUMMARY
The evaluations led to discussions about how to solve the different issues identified. Solutions were
classified in two groups, one covering the improvements of the current visualization and the other
including future features not necessarily related to the force-directed graph.
Regarding the main visualization, it has been considered to group pages more efficiently through the
use of a community-detection algorithm. In order to give more information about the links, using
the average time spent on the destination page could be useful as it would help in particular
distinguishing hub pages from highly connected pages that require more time to read. Some
redesigning suggestions were also made regarding the UI after it was observed that it was not fitted
for all of the users’ tasks.
As for the new features, the possibility to display the website’s map as a tree has been presented.
Webmasters would find anomalies in the traffic of their website easier than with the force-directed
graph. The possibility to use different sources of data has been exposed, showing that some APIs
would not suit the force-directed visualizations easily. Eventually, the possibility to cluster visitors
Constel Analytics Discussion - New features
- 86 -
instead of pages has been addressed: while the algorithms exist and would likely work, the current
Google Analytics APIs do not provide sufficient information to categorize users’ behavior according
to their pattern.
Constel Analytics Conclusion
- 87 -
CONCLUSION
“If all difficulties were known at the outset of a long journey, most of us would never start out at all”.
- Dan Rather, journalist and former news anchor
This section serves as a conclusion to this project. A first chapter summarizes how this master thesis
was led. Then, a synthesis is proposed, summarizing the limitations and the weaknesses of Constel
Analyitcs while putting in perspective the valuable insights acquired during its development and
evaluation. Eventually, we will present the possibilities of evolution for Constel Analytics: beyond
the short-term improvements described in the previous section, what could be the future of this
application?
TABLE OF CONTENT
CONCLUSION ....................................................................................................................... 87
Table of content ................................................................................................................................. 87
Wrap up .................................................................................................................................. 88
Conclusion .............................................................................................................................. 88
Future work ............................................................................................................................ 89
Constel Analytics Conclusion
- 88 -
WRAP UP
This master thesis worked as a step-by-step discovery of web analytics, starting with theoretical
discussions, going through an overview and a classification of current tools and proposing a new
take on how it is possible to report data collected.
First, the definition of web analytics was discussed, providing a quick overview of all the four parts
of the web analytics management lifecycle. We also stressed the importance of the web analytics’
evolution: it started with the simple need of measuring the audience of a websites to a whole new
dimension, tracking users’ behaviors from many different devices over many different channels. As
we hypothesized that data visualization in current web analytics tools could be a problem, a
description of data visualization was also given, explaining how a visualization could be evaluated.
We reviewed four different data visualizations that could be used in the context of web analytics,
and spoke about others that are already being used. Then, we proposed a survey of 20 web
analytics tools from which emerged a taxonomy to categorize them according to several criteria,
such as their pricing and targets.
The application developed during this project was aiming at making interactions between the pages
of a website easier to see. We based it on Google Analytics’ data and used open-source technologies
for its development: PHP and Symfony as a base and D3 for its visualization. In order to evaluate the
application, two Swiss institutional websites were selected. We challenged Constel Analytics in two
steps: first by testing its performances and flexibility by deploying it online, then by evaluating its
relevance and usability with evaluators. Eventually, we provided some ideas on how the application
could be improved.
CONCLUSION
This project started with the assumption that visualizations in current public web analytics tools
were simplistic and might be replaced by more advanced versions. One of the first contribution of
this thesis was to provide a classification of web analytics tools based on their targets and nature, as
we believed this was the main difference between them.
Among the findings of this study, a particular fact has been highlighted: many academic Web
Analytics and Web Mining projects use Network diagrams to represent their data while none of the
existing commercial solutions do, as if advanced visualizations did not suit public software. Constel
Analytics filled the gap: network diagrams – with a Force-directed layout method – offer a clear
added value to existing and massive data collected by commercial Web Analytics solutions. While
we believe that extensive evaluations would be necessary to declare that this is undeniably true, the
present study made a clear step in this direction.
We hope that the evolution of Web Analytics in the years to come will take interest in those
advanced visualizations and algorithms, using them in order to make analyses both understandable
for more people and more efficient at predicting users’ behavior.
Constel Analytics Conclusion
- 89 -
FUTURE WORK
Regarding the academic works in the field of web analytics, we believe that this thesis might serve
as a base for further evaluations regarding the relevance of network diagrams and other unusual
visualizations in web analytics.
As for the application developed during this thesis, it is bound to be improved. Evaluations and
discussions led an observation: Constel Analytics can evolve into two different directions.
The first one is the direction of visual analytics, a disciple that seeks to support analytic reasoning
through automatically processed data presented in an interactive way [40]. By using a force-directed
graph, Constel Analytics already made the first steps in that direction: it can be regarded as a
computationally enhanced Visualization as defined in Bertini and Lalanne’s taxonomy [41]. Proposed
short-term improvement in section 5 would turn it into an actual visual analytics tool that would not
only display information, but also process data in order to detect patterns in visitors’ habits or in the
way pages interact together.
The second direction is communication. Evaluators noted that Constel Analytics was not able to
provide clear, instantly understandable insights that anyone could understand. Allowing the
application to produce advanced reports highlighting complex findings while remaining clear for
everyone is a necessary step should it be integrated – or developed as – public software. To this
end, the use of infographics is considered.
Other than those two perspectives of evolution, the scope of web analytics is also bound to grow in
the next few years as explained in chapter 2.1.3. Eventually, Constel Analytics will have to handle
several types of channels – not only showing how webpages interact with each other, but rather
how the whole web ecosystem’s components work together. This will raise several concerns about
the display different types of nodes on a single visualization while keeping relevant derived values.
Constel Analytics References
- 91 -
REFERENCES 1. W3Techs. Usage Statistics and Market Share of traffic analysis tools for websites, April 2014.
W3Techs. [Online] April 2014. [Cited: April 11, 2014.]
http://w3techs.com/technologies/overview/traffic_analysis/all.
2. Ivory, Melody Y. and Hearst, Marti A. The State of the Art in Automating Usability Evaluation of
User Interfaces. ACM Computing Surveys, Vol. 33, No. 4. Berkeley : ACM, Inc, 2001, pp. 470-516.
3. Wikipedia. Web Analytics - Wikipedia, the free encyclopedia. Wikipedia. [Online] November 3,
2013. [Cited: April 8, 2014.] https://en.wikipedia.org/wiki/Web_analytics.
4. Web Analytics Association. Web Analytics Definitions. Digital Analytics Association. [Online]
September 22, 2008. [Cited: April 24, 2014.]
http://www.digitalanalyticsassociation.org/Files/PDF_standards/WebAnalyticsDefinitions.pdf.
5. Bottégal, Brice. Définition et histoire du Web analytics. Blog web analytics. [Online] January 7,
2013. [Cited: April 10, 2014.] http://www.bricebottegal.com/definition-histoire-web-
analytics/#more-226.
6. Visual Analytics: How Much Visualization and How Much Analytics? Keim, Daniel A., Mansmann,
Florian and Thomas, Jim. s.l. : ACM, 2009, ACM SIGKDD Explorations Newsletter , pp. 5-8.
7. Low-Level Components of Analytic Activity in Information Visualization. Amar, Robert, Eagan,
James and Stasko, John. Minneapolis : IEEE, 2005, 2005 IEEE Symposium on information
Visualization (InfoVis 2005), p. pp.15.
8. Wikipedia. Dendrogram - Wikipedia, the free encyclopedia. Wikipedia, the free encyclopedia.
[Online] November 22, 2013. [Cited: April 8, 2014.] https://en.wikipedia.org/wiki/Dendrogram.
9. —. Hierarchical clustering - Wikipedia, the free encyclopedia. Wikipedia, the free encyclopedia.
[Online] April 2, 2014. [Cited: April 8, 2014.] https://en.wikipedia.org/wiki/Hierarchical_clustering.
10. Teoh, Soon Tee. A Study on Multiple Views for Tree Visualization. [book auth.] San Jose State
University. Visualization and Data Analysis. San Jose : SPIE Press, 2007, pp. 99-110.
11. Bostock, Mike. Cluster Dendrogram. mbostock’s blocks. [Online] December 19, 2012. [Cited:
April 08, 2014.] http://bl.ocks.org/mbostock/4339607.
12. Skau, Drew. Battle of the Charts: Why Cartesian Wins Against Radial. visual.ly. [Online] June 7,
2012. [Cited: April 8, 2014.] http://blog.visual.ly/cartesian-vs-radial-charts/.
13. Dragos, Sanda Maria and Beldean, Alina Mihaela. Analysing Web Usage with Force-Directed
Graphs. Studia Univ. Babes-Bolyai, Informatica. 2013, pp. 75-85.
Constel Analytics References
- 92 -
14. Wikipedia. Force-directed graph drawing - Wikpedia, the free encyclopedia. Wikpedia, the free
encyclopedia. [Online] January 16, 2014. [Cited: April 10, 2014.] https://en.wikipedia.org/wiki/Force-
based_algorithms.
15. Manning, Christopher. Chicago Lobbyists Force-Directed Graph Visualization.
christophermanning's blocks. [Online] bl.ocks.org, January 17, 2012. [Cited: April 9, 2014.]
http://bl.ocks.org/christophermanning/1625629.
16. Bostock, Mike. Among the Oscar Contenders, a Host of Connections - Interactive Feature -
NYTimes.com. The New York Times. [Online] The New York Times, February 20, 2013. [Cited: April 9,
2014.] http://www.nytimes.com/interactive/2013/02/20/movies/among-the-oscar-contenders-a-
host-of-connections.html?_r=0.
17. IEA. IEA Sankey Diagram. IEA. [Online] [Cited: April 10, 2014.]
18. Wikipedia. Treemaping - Wikipedia, the free encyclopedia. Wikipedia, the free encyclopedia.
[Online] March 29, 2014. [Cited: April 8, 2014.] https://en.wikipedia.org/wiki/Treemapping.
19. McCandless, David, et al. Billion Dollar-o-Gram 2013 | Information is Beautiful. Information is
Beautiful. [Online] April 2013. [Cited: April 9, 2014.]
http://www.informationisbeautiful.net/visualizations/billion-dollar-o-gram-2013/.
20. Ruuskanen, Aleksi. IBM Coremetrics - Web Analytics and Digital Marketing Optimization.
Helsinki : Helsinki Metropolia University of Applied Sciences, 2013.
21. Prescott, LeeAnn. Entreprise Web Analytics Tools: The Marketer's Guide. [ed.] Karen Burka and
Claire Schoen. s.l. : Digital Marketing Depot, 2012.
22. Demers, Tom. Web Analytics Softwares Comparison: Identifying The Right Web Analytics Tool
For Your Business. Search Engine Land. [Online] May 10, 2013. [Cited: April 10, 2014.]
http://searchengineland.com/web-analytics-software-comparison-identifying-the-right-web-
analytics-tools-for-your-business-149373.
23. Singh, Brijendra and Singh, Hemant Kumar. Web Data Mining Research: A Survey. Proceedings
of IEEE International Conference on Computational Intelligence and Computing Research.
Coimbatore : s.n., 2010.
24. Carta, Tonia, Paternò, Fabio and de Santana, Vagner Figuerêdo. Web Usability Probe: A Tool for
Supporting Remote Usability Evaluation of Web Sites. Human-Computer Interaction – INTERACT
2011 . Lisbon : IFIP International Federation for Information Processing, 2011, pp. 349-357.
25. Burzacca, Paolo and Paternò, Fabio. Remote Usability Evaluation of Mobile Web Applications.
[book auth.] Masaaki Kurosu. Human-Computer Interaction, Part I. Berlin Heidelberg : Springer-
Verlag, 2013, pp. 241-248.
Constel Analytics References
- 93 -
26. Hong, Jason I, et al. WebQuilt: A Proxy-based Approach to Remote Web Usability Testing. ACM
Transactions on Information Systems, Vol. 19, No. 3, July 2001. Berkeley : ACM, Inc., 2001, pp. 263–
285.
27. Jalali, Mehrdad, et al. WebPUM: A Web-based recommandation system to predict user's future
movements. Expert System with Applications. s.l. : Elsevier, 2010, pp. 6201-6212.
28. Labroche, Nicolas, Lesot, Marie-Jeanne and Yaffi, Lionel. A New Web Usage Mining and
Visualization Tool. 19th IEEE International Conference on Tools with Artificial Intelligence. Patras :
IEEE Computer Society, 2007.
29. Ovsyannykov, Igor. Computer Science and Marketing: A Developing Relationship.
Inspirationfeed. [Online] Inspirationfeed, August 10, 2012. [Cited: April 10, 2014.]
http://inspirationfeed.com/articles/technology-articles/computer-science-and-marketing-a-
developing-relationship/.
30. Google. Dimensions & Metrics Reference - Google Analytics. Google Developers. [Online] [Cited:
April 20, 2014.] https://developers.google.com/analytics/devguides/reporting/core/dimsmets.
31. Wikipedia. D3.js - Wikipedia, the free encyclopedia. Wikipedia, the free encyclopedia. [Online]
April 4, 2014. [Cited: April 13, 2014.] https://en.wikipedia.org/wiki/Data-Driven_Documents.
32. Bostock, Mike. d3/src/layout/force.js - mbostock/d3. GitHub. [Online] [Cited: April 20, 2014.]
https://github.com/mbostock/d3/blob/master/src/layout/force.js.
33. —. Multi-Foci Force Layout. mbostock’s block. [Online] June 12, 2011. [Cited: April 15, 2014.]
http://bl.ocks.org/mbostock/1021841.
34. W3Techs. Usage Statistics and Market Share of Server-side Programming Languages for
Websites, April 2014. W3Techs. [Online] April 2014. [Cited: April 10, 2014.]
http://w3techs.com/technologies/overview/programming_language/all.
35. SensioLabs. High Performance PHP Framework for Web Development - Symfony. Symfony.
[Online] [Cited: April 13, 2014.] http://symfony.com/.
36. Bostock, Mike. Curved Links. mbostock’s block. [Online] bl.ocks.org, January 23, 2013. [Cited:
April 9, 2014.] http://bl.ocks.org/mbostock/4600693.
37. BELIV. BELIV 2014. BELIV. [Online] [Cited: April 11, 2014.] http://beliv.cs.univie.ac.at/.
38. Girvan, Michelle and Newman, Mark. Community structure in social and biological networks.
Proceedings of the National Academy of Sciences of the United States of America. [Online] April 6,
2002. [Cited: April 11, 2014.] http://www.pnas.org/content/99/12/7821.full.
39. Web Systems Evaluation on Users' Behaviour Modelling. Robal, Tarmo and Kalja, Ahto. Tallinn :
s.n., 2009, Volume 187: Databases and Information Systems V, pp. 41-52.
Constel Analytics References
- 94 -
40. Wikpedia. Visual analytics - Wikipedia, the free encyclopedia. Wikpedia, the free encyclopedia.
[Online] April 10, 2014. [Cited: April 12, 2014.] https://en.wikipedia.org/wiki/Visual_analytics.
41. Surveying the complementary role of automatic data analysis and visualization in knowledge
discovery. Bertini, Enrico and Lalanne, Denis. s.l. : ACM, 2009, Proceedings of the ACM SIGKDD
Workshop on Visual Analytics and Knowledge Discovery: Integrating Automated Analysis with
Interactive Exploration, pp. 12-20.
42. Bostock, Mike. Sankey Diagram. Mike Bostock. [Online] May 22, 2012. [Cited: April 10, 2014.]
http://bost.ocks.org/mike/sankey/.
43. UC Berkeley Visualization Lab. Flare. Data Visualization for the Web. [Online] [Cited: April 20,
2014.] http://flare.prefuse.org/.
44. Happy Recruiting. Maximizing Global Potential. Happy Recruiting. [Online] [Cited: April 20,
2014.] http://happyrecruiting.se.
Constel Analytics Appendix A “Billion Dollar-o-Gram visualization”
- 95 -
APPENDIX A “BILLION Dollar-O-GRAM VISUALIZATION”
Constel Analytics Appendix A “Billion Dollar-o-Gram visualization”
- 96 -
Constel Analytics Appendix B “Dijkstra’s implementation”
- 98 -
APPENDIX B “DIJKSTRA’S IMPLEMENTATION”
Djikstra’s algorithm has been implemented for Constel Analytics. Its main function - hungryPath() -
initially used a heuristic hungry function, but it was found that using Djikstra’s algorithm would not
cause any significant performance losses. hungryPath() is called by the findPath() function which set
up the proper parameters and styles the nodes and links if a path is found.
3. function hungryPath(end, border, processed, bestPath) {
4. var bestWeightOfTheRound = 0;
5. var bestOfTheRound = '';
6.
7. node.each(function (d) {
8. index = d.name;
9. if (!processed[index]) {
10. for (var key in border) {
11. var connectionValue = isTarget(border[key][0], d)
12. if (connectionValue) {
13. linkWeight = connectionValue / border[key][0].value * border[key][1];
14. if (linkWeight > bestWeightOfTheRound) {
15. bestWeightOfTheRound = linkWeight;
16. bestOfTheRound = [d, bestWeightOfTheRound, border[key]];
17. }
18. }
19. }
20. }
21. });
22. if (!bestOfTheRound) {
23. return false;
24. }
25. else {
26. // Add the new winner to the lists
27. bestIndex = bestOfTheRound[0].name;
28. border[bestIndex] = [bestOfTheRound[0], linkWeight, bestOfTheRound[2]];
29. processed[bestIndex] = [bestOfTheRound[0], linkWeight, bestOfTheRound[2]];
30.
31. // Remove previous border node if not connected to any unprocessed node anymore
32. var isBorderStillConnected = false;
33. node.each(function (d) {
34. if (processed.indexOf(d) == -1) {
35. var connected = isConnected(d, bestOfTheRound[1]);
36. if (connected) {
37. isBorderStillConnected = true;
38. }
39. }
40. })
Constel Analytics Appendix B “Dijkstra’s implementation”
- 99 -
41. if (!isBorderStillConnected) {
42. border.splice(bestOfTheRound[1], 1);
43. }
44. }
45. if (bestOfTheRound[0] == end) {
46. var bestPath = [
47. [],
48. []
49. ];
50. bestPath = rollDownPath(bestOfTheRound, bestPath);
51. return bestPath;
52. }
53. else return hungryPath(end, border, processed, bestPath);
54. }
55. function rollDownPath(lastNode, finalArray) {
56. finalArray[0].push(lastNode[0]);
57. if (lastNode[2]) {
58. finalArray[1].push([lastNode[2][0].name, lastNode[0].name]);
59. return rollDownPath(lastNode[2], finalArray);
60. }
61. else {
62. return finalArray;
63. }
64. }
65.
Constel Analytics APPENDIX C “Evaluations’ screenshots”
- 100 -
APPENDIX C “EVALUATIONS’ SCREENSHOTS”
EVALUATION 1
Figure 62 - New Visitors visualization, Eval. 1. The German part (bottom) was deemed as slightly larger than in the previous graph.
Figure 63 - Mobile and Tablet traffic for Eval.1. The overall graph was considered similar. It must be noted, though, that the English part is smaller
and that the German part is more interconnected than the French part.
Constel Analytics APPENDIX C “Evaluations’ screenshots”
- 101 -
Figure 64 - Referral traffic for Eval.1. The English part (right) is less distinct than in previous graphs.
Figure 65 - Swiss traffic for Eval.1. The French part (right) was regarded as larger than the German part. It seems clear that it stretches more than
the German part but it is hard to say it the total nodes’ size and the total nodes’ number are really higher.
Constel Analytics APPENDIX C “Evaluations’ screenshots”
- 102 -
Figure 66 - Spain (left) and Italy (right) filtered results, Eval. 1. The largest group for Spain is French, while Italy has two arguably similar groups
(English and French).
EVALUATION 2
Figure 67 – Country filters for Eval.2. Mozambique does not appear, Burkina Faso is far behind other countries.
Constel Analytics Appendix D “Installation guide”
- 103 -
APPENDIX D “INSTALLATION GUIDE”
REQUIREMENTS
The following installation is required to run Constel Analytics:
Apache Server, rewriting allowed
PHP 5.3.3 and above, unsafe Mode
JSON, PHP-XML and ctype installed
Date.timezone setting is configured in php.ini
A Google account to host the application and a Google account connected to a Google
Analytics account (it can be the same account)
(optional) Composer installed in order to update the bundles
(optional) Git installed in order to take advantage of the versioning
Constel Analytics only requires one third party bundles in addition to those proposed in the
Standard Installation of Symfony: HappyR Google Analytics bundle, version 1.2.2. Please check that
none of your application’s bundles will conflict with this particular version.
DOWNLOADING THE SOURCE CODE
Constel Analytics is hosted on Bitbucket using Git. The URL of the project is as follows (can be used
to clone the repository with HTTPS):
https:// bitbucket.org/pvanhulst/constel-analytics.git
The code source includes a whole Symfony 2 application. At the time of this writing, Constel
Analytics has not been released as an independent bundle, but it is planned in months following its
publication.
DEPLOYMENT & CONFIGURATION
Deployment takes place in a standard fashion for a Symfony 2 project: it is recommended to update
bundles, perform the database migration and clean up the cache. Since Constel Analytics does not
require a Database for now, cleaning up the cache and updating the bundles is enough. Transfer of
the application can be done in all possible manners, from simple FTP transfer to more sophisticated
tools like Source Controls.
The configuration is divided into two stages: the hosting account configuration and the website
configuration.
HOSTING ACCOUNT CONFIGURATION
During this step of the configuration, Constel Analytics will be linked to a Google Account
responsible for its operation. Quotas will be recorded on this account.
Constel Analytics Appendix D “Installation guide”
- 104 -
Access Google Cloud Console (https://cloud.google.com/console)
Create a new project
Create a new Client ID
The redirect URI is http(s)://your.path.to.symfony/admin/google/analytics/oauth2callback
The “config.yml” file of Symfony (/app/config directory) contains the essential configuration
information for the application. At this step of the configuration, the bundle “happy_r_google_api”
must be configured as follows:
66. happy_r_google_api:
67. application_name: APPLICATION NAME FROM THE GOOGLE CLOUD CONSOLE
68. oauth2_client_id: CLIENT ID FROM THE GOOGLE CLOUD CONSOLE
69. oauth2_client_secret: CLIENT SECRET FROM THE GOOGLE CLOUD CONSOLE
70. oauth2_redirect_uri: AUTHORIZED REDIRECT URI FROM THE GOOGLE CLOUD CONSOLE
71. developer_key: API Keys FOR BROWSER APPLICATION FROM THE GOOGLE CLOUD CONSOLE
72. site_name: WEBSITE’S NAME (UNRELATED TO GOOGLE CLOUD CONSOLE)
WEBSITE CONFIGURATION
During the second step of the configuration, a Google Analytics will be linked in order to access its
data. This step has to be repeated each time users want to see the data of another website. On the
long-run, Constel Analytics will be adapted so that it can keep in memory several websites.
In “config.yml”, configure the “happy_r_google_analytics” bundle as follows:
73. happy_r_google_analytics:
74. profile_id: PROFILE ID
75. token_file_path: %kernel.root_dir%/var/happyr/storage/
76. host: HOST
77. tracker_id: TRACKER ID
78. tracker_enabled: false
The information can be found on the main page of Google Analytics, as shown in Figure 68.
Constel Analytics Appendix D “Installation guide”
- 105 -
Figure 68 - Where to find the necessary information to configure HappyR Google Analytics
Once “happyr_r_google_analytics” has been configured, configure “vtwa” as follows:
79. vtwa:
80. json_cache_path: %kernel.root_dir%/var/vtwa/storage/
81. site_name: DOMAIN NAME OF THE ANALYZED WEBSITE
82. max_pagelevel: 3
83. distinction: path
json_cache_path, max_pagelevel and distinction have default values. Refer to section 3.3 for more
information about those parameters.
Empty the /app/cache, the HappyR Google Analytics storage and Constel Analytics’ cache directory
(/app/var/vtwa/storage by default).
Constel Analytics Appendix D “Installation guide”
- 106 -
APPENDIX E “SOURCE CODE”
The enclosed CD contains the source code of Constel Analytics with the latest commit at the time of
this writing (9db8684, 2014-04-16). For an up-to-date version please check the repository (see
previous appendix).
Constel Analytics Appendix D “Installation guide”
- 107 -