121
CONSTEL ANALYTICS A VISUAL TOOL FOR WEB ANALYTICS STUDENT: PIERRE VANHULST SUPERVISOR: DENIS LALANNE APRIL 2014 DEPARTMENT OF INFORMATICS MASTER PROJECT REPORT Département d’Informatique - Departement für Informatik • Université de Fribourg - Universität Freiburg • Boulevard de Pérolles 90 • 1700 Fribourg • Switzerland

Constel Analytics - unifr.ch · 2014-06-10 · Constel Analytics Introduction - Motivation - XI - ACKNOWLEDGEMENTS Many people were involved in this project and I’d like to thank

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Constel Analytics - unifr.ch · 2014-06-10 · Constel Analytics Introduction - Motivation - XI - ACKNOWLEDGEMENTS Many people were involved in this project and I’d like to thank

CONSTEL ANALYTICS A VISUAL TOOL FOR WEB ANALYTICS

STUDENT: PIERRE VANHULST

SUPERVISOR: DENIS LALANNE

APRIL 2014

DEPARTMENT OF INFORMATICS – MASTER PROJECT REPORT

Département d’Informatique - Departement für Informatik • Université de Fribourg - Universität

Freiburg • Boulevard de Pérolles 90 • 1700 Fribourg • Switzerland

phone +41 (26) 300 84 65 fax +41 (26) 300 97 31 [email protected] http://diuf.unifr.ch

Page 2: Constel Analytics - unifr.ch · 2014-06-10 · Constel Analytics Introduction - Motivation - XI - ACKNOWLEDGEMENTS Many people were involved in this project and I’d like to thank

Constel Analytics Abstract

Page 3: Constel Analytics - unifr.ch · 2014-06-10 · Constel Analytics Introduction - Motivation - XI - ACKNOWLEDGEMENTS Many people were involved in this project and I’d like to thank

Constel Analytics Abstract

ABSTRACT

Web Analytics is a significant tool in the hands of webmasters: knowing who your users are and how

they fare on your website is definitely useful, as it can both help with marketing and usability. Still,

the basic assumption of this master thesis is that web analytics remains largely underexploited,

notably because of the way data are displayed: it is hard to get something out of the amount of

collected data. While many users know how to gather basic statistics, fewer are those who use the

full extent of modern systems.

The first part of this project focuses on defining what Web Analytics is and, in the light of this

definition, what is its closely related field of research known as « Data Visualization ». Based on this,

we reviewed a set of existing web analytics solutions in order to assert whether their visualizations

could be more informative and make complex relationships easier to understand. As we concluded

that there was room for improvement, we selected one of the reviewed solution as a base – Google

Analytics - and used its data to build an application.

This application – called “Constel Analytics” - had one main goal: displaying the interactions

between the pages of a website in a comprehensive way. Amongst its side objectives, it also aimed

at being easy to adopt, adapt and deploy for most webmasters. Using open-source technologies at

its core, Constel Analytics was evaluated on two Swiss institutional websites. While the results of the

evaluation highlighted a few limitations, the visualizations was globally successful and remains open

for further development.

KEYWORDS

Web Analytics, Visualizations, Visual Analytics, Communication, Google Analytics, Piwik, Data-Driven

Documents, Force-directed layout

Page 4: Constel Analytics - unifr.ch · 2014-06-10 · Constel Analytics Introduction - Motivation - XI - ACKNOWLEDGEMENTS Many people were involved in this project and I’d like to thank

Constel Analytics Table of content

- I -

TABLE OF CONTENT

INTRODUCTION ..................................................................................................................... 1

Table of content ................................................................................................................................... 1

Motivation ................................................................................................................................ 2

Aims .......................................................................................................................................... 2

Methods ................................................................................................................................... 2

Structure of this document ...................................................................................................... 3

BACKGROUND ....................................................................................................................... 5

Table of content ................................................................................................................................... 5

Web Analytics........................................................................................................................... 6

Definition ............................................................................................................................. 6

The scope of web analytics ................................................................................................. 7

Data visualization ..................................................................................................................... 9

Definition & aims ................................................................................................................. 9

Example of data Visualizations .......................................................................................... 10

Visual Tools for Web Analytics ............................................................................................... 17

Definition & taxonomy ...................................................................................................... 17

Technological survey ......................................................................................................... 19

Wrap-up of the review ...................................................................................................... 24

Summary ............................................................................................................................................ 25

CONSTEL ANALYTICS ............................................................................................................ 27

Table of content ................................................................................................................................. 27

Aims and design ..................................................................................................................... 28

Aims ................................................................................................................................... 28

Design ................................................................................................................................ 29

Technologies .......................................................................................................................... 33

Data source ....................................................................................................................... 33

Visualization ...................................................................................................................... 36

Programming language & Framework .............................................................................. 39

Other technologies ............................................................................................................ 40

Software Architecture ............................................................................................................ 41

Page 5: Constel Analytics - unifr.ch · 2014-06-10 · Constel Analytics Introduction - Motivation - XI - ACKNOWLEDGEMENTS Many people were involved in this project and I’d like to thank

Constel Analytics Table of content

- II -

Structure of the application .............................................................................................. 41

Data Processing ................................................................................................................. 43

View & visualization .......................................................................................................... 45

Summary ............................................................................................................................................ 54

EVALUATION ....................................................................................................................... 55

Table of Content................................................................................................................................. 55

Setting up the evaluation for this project .............................................................................. 56

Selection of the websites for this evaluation .................................................................... 56

Evaluation protocol ........................................................................................................... 61

Performances ......................................................................................................................... 63

Evaluation 1: Unifr “Course offerings” .............................................................................. 63

Evaluation 2: HEP of Canton Vaud .................................................................................... 64

Usefulness &Usability ............................................................................................................ 65

Evaluation 1: Unifr “Course offerings” .............................................................................. 65

Evaluation 2: HEP of Canton Vaud .................................................................................... 71

Evaluation with the ten activities .......................................................................................... 76

Summary ............................................................................................................................................ 77

DISCUSSION ......................................................................................................................... 79

Table of content ................................................................................................................................. 79

Information management ...................................................................................................... 80

Various UI improvements ................................................................................................. 80

Detection of groups ........................................................................................................... 80

Categorization of the interactions .................................................................................... 82

New features .......................................................................................................................... 83

Sitemap comparison.......................................................................................................... 83

Visits typology ................................................................................................................... 84

Other sources of data ........................................................................................................ 85

Summary ............................................................................................................................................ 85

CONCLUSION ....................................................................................................................... 87

Table of content ................................................................................................................................. 87

Wrap up .................................................................................................................................. 88

Conclusion .............................................................................................................................. 88

Future work ............................................................................................................................ 89

REFERENCES ........................................................................................................................ 91

APPENDIX A “BILLION DOLLAR-O-GRAM VISUALIZATION” ........................................................... 95

Page 6: Constel Analytics - unifr.ch · 2014-06-10 · Constel Analytics Introduction - Motivation - XI - ACKNOWLEDGEMENTS Many people were involved in this project and I’d like to thank

Constel Analytics Table of content

- III -

APPENDIX B “DIJKSTRA’S IMPLEMENTATION” ............................................................................. 98

APPENDIX C “EVALUATIONS’ SCREENSHOTS” ............................................................................ 100

Evaluation 1 ...................................................................................................................................... 100

Evaluation 2 ...................................................................................................................................... 102

APPENDIX D “INSTALLATION GUIDE” ........................................................................................ 103

Requirements ................................................................................................................................... 103

Downloading the source code ......................................................................................................... 103

Deployment & configuration ........................................................................................................... 103

Hosting account configuration ...................................................................................................... 103

Website configuration ................................................................................................................... 104

APPENDIX E “SOURCE CODE” .................................................................................................... 106

Page 7: Constel Analytics - unifr.ch · 2014-06-10 · Constel Analytics Introduction - Motivation - XI - ACKNOWLEDGEMENTS Many people were involved in this project and I’d like to thank
Page 8: Constel Analytics - unifr.ch · 2014-06-10 · Constel Analytics Introduction - Motivation - XI - ACKNOWLEDGEMENTS Many people were involved in this project and I’d like to thank

Constel Analytics List of Figures

- V -

LIST OF FIGURES

Figure 1 - Web Analytics Management Lifecycle. .................................................................................. 6

Figure 2 - "Le digital, un écosystème complexe" ................................................................................... 8

Figure 3- 10 main activities of Data Visualization's users. ..................................................................... 9

Figure 4 - "Cartesian layout" of a Dendrogram .................................................................................... 10

Figure 5 - "Radial layout" of a Dendrogram ......................................................................................... 11

Figure 6 - "Chicago Lobbyists Force-Directed Graph Visualization", by Christopher Manning ........... 12

Figure 7 - "Among the Oscar Contenders, a Host of Connections" by Mike Bostock .......................... 12

Figure 8 - IEA's Sankey Diagram ........................................................................................................... 13

Figure 9 - "Map of Napoleon's Russian Campaign", by Charles Minard. ............................................. 14

Figure 10 - "Treemap of votes by county, state and locally predominant recipient in the US

Presidential Elections of 2012" ............................................................................................................ 15

Figure 11 - "The Billion Dollar-o-Gram 2013" ...................................................................................... 15

Figure 12 - Example of Funnel from Adobe Marketing Cloud (formerly Omniture) in 2009 ............... 16

Figure 13 - Heatmap by AT Internet ..................................................................................................... 16

Figure 14 - Example of "Heatmap" from Crazy Egg .............................................................................. 17

Figure 15 - Yandex.Metrica "link's map" .............................................................................................. 23

Figure 16 - Final categorization of the Web Analytics tool .................................................................. 24

Figure 17 - High-level diagram of Constel Analytics ............................................................................. 29

Figure 18 - In-details diagram of Constel Analytics .............................................................................. 30

Figure 19 - Main components of Google Analytics APIs ...................................................................... 33

Figure 20 - OAuth 2.0 workflow with Google APIs ............................................................................... 34

Figure 21 - D3 selections ...................................................................................................................... 37

Figure 22 - enter() selection in console log .......................................................................................... 37

Figure 23 - Final representation of the example .................................................................................. 38

Figure 24 - "Multi-Force Foci Layout" .................................................................................................. 39

Figure 25 – In-details diagram of Constel Analytics with technologies ............................................... 41

Figure 26 - File tree for VTWABundle ................................................................................................... 42

Figure 27 - Constel Analytics UI split in four parts ............................................................................... 45

Page 9: Constel Analytics - unifr.ch · 2014-06-10 · Constel Analytics Introduction - Motivation - XI - ACKNOWLEDGEMENTS Many people were involved in this project and I’d like to thank

Constel Analytics List of Figures

- VI -

Figure 28 - Difference between linear and logarithmic scale in nodes’ representation. .................... 47

Figure 29- Difference between constant Link Distance (blue) and function Link Distance (orange) .. 49

Figure 30 - Constel Analytics Toolbox .................................................................................................. 49

Figure 31 - Example of search .............................................................................................................. 50

Figure 32 - Example of zoom over a large graph .................................................................................. 50

Figure 33 - Example of path.................................................................................................................. 51

Figure 34 - Example of Highlight .......................................................................................................... 51

Figure 35 - Example of "Closer related nodes" Function. .................................................................... 52

Figure 36 - Example of Minimal Weight's slider ................................................................................... 53

Figure 37 – Example of Filters .............................................................................................................. 53

Figure 38 – Example of Timelapse ........................................................................................................ 54

Figure 39 - Constel Analytics, development website "L'Organisation Très Secrète" between 2014-02-

15 and 2014-03-06 ............................................................................................................................... 56

Figure 40 - Example of subgraphs ........................................................................................................ 57

Figure 41 - Individual page tracking on a low-trafic website ............................................................... 57

Figure 42 - Constel Analytics, development website "L'Organisation Très Secrète" from 2013-01-19

to 2014-03-10 ....................................................................................................................................... 57

Figure 43 - Overlapping branches with no connection ........................................................................ 58

Figure 44 - Homepage of "Course Offerings" ....................................................................................... 59

Figure 45 - The page for the "Information systems" course ................................................................ 59

Figure 46 - HEPL Homepage ................................................................................................................. 60

Figure 47 - Initial visualization for "Course Offerings" from 2014-01-29 to 2014-02-17 ..................... 64

Figure 48 - Main visualization for Eval. 1, 2014-02-15 to 2014-03-06 ................................................. 66

Figure 49 - Country filters for Eval. 1 .................................................................................................... 67

Figure 50 - 30 days timelapse, Eval 1 ................................................................................................... 68

Figure 51 - It is necessary to scroll down the page in order to view "Current page" information. ..... 69

Figure 52 - Difference between expected traffic and actual traffic in Eval. 1 ..................................... 71

Figure 53 - Main visualization for Eval. 2, 2014-03-27 to 2014-04-15. ................................................ 72

Figure 54 - New Visitors segment for Eval. 2 ....................................................................................... 73

Figure 55 - Filtered results for France for Eval. 2 ................................................................................. 74

Figure 56 - Timelapse (30 days) for Eval. 2 ........................................................................................... 75

Figure 57 - Possible future layout for Constel Analytics ...................................................................... 80

Figure 58 - "Target render" for groups’ detection ............................................................................... 81

Page 10: Constel Analytics - unifr.ch · 2014-06-10 · Constel Analytics Introduction - Motivation - XI - ACKNOWLEDGEMENTS Many people were involved in this project and I’d like to thank

Constel Analytics List of Figures

- VII -

Figure 59 – “Target render” for interactions’ categorization .............................................................. 83

Figure 60 - Sketch for a possible "site map" visualization ................................................................... 83

Figure 61 - Sketch for Visitors' typology. .............................................................................................. 84

Figure 62 - New Visitors visualization, Eval. 1 .................................................................................... 100

Figure 63 - Mobile and Tablet traffic for Eval.1. ................................................................................. 100

Figure 64 - Referral traffic for Eval.1 .................................................................................................. 101

Figure 65 - Swiss traffic for Eval.1. ..................................................................................................... 101

Figure 66 - Spain (left) and Italy (right) filtered results, Eval. 1 ......................................................... 102

Figure 67 – Country filters for Eval.2. ................................................................................................. 102

Figure 68 - Where to find the necessary information to configure HappyR Google Analytics .......... 105

Page 11: Constel Analytics - unifr.ch · 2014-06-10 · Constel Analytics Introduction - Motivation - XI - ACKNOWLEDGEMENTS Many people were involved in this project and I’d like to thank
Page 12: Constel Analytics - unifr.ch · 2014-06-10 · Constel Analytics Introduction - Motivation - XI - ACKNOWLEDGEMENTS Many people were involved in this project and I’d like to thank

Constel Analytics List of tables

- IX -

LIST OF TABLES

Table 1 – Web Analytics tools sorted by Business Model (April 2014) ................................................ 20

Table 2 – Web Analytics tools sorted by Features (April 2014) ........................................................... 22

Table 3 – Academic Web Analytics / Web Usage Mining tools sorted by Visualizations (April 2014) 23

Table 4 - Symbols used by Constel Analytics ........................................................................................ 31

Table 5 – Default parameters for MainController ............................................................................... 42

Table 6 – Methods in AnalyticsRequestService.................................................................................... 45

Table 7 – Main characteristics of use cases ......................................................................................... 63

Table 8 – Insights acquired thanks to the Main Visualization, Eval. 1 ................................................. 66

Table 9 – Insights acquired thanks to the Segments, Eval. 1 ............................................................... 67

Table 10 – Insights acquired thanks to the Filters, Eval. 1 ................................................................... 68

Table 11 – Insights acquired thanks to Timelapse, Eval. 1 ................................................................... 69

Table 12 – Insights acquired thanks to the Main Visualization, Eval. 2 ............................................... 72

Table 13 – Insights acquired thanks to the Segments, Eval. 2 ............................................................. 73

Table 14 – Insights acquired thanks to the Filters, Eval. 2 ................................................................... 74

Table 15 – Insights acquired thanks to Timelapse, Eval. 2 ................................................................... 75

Table 16 – Different types of interactions............................................................................................ 82

Page 13: Constel Analytics - unifr.ch · 2014-06-10 · Constel Analytics Introduction - Motivation - XI - ACKNOWLEDGEMENTS Many people were involved in this project and I’d like to thank
Page 14: Constel Analytics - unifr.ch · 2014-06-10 · Constel Analytics Introduction - Motivation - XI - ACKNOWLEDGEMENTS Many people were involved in this project and I’d like to thank

Constel Analytics Introduction - Motivation

- XI -

ACKNOWLEDGEMENTS

Many people were involved in this project and I’d like to thank them with these few lines.

The first one is indeed Dr. Denis Lalanne who personally supported this thesis during half a year. His

insights and knowledge were inseparable from its success. Along with him, the whole DIVA group

provided me with the opportunity to study web analytics: thanks for this and sorry for all the days

I’ve spent squatting different desks!

University of Fribourg and HEP of Canton Vaud’s staffs were also highly involved in this project.

Thanks to Nicolas Fretigny, Samuel Crausaz, Serge Keller, Barbara Fournier and Bertrand Mure for

their time, their enthusiasm and the relevance of their remarks during the evaluation. Thanks to M.

Philippe Schmid for his advices regarding the quality of Constel Analytics’ code: while all the ideas

were not taken into account at the time of this writing, they will very soon.

Some other people also deserve my thanks.

Brice Bottégal, for his concise and precise article that guided my late researches about large

web analytics tools. It is not easy for a student to actually test all these, so having a starting

point like this one was really helpful.

David, for his professional proofreading. Thanks for the extra hours of work!

Francesca, for our study of Facebook Insights (which didn’t make it to this document). I hope

you could learn as much as I did.

Mathias, for his wonderful algorithms. Geez, you always have an answer for everything!

Nicole, for her quality proofreading. Sorry for the eyes, I hope my English mishtakesh didn’t

hurt too much!

Finally, it goes without saying that my family deserves all my gratitude as well. Supporting me all

these months – or even years – was not a piece of cake every day. Thank you!

Page 15: Constel Analytics - unifr.ch · 2014-06-10 · Constel Analytics Introduction - Motivation - XI - ACKNOWLEDGEMENTS Many people were involved in this project and I’d like to thank

Constel Analytics Introduction - Motivation

- 1 -

INTRODUCTION

“The price of light is less than the cost of darkness”. - Arthur C. Nielsen, Market Researcher & Founder of ACNielsen

TABLE OF CONTENT

INTRODUCTION ..................................................................................................................... 1

Table of content ................................................................................................................................... 1

Motivation ................................................................................................................................ 2

Aims .......................................................................................................................................... 2

Methods ................................................................................................................................... 2

Structure of this document ...................................................................................................... 3

Page 16: Constel Analytics - unifr.ch · 2014-06-10 · Constel Analytics Introduction - Motivation - XI - ACKNOWLEDGEMENTS Many people were involved in this project and I’d like to thank

Constel Analytics Introduction - Motivation

- 2 -

MOTIVATION

More than half of modern Websites are using Google Analytics [1]. From marketing team to usability

engineers [2], everyone knows about the benefits of web analytics and wants to understand their

website’s uses and visitors. But while gathering basic data is easy, getting more relevant insights –

such as how many users drop off during a registration process – is a much harder task. Most

webmasters are simply unable to find the information or to produce advanced reports.

For most parts, the problem is not so much about getting the data, as web analytics tools already

gather formidable amount of them, it is about how to display them. This is where visualization

comes in: displaying data in a comprehensive way is a new and dynamic field of research from which

many findings have already been put into application. Until now, however, most web analytics tools

seem to be confined to simplistic visualizations.

AIMS

This thesis has several objectives. We sorted them hierarchically, starting with a global objective

split into sub-objectives.

Discovering the impact of data visualization on web analytics

This is the main objective of this thesis: to understand how much a visualization can

influence the understanding of complex information. It implies several sub-objectives such as

clearly defining web analytics and data visualization.

o Reviewing the features and visualizations of current web analytics tools

In order to challenge our main assumption, it is necessary to review the current

visualizations used in web analytics tools and discuss how efficient they are and what

information they can display.

o Implementing a transversal application to visualize complex information

Eventually, this thesis aims to evaluate the relevance of new, more advanced data

visualizations in the context of web analytics. An application will be developed with

two ideas: displaying unusual and relevant information while being able to adapt

various websites. The actual efficiency of this visualization will be assessed during an

evaluation.

METHODS

In order to achieve these objectives, the definition of web analytics and data visualizations has been

compiled from scientific literature. A survey of current web analytics tools has been led on this

basis, trying to challenge the main assumption of this document, while taking interest into their

general features as well. Eventually, a web application was developed, using one of the advanced

visualizations as its core feature with data collected by other web analytics tools accessed through

their APIs. A qualitative evaluation on two institutional websites was led to try to understand what

gains this application could bring and how efficient it could be to understand complex information

that regular web analytics software would not be able to show.

Page 17: Constel Analytics - unifr.ch · 2014-06-10 · Constel Analytics Introduction - Motivation - XI - ACKNOWLEDGEMENTS Many people were involved in this project and I’d like to thank

Constel Analytics Introduction - Structure of this document

- 3 -

STRUCTURE OF THIS DOCUMENT

This document contains four main sections. The first one – Background – introduces the definitions

and explains our survey of current web analytics tools. The second one presents Constel Analytics,

the application developed in the context of this Master thesis. The third section gives an evaluation

of this application, testing its performances as well as its usability and relevance. Eventually, several

short-term improvements of the initial version of Constel Analytics are offered in the Discussions

section, the last part of this document, based on the feedbacks of the evaluators.

Page 18: Constel Analytics - unifr.ch · 2014-06-10 · Constel Analytics Introduction - Motivation - XI - ACKNOWLEDGEMENTS Many people were involved in this project and I’d like to thank
Page 19: Constel Analytics - unifr.ch · 2014-06-10 · Constel Analytics Introduction - Motivation - XI - ACKNOWLEDGEMENTS Many people were involved in this project and I’d like to thank

Constel Analytics Background - Structure of this document

- 5 -

BACKGROUND

“Nothing in all the world is more dangerous than sincere ignorance and conscientious stupidity”. - Martin Luther King, Jr.

This chapter covers the background aspects of this project. Definitions and examples of the different

fields of research involved will be presented, with the aim of giving a better understanding of what

this project is about, as the later sections of this document refer to those definitions intensively.

In the light of those definitions, a selection of visual tools for web analytics will be presented.

Eventually, several conclusions will be made regarding the efficiency of their visualizations.

TABLE OF CONTENT

BACKGROUND ....................................................................................................................... 5

Table of content ................................................................................................................................... 5

Web Analytics........................................................................................................................... 6

Definition ............................................................................................................................. 6

The scope of web analytics ................................................................................................. 7

Data visualization ..................................................................................................................... 9

Definition & aims ................................................................................................................. 9

Example of data Visualizations .......................................................................................... 10

Visual Tools for Web Analytics ............................................................................................... 17

Definition & taxonomy ...................................................................................................... 17

Technological survey ......................................................................................................... 19

Wrap-up of the review ...................................................................................................... 24

Summary ............................................................................................................................................ 25

Page 20: Constel Analytics - unifr.ch · 2014-06-10 · Constel Analytics Introduction - Motivation - XI - ACKNOWLEDGEMENTS Many people were involved in this project and I’d like to thank

Constel Analytics Background - Web Analytics

- 6 -

WEB ANALYTICS

DEFINITION

Web Analytics is a field of research that built its own nomenclature to define some of its key-

aspects. This chapter will only covers essential points.

Wikipedia defines Web Analytics in the following way:

“Web analytics is the measurement, collection, analysis and reporting of internet data for purposes

of understanding and optimizing web usage.” [3]

This definition, initially given by the Web Analytics Association (WAA) [4], makes a distinction

between the methods (“measurement, collection, analysis and reporting of internet data”) and the

aims (“understanding and optimizing web usage”). While the aims seem clear, the methods might

require some more discussions.

Figure 1 - Web Analytics Management Lifecycle. Measurement establishes the data that need to be collected, collection gathers the raw data from

the website, analysis computes dimensions out of the collected data and reporting displays the data in a comprehensive way.

Figure 1 illustrates the Web analytics management lifecycle within an entity (company,

association, ...). Each step influences the next one, the whole cycle working iteratively.

The measurement step is about establishing a Web analytics strategy prior to data collection: a

company should know which data are relevant and how they will support the understanding and the

optimization of its web usage.

Page 21: Constel Analytics - unifr.ch · 2014-06-10 · Constel Analytics Introduction - Motivation - XI - ACKNOWLEDGEMENTS Many people were involved in this project and I’d like to thank

Constel Analytics Background - Web Analytics

- 7 -

The second step of a web analysis – collection – aims at gathering raw data that will serve as a basis

for the reports. This collection can take multiple forms which are usually sorted into two categories:

log file analysis and page tagging.

Most servers produce log files which contain the history of all the received connections (as well as

error reports produced because of those connections). Each type of server has its own way of

recording connections and most of them allow users to configure and customize it. Log file analysis

is an interesting option because it does not require any code or modification on the analyzed

website, data being taken directly from the server. However, it suffers from several limitations, such

as the relative weakness of the collected data (no information about the users’ technologies nor

about their actions that do not generate requests to the server), caching (cached pages do not need

requests to the server, thus are not taken into account) or the fact that data are not always

accessible to the final users (as in the case of a shared Hosting, for instance).

Faced with these problems, page tagging is a solution that eventually imposed itself as a quasi-

standard. The idea here is to add a snippet – or “tag” - on each of the analyzed website’s page. This

snippet gathers a certain amount of data and sends them to the web analytics’ system which, in

turn, stores them one way or another (relational database, logfiles, ...). It is possible to collect

precise information about the visitors’ computer equipment (browsers, operating systems, ...) or

about certain events on the page (like a click on a button that does not ask a resource to the web

server). Those “tags” can be more or less complex depending on the system that is used and might,

in some cases, slow down a website (in particular when the servers are busy). It must be noted that

nothing prevents web analytics tools from using both methods in parallel: those systems are known

as “hybrids”.

Once the raw data are recorded, they are aggregated in the analysis phase: they are used to

calculate several dimensions – i.e. the number of views for each page or the number of visitors using

a particular browser. Some of the web analytics tools (mostly free tools offered as a “Software as a

Service”, or “SaaS”) do not give access to the raw data, which are, at best, retained on the hosting

company’s servers, at worst deleted after a few weeks. Most of the data available to the end users

are thus aggregated data.

The presentation of those data during the last step, report, is a discipline in itself. This discipline is a

prominent part of this project: in order to make this colossal amount of data understandable,

advanced visualization techniques are used. Chapter 2.2 covers a definition and some examples of

those techniques.

THE SCOPE OF WEB ANALYTICS

Extending the scope of web analytics is one of the major trends identified during the review

proposed in chapter 2.3. The analysis is not limited to the scope of the website: we are speaking

about web usage in general. In the past, companies typically had a single website whose

interconnections with the rest of the Internet were limited, but the rise of social networks and

smartphones have changed the situation. The Facebook page of a company might have more impact

Page 22: Constel Analytics - unifr.ch · 2014-06-10 · Constel Analytics Introduction - Motivation - XI - ACKNOWLEDGEMENTS Many people were involved in this project and I’d like to thank

Constel Analytics Background - Web Analytics

- 8 -

on the success of an advertising campaign than an isolated website. Mobile applications must also

be taken into account, regardless of their nature: either actual independent application or pseudo-

mobile version of a website available for download. In order to build a global overview of a modern

and large company, it is no longer possible to be limited to a simple website: we must understand

how it interacts with all the other channels of communication.

This new scope of web analytics explains the rise of tools that take into account different channels,

amongst other things.

Figure 2 - "Le digital, un écosystème complexe" [5]. Many new devices connect to various channels. All this must be monitored in order to

understand how modern web users behave online.

Similarly, a change is occurring with objects connecting to Internet: simple computers were our only

way to browse the web fifteen years ago, and now a multitude of objects connect sometimes

without their owners noticing it. Figure 2 illustrates this new challenge: smartphones, already well-

established in our habits, are one of the numerous examples of this new ecosystem, and others will

soon follow, along with new trends of computer science (such as wearable computing, pervasive

computing, ...). Hence, taking into account those connections and understanding their nature (either

human or not) is a new issue for Web Analytics. The leaders of the industry are already building

systems able to handle those changes.

Page 23: Constel Analytics - unifr.ch · 2014-06-10 · Constel Analytics Introduction - Motivation - XI - ACKNOWLEDGEMENTS Many people were involved in this project and I’d like to thank

Constel Analytics Background - Data visualization

- 9 -

DATA VISUALIZATION

DEFINITION & AIMS

As explained in the introduction of this document, the amount of data collected globally is

overwhelming our ability to assimilate knowledge. In other words, we are almost too efficient at

gathering data, biting off more than we can chew [6].

One of the culprits could be our difficulty to read those data easily. However, sight is known to have

the fastest and the biggest bandwidth of all of the five senses: when data are presented in a

relevant way, it is possible to understand complex concepts with a few visuals. “Data Visualization”

is the study of the methods allowing to present those data in the most efficient way, taking in

consideration esthetics as well as functionality.

Making those data comprehensive requires to understand what their users are looking for. Robert

Amar, James Eagan and John Stasko think it is possible to classify the goals of visualizations’ users.

They have defined ten categories of activities that can be represented with three different axes

(Figure 3) [7].

Figure 3- 10 main activities of Data Visualization's users. They are distributed on three axes.

A successful visualization supports most of those activities, as long as they fit the context in which it

was created. They can be used both as a way to gather requirements during a visualization’s

development or to evaluate it afterwards.

Page 24: Constel Analytics - unifr.ch · 2014-06-10 · Constel Analytics Introduction - Motivation - XI - ACKNOWLEDGEMENTS Many people were involved in this project and I’d like to thank

Constel Analytics Background - Data visualization

- 10 -

EXAMPLE OF DATA VISUALIZATIONS

Several visualizations were reviewed during this project because of their potential to display web

analytics data.

A Dendrogram is a tree layout used to represent clusters usually produced by hierarchical clustering

[8]. In order to build clusters, it is necessary to split a set of data using either a top-down greedy

algorithm or the opposite bottom-up approach, both being quite complex operations (O(n3) or even

O(2n)) [9].

Dendrograms look like trees, being split into two branches at several levels, with all the individual

values of a dataset being arranged at the bottom. The closer the values are, the higher they are

correlated. It is possible to quantify the distance between two values by looking at the height of the

deepest branch to which they both belong.

Dendrograms can be displayed radially, taking advantage of the usual Radial Tree’s pros such as

better space optimization [10]. Below are two examples of Dendrogram, one being Cartesian – or

hierarchical - while the other is radial. They both represent the Flare1 class hierarchy.

Figure 4 - "Cartesian layout" of a Dendrogram [11]. It is hardly readable because how far it stretches.

Figure 4 points out how difficult it can be for a large Cartesian tree diagram to use space efficiently,

while in Figure 5, the graph takes nearly half the space for the same information. To temper this, it

must be noted that Cartesian layouts are easier to interpret [12]: the final choice between the two

options depends on the usage.

In the context of Web Analytics, dendrograms could be used to represent clusters produced from

the collected data, like the nature of a website’s visits. See subchapter 5.2.3 for more details.

1 Flare is a Java library to produce data visualizations. It partly inspired the development of D3. [43]

Page 25: Constel Analytics - unifr.ch · 2014-06-10 · Constel Analytics Introduction - Motivation - XI - ACKNOWLEDGEMENTS Many people were involved in this project and I’d like to thank

Constel Analytics Background - Data visualization

- 11 -

Figure 5 - "Radial layout" of a Dendrogram [11]. It optimizes the available space much better than Cartesian layout.

Network graphs are used to represent a network of connections composed of nodes and links.

Those visualizations can be used to illustrate a multitude of situations – from the traditional

“traveling salesman problem” to the relationships between protagonists of a story. The repartition

of nodes across the graph can be computed through different methods. Amongst these methods,

Force-directed graph drawing is one of most popular [13].

At a theoretical level, Force-directed graphs display a set of nodes and a set of edges influenced by

at least two forces: one that draws connected nodes closer like a spring and another one that makes

them repel each other like electrically charged particles. The idea is that nodes that are highly

connected will be drawn to each other, while isolated nodes will be repelled further. Force-Directed

graphs are “living” graphs: the two forces influence nodes and edges iteratively, until the graph

reaches an equilibrium in which nodes and edges do not move anymore. Further forces can be

added to the graph, in order to influence the display. While the algorithms used to draw Force-

Directed graphs are easy to understand, their running time is high (usually O(n3)) [14], as each

iteration needs to compute the forces for each pair of nodes. However, splitting the graph and

computing the repulsive force only when a pair of nodes is close, while ignoring pairs with distant

nodes, is an efficient optimization of the running time.

On a practical level, Force-Directed graphs are used to show the interconnections between several

data points. Force-Directed graphs have proven to be highly understandable for everyone, which

explains why they are increasingly popular on the Internet. One interesting side of force-directed

layouts is that they naturally cluster data while arranging the points, while other layout methods

such as Balloon Trees require preprocessing.

Below are two examples of Force-Directed graphs.

1. Chicago Lobbyists Force-Directed Graph Visualization: developed by Christopher Manning

[15] as an attempt to display 50 highest paid lobbyists in Chicago and their relationships. It is

Page 26: Constel Analytics - unifr.ch · 2014-06-10 · Constel Analytics Introduction - Motivation - XI - ACKNOWLEDGEMENTS Many people were involved in this project and I’d like to thank

Constel Analytics Background - Data visualization

- 12 -

possible to isolate and display precise information about a node by hovering it with the

cursor (Figure 6).

Figure 6 - "Chicago Lobbyists Force-Directed Graph Visualization", by Christopher Manning [15]. An example of Force-Directed graph used to

display the highest paid lobbyists in Chicago.

2. Among the Oscar Contenders, a Host of Connections: developed by Mike Bostock [16], this

visualization displays the interconnections between the actors nominated for the 2013

Oscar. Clicking on the name of a movie displays the related actors and the name of those

who were nominated for an Oscar (Figure 7).

Figure 7 - "Among the Oscar Contenders, a Host of Connections" by Mike Bostock [16]. Another example of Force-directed graph: esthetically

pleasant, it provides a clear map of the relationship between actors and directors.

Network diagrams and particuarly Force-directed layouts can be used to display relationships

between several components. Web Analytics could use it to display the interactions between the

pages of a website (which is the main purpose of the application developed during this project, see

section 3 for more).

Sankey diagrams are used to display the steps through which one or several flows pass. Sources

from which the flows emanate are placed at the left of the graph, their targets are placed at the

Page 27: Constel Analytics - unifr.ch · 2014-06-10 · Constel Analytics Introduction - Motivation - XI - ACKNOWLEDGEMENTS Many people were involved in this project and I’d like to thank

Constel Analytics Background - Data visualization

- 13 -

right of the graph, and intermediate steps are in the middle. The flows take the shape of arrows,

whose size indicates their intensity. It is possible for a flow to come back from where it comes,

though this tends to make the diagram much less readable for users.

Sankey diagrams can be used in conjunction with other visualizations, like actual maps or bar charts.

There exists several examples of Sankey Diagram, mostly in the field of energy transfers (which was

historically their original purpose). We selected two: one that shows an advanced use of Sankey

diagrams and the other that explains how it is possible to map a flow going through different steps

with geographical information.

1. IEA Sankey web application: the International Energy Agency offers a web application that

allows its users to see how energy is produced and distributed, thanks to an interactive

Sankey Diagram. Clicking on any of the intermediate steps display a pie chart which provides

a clearer breakdown of the flows going through it. The application provides many tools to

play with the visualization, such as the possibility to select different periods to visualize and

see how they evolve through time, to rearrange the flows manually or to filter data by

countries (Figure 8)

Figure 8 - IEA's Sankey Diagram [17]. Types of energy are at the left border, they go through intermediate steps before being distributed between

their different usages at the right of the graph.

2. Charles Minard’s Map of Napoleon’s Russian Campaign: the Charles Minard’s Map of

Napoleon’s Russian Campaign is an historical example of use of a Sankey Diagram (before

they were actually called like that) mixed with geographical mapping. The brown arrow

displays the number of soldiers entering Russia while the black arrow depicts the number of

soldiers leaving Russia. Other metrics, like the temperature measured at certain geographical

locations, are also displayed (Figure 9).

Page 28: Constel Analytics - unifr.ch · 2014-06-10 · Constel Analytics Introduction - Motivation - XI - ACKNOWLEDGEMENTS Many people were involved in this project and I’d like to thank

Constel Analytics Background - Data visualization

- 14 -

Figure 9 - "Map of Napoleon's Russian Campaign", by Charles Minard. The brown line shows the soldiers in Russia, the black lines shows those who

leave Russia. The graph displays other data, such as the temperature.

Without surprise, Sankey Diagrams are used to display the flow of visitors on a website. Google

provides a brilliant example of such a use, example that will be discussed in the next chapter.

Treemap is a tree layout whose main purpose is to give a better sense of the size of each element. A

Treemap looks like a group of labelled, nested rectangles organized hierarchically [18]. The global

window represents the roots of the tree, the first group of rectangles represent its first branches

and the inner rectangles represent deeper levels. Ideally, the rectangles have similar ratio and are

arranged by order of importance (the largest being at the top left while the smallest is at bottom

right) but the final result depends on the algorithm: there are several of them and none can

guarantee both perfect order and ratio. Thanks to its dense layout, Treemap layout is not only

effective in managing space, but also offers its viewers the unique ability to compare geographically

distant nodes [10].

The example below (Figure 10) shows how a Treemap can immediately give a sense of sizes when it

comes to electoral results. In this example, blue rectangles represent Democrats and red

Republicans during the 2012 presidential election.

Another example follows (Figure 11) [19], showing how Treemapping can be used to display

financial measures. This visualization is interactive and can be rearranged by category or size. As the

whole visualization is “vertically impracticable” for this document, only a part of it is shown. The full

version can be found in Appendix A.

Page 29: Constel Analytics - unifr.ch · 2014-06-10 · Constel Analytics Introduction - Motivation - XI - ACKNOWLEDGEMENTS Many people were involved in this project and I’d like to thank

Constel Analytics Background - Data visualization

- 15 -

Figure 10 - "Treemap of votes by county, state and locally predominant recipient in the US Presidential Elections of 2012" [18]. Red is Republican

votes, blue is Democrat votes.

Figure 11 - "The Billion Dollar-o-Gram 2013" [19]. The full-size graph can be found in Appendix A.

Treemaps could be used by a Web Analytics tool to display some of the visitors’ information, such as

geographical data (Continent, Country, Province and City) or technical data (Browser’s family,

Brower and Browser’s version).

Page 30: Constel Analytics - unifr.ch · 2014-06-10 · Constel Analytics Introduction - Motivation - XI - ACKNOWLEDGEMENTS Many people were involved in this project and I’d like to thank

Constel Analytics Background - Data visualization

- 16 -

Web Analytics tools already use a few unusual visualizations to represent certain specific

information. As these visualizations were not taken into account for the application developed

within this project, we will not discuss them in details and simply present them with an example.

FUNNEL

Funnel visualization is generally used to describe how many visitors went through a process

(generally, a Goal) and how many left at each step.

Figure 12 - Example of Funnel from Adobe Marketing Cloud (formerly Omniture) in 2009. It shows the Order process on a website: 328'998 visitors

get to the Customer Information’s pages, only 38.5% of them get to the Billing Information’s pages and 45.7% of the remaining reaches the Orders’

page.

HEAT MAP

Heat maps are matrixes using colors to show the intensity of a relationship. Usually, “warm” colors

indicate a strong relationship. They can be used to display clusters similarly to other visualizations

presented above.

Figure 13 - Heatmap by AT Internet, showing the hours of connection on several websites for different European countries.

Page 31: Constel Analytics - unifr.ch · 2014-06-10 · Constel Analytics Introduction - Motivation - XI - ACKNOWLEDGEMENTS Many people were involved in this project and I’d like to thank

Constel Analytics Background - Visual Tools for Web Analytics

- 17 -

HITMAP

While most tools do not make a distinction between using a heat map to visualize clicks on a

website and using it to visualize clusters in a matrix, we found necessary to show the difference of

use between the two. We defined Hitmap, also known as Clickmap, as a heatmap applied directly on

a webpage. It allows users to see where visitors clicked the most.

Figure 14 - Example of "Heatmap" from Crazy Egg. The blue areas mean that there were a few clicks on the links, while warmer (green, yellow and

red) areas indicate more clicks.

VISUAL TOOLS FOR WEB ANALYTICS

DEFINITION & TAXONOMY

According to the previous chapters, “web analytics tools” could be defined as software which

manage all the four steps of web analytics while reporting the aggregated data in a visually

understandable fashion.

Page 32: Constel Analytics - unifr.ch · 2014-06-10 · Constel Analytics Introduction - Motivation - XI - ACKNOWLEDGEMENTS Many people were involved in this project and I’d like to thank

Constel Analytics Background - Visual Tools for Web Analytics

- 18 -

Our initial assumption was that these tools were usually too complex for regular users, mostly

because of the way the data are displayed. In order to challenge this assumption, we surveyed as

many web analytics software as possible. Our study led us to analyze not only the visualizations

proposed, but also the key-features and the marketing targets of web analytics products. The

presentation of our results will be done in two parts.

The Business Model of a web analytics tool is our first criteria to sort them. By “Business model”, we

mean “nature of the tool”, “targets of the tool” and “data storage”.

Our review found four possible “natures” for the tools: open-source, free, paying and academic. It is

possible for a tool to provide several options and thus be part of several groups.

The tools’ targets have been defined according to the range of their features, their pricing options

and their communication. We classified targets into four groups:

Personal: individuals looking for basic insights about a small website

SMEs: small or average companies, including associations and foundations, able to pay a

monthly fee for the tool but needing higher-level features

Large companies: national or multinational organizations, monitoring multiple channels on

multiple websites, or even mobile applications

Scientists: individuals interested by some specific, prototypal features

The data storage is another critical point that is to be taken into account. SaaS store their data on

their own servers, implying possible privacy issues. On the contrary, On-premise tools can be

installed directly in the users’ infrastructure.

The second part of this study focuses on the actual features proposed by the tools. As it became

clear that there was a difference between academic projects – focusing on selected and innovative

features – and public tools – providing a full array of functionalities – we decided to manage the two

groups for this part separately.

We assume that public tools come in with several features by default, such as goals setting, real-

time data or complete APIs. When one of those features is particularly developed or proposes

something new, an indication is present in the cell “other notable features”. The differentiating

criteria for commercial tools are:

Collection method: either logfile analysis or page-tagging. Page-tagging can take the form of

JavaScript tagging (most usual) or others (such as PHP-tagging in the case of self-hosted

systems).

Advanced visualizations: whether or not the tool provides some non-standard visualizations,

standard visualizations including “bar charts”, “pie charts”, “line charts” and “two-

dimensional plots”.

Page 33: Constel Analytics - unifr.ch · 2014-06-10 · Constel Analytics Introduction - Motivation - XI - ACKNOWLEDGEMENTS Many people were involved in this project and I’d like to thank

Constel Analytics Background - Visual Tools for Web Analytics

- 19 -

Data exploration: the different options available to explore the data from the UI. Filters,

segments, time lapse (several periods being visualized one after the other), datasets

comparisons, search and sorting are parts of this point. “Data-crossing” means that the UI

allows users to split records according to a secondary dimension like “All visitor’s countries

by visitor’s browsers”.

Visitors’ information: the range of information the tool can offer about visitors. It can be

either technical information (browsers,...), geographical information (i.e. country), age,

gender or centers of interest (sport, cinema, ...).

Other notable features: a list of particular features that could differentiate the software from

the competition.

Academic projects will be sorted according to their collection method, their proposed visualizations

and a small description of their purpose (instead of having a full list of features).

The selection of the tools for this study is not exhaustive because of the number of Web Analytics

tools that exist. As more than 40 tools were identified, only 20 of them have made their way to the

tables in the following chapters as we tried to keep the most representative tools for each category.

Keeping in mind that it was not possible to actually access all those software – especially those

reputed as the most performing and professional – we relied on other studies as much as possible

to describe the way they work [20] [21]. Other sources include marketing materials – which is

equally essential as it points out which are their intended targets – web articles [22] and

demonstrations of the products.

In our efforts to find as many tools from as many sources as possible, several academic projects

were reviewed as well [23], as it was assumed their visualizations could be different from the public

tools’ visualizations.

TECHNOLOGICAL SURVEY

Tools Nature Targets Data storage

Web Usability Probe [24] [25] Academic Scientists On-premise

WebQuilt [26] Academic Scientists On-premise

WebPUM [27] Academic Scientists On-premise

Labroche, Lesot and Yaffi’s Web Usage Mining and Visualization Tool [28]

Academic Scientists On-premise

Yandex Metrica Free All SaaS

AT Internet Free + Paying Large companies, Personnal

SaaS

Clicky Free + Paying options SMEs, Personnal SaaS

Google Analytics Free + Paying options All, Larges companies (premium)

SaaS

Page 34: Constel Analytics - unifr.ch · 2014-06-10 · Constel Analytics Introduction - Motivation - XI - ACKNOWLEDGEMENTS Many people were involved in this project and I’d like to thank

Constel Analytics Background - Visual Tools for Web Analytics

- 20 -

ShinyStat Free + Paying options All SaaS

Woopra Free + Paying options All SaaS

Open Web Analytics Open-Source All On-premise

AWStats Open-Source SMEs, Personnal On-premise

Piwik Open-source + Paying options All SaaS, On-premise

Adobe Marketing Cloud (Omniture)

Paying Large companies SaaS, On-premise

IBM Coremetrics & Unica Paying Large companies SaaS, On-premise

comScore Digital Analytx Paying Large companies SaaS, On-premise

Webtrends Analytics Paying Large companies SaaS, On-premise

Mint Paying SMEs, Personnal On-premise

Mouseflow Paying SMEs, Personnal SaaS

Crazy Egg Paying SMEs, Personnal SaaS

Table 1 – Web Analytics tools sorted by Business Model (April 2014)

Table 1 is sorted according to the nature of the tool, as it is the most relevant criterion of this first

comparison. The second most important criterion is definitely the targets of the tool, as they tend to

shape both its communication and its features.

Our study made it quite clear that most of the “large companies”-oriented tools were branded as

marketing tools: it would seem that this niche is more promising than others. They are paying tools,

providing additional services and support and while SaaS are usually preferred to On-premise [5],

the option of installing the tool directly in the customer’s infrastructure almost always exists.

Tools aiming at smaller companies and private users are usually more neutral in their

communication, though marketing is still their primary orientation. While many of those tools are

proposed for free with paying options, there exists important differences between the pricings

proposed. Most limit drastically the free version (either by allowing a very limited amount of visits

to be taken into account or by restricting many features) and dispose of a wide range of offers which

makes comparisons harder. Some SMEs-oriented tools limit themselves to a smaller range of

features whose implementation is particularly advanced (Crazy Egg, Mouseflow).

As for the academic projects, it would seem that a higher proportion aims to improve websites’

usability or business intelligence rather than marketing. Most are “Web Mining”-oriented, providing

techniques to find patterns out of data and to predict behavior, something that can eventually lead

to marketing usages. While we do not believe that this study is enough to assert that this is an

actual trend, one could assume that the barrier between marketing and computer science in the

academic world is stronger than the one between decision support and computer science, despite

some companies’ attempts to reduce it [29]. This should be confirmed by further studies.

Page 35: Constel Analytics - unifr.ch · 2014-06-10 · Constel Analytics Introduction - Motivation - XI - ACKNOWLEDGEMENTS Many people were involved in this project and I’d like to thank

Constel Analytics Background - Visual Tools for Web Analytics

- 21 -

Tools Collection method

Advanced visualizations

Data exploration

Visitors information

Other notable features

Adobe Marketing Cloud (formerly Omniture)

Page-tagging Funnel

Hitmap

Heatmap

Path visualization

Data-crossing

Filters

Segmentations

Geographical

Technical

Age

Gender

Interest

AB Testing

Mobile features

Social features

AT Internet Page-tagging Funnel

Heat map

Hitmap

Sankey

Data-crossing

Filters

Segmentations

Geographical

Age

Gender

Technical

Interests

AB Testing / Multivariate

Mobile features

Soft Tagging

Social features

Multi-channels tracking

AWStats Logfile analysis

- Data-crossing

Filters

Segmentations

Geographical

Technical

Web compression data

HTTP Status Code

Clicky Page-tagging Hitmap

Funnel

Filters

Segmentations

Geographical

Technical

AB Testing

Uptime monitoring

Individual tracking

comScore Digital Analytix

Page-tagging Hitmap

Data-crossing

Filters

Segmentations

Geographical

Technical

Age

Gender

Interest

Individual tracking

Multi-channels tracking

AB Testing / Multivariate

Mobile features

Social features

Persona-driven dashboards

Raw data manipulation

Crazy Egg Page-tagging Hitmap (several variant, including « Confetti » which maps clicks and referrers)

Filters - Scrollmap (showing where visitors abandon scrolling)

Google Analytics Page-tagging Sankey

Funnel

Hitmap

Data-crossing

Filters

Segmentations

Geographical

Age

Gender

Interests

Technical

AB Testing

Multi-channels tracking

Mobile features

Social features

SiteSpeed

Individual tracking (premium)

IBM Coremetrics & Unica

Page-tagging Heat map

Funnels

Channel Venn

Data-crossing

Filters

Segmentations

Geographical

Technical

Age

Gender

Interest

AB Testing

Mobile features

Social features

Automatic marketing recommendations

Mint Page-tagging - - Geographical Large set of plugins (Pepper)

Page 36: Constel Analytics - unifr.ch · 2014-06-10 · Constel Analytics Introduction - Motivation - XI - ACKNOWLEDGEMENTS Many people were involved in this project and I’d like to thank

Constel Analytics Background - Visual Tools for Web Analytics

- 22 -

Technical

Mouseflow Page-tagging Hitmap (actual replay of the users’ session)

- - Session replay

Open Web Analytics

Page-tagging Funnel Data-crossing

Filters

Segmentations

Geographical

Technical

Individual tracking

Mouse tracking

Piwik Page-tagging, Logfile Analysis

Sankey Filters

Segmentations

Geographical

Technical

Individual tracking

ShinyStat Page-tagging Path visualization Data-crossing

Segmentations

Geographical

Technical

B2B analysis

Webtrends Analytics

Page-tagging Use of infographics to display data

Funnel

Path visualization

Hitmap

Data-crossing

Filters

Segmentations

Geographical

Technical

Age

Gender

Interest

AB Testing / Multivariate

Mobile features

Social features

Import data from third-party

Woopra Page-tagging Funnel Filters

Segmentations

Geographical

Technical

Individual tracking

Live chat with visitors

Yandex.Metrica Page-tagging Sankey

Funnel

Hitmap

Filters

Segmentations

Geographical

Age

Gender

Interests

Technical

Mobile features

Table 2 – Web Analytics tools sorted by Features (April 2014)

“Large-companies”-oriented tools focus a lot on multi-channel analysis, which seems to be the

future of Web Analytics as suggested in chapter 2.1.2. They are also quite similar in terms of

features, distinguishing themselves on services, implementation and pricing.

When it comes to visualizations, very few of these tools dare to think out of the box: most limit

themselves to standard “dashboard visualizations” like pie charts, bar charts, line charts or two-

dimensional axes diagrams using two metrics to compare dimensions at best. Larger tools usually

provide a way to display visitors’ path (either by a sankey, either by something similar), a Hitmap

and Funnels. As stated in 2.3.1.3, our review might be incomplete as we couldn’t use all the services

listed above. However, we can assert that visualization is a very secondary selling argument for most

of these tools: it would seem that instead of working and communicating on visualizations, larger

software prefer to offer means of customizing reports for their customers. Webtrends Analytics 10

seems to be the most innovative when it comes to data visualization and made an interesting step

towards end-users by facilitating the creation of infographics.

Of all these tools, Google Analytics and Yandex.Metrica stand out for two reasons: they are free and

they propose professional and advanced features for everyone. Particularly, their use of Sankey

diagrams to display the flow of visitors is interesting (see Figure 15 for an example), though both

implementations tend to suffer from some problems: Google Analytics’ Sankey makes pages appear

Page 37: Constel Analytics - unifr.ch · 2014-06-10 · Constel Analytics Introduction - Motivation - XI - ACKNOWLEDGEMENTS Many people were involved in this project and I’d like to thank

Constel Analytics Background - Visual Tools for Web Analytics

- 23 -

several times instead of allowing traffic to go back, while Yandex.Metrica’s stretch horizontally,

making it hard to read.

When it comes to open-source projects, Piwik and Open Web Analytics are both progressive

systems. While OWA proposes several advanced features like mouse tracking and hitmaps, Piwik

benefits from a more complete API that is suitable for near-professional plugins.

Figure 15 - Yandex.Metrica "link's map". It stretches horizontally, but allows flows to go back to a previous page if needed.

Collection method Visualizations proposed Notes

WebQuilt Logfile analysis Network visualization Is not intended for large audience. Up to 20 to 100 participants would go through tasks prepared by a web designer. Framework that can serve as a basis for advanced visualizations.

Labroche, Lesot and Yaffi’s Web Usage Mining and Visualization Tool

Page-tagging Network visualization Full system, from collection to report. Aims to display the usage of a website by reading its web log. Based on Leader Ant algorithm to process data.

WebPUM Logfile analysis Network visualization

Line charts

Full system, from collection to report. Fitted for prediction of future users’ movement by the use of a graph-partitioning algorithm and a similarity of subsequences.

Web Usability Probe Page-tagging Frequencies Web Usability Probe aims at making Usability testing easier by letting the designers record their “ideal path” on their website and comparing it to the users’ paths.

Table 3 – Academic Web Analytics / Web Usage Mining tools sorted by Visualizations (April 2014)

Page 38: Constel Analytics - unifr.ch · 2014-06-10 · Constel Analytics Introduction - Motivation - XI - ACKNOWLEDGEMENTS Many people were involved in this project and I’d like to thank

Constel Analytics Background - Visual Tools for Web Analytics

- 24 -

Comparing these academic projects with commercial tools revealed something of interest: among

others, network visualizations is almost constantly used to display how pages or users interact with

the analyzed website. This seems somewhat surprising as these visualizations are not used in

commercial software.

WRAP-UP OF THE REVIEW

Our study led us to consider several categories of tools. These categories determine which features

those tools are likely to propose. Figure 16 below illustrates our final categorization.

Figure 16 - Final categorization of the Web Analytics tool, based on our previous analysis. The 20 reviews tools are distributed as the leaves of the

branches.

Academic tools have a few prototypal features. They tend to propose network diagrams to visualize

their data. We made a distinction between the web mining tools (predict and clusters users’

behavior) and web analytics tools (display users’ behavior).

Close to Academic projects, tools with specific features are becoming more common. In order to

stand out, they focus on them by presenting a few innovative ideas and are sold to companies that

cannot afford larger solutions.

Also aiming at SMEs, some tools propose most of the usual end-user features of web analytics while

being much cheaper than larger software. As they cannot offer all the detailed information of high-

end competition, they try to stand out by specializing themselves on a whole dimension (Marketing

for Woopra, self-hosting and plugins for Mint) and take interest into small and medium companies

looking for web analytics tools and support without being able to invest thousands of USD each

month.

Page 39: Constel Analytics - unifr.ch · 2014-06-10 · Constel Analytics Introduction - Motivation - XI - ACKNOWLEDGEMENTS Many people were involved in this project and I’d like to thank

Constel Analytics Background - Visual Tools for Web Analytics

- 25 -

Larger tools propose similar set of features and visualizations. They offer professional support and

differentiate themselves with advanced features aimed at large companies, such as manipulation of

raw data (Digital Analytix), high modularity (Adobe Marketing Cloud) or tag management (AT

Internet).

Free tools have different faces and are hard to depict as a whole. Google Analytics leads the way

with its incredible market share (over 80%) with the idea that free users joining large companies

would go for Google Analytics Premium rather than the competition. All the other tools, including

the paying tools aiming at SMEs, compare themselves with Google Analytics. Yandex.Metrica aimed

at matching Google Analytics’ features. Piwik wanted to free the web analytics by providing a similar

set of features while being self-hosted, just like Open Web Analytics whose UI is definitely inspired

by Google Analytics’.

SUMMARY

Web analytics is defined as a way to optimize web usage by analyzing reports based on web data

gathered through different methods. The efficiency of those reports depend on how clear those

visualizations are – which, in turn, is a field of research in itself known as Data visualization: efficient

visualizations must support several users’ activities that have been identified and classified.

Knowing this, a review of the available visual tools for web analytics has been conducted and we

concluded that current commercial software are late in adopting advanced visualizations while

academic projects seem already more inclined to do so.

The following sections describe a possible solution to this problem.

Page 40: Constel Analytics - unifr.ch · 2014-06-10 · Constel Analytics Introduction - Motivation - XI - ACKNOWLEDGEMENTS Many people were involved in this project and I’d like to thank
Page 41: Constel Analytics - unifr.ch · 2014-06-10 · Constel Analytics Introduction - Motivation - XI - ACKNOWLEDGEMENTS Many people were involved in this project and I’d like to thank

Constel Analytics Constel Analytics - Visual Tools for Web Analytics

- 27 -

CONSTEL ANALYTICS

“It feels like we're all suffering from information overload or data glut. And the good news is there might be an easy solution to that, and that is

using our eyes more”. - David McCandless, data journalist and information designer

This chapter covers all the aspects of the application built during this project. It presents the reasons

behind its development, the way it was envisioned, the technologies selected for its realization and

its architecture.

TABLE OF CONTENT

CONSTEL ANALYTICS ............................................................................................................ 27

Table of content ................................................................................................................................. 27

Aims and design ..................................................................................................................... 28

Aims ................................................................................................................................... 28

Design ................................................................................................................................ 29

Technologies .......................................................................................................................... 33

Data source ....................................................................................................................... 33

Visualization ...................................................................................................................... 36

Programming language & Framework .............................................................................. 39

Other technologies ............................................................................................................ 40

Software Architecture ............................................................................................................ 41

Structure of the application .............................................................................................. 41

Data Processing ................................................................................................................. 43

View & visualization .......................................................................................................... 45

Summary ............................................................................................................................................ 54

Page 42: Constel Analytics - unifr.ch · 2014-06-10 · Constel Analytics Introduction - Motivation - XI - ACKNOWLEDGEMENTS Many people were involved in this project and I’d like to thank

Constel Analytics Constel Analytics - Aims and design

- 28 -

AIMS AND DESIGN

Amongst the points that were raised in the previous chapter, data visualization in Web Analytics

tools is one of the most problematic: with the current existing tools, webmasters can answer simple

questions like “what are the most popular browsers on my website”, but their understanding of

advanced data, like “how every pages of a given website work together”, is limited.

In order to overcome this recurrent issue, an application has been developed during this project. Its

name is "Constel Analytics", because of the shape of the visualization that it generates. This chapter

exposes its aims and its design, without regard for the technologies required to achieve them.

AIMS

Constel Analytics aims at making the relations between pages of a given website easier to see.

Instead of redeveloping a collection system – which would be out of the scope of this project - it

relies on data gathered by others and retrieves them through their APIs. Data are then processed

and presented in a more meaningful way.

Constel Analytics knows two types of objectives: design objectives and implementation objectives.

Design objectives are the fundamentals of Constel Analytics. The final evaluation of the application

was based upon them (see section 4).

Interactions between pages: the main objective of Constel Analytics is to offer advanced

visualizations in the context of web analytics. During this project, the first visualization of the

application will be developed: the visualization of interactions between pages. It must allow

users to understand how their website’s traffic works.

Independence from Data sources: Constel Analytics should be able to use data from

different sources. Thus, the Data Processing part must be as loosely coupled as possible from

the visualization, so that the application can be adapted without redeveloping large parts of

its code.

Ease of installation: Constel Analytics must be usable by a maximum of webmasters,

therefore it must :

o Rely on common technologies so that it can be installed without requiring complex

manipulations on the server

o Propose easy installation and configuration

Page 43: Constel Analytics - unifr.ch · 2014-06-10 · Constel Analytics Introduction - Motivation - XI - ACKNOWLEDGEMENTS Many people were involved in this project and I’d like to thank

Constel Analytics Constel Analytics - Aims and design

- 29 -

Scalability and adaptability: Constel Analytics will evolve beyond this initial project. Other

contributors must be able to handle it and improve it.

DESIGN

This chapter covers the theoretical aspects of Constel Analytics' design.

Figure 17 - High-level diagram of Constel Analytics.

At a conceptual level, Constel Analytics works as follows (Figure 17):

0. A website regularly sends data to a web analytics tool (called “Data Source” on Figure

13).

1. A user visits Constel Analytics and asks for a report of the traffic on her website according

to different criteria (dimension, filters, period, ...).

2. Constel Analytics queries the data Source.

3. The data source sends the data to Constel Analytics, which processes them.

4. Constel Analytics returns a visualization of the data to the user.

Constel Analytics thus performs two distinct tasks: it first processes the data received from the data

source and displays the processed result. If the same data are called twice (multiple queries on the

same criteria), the application must be able to keep them in cache for reuse without loss of time.

Page 44: Constel Analytics - unifr.ch · 2014-06-10 · Constel Analytics Introduction - Motivation - XI - ACKNOWLEDGEMENTS Many people were involved in this project and I’d like to thank

Constel Analytics Constel Analytics - Aims and design

- 30 -

Figure 18 - In-details diagram of Constel Analytics

Figure 18 shows a more detailed schema of Constel Analytics.

1. The main controller receives the user’s request.

2. The main controller checks if the required data are already cached.

a. If that is the case, it sends the data to the view.

b. Otherwise, it calls the Data Processing module which queries the Data Source and

processes the results before returning them to the Controller. The controller sends

the processed data to the view.

3. The main controller renders a visualization, using the view.

The traffic between the pages of a website is defined as the exchange between each pair of pages of

the said website. By "exchange", we mean "a visitor who passes from a page X to a page Y". Two

elements are required to compute this:

The list of pages. Pages are defined by their Title or their URI, depending one which is unique

(some websites use the same title for several pages, while some others use several URIs for

the same page). A page has five properties:

o a title

o a measured metric, usually the number of times that the page was viewed

(“pageviews”), for a given period

o one or several URLs

o a group to which the page belongs. Groups are defined by the hierarchical level of the

URIs. For instance, "/forum/index.php" and "/forum/" would be in the same group,

while "/blog/index.php" would be in another one. It implies that all URIs related to

one Page Title should be in the same hierarchical level.

o a weight equal to the number of links between this page and the others.

Page 45: Constel Analytics - unifr.ch · 2014-06-10 · Constel Analytics Introduction - Motivation - XI - ACKNOWLEDGEMENTS Many people were involved in this project and I’d like to thank

Constel Analytics Constel Analytics - Aims and design

- 31 -

The list of interactions from one page to another. An interaction is defined by its unique pair

of pages (source and target), and has one more property: the intensity of the interaction.

Concretely, the intensity represents how many times visitors went from source to target.

By default, Constel Anayltics focuses on the most important pages of a website. Therefore, the list of

pages is built based on the measured metric: for instance, the most viewed pages are taken into

account first. It is the same for transitions, which are sorted by intensity. Note that if a transition is

very common, but covers two pages that are not in the list of most viewed pages, then it will be

ignored when processing data. This case is very rare and could occur with sites where the

distribution of visits is very homogeneous.

Once these two elements are processed by Constel Analytics, data can be visualized.

As this project aims at displaying the traffic between pages, network diagrams are the obvious best

choice: pages will be represented as vertex and interactions as edges.

The force-directed layout method has been selected for two reasons:

Its distribution of nodes on the graph is immediately understandable by a majority of

people. Moreover, a force-directed graph can easily display hundreds of nodes, unlike other

methods. From a technical point of view, however, the processing time required to display

such a diagram can be problematic and should be minimized using a QuadTree.

Force-Directed layouts automatically generate clusters: some of the other visualizations, like

Balloon Tree, might be easier to understand but would impose more constraints (in the

Balloon Tree example, a child can have only one parent) and ask for prior preprocessing.

Forced-directed visualizations are one step closer to Visual Analytics compared to other

Network Drawing methods.

The available data gathered from the Data Source will be represented by the following symbols:

Information Criterion

Page Node, represented by circles

Interactions Curves between nodes

Measured metrics of a page Size of the node (logarithmic scale)

Intensity of an interaction Opacity of the link redundant with proximity of the nodes

Group of a page Color of the node

Weight of a page Opacity of the border of the node

Table 4 - Symbols used by Constel Analytics

The design phase and the evaluation phase have highlighted the importance to answer to different

groups of needs in order to make the visualization useful. These partially correspond to the ten

activities discussed in chapter 2.2.1:

Page 46: Constel Analytics - unifr.ch · 2014-06-10 · Constel Analytics Introduction - Motivation - XI - ACKNOWLEDGEMENTS Many people were involved in this project and I’d like to thank

Constel Analytics Constel Analytics - Aims and design

- 32 -

Finding datapoints: this requirement includes the ability to discover information about the

pages (title, number of views, ...), the ability to explore the graph and unravel entangled

vertexes and edges.

Retrieving values: some users need to immediately identify a specific page, in order to

compare its actual position to its expected position.

Arranging datapoints: clustering and sorting nodes is necessary to have a better view of how

the website works. As for correlation, it requires the ability to compare different sets of

data.

Constel Analytics addresses those needs with a set of tools.

FINDING DATAPOINTS: ZOOM, MINIMAL WEIGHT SLIDER

Visualizations with many nodes and many edges can be hard to read. In order to let the users

explore the graph, Constel Analytics proposes two zooming options. The first one works like a

magnifying glass: a part of the graph is magnified, but the rest of the graph is still visible in order to

keep a sense of the global context. The “magnifying glass effect” is interesting when it comes to

small areas, but fails to reorganize the whole graph. Therefore, a global zoom has been added in

order to spread or to narrow the whole graph.

Another feature allows to filter pages and links according to their importance, supporting the “find

extremum” and “filter” activities. As for the anomalies, we expect the force-directed graph to show

them clearly thanks to the position of the nodes (a node that should not belong to a cluster is easy

to spot).

RETRIEVING VALUES: PATH & SEARCH

Constel Analytics offers a dynamic research field, highlighting nodes whose title or URIs match the

searched string.

Similarly, a user may want to identify a path between two pages. To meet this need, the Path Finder

feature was implemented. It identifies what is the most likely path between two pages (source and

target).

ARRANGING DATAPOINTS: FILTERS, SEGMENTS & TIMELAPSE

While the force-directed layout will take care of the sorting and clustering of nodes and links,

Constel Analytics proposes different tools to compare several graphs: filters, segments and

timelapse.

Filters and segments operate in a similar way: they can limit the visualization according to certain

criteria. Filters generally represent a single dimension: one can for example display the visualization

only for visitors coming from a particular country, or visitors using a particular technology (browser,

operating system, ...). Segments represent a more "complex" variant of the filters and must be

defined directly in the data source’s interface. They work as a set of metrics and dimensions limiting

the data displayed to a particular population.

Page 47: Constel Analytics - unifr.ch · 2014-06-10 · Constel Analytics Introduction - Motivation - XI - ACKNOWLEDGEMENTS Many people were involved in this project and I’d like to thank

Constel Analytics Constel Analytics - Technologies

- 33 -

Finally, the timelapse feature allows users to visualize the evolution of the traffic over different

periods. Several graphs are generated and the webmaster can compare them in order to find out if

the visitors behave differently during the selected periods.

TECHNOLOGIES

DATA SOURCE

The Chapter 2.3 describes a list of Web Analytics tools that could be used as a Data Source for

Constel Analytics. While the Chapter focuses mostly on their use as complete and independent Web

Analytics tools, their APIs were also reviewed. Those reviews led to think that of all the candidates,

Google Analytics would be the best initial choice for two reasons:

It is widely used, thus allowing Constel Analytics to reach a large group of potential users.

It provides powerful and complete APIs, allowing to reach a vast amount of data with only a

few requests.

Google splits Analytics APIs into four main components (Figure 19):

Figure 19 - Main components of Google Analytics APIs

The Collection component, which focuses on “how to gather data”. APIs provide different

ways of sending interaction data to Google Analytics. Figure 19 shows that the new

analytics.js passes through “Measurement Protocol” like Android and iOS SDKs, while the

older ga.js does not. This “Measurement Protocol” is a standard way of sending data to

Google Analytics: it can be used to send custom data through HTTP requests.

The Configuration component contains all the user settings of a given user’s account. The

APIs allow developers to retrieve the segments or the goals which were configured by the

user.

The Processing component which computes the reports accessible in Google Analytics

regular interface. Obviously, Google does not provide APIs to influence the way this

processing is done.

Page 48: Constel Analytics - unifr.ch · 2014-06-10 · Constel Analytics Introduction - Motivation - XI - ACKNOWLEDGEMENTS Many people were involved in this project and I’d like to thank

Constel Analytics Constel Analytics - Technologies

- 34 -

The Reporting component which takes care of the processed data and how to display them.

The Reporting APIs retrieve the processed data and are able to access almost all of the

dimensions proposed by Google Analytics regular interface.

OAuth is a standard born from the efforts of various American companies which wanted some

interoperability between their applications. The first version was released as a complete protocol in

2010. It led to a second version, quite different as it leaves some flexibility in its implementation

(similar to a Framework rather than to a strict protocol). Google, Twitter, Microsoft and Facebook

are among the most popular companies that use OAuth 2.0.

Each company has its own implementation of OAuth 2.0, which may vary on different dimensions

such as the type of data needed for the application to issue an access token. In the case of Google, it

is important to distinguish the difference between the account hosting the application accessing

data from Google and the user account whose data will be used by the application. They can be

different: for example, an application can be hosted on a Google developer’s account without being

limited to its data.

Figure 20 shows how Google implemented OAuth 2.0 for their APIs.

Figure 20 - OAuth 2.0 workflow with Google APIs

1. When an application wants to access data from Google, it first requests an access token by

declaring, through the URL of the request, what kind of access it requires.

2. The user of the application is asked to log in and confirm that she wants to let the application

access her data.

3. Google returns an authorization code.

4. The application uses this authorization code to ask for a token.

5. Google returns two tokens: an access token used to access Google APIs and a refresh token

that must be stored in order to ask for a new access token later.

6. The application can now access Google APIs.

Page 49: Constel Analytics - unifr.ch · 2014-06-10 · Constel Analytics Introduction - Motivation - XI - ACKNOWLEDGEMENTS Many people were involved in this project and I’d like to thank

Constel Analytics Constel Analytics - Technologies

- 35 -

From a more technical point of view, the last step returns a “client object” to the PHP application.

This object can be used to initialize an analytics service object.

Once authenticated, requesting data to the Google Analytics API is pretty easy. The first step is to

create an analytics service object thanks to the client object received at the end of the

authentication. Then, it is possible to select a website affiliated to the user’s account and query the

Reporting APIs for data (see “Available dimensions & metrics” below).

While Google’s resources are accessible for free, the company imposes an access limit – known as

“quota” - which varies depending on the service requested. In the case of Google Analytics, two

limitations are to be taken into account:

The maximum number of results returned by a request cannot exceed 10'000 rows. It is

however possible to overcome this limitation by sending a second, similar query starting

with the 10’001th result.

An application using Google Analytics’ data is limited to 10'000 requests a day. It is possible

to go beyond this limit by paying a fee depending on how many requests are made.

These two limitations are not a problem for Constel Analytics because as presented in chapter 3.1.2,

Constel Analytics requires only two main objects from Google Analytics: the list of pages and the list

of interactions. Knowing that a Force-Directed graph with more than 1,000 nodes and 10,000

interactions is almost unreadable and more that 5,000 graphs are unlikely to be generated in one

day, the limit of 10,000 queries per day will probably never be exceeded.

Google Analytics provides around 200 dimensions and metrics, sorted by categories [30]. It ranges

from visitors' information (country, city, returning vs new, number of visits, ...) to site speed (page

loading time, DOM interactive time, ...). Since Reporting APIs allow developers to cross several

dimensions and metrics when requesting them, the amount of accessible information is

considerable. For instance, requesting "pagePath" dimension with "pageviews" metric return a list

of page URIs and the number of times they have been visited. But by crossing "pagePath" and

"previousPagePath", the API will return every unique pair of "pagePath" and "previousPagePath"

with the number of times this connection has been visited.

It is important to note that Google Analytics APIs do not allow developers to retrieve all the visited

pages of every visit. Therefore it is not possible for a third party application to build a Sankey

Diagram similar to Google Analytic's Behavior Flow.

This limitation implies serious risks of misunderstanding the visualization because of a phenomenon

called memory loss: it is not possible to track individuals on the graph and thus, distinguish Hub

nodes from popular nodes will be much harder.

Page 50: Constel Analytics - unifr.ch · 2014-06-10 · Constel Analytics Introduction - Motivation - XI - ACKNOWLEDGEMENTS Many people were involved in this project and I’d like to thank

Constel Analytics Constel Analytics - Technologies

- 36 -

As an example, let us consider two highly-connected webpages from a photographer: one is the

“Legal information” page, accessible from everywhere as it is part of the footer, and the other is the

main “Photostream” that serves as an index for all the pictures. It can be assumed that visitors will

go on the “Legal information” page to check how they are allowed to use the pictures displayed on

the website and then go back to the previous page. If the website has a large traffic, then different

visitors will get to the “Legal information” page from possibly many different pages. The

“Photostream”, however, will likely act as a real hub, visitors going back to it to select new pictures

to display. The clear difference of use between the two pages will have to be deduced by the users –

assuming that they know the website. A partial solution to this problem is presented in Chapter

5.1.5.

VISUALIZATION

Visualization is the cornerstone of Constel Analytics, therefore it was necessary to find a technology

that would support it. Data-Driven Documents, better known as D3, is an open-source Javascript

library developed by Mike Bostock, Jeff Heer and Vadim Ogievetsky [31]. It aims at binding data and

DOM elements easily. D3 is regarded as a successor of Protovis, as it was developed by the same

persons, and benefits from the enthusiasm of the academic world. Its openness offered the

possibility to analyze the way it works in order to ensure that it would suit the project. D3 is

distributed under the BSD license.

From a technical point of view, D3 produces DOM elements, usually in-line SVG images. These visual

elements are bound to data – mostly JavaScript arrays or JSON - used to generate them. D3 has a

unique way of binding a set of data to a visualization.

We believe that in this case, examples speak more than theory: below is a typical D3 snippet,

followed by a short explanation.

1. var dataArray = [23,43,58];

2. d3.selectAll(“.nodes”)

3. .data(dataArray)

4. .enter()

5. .append(“circle”)

6. .attr(“cx”, function(d) {return d}).attr(“cy”, function(d){ return d }).attr(“r”, function(d){ return d/10 })

7. .attr(“fill”, “red”);

Line 1: An array of data is created. It contains simple integers that will be used to feed the

visualization.

Line 2: D3 selects a set of DOM elements – in this example, all the “.nodes” elements. It is possible

that there exist no “.nodes” elements. This is not a problem for D3, as it simply selects an empty set

in this case.

Page 51: Constel Analytics - unifr.ch · 2014-06-10 · Constel Analytics Introduction - Motivation - XI - ACKNOWLEDGEMENTS Many people were involved in this project and I’d like to thank

Constel Analytics Constel Analytics - Technologies

- 37 -

Line 3: Once the set is selected, D3 attributes a set of data through the “data()” function. In this

case, the three values of dataArray are sent to the empty “.nodes” set. When doing so, three

selections are created (Figure 21):

The enter selection which contains all the new data whose keys do not exist in the former

data

The update selection which contains all the data whose keys are found in the new data

The exit selection contains all the former data with no corresponding key in the new data

Figure 21 - D3 selections. The Enter selection contains all the new data, the Exit selection contains all the data that aren’t present in the new set

and the Update selection contains all the data that are present in both the former set and the new set.

Obviously, all the data sent from dataArray are new, since there are no data in the “.nodes” set so

far.

Line 4: D3 select all the new data (from the “enter” set). Figure 22 shows the result of this line in the

Firefox developer console. Three objects created, with the __data__ attribute corresponding to the

data sent from dataArray.

Figure 22 - enter() selection in console log. The data are bound (__data__), but there aren’t visible on the page yet.

Page 52: Constel Analytics - unifr.ch · 2014-06-10 · Constel Analytics Introduction - Motivation - XI - ACKNOWLEDGEMENTS Many people were involved in this project and I’d like to thank

Constel Analytics Constel Analytics - Technologies

- 38 -

These objects have no visual representation for now, but the following lines handle this...

Line 5: For each new element added to the set, D3 uses the append() function to create “circle”

DOM elements.

Line 6: All the circles generated receive a “x” position, a “y” position and radius attributes related to

their __data__.

Line 7: All the circles generated are filled with red color. Figure 23 shows the final representation of

this short code example: the three dots are visible in the upper part of the window, and the lower

part shows the console results. A SVGCircleElement has been created for each of the data sent from

dataArray.

Figure 23 - Final representation of the example. The three nodes are visible at the top of the window. The log shows that data are bound to visual

elements (SVGCircleElement).

It is possible to update data using the update selection, or to remove data using the exit selection,

just as with the enter selection.

On the top of its data-binding feature, D3 proposes several layout methods to compute how the

data should be spread across the surface of a DOM element. Amongst those layout methods is an

implementation of the Force-Directed layout that will be used in this project [32]. As D3 is a

renowned academic project, its implementation of Force-directed layout was not questioned.

However, it is necessary to understand the way D3 manages Force-layout in order to guarantee that

the visual result will match the expectations2.

By default, D3 uses three forces to build a Force-Directed graph:

1. A “gravity” force to keep nodes within a sphere, in order to avoid them from being repulsed

out of the SVG area.

2 Full sourcecode can be found here : https://github.com/mbostock/d3/blob/master/src/layout/force.js

Page 53: Constel Analytics - unifr.ch · 2014-06-10 · Constel Analytics Introduction - Motivation - XI - ACKNOWLEDGEMENTS Many people were involved in this project and I’d like to thank

Constel Analytics Constel Analytics - Technologies

- 39 -

2. A “charge”, generally used to repulse nodes. Its value can be either negative (for repulsion)

or positive (for attraction). This charge is computed only for nearby nodes which are situated

inside the same part of the quadtree, using Barnes-Hut simulation.

3. An attraction force between linked nodes.

D3 also uses a “cooling parameter” known as “alpha”, ranging from 1 to 0, which progressively

decreases with each iteration of the Force-Directed layout computation. The intensity of each

iteration is also related to this alpha value, which means that initial iterations have more influence

on the graph than the latest. D3’s Force-Directed layout remains active as long as the alpha didn’t

reach 0.

D3’s implementation of force-directed layout does not take into account potential weight of the

edges – they are not even required to have one. This might cause a problem in Constel Analytics,

since strong interactions between pages are supposed to be immediately visible. However, two

options can be used to allow hypothetical edges’ weight to be taken into account:

The “Link distance” method allows to set up a targeted distance between each node (20 by

default). It can be either a constant (all the nodes will have the same target link distance) or

a function, in which case a different distance can be set for each link according. This

“targeted link distance” is a weak constraint.

Force-Directed Layout in D3 uses an event called “tick” which is triggered at each iteration of

the graph’s computation. Defining a function called during this event allows subsequent

forces to be applied to the graph, amongst other things. For instance, it is possible to add

specific attraction to different spots of the graph (see Figure 24 for an example: four

divergent forces attract different groups of nodes in opposite directions).

Figure 24 - "Multi-Force Foci Layout" [33]. Force-directed graph with additional forces to attract points.

PROGRAMMING LANGUAGE & FRAMEWORK

An easy way to make Constel Analytics accessible to the greatest number is to develop it as a web

application. Thus, most end users do not even have to install any software to use it: they just have

to go to an URL.

Page 54: Constel Analytics - unifr.ch · 2014-06-10 · Constel Analytics Introduction - Motivation - XI - ACKNOWLEDGEMENTS Many people were involved in this project and I’d like to thank

Constel Analytics Constel Analytics - Technologies

- 40 -

Amongst the common programming languages on the web, PHP is a server-side scripting language

born in 1995 and maintained by the PHP Group. It remains extremely notorious with a market share

of 81.8%, far from the 17.8% of ASP.Net (its main rival) [34]. Being both open-source and

widespread, PHP is the most eligible programming language for Constel Analytics.

Despite its success, PHP suffers from a certain unpopularity within the programming community: its

weak typing and its numerous shortcuts are known to encourage poor programming practices. Thus,

in order to save some development time and impose a standard structure for this project, it was

decided early on that a Framework should be used. There exists several PHP Frameworks and one of

the most popular is Symfony 2 (several prominent PHP applications use Symfony, such as the latest

version of Drupal). This popularity can be explained by its high modularity, its active developers’

community and the support it receives from large companies [35]. Since Symfony also proposes

several modules that would make integration of Google API easier, it definitely seemed like a good

choice for Constel Analytics. Symfony is distributed under the terms of MIT licenses.

Like many other Frameworks, Symfony 2 has a Model-View-Controller approach. It extended this

approach in order to achieve more flexibility.

At the root of a Symfony project lies the “app” folder which contains utilities (including a “console”

file which allows managing the application through a terminal), configuration files and the

application’s cache.

A Symfony 2 application is structured into bundles - independent components which are thought as

removable without making the whole application unusable. Many developers working with Symfony

are swift to share their non-confidential bundles, thus contributing to the elaboration of a wide

range of open-source extensions facilitating the creation of other applications. Those can be

installed using Composer. Bundles can have their own Model, View and Controller components and

work together thanks to a powerful routing system provided by the Framework.

Symfony 2 uses a specific nomenclature and this project will rely on it. Mainly, controllers are known

as “actions” and objects that are used globally by all the bundles are called “services” (which belong

to a “Service container” callable from everywhere).

OTHER TECHNOLOGIES

Other technologies were used during the course of this project. As they are not directly related to

the main goals of Constel Analytics, they will be only summarily presented here:

- Twitter Bootstrap: Bootstrap is a popular Front-end Framework for web applications. It is

proposed by Twitter under the MIT license, and provides convenient JavaScript and CSS

libraries to style a website. Constel Analytics uses it in order to get a decent-looking UI

without effort.

- JQuery: Well-known Framework for JavaScript, JQuery is a requirement for Twitter Bootstrap

to work. It provides different handful functions to make JavaScript easier and offers a large

Page 55: Constel Analytics - unifr.ch · 2014-06-10 · Constel Analytics Introduction - Motivation - XI - ACKNOWLEDGEMENTS Many people were involved in this project and I’d like to thank

Constel Analytics Constel Analytics - Software Architecture

- 41 -

amount of plug-ins for different purposes. Constel Analytics relies on JQuery to carry out

Ajax requests and also uses the plugin JQuery UI to display HTML sliders. It is distributed

under a MIT license.

- Twig: Twig is a PHP templating system developed by SensioLabs. One of its main purpose is

to make Views’ development easier by using a more concise syntax than PHP’s. Symfony

uses it by default and thus, this project does too. It uses a BDS license.

- Git: Git is a revision control system that was used during this project to keep track of the

changes. The repository was hosted on Bitbucket and will be opened once the code is

cleaned up to a sufficient level. Git uses GNU General Public License V2.

SOFTWARE ARCHITECTURE

In the previous chapters, the aims of Constel Analytics were presented, as well as the technological

means that will help materlialize them. This chapter focuses on the actual structure of Constel

Analytics.

STRUCTURE OF THE APPLICATION

Below is an updated diagram of Constel Analytics, adding the different technologies presented in

the previous chapter. PHP and Symfony act as a basis for the whole project, Twig and D3 take care

of the view, the cache uses data in JSON format and Google Analytics serves as a Data Source which

implies using OAuth for the authentication. Below is an updated diagram of Constel Analytics,

including technologies (Figure 25).

Figure 25 – In-details diagram of Constel Analytics with technologies

Most of Constel Analytics’ code is contained within a bundle called “VTWABundle” residing in the

“src” folder of the Symfony 2 application (“VTWA” stands for “Visual tool for web analytics”, the

development name of the project). Figure 26 shows the arborescence of the bundle.

Page 56: Constel Analytics - unifr.ch · 2014-06-10 · Constel Analytics Introduction - Motivation - XI - ACKNOWLEDGEMENTS Many people were involved in this project and I’d like to thank

Constel Analytics Constel Analytics - Software Architecture

- 42 -

Figure 26 - File tree for VTWABundle

This first version of Constel Analytics relies on one single controller, called MainController. This

controller has three different actions:

indexAction(), which displays the homepage of Constel Analytics.

interactionsAction(), which displays the main visualization.

ajaxAction(), which is called by Ajax requests when refreshing the visualization.

interactionsAction() is the central action of Constel Analytics : it processes the GET Data containing

the visualization’s parameters, checks if the required JSON data already exist in the cache, initialize

the AnalyticsRequest service and saves its output in the opposite case, then renders the final web

page thanks to the View interactions.html.twig. ajaxAction() does a similar job but simply returns

raw JSON data for new visualizations without reloading the page. In the absence of value, the

default parameters for a request are:

Parameter Description Default value

$startDate The day from which results are fetched 20 days ago

$endDate The day up to which results are fetched Today

$segment The segment applied to the results None

$filter The filters applied to the results None

$selectedFilter The dimension selected to display at the bottom of the page

Country

$metric The metric used to display the results Page views

$maxPages The maximum number of pages on the graph

100

$maxInteractions The maximum number of interactions on the graph

1000

Table 5 – Default parameters for MainController

The DependyInjection folder contains two files which take care of the configuration parameters for

Constel Analytics. Namely, Constel Analytics can take up to four parameters:

Page 57: Constel Analytics - unifr.ch · 2014-06-10 · Constel Analytics Introduction - Motivation - XI - ACKNOWLEDGEMENTS Many people were involved in this project and I’d like to thank

Constel Analytics Constel Analytics - Software Architecture

- 43 -

json_cache_path, the path to the cache directory. By default, the cache is saved under

the %kernel.root_dir%/var/vtwa/storage directory.

site_domain_name is the name of the website’s domain, used by Constel Analytics to

redirect to the website while visualizing the different paths associated with a page.

max_pagelevel is the maximum level up to which Constel Analytics should go when defining

to which group a page belongs. The default value is “1”.

distinction allows users to define which of URI or Title should be used when distinguishing

pages. The default value is “title”.

The Resources folder regroups many files, including:

the services’ configuration file

all the view files

all the public resources that are sent to the /web/ directory of Symfony 2 (CSS and JavaScript

libraries, for the most part)

At the bottom of the tree, the services directory contains the AnalyticsRequestService’s definition

which handles the connection to Google Analytics and processes the data sent to the main

controller.

DATA PROCESSING

The Data Processing module takes the form of a Symfony 2 service - vtwa.analytics_request - so that

it can be called from everywhere in the application. The service’s definition – found in the file

services.yml - uses HappyR3’s happyr.google.analytics.page_statistics as a parent, since it asks for

the same base arguments, while requiring additional arguments corresponding to three of the four

parameters of the bundle: the path of the cache’s directory, the maximum level for groups and the

distinction used to identify pages.

The definition of the class itself is close to the PageStatisticsService class from HappyR’s Google

Analytics bundle, for it was initially thought as a simple extension of this class. However, in order to

guarantee more independence in the development of the class, it was decided not to inherit

PageStatisticsService class.

Below is a description of the methods defined in AnalyticsRequestService:

Name of the method Arguments taken Description Return value

hasAccessToken() Void Checks if an access token already exists. Similar to HappyR’s method.

The token or false

saveAccessToken() Void Saves the generated OAuth 2.0 access token.

void

3 HappyR – for Happy Recruiting - is a recruiting company which kindly provides some of its Symfony bundles for free. [44]

Page 58: Constel Analytics - unifr.ch · 2014-06-10 · Constel Analytics Introduction - Motivation - XI - ACKNOWLEDGEMENTS Many people were involved in this project and I’d like to thank

Constel Analytics Constel Analytics - Software Architecture

- 44 -

in_array_r() $needle: the researched term

$haystack: the array

$parent = null: the previous array

This method is an extension of the original in_array() function of PHP.

Looks inside nested arrays recursively for a given $needle.

The key of the array’s parent, the key of the array if there is no parent or false

parseDimensionOrMetrics() $haystack: an array of dimensions or metrics

Translates a set of dimensions or metrics into their Google Analytics’ counterparts.

A string with all the methods, starting with “ga:” (prefix for all Google Analytics’ dimensions and metrics) and separated by semicolons

saveJson Various variables corresponding to the request’s parameters

Save the processed results into a JSON file in the cache directory. The name of the file is a unique md5 hash built upon all the parameters of the request.

A string, the content of the file

getSegments() void Get all the segments defined in the Google Account.

Though this does not change anything inside the code, this request uses the Configuration APIs.

An array of segments

getFilters() $startDate: the day from which results are fetched

$endDate: the day up to which results are fetched

$dimensions:

$segment: a segment applied to the results

$metric: the metric measured

Get the 50 first results of the dimension selected for filters, sorted by the metrc used to generate this visualizations.

An array of filters

getAnalyticsData() $dimensions, $metric

$maxResults = 10000: the maximum results

$sort = null: the sort order of the results

$filters = null: a list of filters

$startDate, $endDate, $segment

This method queries Google Analytics with the specified parameters.

An array of results from Google Analytics

getPagesByPath() $startDate, $endDate, $segment, $metric, $maxResults, $filter

This method calls getAnalyticsData() to request a list of pages, differentiated by their pagePath. It also computes to which group each page belongs.

An array of pages

getPagesByTitle() $startDate, $endDate, $segment, $metric, $maxResults, $filter

Similarly to getPagesByPath(), this method request the list of pages, but differentiated by pageTitle instead.

An array of pages

getPages() $startDate, $endDate, $segment, $metric, $maxResults, $filter

This method simply forwards the request to getPagesByPath() or getPagesByTitle() depending on the “distinction” configuration of Constel Analytics.

The result of getPagesByPath() or getPagesByTitle()

getInteractions() $pagesList: an array of pages produced by getPages()

Generates a list of interactions, based on the list of pages

An array of interactions defined by the source page,

Page 59: Constel Analytics - unifr.ch · 2014-06-10 · Constel Analytics Introduction - Motivation - XI - ACKNOWLEDGEMENTS Many people were involved in this project and I’d like to thank

Constel Analytics Constel Analytics - Software Architecture

- 45 -

$startDate, $endDate, $segment, $metric, $maxResults, $filter

previously produced and sent as an argument.

the target page and the amount of metric generated for this interaction

Table 6 – Methods in AnalyticsRequestService

When MainController calls AnalyticsRequestService, it first asks for the list of the segments through

the getSegments() method and the list of the filters through the getFilters() method. The results are

not cached as they do not require excessive processing time. Then if no cache file exists for the

required parameters, MainController calls the getPages() method, followed by the getInteractions()

method. It generates an array with those values, encode them into JSON and calls the saveJson()

method.

Once this whole process is done, MainController renders the results thanks to the

interactions.html.twig View, described in the following subchapter.

VIEW & VISUALIZATION

The file interactions.html.twig extends layout.html.twig, which contains the basis for all the pages of

Constel Analytics (HTML headers, global structure of each page, ...). The User Interface of the main

visualization is split into four levels, as can be seen in Figure 27.

Figure 27 - Constel Analytics UI split in four parts. From top to bottom: global navigation menu (red), parameters (orange), visualization (green) and

advanced tools (blue).

Page 60: Constel Analytics - unifr.ch · 2014-06-10 · Constel Analytics Introduction - Motivation - XI - ACKNOWLEDGEMENTS Many people were involved in this project and I’d like to thank

Constel Analytics Constel Analytics - Software Architecture

- 46 -

The upper part (red) is occupied by a global navigation menu. Ultimately, it is supposed to lead to

the different sections of Constel Analytics but at the time of this writing, the only working sections

are the Homepage and “Interactions between pages”, the other items working as placeholders.

Below, in orange, are different parameters grouped here because they require a page reloading that

can occur by clicking on the “Analyze” button to the right. Among the options, the period, optional

segment, the metrics and the maximum number of pages and interactions can be decided. Those

data are sent to the Main Controller via the GET method.

The green part is the visualization in itself. Next to the graph are tree boxes that are supposed to be

visible all the time during its exploration because they provide useful information and utilities to the

users : a toolbox (described a dedicated subchapter below), a “General Information” box which

provides the users a few references when they need to compare quantitative data and a “Current

page” box which displays all the information of the hovered node (title of the page, number of

metrics, the different URIs related to the page and the list of all its interactions).

The blue part groups all the advanced, Ajax tools. All of them are described in details below.

Since the graph can be redrawn at several occasions, the code which generates it is encapsulated

within a function called generateForceGraph() that can be called at any time. This function requires

two parameters :

JSON data serving as a basis for the visualization

The ID of the DOM object that will host the visualization

generateForceGraph() can be split into three phases, described below.

NODES GENERATION

Nodes are represented using simple SVGCircleElements. Their radius is defined according to the

amount of the selected metric for the visualization, ranging ~2 (smallest node) to 12 (largest node)

according to the following function:

1. .attr("r", function (d) {

2. return 2 + (Math.log(d.value) / Math.log(largest)) * 10;

3. })

A logarithmic scale is used because most websites tend to have less than half-a-dozen of very

important pages with the others being far behind, which made it difficult to distinguish small pages

from average pages. With a logarithmic scale, the largest pages still stand out as they tend to be

much more central and connected in the graph. A difference between the result of a linear scale and

the result of a logarithmic scale can be seen in Figure 28.

Page 61: Constel Analytics - unifr.ch · 2014-06-10 · Constel Analytics Introduction - Motivation - XI - ACKNOWLEDGEMENTS Many people were involved in this project and I’d like to thank

Constel Analytics Constel Analytics - Software Architecture

- 47 -

Figure 28 - Difference between linear and logarithmic scale in nodes’ representation.

The “mouseover” event is used to display all the information regarding the hovered node. It mainly

goes through a previously produced list of connections between each node (following Christopher

Manning’s example with the Chicago Lobbyists’ visualization [15]) and indicates those which are

connected in the “Current page” box under the “Page connections” panel.

As clicking on nodes can trigger several actions, the “mousedown” event was used to define the

priorities of these actions: PathFinder goes first, followed by the Highlight feature.

LINKS GENERATION

The links are represented using SVGPathElements (or SVG Paths) in order to be esthetically more

pleasant than the usual straight lines. However, doing so asks for a new step while building the

nodes and the edges list, since paths require an intermediate point between the start and the end.

Following Mike Bostock’ ideas [36], a new set of links, called bilinks, is generated, containing the

source node, the intermediate node, the target node. For our visualization’s specific needs, the

value of the link is also added to the set.

4. var nodes = data.nodes.slice();

5. var links = [];

6. var bilinks = [];

7.

8. data.links.forEach(function (link) {

9. var s = nodes[link.source],

10. t = nodes[link.target],

11. i = {},

12. v = link.value

13.

14. nodes.push(i);

15. links.push({source: s, target: i, value: v}, {source: i, target: t, value: v});

16. bilinks.push({source: s, intermediate: i, target: t, value: v});

17. });

Page 62: Constel Analytics - unifr.ch · 2014-06-10 · Constel Analytics Introduction - Motivation - XI - ACKNOWLEDGEMENTS Many people were involved in this project and I’d like to thank

Constel Analytics Constel Analytics - Software Architecture

- 48 -

So we come out with three new sets: one set of nodes with additional entries without titles that will

serve as intermediate points, a set of bilinks and a set of regular links pointing either to the

intermediate point instead of the former target, or from the intermediate point instead of the

source. While the set of regular links is passed to the force-directed layout’s generations, the bilinks

are used to display the links once the data are bound. Among the attributes, the path’s opacity

ranges ~10% to 80% depending on the intensity of the link.

FORCE-DIRECTED LAYOUT PARAMETERS

The parameters used by the Force-Directed layout in Contel Analytics can be found below.

1. var force = self.force = d3.layout.force()

2. .linkDistance(function (o) {

3. diviser = 1 + o.value / mostIntense;

4. return 10 / diviser;

5. })

6. .on("tick", tick)

7. .charge(c)

8. .nodes(nodes)

9. .links(links)

10. .size([width, height])

11. .linkStrength(1)

12. .gravity(0.1)

13. .start();

In this graph, linkDistance is set as a function. By doing so, D3 goes through all the links of the graph

and computes their targeted link distance according to an anonymous function. The purpose here is

to bring closer nodes that are linked by strong interactions. In order to achieve this, the default

linkDistance (10) is divided by a more or less large number (ranging from ~1 to 2) according to the

link’s intensity: thus, this function sets the targeted link distance of the most intense link to 5.

In a large and heavily interconnected graph, this does not impact as much as intended because of all

the other constraints. Figure 29 shows how this constraint works on a simple graph: in this example,

each node are connected to another one. The edge between B and C has a weight of 100 while the

other two have a weight of 1. Two visualizations are overlapped on this figure: in blue, the

visualization with a constant Link Distance (10) and orange, the visualization with the function

described above. While there are a few differences, they are likely to go unnoticed in a wider graph,

which prompted the developement of an additional feature to gather nodes linked by heavy edges

(see “Toolbox” below).

Page 63: Constel Analytics - unifr.ch · 2014-06-10 · Constel Analytics Introduction - Motivation - XI - ACKNOWLEDGEMENTS Many people were involved in this project and I’d like to thank

Constel Analytics Constel Analytics - Software Architecture

- 49 -

Figure 29- Difference between constant Link Distance (blue) and function Link Distance (orange)

On the tick event, the function tick() is called. This function has three objectives:

Enabling / disabling the Progress Bar.

Checking if the CloserRelatedNodes option is enabled (see toolbox below), in which case it

brings linked nodes closer.

Displaying the computed position of each node and each link once the alpha reaches 0.01 (a

threshold low enough to make sure the position will be precise). At this point, it stabilizes

the graph by stopping the computation.

The charge is computed according to a heuristic function designed after several measurements with

the development website. The idea was to find the optimal charge for 10 different sets of nodes and

derive a function able to output close results.

1. var n = data.nodes.length,

2. charge = -Math.round(311 / Math.sqrt(n))

Further evaluations pointed that this function is not working well for graphs with a different ratio of

nodes/links, mainly because it does not take into account the number of links which influences how

the links are spread.

Figure 30 - Constel Analytics Toolbox

Page 64: Constel Analytics - unifr.ch · 2014-06-10 · Constel Analytics Introduction - Motivation - XI - ACKNOWLEDGEMENTS Many people were involved in this project and I’d like to thank

Constel Analytics Constel Analytics - Software Architecture

- 50 -

The toolbox (Figure 30), situated in the upper right corner of the graph, offers several tools to

interact with the main visualization. Those tools do not require new requests to be processed and

act exclusively through Javascript.

The graph's progress bar is located at the top of the Toolbox. It restarts when the Graph is redrawn

using Ajax requests, and is set in motion when the Graph is redrawn when changing the zoom (see

below). All this is achieved using Twitter Bootstrap inner “Progress bar” feature, which relies on CSS

and HTML.

The search field allows users to find one or several pages within the graph. Each "keyup" event in

the field calls a Javascript function which checks each node's URLs and Title. If the input string is

present, the matching nodes are highlighted with a wide red border (see Figure 31 for an example).

s

Figure 31 - Example of search. The red, thick borders indicate the two pages matching the desired string.

The row situated under the progress bar proposes four tools:

“Zoom”, which works as a magnifying glass when the mouse’s cursor passes over the graph.

Clicking somewhere on the graph ”freeze” the zoom on place.

Based on a D3 module, this zoom deforms the graph as if it was seen from the eyes of a fish.

Figure 32 - Example of zoom over a large graph. The green part is magnified.

Page 65: Constel Analytics - unifr.ch · 2014-06-10 · Constel Analytics Introduction - Motivation - XI - ACKNOWLEDGEMENTS Many people were involved in this project and I’d like to thank

Constel Analytics Constel Analytics - Software Architecture

- 51 -

When generating the force-directed graph, the displacement of each node is precomputed.

The event “mouseover” is used to track the move of the cursor over the graph. When a node

enters the radius of the magnifying glass, its new position is determined by a function taking

into account its precomputed position and its proximity to the center of the glass, in order to

generate a progressive zooming effect. In order to keep a certain readability, the radius of

the nodes remains the same: the zooming effect only changes the distance separating each

pair of nodes. Figure 32 shows this effect on a relatively highly-connected graph.

“Path”, for “PathFinder”, allows to establish the most likely path between two nodes. The

user just has to click on a source node and on a target node. If there exists a between the

two, a purple path is drawn while all the other links are hidden (Figure 33)

Figure 33 - Example of path. All the links are hidden in order to keep the path more visible.

Dijkstra's algorithm was used to compute the shortest and most likely path (see Appendix B

for details of implementation). It is important to note that the PathFinder feature does not

highlight a real user’s path. It computes which path would be the most likely out of the

pages’ traffic, but it does not mean that anyone actually took that path.

“Highlight”, enabled by default, allows users to click on a node in order to highlight the

nodes that are linked to it. By holding Shift while clicking, it is possible to select several nodes

and their respective links. Figure 34 shows an example of highlight with two nodes selected.

Figure 34 - Example of Highlight. Two nodes are selected, showing only their relatives.

Page 66: Constel Analytics - unifr.ch · 2014-06-10 · Constel Analytics Introduction - Motivation - XI - ACKNOWLEDGEMENTS Many people were involved in this project and I’d like to thank

Constel Analytics Constel Analytics - Software Architecture

- 52 -

In order to find which nodes are connected to the targeted node, an array of relationships

between nodes is built during the generation of the graph. Each time a node is selected, a

function reviews all the nodes and checks if they're connected.

“Closer” stands for “Closer related nodes”. Once activated, the user is required to click on

the “redraw” button at the bottom of the box. The idea behind this option is to add more

graphical importance to the heavy interactions, overcoming one of the possible shortcoming

of Force-Directed layout. Figure 35 shows a small Force-Directed graph before the use of

“Closer” (left) and after (right).

At each tick, a function passes through all the nodes and checks which are their linked nodes.

It processes to bring them closer by moving the source node to the target node depending

on the intensity of the interaction. As it stands now, this feature breaks any other constraint

and tends to generate a lot of overlapping nodes. Additional constraints could prevent this to

happen.

Figure 35 - Example of "Closer related nodes" Function. Left is regular function, right shows how “Closer related nodes” impacts the nodes’

position.

The “Minimal Weight” slider hides the least important nodes of the graph (their importance being

relative to the metric selected - pageviews, by default). Its purpose is to make the graph clearer by

cleaning it from potentially disturbing points. It is also a good way to check which nodes are the

most prominent on the graph, as this is not always obvious at glance.

The value visible under the slider is the logarithm base 10 of the selected minimal weight. Each time

the slider moves, a function is called. For every nodes, it checks whether the logarithm base 10 of

the node's value is smaller than the value of the slider in which case the node is hidden (example

with Figure 36).

Page 67: Constel Analytics - unifr.ch · 2014-06-10 · Constel Analytics Introduction - Motivation - XI - ACKNOWLEDGEMENTS Many people were involved in this project and I’d like to thank

Constel Analytics Constel Analytics - Software Architecture

- 53 -

Figure 36 - Example of Minimal Weight's slider. The smaller nodes are hidden.

The second slider and the “redraw” button are related: they allow to redefine the level of zoom of

the whole graph, by opposition to the Fisheye magnifying glass that only magnifies a part of it. The

value visible under the slide corresponds to the absolute value of the graph’s current charge. Once a

new value is selected, a click on the “redraw” button resets the computation of the graph (a process

that takes as much time as the initial computation).

Filters are presented using bar charts (Figure 37). They are located at the bottom of the page and

displayed according to the user’s choice of dimension when generating the graph, sorted by

descending order of importance. As with the node’s radius, a linear scale would have presented the

risk of having a few long bars for the main values of the selected dimension, the others being left far

behind with indistinguishable changes of size. A logarithmic scale has been used instead so that this

case would not present itself.

Figure 37 – Example of Filters. It can take up to 50 values and displays them with a logarithmic scale.

The user can click on one or several bars in order to select the desired values. Clicking the “Filter the

results” button then initializes an Ajax request to ajaxAction() from the Main Controller. The result is

sent to the generateForceGraph() function.

In this initial release of Constel Analytics, the timelapse feature is limited to four different periods

being displayed as for separated, smaller graphs. In order to facilitate the readability of the graphs,

hovering a node with the mouse will highlight it over the other three graphs.

The timelapse option is present at the bottom of the page. It asks the user to choose which lapse

she wants to display (day, week or month). Once the option selected, a click on the “Get the data”

button launches an anonymous function. It computes the periods for the four graphs, then proceeds

to launch successive ajax requests to ajaxAction(). Each result is forwarded to the

“generateForceGraph” function, specifying a new DIV each time (Figure 38).

Page 68: Constel Analytics - unifr.ch · 2014-06-10 · Constel Analytics Introduction - Motivation - XI - ACKNOWLEDGEMENTS Many people were involved in this project and I’d like to thank

Constel Analytics Constel Analytics - Software Architecture

- 54 -

Figure 38 – Example of Timelapse. Four graphs display different periods of time.

SUMMARY

Constel Analytics’s main objective is to facilitate the understanding of interactions between the

pages of a website. The application must also be as loosely coupled to the data source as possible, in

order to be adapted for other sources during further developments. Furthermore, the application

should also be easy to install and deploy.

Supporting these aims, the design of Constel Analytics implies the use of an external Data Source

that is queried by Constel Analytics. The application processes the data, stores it and displays an

interactive Force-Directed graph.

To achieve this design, various technologies have been selected. Google Analytics serves as a Data

source thanks to its popularity and its APIs, the D3 library takes care of the force-directed

visualization and the backend part is managed by PHP and Symfony2.

Concretely, Constel Analytics takes the form of a Symfony 2 bundle providing a service that queries

Google Analytics and a View that display the data along with advanced tools to explore them.

Page 69: Constel Analytics - unifr.ch · 2014-06-10 · Constel Analytics Introduction - Motivation - XI - ACKNOWLEDGEMENTS Many people were involved in this project and I’d like to thank

Constel Analytics Evaluation - Software Architecture

- 55 -

EVALUATION

“The beginning of thought is in disagreement - not only with others but also with ourselves.”

- Eric Hoffer, moral and social philosopher

This section covers how Constel Analytics was evaluated. The first chapter briefly describes the

website that served as a base during the development and exposes the reasons for the choice of

testing subjects and methodology. The second chapter focuses on performances: what were the

technical problems met during the tests on different websites? The third and last chapter presents

the insights and the comments made by the evaluators during the interviews: what did they

discover thanks to Constel Analytics, what were the Usability issues identified and, globally, what

are their feelings towards the application?

TABLE OF CONTENT

EVALUATION ....................................................................................................................... 55

Table of Content................................................................................................................................. 55

Setting up the evaluation for this project .............................................................................. 56

Selection of the websites for this evaluation .................................................................... 56

Evaluation protocol ........................................................................................................... 61

Performances ......................................................................................................................... 63

Evaluation 1: Unifr “Course offerings” .............................................................................. 63

Evaluation 2: HEP of Canton Vaud .................................................................................... 64

Usefulness &Usability ............................................................................................................ 65

Evaluation 1: Unifr “Course offerings” .............................................................................. 65

Evaluation 2: HEP of canton Vaud ..................................................................................... 71

Evaluation with the ten activities .......................................................................................... 76

Summary ............................................................................................................................................ 77

Page 70: Constel Analytics - unifr.ch · 2014-06-10 · Constel Analytics Introduction - Motivation - XI - ACKNOWLEDGEMENTS Many people were involved in this project and I’d like to thank

Constel Analytics Evaluation - Setting up the evaluation for this project

- 56 -

SETTING UP THE EVALUATION FOR THIS PROJECT

Once the first version of Constel Analytics was developed, two evaluations were conducted to

challenge both its relevance and its performances. For most part, the feedbacks received during the

evaluations will result in further development whose issues and implementations will be described

in section 5.

SELECTION OF THE WEBSITES FOR THIS EVALUATION

During its development, Constel Analytics has been based on a personal website whose traffic was

humble (about 1'500 pagesviews distributed over 300 pages in 20 days). This website, that we will

call “development website” and benefiting from 6 years of Google Analytics’ data, is split into four

parts: a blog, a forum, a wiki and an index linking each of the sections. This website was the first to

serve for an informal evaluation of Constel Analytics.

Figure 39 - Constel Analytics, development website "L'Organisation Très Secrète" between 2014-02-15 and 2014-03-06

Figure 39 above presents the interactions between pages of the development website over a period

of 20 days. Firstly, it can be observed that the blog (blue) represents a weak majority of the traffic

linked to the main graph. The forum (light orange) benefits from an almost similar traffic. The Wiki

(vert) remains rather small and the index (bright orange and light blue) works, unsurprisingly, as the

hub of the website.

Nodes in the periphery, particularly numerous, are isolated pages that were visited by users with

little interest in the rest of the site: most of the time, they find the website following a specific

search on the web and do not want to browse beyond the landing page. Logically, the majority of

these points come from the forum, as forums usually host sections dealing with various subjects

which are unrelated to the main point of the website. Finally, some peripherial nodes are related to

each other in the form of sub-graphs (Figure 40) that represent specific and detached topics.

Page 71: Constel Analytics - unifr.ch · 2014-06-10 · Constel Analytics Introduction - Motivation - XI - ACKNOWLEDGEMENTS Many people were involved in this project and I’d like to thank

Constel Analytics Evaluation - Setting up the evaluation for this project

- 57 -

Figure 40 - Example of subgraphs. Users arrived through a search engine, read a few related pages and left the website.

The website’s weak traffic allows to isolate a few individual paths (example with Figure 41). Among

others, it is possible to observe that a particular user visited a succession of thematic pages related

to each other, these being attached to the main graph.

Figure 41 - Individual page tracking on a low-trafic website. The branch visible here comes from the main graph.

Figure 42 - Constel Analytics, development website "L'Organisation Très Secrète" from 2013-01-19 to 2014-03-10

Page 72: Constel Analytics - unifr.ch · 2014-06-10 · Constel Analytics Introduction - Motivation - XI - ACKNOWLEDGEMENTS Many people were involved in this project and I’d like to thank

Constel Analytics Evaluation - Setting up the evaluation for this project

- 58 -

The development underlined the fact that a graph composed of 1'000 nodes and 10'000 links is

almost unusable (Figure 42): the visual render allows to distinguish the big trends of the analyzed

website, but the necessary processing time is too high: it systematically exceeds the 30 seconds of

scripts’ maximum execution time generally allocated by standard PHP installations.

Issues with the nodes’ position have been observed as well: the space within the graph deploys

itself being too limited, it is frequent that two unrelated branches overlap, creating the fake

impression that their pages are linked while it is not the case (example with Figure 43). This problem

was particularly prominent during the evalution of the HEP of Canton Vaud’s website (see

subchapter 4.3.2).

Figure 43 - Overlapping branches with no connection. The blue branch go through the orange branch, possibly giving the impression that they are

related while it isn’t the case.

Despite those issues, it was apparent that Constel Analytics allows to identify efficiently the

structure of the development website and to visualize its traffic quickly. However, it did not mean

that the same would be true for all websites, as this graph was unique to the developement

website. Constel Analytics being a tool that must adapt to all sites, it was necessary to evaluate it

with other data. This evaluation had two distinct objectives :

- Challenging the relevance of the force-directed visualization with various users

- Testing the flexibility and the performances of Constel Analytics on websites whose size and

purposes are different

Two institutional websites were selected for this evaluation : the new section “Course offerings” of

University of Fribourg and the integrality of HEP of Canton Vaud’s website.

First choice for this evaluation, the website of University of Fribourg benefits from a large traffic as

well as various sections (that we will call “sub-sites”) available in different languages. This evaluation

was set up with the collaboration of the University’s webmasters, Nicolas Fretigny and Samuel

Crausaz, and with Serge Keller, scientific collaborator in the Communication and Media service.

Page 73: Constel Analytics - unifr.ch · 2014-06-10 · Constel Analytics Introduction - Motivation - XI - ACKNOWLEDGEMENTS Many people were involved in this project and I’d like to thank

Constel Analytics Evaluation - Setting up the evaluation for this project

- 59 -

Figure 44 - Homepage of "Course Offerings"

The different sub-sites of University of Fribourg are not bound to the same Google Analytics’

account: thus it was not possible to make an evaluation of all of them. However, the webmasters

would have liked to visualize the performances of “Course Offerings”, a new sub-site deployed

online in november 2013 (a view of the homepage in Figure 44). Available in three languages

(french, german and english), this website aims at appealing new students (Swiss or foreign) while

encouraging current Bachelor students to remain in Fribourg for their Master cursus.

Figure 45 - The page for the "Information systems" course

Page 74: Constel Analytics - unifr.ch · 2014-06-10 · Constel Analytics Introduction - Motivation - XI - ACKNOWLEDGEMENTS Many people were involved in this project and I’d like to thank

Constel Analytics Evaluation - Setting up the evaluation for this project

- 60 -

The most important part of this sub-site is the course offerings: the integrality of the available

studies are described, both at Bachelor and Master level. In addition to their informative nature, the

courses’ pages (example with Figure 45) are intended to lead Bachelor students to the Master

courses’ pages, thanks to the presence of a tab highlighting this possibility. The link to apply at

University of Fribourg is also highlighted, facilitating the transition from information research to

registration.

The sub-site also proposes two additional sections drawing attention at the benefits of living in

Fribourg:

“Life in Fribourg”, aiming at foreign students, proposes general information about Fribourg,

its geographical, sociological and economical situations. The objective is to present all the

extra-curricular advantages of being in Fribourg as a student.

“Organisation of Studies” describes the rules of the University. General information about

Bologna system and its local implement can be found there, along with administrative

specificities of University of Fribourg.

Finally, “Course Offerings” proposes a last section called “More Ressources” which offers different

links pointing to other important sub-sites of University of Fribourg (sub-sites of each Faculty,

courses calendar, ...).

“Course Offerings” being translated in three languages, visualizing this with Constel Analytics was

one of the main points of interest of this evaluation. Implied by this particularity, the different

methods allowing users to isolate certain parts of the graph for further analyses would be

challenged.

Figure 46 - HEPL Homepage

Page 75: Constel Analytics - unifr.ch · 2014-06-10 · Constel Analytics Introduction - Motivation - XI - ACKNOWLEDGEMENTS Many people were involved in this project and I’d like to thank

Constel Analytics Evaluation - Setting up the evaluation for this project

- 61 -

The second evaluation of Constel Analytics was led on the website of the Haute École Pédagogique

of Canton Vaud (also known as “HEPL”, which stands for “Haute École Pédagogique de Lausanne”).

The High School is currently improving its visibility in Switzerland and around the world, notably

thanks to a new website online since november 2011. The traffic there is high, just as the total

number of pages. This evaluation was led with the help of Philippe Schmid, Head of Information

Service, of Guillaume Vanhulst, Rector of the HEP of Canton Vaud, Barbara Fournier, Head of

Communication and Bertrand Mure, Project Manager for HEP Vaud website.

The new website of HEPL has several portals whose audiences are different. The one that was

analyzed is the main portal, dedicated to all common visitors. It is split into four main sections

(visible on Figure 46):

“Mission et organisation” describing the aims of the High School and the way it works. Job

offers and international information are also present under this section.

“Formation” describing the various course offerings of the HEPL.

“Recherche” which contains all the information related to the projects led within the HEPL as

well as publications

“Actualités et agenda” which covers all the events organized by or with the HEPL.

In addition, an extranet offers practical information to the students and HEPL employees.

The current website is known for its high verticality: finding information about a particular

collaborator or a specific publication requires to browse through many pages.

HEPL’s main portal aims at various visitors:

Future students looking for information about the school

Current students looking for information about their cursus or daily courses

Future employees of the High School who want to know what the HEPL could offer

Current employees who could be interested by events or might want to change their

personal information

External scientists interested by HEPL’s publications

Therefore the objective of Constel Analytics varies greatly depending on the visitors. The evaluation

was led by members of the administration whose vision is rather global: their purpose was simply to

observe the traffic and check if it changes according to several events.

EVALUATION PROTOCOL

Both evaluations were conducted following the same protocol. Constel Analytics being a prototype,

it was more relevant to lead qualitative evaluations rather than quantitative. Indeed, the latters

would have asked for resources out of the scope of this project, as evidenced by the annual BELIV

workshop (Beyond Time and Errors : Novel Evaluation Methods for Visualization) which regularly

draw attention to the fact that quantitative evaluations for data visualization are hard to set up [37].

Page 76: Constel Analytics - unifr.ch · 2014-06-10 · Constel Analytics Introduction - Motivation - XI - ACKNOWLEDGEMENTS Many people were involved in this project and I’d like to thank

Constel Analytics Evaluation - Setting up the evaluation for this project

- 62 -

Moreover, at this stage of the development, qualitative evaluations can retrieve more relevant

information.

The protocol of the evaluations was as follows:

0. Initial phase: discussions with the evaluators to present the project and set up the objectives

described above, access to the Google Analytics’ data

1. Performance tests and correction of the potential impediments that would have blocked the

rest of the evaluation

2. Familiarization of the evaluators with Constel Analytics

3. Interview and live demonstration of Constel Analytics

After configuring Constel Analytics, the next step is to test whether the application displays the data

correctly. Specifically, the processing speed and data integrity are verified. Bugs and other

optimization problems affecting either are corrected so as to be able to carry the rest of the

evaluation in acceptable conditions. This step is also an opportunity to make some preliminary

findings on the information it is possible to retrieve thanks to Constel Analytics.

Once these two steps are completed, Constel Analytics is deployed online so that evaluators can

access it.

The evaluators have several days to familiarize themselves with Constel Analytics, with the help of a

description of the different features.

The interview takes place, the evaluators attending together as a panel for the discussion. On this

occasion, the relevance of each group of features of Constel Analytics is tested. Using Constel

Analytics, evaluators aim to find at least three pieces of information for each group of features.

These groups are as follows:

Main visualization

Segments

Filters

Timelapse

The speed and ease with which the evaluators are able to find this information is assessed

qualitatively. During the interview, the evaluators are also invited to give their opinion freely.

The following sections provide the results obtained when applying the protocol described above.

Page 77: Constel Analytics - unifr.ch · 2014-06-10 · Constel Analytics Introduction - Motivation - XI - ACKNOWLEDGEMENTS Many people were involved in this project and I’d like to thank

Constel Analytics Evaluation - Performances

- 63 -

PERFORMANCES

Observations regarding the performances of Constel Analytics were mostly made during step 2 of

the evaluation protocol.

Below is a table summarizing the main characteristics of each use case:

Evaluation 1: Unifr “Course Offerings”

Evaluation 2 : HEP of Canton Vaud

Purpose Description of the course offerings for the University

Main institutional portal of the High School

Audiences Students of the University Various audiences (students, employees, ...)

Traffic Average : ~800 different pages seen in 20 days

High : +1’000 different pages seen in 20 days

Deployment University’s server Dedicated shared hosting

Table 7 – Main characteristics of use cases

EVALUATION 1: UNIFR “COURSE OFFERINGS”

From a technical point of view, the configuration went smoothly. However, the number of steps

required to configure Analytics Constel exceeds those of a standard web application that can be

found on the Internet. More problematic, the fact that is is necessary to use the Google Cloud

Console (to create a Client ID), Google Analytics (to recover the account ID and website tracking

code) and the Symfony configuration file makes this operation quite errors-prone.

One main issue was identified while testing Constel Analytics on “Course Offerings”: different pages

have the same title. Namely, a course given both in Master and Bachelor curriculums has the same

title, which made both pages appear as one in the force-directed visualization, a problem that

impedes the integrity of the data. Figure 47 shows that only a few nodes are visible in the initial

visualization, Master and Bachelor pages being fused. In order to overcome this limitation, a

configuration option was implemented to specify which of the title or the URL have to be taken into

account (using "title" by default). The Analytics Service was modified to support both ways of

gathering data. Moreover, using URLs allow to save one request to Google Analytics as there is no

need to differentiate the titles from the URL anymore. On the other hand, using URLs instead of

titles creates one node for each research, which makes the graph somehow less readable. In order

to keep track of pages with similar titles, it was decided that hovering a node would change the

filling color of all similarly named nodes. Once implemented, this option introduced a number of

bugs related to the selection of nodes (some functions made use of the title to identify nodes). For

instance, the pathfinder feature does not work anymore.

Page 78: Constel Analytics - unifr.ch · 2014-06-10 · Constel Analytics Introduction - Motivation - XI - ACKNOWLEDGEMENTS Many people were involved in this project and I’d like to thank

Constel Analytics Evaluation - Performances

- 64 -

Figure 47 - Initial visualization for "Course Offerings" from 2014-01-29 to 2014-02-17. Only four colors were visible, Master and Bachelor courses’

pages are merged.

The processing time was deemed reasonable, though the generation of multiple graphs (as with the

timelapse feature) might lead to execution’s time error. Ajax requests were adapted to limit the

risks of such situations.

The deployment should have occurred on the University’s servers. However, Symfony 2’s

requirements were higher than expected and it was decided to deploy the prototype on the

author’s personal website instead.

EVALUATION 2: HEP OF CANTON VAUD

Like in the previous evaluation, configuration for the HEPL website has not posed any technical

problem, while several Usability issues were identified (see next chapter for more information).

However, a few more problems were observed.

The amount of visited pages was much higher than anticipated. HEPL had over 1’000 pages

visited in 20 days, which made the visualization excessively slow to display. It could take up

to 3 minutes to display a graph, while most PHP installations limit execution time to 30 secs.

Beside a few basic optimizations (removing useless loops through the sets of results), a new

option was introduced to let the user decide how many pages should be displayed. By

default, Constel Analytics only displays up to 100 pages and 1’000 interactions which greatly

speeds up the display time without losing the main structure of the website.

Every node had the same color. The first hierarchical level of the website's URLs is actually

always the same ("cms"). As this might be the case of many other websites, it was decided to

modify the way that groups are determined. Instead of getting this information from Google

Analytics, Constel Analytics now discovers the hierarchical levels by itself, splitting the full

path according to the number of slashes (“/”). Working this way induces a risk of having too

Page 79: Constel Analytics - unifr.ch · 2014-06-10 · Constel Analytics Introduction - Motivation - XI - ACKNOWLEDGEMENTS Many people were involved in this project and I’d like to thank

Constel Analytics Evaluation - Usefulness &Usability

- 65 -

many groups for a readable visualization (the HEPL website has more than 40 different

hierarchical levels). A configuration option was thus implemented in order to specify up to

which level Constel Analytics should analyze the URL (1 by default).

Once again, deployment was problematic. Following the suggestion of the evaluators, the

deployment should have been made on a hosted website specifically purchased from a private

service provider. However, the server's configuration was not up to date: a problem between

Symfony 2 and libxml2 prevented the application from working properly. Therefore it was decided

to deploy Constel Analytics on the authors' website.

USEFULNESS &USABILITY

Feedbacks regarding the relevance and the usability of Constel Analytics were mainly collected

during step 4. However, a few observations were also made during step 2. The results are split into

three parts:

observations made during step 2

insights acquired by the evaluators during the interview

remarks and suggestions of the evaluators

EVALUATION 1: UNIFR “COURSE OFFERINGS”

Several usability issues were identified during the performances’ evaluation.

It is easy to neglect one of the steps during the application’s configuration. Being force to use

several administration panels can lead users to confusion and impede the installation of

Constel Analytics.

This visualization works as if there were three similar parts (one for each language).

Unfortunately, similar pages between languages do not have the same color. For instance,

“Information System” in English and “Systèmes d’information” in French are not related by

any mean. It is hard to keep track of the evolution of a single page over the three different

languages.

Below are the observations made by the evaluators during the Interview. Additional screenshots of

this evaluation can be found in Appendix C. The interview lasted one hour and a half, during which

the three evaluators went through all four features of Constel Analytics.

Page 80: Constel Analytics - unifr.ch · 2014-06-10 · Constel Analytics Introduction - Motivation - XI - ACKNOWLEDGEMENTS Many people were involved in this project and I’d like to thank

Constel Analytics Evaluation - Usefulness &Usability

- 66 -

MAIN VISUALIZATION

Figure 48 - Main visualization for Eval. 1, 2014-02-15 to 2014-03-06. The three groups are visible: French is at the top left of the graph, German is at

the right of the graph and English is at the bottom of the graph.

Figure 48 shows a possible representation for the “Course Offerings” website. It is limited to 250

pages and 10'000 interactions. Evaluators found the following information, sorted by chronological

order of discovery:

# Insight Description How did they discover this Ease and speed

1 Three main areas Expected behavior.

Without surprise, three areas clearly stand out. They represent the three languages in which the website is proposed. The german and the french parts are similar in terms of size and structure while the english part is slightly smaller.

Simply by looking at the three main groups. Structure was deemed similar because of the two recurrent entry points in each group and the fact that colors’ groups seem to match. Size was qualitatively appreciated.

This insight was acquired without effort as soon as the graph appeared.

2 Sections are well divided Expected behavior.

Users of the website tends to explore it section by section. “Life in Fribourg” and “Organisation of studies” are loosely tightened to the rest of the graph, for instance. However, a few branches have connections with “Life in Fribourg”, as there were links pointing to this section in their description.

It appears that besides the colors used to represent courses, other colors are clustered into individual branches without many interactions with the others.

The information was acquired after a few minutes, as the evaluators started to move the cursor of the mouse over the graph and check details.

3 Bachelor students do not care about Master courses

Unexpected behavior.

The evaluators expected the Bachelor students to read Master courses’ pages after they read the Bachelor courses’ pages they were after, but it seems like it did not work.

If the students had behaved in the expected way, there would have been some visible branches in the graph in which Bachelor pages precede Master’s. The evaluators deemed that it was not the case.

This information was not easy to understand, as it required the evaluators to understand which colors were attributed to Master and Bachelor pages for each language. Then they had to double-check whether those branches exist or not.

Table 8 – Insights acquired thanks to the Main Visualization, Eval. 1

Page 81: Constel Analytics - unifr.ch · 2014-06-10 · Constel Analytics Introduction - Motivation - XI - ACKNOWLEDGEMENTS Many people were involved in this project and I’d like to thank

Constel Analytics Evaluation - Usefulness &Usability

- 67 -

SEGMENTS

The evaluators selected three segments in order to challenge the interest of this feature.

# Segments : Insight Description How did they discover this

Ease and speed

4 New visitors : German-speaking visitors seemed to be more represented

Unexpected behavior.

By selecting this segment, evaluators hoped to highlight potential differences between the new visitors compared to returning visitors. At first glance, it seemed that German visitors are more represented in this category. This unexpected observation can be explained by the fact that the evaluators – being responsible for the implementation of this new site – are French. They have frequently visited the site, possibly influencing the Returning Visitors segment in favor of French-speakers.

The size of each group was estimated at glance.

This insight was acquired quickly, without much effort.

5 Tablet and mobile: no difference in traffic’s structure

Unexpected behavior.

Evaluators expected mobile visitors to behave differently from regular visitors. It did not seem so.

By comparing the non-segmented graph to this one, it appeared that there were not different in the patterns of the visits.

This information required the evaluators to check several nodes and appreciate the global distribution of the graph. This process was arguably quick.

6 Referal traffic: english articles lead to other languages’ sections

Unexpected behavior.

It seems like people referring to people landing on the website through external links tend to switch for another language. A few hypotheses were made regarding this: perhaps that some French and German visitors reached Unifr website through an English article and switched for their mother language after the first page.

The evaluators noted that the English part is less distinct and spread across the French and the German groups.

This information was found by appreciating the global distribution of the graph, which was done quickly.

Table 9 – Insights acquired thanks to the Segments, Eval. 1

FILTERS

Various observations were made using the Filters’ bar chart visualization and the filters’ application

on the main visualization.

Figure 49 - Country filters for Eval. 1. It is easy to see that China comes right after the main western countries.

# Filter : Insight Description How did they discover this

Ease and speed

Page 82: Constel Analytics - unifr.ch · 2014-06-10 · Constel Analytics Introduction - Motivation - XI - ACKNOWLEDGEMENTS Many people were involved in this project and I’d like to thank

Constel Analytics Evaluation - Usefulness &Usability

- 68 -

7 China : second after Western countries Unexpected behavior.

China comes right after Western countries in terms of visits for the analyzed period.

The bars were compared (Figure 49)

It was fast and simple, as all it required was a glance at the order of the bars.

8 Switzerland: majority of French-speakers

Expected behavior.

It was assumed that a majority of visitors were French-speakers, notably because of the evaluators being French-speakers.

The size of the French parts were compared across the main visualization, the filtered visualization for Switzerland and the filtered visualization for Germany.

Constel Analytics does not provide quantitative information, but the sizes of the French group was arguably larger in Switzerland.

9 Spain and Italy: more French-speakers than others

Expected behavior.

Latin countries have more visitors on the French part of the website than on the German part. In the case of Spain, French is clearly dominant.

By checking the filtered results for Italy and Spain, it was observed that the French group was larger.

The size of the German and English parts were clearly smaller.

Table 10 – Insights acquired thanks to the Filters, Eval. 1

Other dimensions were also used: the webmasters were interested to see if the browsers had

impact on the way visitors navigate through the website. More specifically, several features were

disable for older version of Internet Explorer. Unfortunately, this could not be confirmed.

TIMELAPSE

The timelapse was used in order to detect changes from month to month. Mainly, the webmasters

had introduced a few more pages for late subscriptions to the University and wanted to check

whether those were going to appear.

Figure 50 - 30 days timelapse, Eval 1. It is possible to see slight differences in the size of three groups over the months.

Page 83: Constel Analytics - unifr.ch · 2014-06-10 · Constel Analytics Introduction - Motivation - XI - ACKNOWLEDGEMENTS Many people were involved in this project and I’d like to thank

Constel Analytics Evaluation - Usefulness &Usability

- 69 -

# Period : Insight Description How did they discover this

Ease and speed

10 30 days : the late subscriptions do not appear

Expected behavior.

Fribourg proposes the specificity of accepting late subscriptions. Several pages were introduced to allow these. They did not appear in the graph, reinforcing the idea that late subscriptions are definitely a rare case.

After using the dynamic search for the name of the pages, it appeared that they were not present.

A simple search was enough to assert this.

11 30 days : more French-speakers at the start

Unexpected behavior.

The number of French-speaking visitors was proportionally higher during the first months of “Course Offerings”. Once again, this might be explained by the fact that the evaluators were French-speakers themselves.

The size of the French group decreased with months, as shown with Figure 50.

A glance was enough to find this insight.

12 30 days: size of the English part varies according to calendars

Expected behavior.

The size of the English part varies from month to month, according to other countries’ calendars. It is assumed that foreign students start looking for appliance around the end of their semester.

The size of the English group varied with months.

Unlike the previous observation, this one was not clear enough. It was only assumed that the English group varied in size.

Table 11 – Insights acquired thanks to Timelapse, Eval. 1

Figure 51 - It is necessary to scroll down the page in order to view "Current page" information.

The majority of comments that emerged during the interview focused on access to information.

In the case of a screen with a low resolution or a large zoom level, the box allowing to see

the name of the current node disappears under the display line of the browser, forcing the

user to scroll down the page (Figure 51). This information should be visible at any time.

Page 84: Constel Analytics - unifr.ch · 2014-06-10 · Constel Analytics Introduction - Motivation - XI - ACKNOWLEDGEMENTS Many people were involved in this project and I’d like to thank

Constel Analytics Evaluation - Usefulness &Usability

- 70 -

Evaluators have repeatedly tried to click on the nodes to stabilize the information in the

"current page" box: they thought that clicking on a node would disable the display of

information at mouseover.

From one visualization to another, the color and the size of the nodes are not consistent.

This is mainly a problem in the case of the timelapse feature, since the only way to identify a

node present on several graphs is to hover it with the mouse’s cursor.

A single page, accessed by the same URI ending with and without a slash appears as two

different nodes on the graph. For example, the URIs “/fr” and “/fr/” are considered as two

different pages. That explains the presence of two main entry points (blue nodes) for each

language.

The value shown under the “minimal weight” slider, the size of the nodes and the height of

the filters’ bars all use a logarithmic scale, but there is no indication of this anywhere. The

evaluators were misled repeatedly.

The intensity of the interactions is hardly visible. It is also impossible to see the direction of

an interaction.

In case Google Analytics returns no result, Constel Analytics displays generic PHP error

messages outside of the layout.

In addition to the remarks on the current features of Constel Analytics, evaluators suggested adding

a printable version of the main visualization, in order to present statically. The ability to zoom in on

a selected branch in order to see its details has also been proposed: the standard visualization

would present only a hundred points, and the user would be free to zoom in on a portion of the

graph in order to show others nodes.

Overall, the evaluators were interested in Constel Analytics and are satisfied by the discoveries

made during the test. However, they noted that the application is meant for exploration and is not

optimal for a report.

The evaluation led to interesting findings that can however be questioned. While it is possible that

some deductions are wrong (which would be a problem of interpretation rather than a problem of

the tool), Constel Analytics itself possibly misled the evaluators. Findings #4 and #8, for instance, are

far from confirmed as the French and the German groups had arguably the same size. Without

quantitative information, Constel Analytics only leaves us with assumptions that cannot be verified.

And while finding #5 stated that there was no difference between mobile/tablet traffic and desktop

traffic, a closer look would spot that the English part is slightly smaller and that the structure of each

language is different (less interconnections between the branches). Additional measures must be

provided to prevent those misinformation.

Page 85: Constel Analytics - unifr.ch · 2014-06-10 · Constel Analytics Introduction - Motivation - XI - ACKNOWLEDGEMENTS Many people were involved in this project and I’d like to thank

Constel Analytics Evaluation - Usefulness &Usability

- 71 -

Figure 52 - Difference between expected traffic and actual traffic in Eval. 1. If visitors went from different search pages to the Master course's page,

then the node will not be related to the Bachelor course's page as much as we expected during the evaluation.

One of their main questions was to know whether Bachelor students visit Master’s courses. Finding

#3 stated that it was not the case, because it was assumed that there should have been branches

consisting of three nodes (search for courses -> Bachelor course’s page -> Master course’s page).

However, assuming that some Master students went directly from search to the Master course’s

page, then the page would not look like a separate branch anymore and could be positioned in a

more ambiguous way. Figure 52 illustrates this issue.

This problem could be solved by limiting the number of links coming out each node, so that only the

significant interactions will be taken into account.

EVALUATION 2: HEP OF CANTON VAUD

It has been found that graphs with a lot of interactions tend to spread themselves across wider

distances. Being the most interconnected graph of the evaluation, HEPL’s visualization highlighted

the weakness of the current Charge-computation function used by Constel Analytics. As this

function does not take into account the number of interactions, charge was constantly

overestimated and many nodes were lost out of the SVG area.

The evaluation lasted one hour with two evaluators going through the four features. Unfortunately,

we were not able to collect 12 information for all the features of Constel Analytics. Instead,

Page 86: Constel Analytics - unifr.ch · 2014-06-10 · Constel Analytics Introduction - Motivation - XI - ACKNOWLEDGEMENTS Many people were involved in this project and I’d like to thank

Constel Analytics Evaluation - Usefulness &Usability

- 72 -

additional suggestions were given by the evaluators (see 4.3.2.3) regarding the evolution of the

application. All the screenshots of this evaluation can be found in Appendix C.

MAIN VISUALIZATION

Figure 53 - Main visualization for Eval. 2, 2014-03-27 to 2014-04-15.

# Insight Description How did they discover this Ease and speed

1 Structure of the traffic matches the expectations

Expected behavior.

Globally, the traffic on the website matches what the webdesigners had planned. There exists several branches leading specific Course Offerings, a dedicated part for students’ information (regarding internships, legal aspects, ...).

By looking at the graph, it is easy to spot the different branches. Mainly, Course offerings (light blue) and the student’s information part (orange and violet nodes at the top of the graph).

This observation was made as soon as the meaning of the nodes’ colors was identified.

2 Few interactions between branches

Expected behavior.

There is not a strong traffic between the courses’ description, which was intended as each branch matches specific needs.

There are no links between the different branches of the graph.

This observation was made along with the previous one.

Table 12 – Insights acquired thanks to the Main Visualization, Eval. 2

Page 87: Constel Analytics - unifr.ch · 2014-06-10 · Constel Analytics Introduction - Motivation - XI - ACKNOWLEDGEMENTS Many people were involved in this project and I’d like to thank

Constel Analytics Evaluation - Usefulness &Usability

- 73 -

SEGMENTS

Figure 54 - New Visitors segment for Eval. 2. The student's information part is much smaller.

# Segments : Insight Description How did they discover this

Ease and speed

3 New Visitors: student’s information part is less important

Expected behavior.

The student’s information part is smaller because this section regards current students of the school who are supposed to visit the website frequently.

By comparing the size of the student’s section with the main, non-segmented visualization.

Discovered quickly as it was the most anticipated change.

4 New Visitors: “Accès rapide” is not used to browse through the webiste

The website has a “quick access” section that is supposed to lead visitors to their destination quickly. It does not seem that those pages are really used. This observation could have been done in the main visualization as well.

Browsing through the nodes was necessary to figure this out.

Table 13 – Insights acquired thanks to the Segments, Eval. 2

Page 88: Constel Analytics - unifr.ch · 2014-06-10 · Constel Analytics Introduction - Motivation - XI - ACKNOWLEDGEMENTS Many people were involved in this project and I’d like to thank

Constel Analytics Evaluation - Usefulness &Usability

- 74 -

FILTERS

Figure 55 - Filtered results for France for Eval. 2. Most of the visible branches represent successions of pages in the Recherche section.

# Filter : Insight Description How did they discover this

Ease and speed

5 France : less visitors looking at the course offering

Expected behavior.

Foreign visitors tend to visit the publications, seminar reports and events more than the Swiss visitors. It seems logical as the further they live from Switzerland, the less likely they are to be looking for a formation or a job at HEP of Canton Vaud.

The branches are different. Course offerings are still present but are slightly smaller while new branches are formed, with events, agendas and other activities.

It was quickly observed that the structure of the graph was different. Understanding the actual difference asked for a few searches as colors had changed and did not match the previous graphs.

6 South Africa : not much success despite diplomatic attempts

Expected behavior.

The High School negociated agreements with Burkina Faso and Mozambique during the past years. It does not seem that those agreements led a lot of visitors from these countries on the website, which can be explained by the lack of infrastructure there.

By looking at the list of countries: Burkina Faso is at position #36 while Mozambique simply does not appear.

Observing the bar chart was enough to get this insight.

Table 14 – Insights acquired thanks to the Filters, Eval. 2

Page 89: Constel Analytics - unifr.ch · 2014-06-10 · Constel Analytics Introduction - Motivation - XI - ACKNOWLEDGEMENTS Many people were involved in this project and I’d like to thank

Constel Analytics Evaluation - Usefulness &Usability

- 75 -

TIMELAPSE

Figure 56 - Timelapse (30 days) for Eval. 2. The node with the thick red border on the third graph is Fukushima's event. The white node present on

the three first graphs is the main page of Freinet's event (related nodes are in saumon in the first two graphs and in violet in the third)

# Period : Insight Description How did they discover this

Ease and speed

7 30 Days : Freinet’s exposition generated traffic

Expected behavior.

The HEP of Canton Vaud led several event during the last months, among which the visit of the last inhabitant of Fukushima and an exposition about Freinet. The latter was prepared over several months with a dedicated section of the website which generated a consequent traffic.

The section dedicated to the Freinet’s exposition appear in January, February and March graphs. The Fukushima’s event appears in March.

A search was enough to locate the pages on all four graphs.

Table 15 – Insights acquired thanks to Timelapse, Eval. 2

As HEP of Canton Vaud does not currently have a specific Web Analytics strategy, the two evaluators

thought of this application as a possible solution for their needs: in this context, they regard it as

somewhat limited and provided many suggestions regarding the features that could make it more

useful.

This visualization does not currently provide information that could help improve a website. As it

stands now, Constel Analytics shows the general traffic of a website, and as long as the

webdesigners made a decent job, it is unlikely to spot any weaknesses beside the most obvious

ones.

Amongst the suggestions for improvement, the possibility to zoom over a group of pages and see

further details about them was the most important. The visualization would act as a visual

Page 90: Constel Analytics - unifr.ch · 2014-06-10 · Constel Analytics Introduction - Motivation - XI - ACKNOWLEDGEMENTS Many people were involved in this project and I’d like to thank

Constel Analytics Evaluation - Evaluation with the ten activities

- 76 -

dashboard to reach different sections with more details about individual users’ tracking and

population’s composition.

Another point raised by the evaluators is the possibility to analyze several Google Analytics’ account,

as each of the HEP’s portal has a separate ID. This would help see how those portals interact

between each other (we assume the visualization would look similar to the three groups of the

previous evaluation).

Beside these suggestions, some of the UI issues identified during evaluation 1 were also spotted.

The fact that colors of the groups change and that the layout is recomputed each time the page is

refreshed rather than being frozen were deemed as problematic.

EVALUATION WITH THE TEN ACTIVITIES

In addition to the remarks made by the evaluators during the interviews, a review of the

visualization can be done in the light of the ten activities discussed in chapter 2.2.

Characterize distribution: Force-directed layouts are best at supporting characterization of

the distribution. It is easy to understand the structure of the traffic with a few glances,

something that would be near-impossible with usual visualizations found in current web

analytics tools. Ironically, this observation was made by the author of this document while

he was conducting the evaluations rather than by the evaluators themselves.

Find anomalies: Anomalies in the context of the traffic of a website can take different

shapes, like a webpage being related to unexpected sections (or the opposite) or several

connections being stronger than anticipated. Spotting such anomalies is possible thanks to

several factors, like the grouping of the pages, which visually indicates to which group each

page belongs or the opacity of the interactions. In practice however, it might be hard to spot

the few nodes that do not belong to a group in a very large graph.

Find extremum: Finding extremum is possible by merely looking at the graph. The largest

node usually stands out pretty well, just like the most intense interaction. Local extrema are

harder to spot but this activity is supported by an efficient “minimal weight” slider: this is

one of the most precious information given by Constel Analytics as it clusters pages and

proposes a way to retrieve the most and the least important of them in each cluster.

Retrieve value: Retrieving specific values can be hard at first glance, since there are no

indication of pages’ name if the users do not hover nodes with their mouse. The search

supports this activity as intended, but it could be possible to add a few textual indications

directly on the graph.

Cluster: Clustering nodes works as good as the characterization of the distribution: the force-

directed layout groups nodes as intended.

Sort: Sorting nodes is not possible with a Force-directed graph as the algorithm computes

the position of each node.

Correlate: Correlation in Constel Analytics would consist of comparisons between several

datasets, such as “did the mobility section of our website gain visibility following the latest

Page 91: Constel Analytics - unifr.ch · 2014-06-10 · Constel Analytics Introduction - Motivation - XI - ACKNOWLEDGEMENTS Many people were involved in this project and I’d like to thank

Constel Analytics Evaluation - Evaluation with the ten activities

- 77 -

votations ?”. With its ability to compare several datasets, Constel Analytics provides basic

tools for correlation. However, evaluations have pointed out that their implementation is

still far from perfect.

Filter: Filtering is possible thanks to the dedicated eponymous feature. However, it could go

further by allowing users to select several dimensions and metrics.

Compute derived value: Computing derived values is hardly possible as statistics are lacking

regarding the data. The Force-directed layout does not help here neither. Several solutions

have been considered in the following section.

Determine range: Determining ranges is not possible by looking at the graph itself.

Computed values in the nearby panels must be consulted in order to get quantitative

information regarding the exact ranges.

SUMMARY

In order to assess Constel Analytics, qualitative evaluations have been conducted because they were

deemed better suited than quantitative evaluations due to the complexity of the latters. Two

institutional websites with different structures have been selected to serve as a basis for this

assessment.

The evaluation protocol consisted first in configuring Constel Analytics, testing its performances and

deploying it online. In a second step, evaluators familiarized themselves with the application before

being interviewed during a live-demonstration, so as to gather feedbacks on both usability and

relevance of the Constel Analytics.

Most of the technical evaluation worked well, though some of the application’s secondary aims are

not reached, as evidenced by the risky deployment that required a backup solution in both cases.

Feedbacks of the two assessments globally overlap. Constellation Analytics is a relevant and

interesting application whose main goal – providing a visualization of a website’s traffic – is reached.

However, it requires some more work to be really useful to a website. It also suffers from usability

problems and lack of quantitative information, not to mention the risk of bias induced by the nature

of the visualization currently implemented.

A part of the observations has already led to changes in Constel Analytics. The solutions that have

not been implemented yet are described in the following section.

Page 92: Constel Analytics - unifr.ch · 2014-06-10 · Constel Analytics Introduction - Motivation - XI - ACKNOWLEDGEMENTS Many people were involved in this project and I’d like to thank
Page 93: Constel Analytics - unifr.ch · 2014-06-10 · Constel Analytics Introduction - Motivation - XI - ACKNOWLEDGEMENTS Many people were involved in this project and I’d like to thank

Constel Analytics Discussion - Evaluation with the ten activities

- 79 -

DISCUSSION

“A person who never made a mistake never tried anything new.” - Albert Einstein, scientist

On the basis of the feedbacks collected during the evaluations, several ideas of improvement for

Constel Analytics were discussed in order to make it go beyond its initial limitations. As these ideas

were explored further, several potential implementations emerged. This chapter aims at discussing

those implementations which take the form of short-term improvements that could be immediately

taken.

TABLE OF CONTENT

DISCUSSION ......................................................................................................................... 79

Table of content ................................................................................................................................. 79

Information management ...................................................................................................... 80

Detection of communities ................................................................................................. 80

Categorization of the interactions .................................................................................... 82

Various UI improvements ................................................................................................. 80

New features .......................................................................................................................... 83

Sitemap comparison.......................................................................................................... 83

Other sources of data ........................................................................................................ 85

Vistits typology .................................................................................................................. 84

Summary ............................................................................................................................................ 85

Page 94: Constel Analytics - unifr.ch · 2014-06-10 · Constel Analytics Introduction - Motivation - XI - ACKNOWLEDGEMENTS Many people were involved in this project and I’d like to thank

Constel Analytics Discussion - Information management

- 80 -

INFORMATION MANAGEMENT

Among the feedbacks of the evaluations, several points regarding the efficiency of the current

visualization were raised as it appeared that it lacks information (mostly quantitative information,

but not only). Solutions discussed below focus on how to improve the Force-directed graph in order

to make it more efficient.

VARIOUS UI IMPROVEMENTS

Figure 57 - Possible future layout for Constel Analytics. The General information box is hidden by default, leaving more place for the tools.

Information about the current page are directly visible next to the cursor.

Evaluations led to the conclusion that the User Interface of Constel Analytics might need several

changes in order to improve its usability. Among others, it is necessary to rethink the layout of the

visualization, as the boxes to the right tend to take too much place and sometimes require scrolling

in order to be visible. Spreading those information directly on the graph could be a solution as it

would not require so much space on the page. Figure 57 shows how much place could be spared by

hiding the general information into a lightbox (it is not useful to keep it visible all the time) and using

tooltips to display information about the hovered page.

Other solutions to improve the readability of the graph include:

Displaying only interactions between groups of pages (should the “detection of

communities” point be implemented)

Providing a way to see interaction’s actual intensity instead of simply guessing it by the

opacity of the link

Comparing dataset by making the main graph evolves rather than seeing four different

graphs at once

DETECTION OF GROUPS

The two evaluations identified how problematic it could be for users to clearly detect groups – or

communities - of pages in large Force-Directed graphs. The way pages are currently grouped (using

their path) is undeniably useful since it allows users to “find landmarks” by mapping their site’s

Page 95: Constel Analytics - unifr.ch · 2014-06-10 · Constel Analytics Introduction - Motivation - XI - ACKNOWLEDGEMENTS Many people were involved in this project and I’d like to thank

Constel Analytics Discussion - Information management

- 81 -

structure on the visualization. However, it does not help making actual groups of pages stand out. In

highly-interconnected graphs, figuring out which are the different groups and how they interact

together can be tricky and misleads users. By using a community-detection algorithm, it would be

possible to automatically group pages according to their proximity. Those groups could then be

compared with the time-lapse features.

Figure 58 shows a manually detected Force-directed graph with groups’ detection, based on an

additional website that was used during the development of the project4. It presents the way that

groups could be displayed on a graph, by encapsulating them into larger transparent sets with

colorized borders. Naming of those groups could be either automatic by taking the name of the

most prominent page or by checking similarities between names, either manual.

Figure 58 - "Target render" for groups’ detection. This is the visualization of Nid du Phénix’s website (a defunct store in Fribourg), which had a

forum (light blue at the top of the graph) and several sections of products that were manually highlighted. Ideally, the application should be able to

group the pages automatically and allows users to name the groups by themselves.

Automatic group detection could work using an algorithm like Girvan-Newman algorithm [38].

Because of its relative complexity, group detection should work as an option located in the toolbox

so the users will choose to use it or not.

As it stands now, Constel Analytics also fails to provide quantitative data that could be used for

weekly or monthly reports. Automatically grouping communities of webpages would allow Constel

4 This is the visualization of nid-du-phenix.ch, a now defunct website for an entertainment store in Fribourg that closed in 2009.

Page 96: Constel Analytics - unifr.ch · 2014-06-10 · Constel Analytics Introduction - Motivation - XI - ACKNOWLEDGEMENTS Many people were involved in this project and I’d like to thank

Constel Analytics Discussion - Information management

- 82 -

Analytics to display their weight (either as percentage or number), which would in turn allows

webmasters to know whether a particular population has grown or not over the last period. This

would help support the “Compute derived values” activity, which was set aside in the first version of

the application.

CATEGORIZATION OF THE INTERACTIONS

Network visualizations suffer from several pitfalls, amongst which is the risk of misunderstanding

the nature of a link. Concretely, evaluations have shown that Constel Analytics can mislead its users

because the graph suffers from memory loss and from a lack of information about the nature of the

interactions. While the memory loss issue could only be solved by using a Data Source that allows

querying visits’ individual path, the second can be addressed by performing a small modification on

Constel Analytics. As it stands now, the application categorizes nodes according to their group. It is

possible to do the same with links.

As an example based on the dimensions available on Google Analytics, it could be imaginable to

categorize four types of interactions, according to the time spent by the user at the destination

page. It would change the second query (see subchapter 3.3.2 for information about the Data

Processing) by requesting a new metric (ga:avgTimeOnPage).

Natures of interactions Characteristics Description

Regular interactions Average or long time on page Read an article, check an image, write a message, ...

HUB / Navigation interactions Short time on page On a gallery website, visitors go back to the photostream to select different pictures to see.

Bounce interactions Shortest time on page Clicked on a page, figured it is not the desired information, go back to the previous page.

Exit interactions Exit page (no time recorded) Leave the website, possibly after finding the desired information or without so.

Table 16 – Different types of interactions

The exact definition of what is a short or long time spent on a page depends on the website’s

average (a histogram could be used to display this information as well).

Figure 59 shows how this categorization could work, using different colors for the categories of

interactions – while it would be possible to represent them with symbols at the end of the link. In

this example, C is clearly a central page linking a few other pages (like an index linking to all the

chapters of a book). D is often used as an exit page (which means that it could be the end of a

book).Thanks to those categories, it is possible to understand how – and likely why – do visitors

move from a page to another.

Page 97: Constel Analytics - unifr.ch · 2014-06-10 · Constel Analytics Introduction - Motivation - XI - ACKNOWLEDGEMENTS Many people were involved in this project and I’d like to thank

Constel Analytics Discussion - New features

- 83 -

Figure 59 – “Target render” for interactions’ categorization. Black (faded) links represent the bouncing visits, blue links represent the Hub visits, the

white link represent the exit and orange links are regarded as “regular”.

This solution would however make the graph less readable, as there could be up to 4 edges between

each pair of nodes. This problem could be solved by displaying those links only in certain

circumstances and without recomputing the whole graph (a click on a node, a certain level of

zoom, ...).

NEW FEATURES

SITEMAP COMPARISON

One of the remarks that emerged often during the evaluations was the fact that Constel Analytics

does not directly show anomalies in the website’s traffic – these have to be discovered by exploring

the graph. A solution proposed is to compare the actual traffic with the ideal sitemap – or site index

– built by the webdesigner.

Figure 60 - Sketch for a possible "site map" visualization. The size of the trait represent the traffic: the thicker it is, the more intense the traffic is.

In order for Constel Analytics to display such a map, the application would have to analyze the

whole arborescence of the website and sort all its path levels. It would then be possible to show the

map as a tree layout, whose branches would indicate the intensity of the traffic (either with color,

opacity or size). This visualization would make anomalies in the traffic immediately visible, while the

Page 98: Constel Analytics - unifr.ch · 2014-06-10 · Constel Analytics Introduction - Motivation - XI - ACKNOWLEDGEMENTS Many people were involved in this project and I’d like to thank

Constel Analytics Discussion - New features

- 84 -

force-directed layout would offer the advantage of shaping the whole site according to its actual

traffic rather than according to the webdesigner’s intends. Figure 60 shows an example of how such

a visualization could work.

The use of a radial tree layout could be relevant, as large websites tend to have a lot of pages to

display. But even so, the granularity of such a tree would have to remain relatively low, as a tree

with over 1’000 leaves would be clearly impossible to read.

VISITS TYPOLOGY

During the conception phase of Constel Analytics, it was initially intended to build a visualization

about visits and visitors. The idea behind this was to sort the different types of visitors of a given

website, classifying them into different clusters according to their similarity.

The main question posed by this feature is "what is visitors' similarity?". Visitors can be sorted

according to various dimensions or metrics: most tools already provide information about

demographics, technology and others. An innovative way of sorting users would be to check the

succession of pages they visited and the time they have spent on each page. Finding similarities

between the progressions of visitors could indicated different usages – and thus, users – of the

website. The algorithm developed by N. Labroche, M.-J. Lesot and L. Yaffi [28] could be used and

extended to take into account the time spent by the users on each page, as this metric is also

revealing of their interests [39].

Using a clustering algorithm, groups of visitors could be visualized as a dendrogram and allows users

to name the clusters and save them as segments for further use (Figure 61 shows an example of the

visualization with named groups of users).

Implementing this feature is not possible with the free version of Google Analytics as its APIs cannot

retrieve all the visits’ steps. Using alternatives like Piwik would be a solution.

Figure 61 - Sketch for Visitors' typology. 7 visitors are distributed at various levels, depending on the similarities of their visits’ steps.

Page 99: Constel Analytics - unifr.ch · 2014-06-10 · Constel Analytics Introduction - Motivation - XI - ACKNOWLEDGEMENTS Many people were involved in this project and I’d like to thank

Constel Analytics Discussion - New features

- 85 -

OTHER SOURCES OF DATA

While Google Analytics is the most widely used web analytics tool, it is not the only one and several

companies might want to use Constel Analytics with other sources of data. In particular, companies

which regard data privacy as a sensitive issue are usually reluctant to use external tools and would

rather go for self-hosted systems like Piwik or Open Web Analytics.

Piwik being one of the most popular choices after Google Analytics, we studied the possibility of

using it as a Data Source. It benefits from its openness and from the fact that the data are stored

locally, which can be an advantage for performance as much as privacy. Moreover, Piwik provides

complete APIs to access its data either internally or through RESTful services, which make it eligible

as a source of data for Constel Analytics. Moreover, Piwik offers a lot more details about individual

visits which opens new perspectives for the Visits’ typology feature.

However, a few limitations prevent the integration of Piwik in Constel Analytics. Namely, two main

problems were identified:

The current APIs do not provide a convenient way to list the interactions between pages, as

it would require one request for each page. From a performance perspective, this is not

possible considering the fact that those requests are likely to be handled through the RESTful

APIs.

Piwik can sort pages according to their URL or according to their Titles, but there are no links

between the two.

In order to integrate Piwik, two options have been considered. The first one is to access Piwik’s

database directly and use the raw data. The second one is to contribute to Piwik and extend its API.

Both solutions should be analyzed in further work.

SUMMARY

The evaluations led to discussions about how to solve the different issues identified. Solutions were

classified in two groups, one covering the improvements of the current visualization and the other

including future features not necessarily related to the force-directed graph.

Regarding the main visualization, it has been considered to group pages more efficiently through the

use of a community-detection algorithm. In order to give more information about the links, using

the average time spent on the destination page could be useful as it would help in particular

distinguishing hub pages from highly connected pages that require more time to read. Some

redesigning suggestions were also made regarding the UI after it was observed that it was not fitted

for all of the users’ tasks.

As for the new features, the possibility to display the website’s map as a tree has been presented.

Webmasters would find anomalies in the traffic of their website easier than with the force-directed

graph. The possibility to use different sources of data has been exposed, showing that some APIs

would not suit the force-directed visualizations easily. Eventually, the possibility to cluster visitors

Page 100: Constel Analytics - unifr.ch · 2014-06-10 · Constel Analytics Introduction - Motivation - XI - ACKNOWLEDGEMENTS Many people were involved in this project and I’d like to thank

Constel Analytics Discussion - New features

- 86 -

instead of pages has been addressed: while the algorithms exist and would likely work, the current

Google Analytics APIs do not provide sufficient information to categorize users’ behavior according

to their pattern.

Page 101: Constel Analytics - unifr.ch · 2014-06-10 · Constel Analytics Introduction - Motivation - XI - ACKNOWLEDGEMENTS Many people were involved in this project and I’d like to thank

Constel Analytics Conclusion

- 87 -

CONCLUSION

“If all difficulties were known at the outset of a long journey, most of us would never start out at all”.

- Dan Rather, journalist and former news anchor

This section serves as a conclusion to this project. A first chapter summarizes how this master thesis

was led. Then, a synthesis is proposed, summarizing the limitations and the weaknesses of Constel

Analyitcs while putting in perspective the valuable insights acquired during its development and

evaluation. Eventually, we will present the possibilities of evolution for Constel Analytics: beyond

the short-term improvements described in the previous section, what could be the future of this

application?

TABLE OF CONTENT

CONCLUSION ....................................................................................................................... 87

Table of content ................................................................................................................................. 87

Wrap up .................................................................................................................................. 88

Conclusion .............................................................................................................................. 88

Future work ............................................................................................................................ 89

Page 102: Constel Analytics - unifr.ch · 2014-06-10 · Constel Analytics Introduction - Motivation - XI - ACKNOWLEDGEMENTS Many people were involved in this project and I’d like to thank

Constel Analytics Conclusion

- 88 -

WRAP UP

This master thesis worked as a step-by-step discovery of web analytics, starting with theoretical

discussions, going through an overview and a classification of current tools and proposing a new

take on how it is possible to report data collected.

First, the definition of web analytics was discussed, providing a quick overview of all the four parts

of the web analytics management lifecycle. We also stressed the importance of the web analytics’

evolution: it started with the simple need of measuring the audience of a websites to a whole new

dimension, tracking users’ behaviors from many different devices over many different channels. As

we hypothesized that data visualization in current web analytics tools could be a problem, a

description of data visualization was also given, explaining how a visualization could be evaluated.

We reviewed four different data visualizations that could be used in the context of web analytics,

and spoke about others that are already being used. Then, we proposed a survey of 20 web

analytics tools from which emerged a taxonomy to categorize them according to several criteria,

such as their pricing and targets.

The application developed during this project was aiming at making interactions between the pages

of a website easier to see. We based it on Google Analytics’ data and used open-source technologies

for its development: PHP and Symfony as a base and D3 for its visualization. In order to evaluate the

application, two Swiss institutional websites were selected. We challenged Constel Analytics in two

steps: first by testing its performances and flexibility by deploying it online, then by evaluating its

relevance and usability with evaluators. Eventually, we provided some ideas on how the application

could be improved.

CONCLUSION

This project started with the assumption that visualizations in current public web analytics tools

were simplistic and might be replaced by more advanced versions. One of the first contribution of

this thesis was to provide a classification of web analytics tools based on their targets and nature, as

we believed this was the main difference between them.

Among the findings of this study, a particular fact has been highlighted: many academic Web

Analytics and Web Mining projects use Network diagrams to represent their data while none of the

existing commercial solutions do, as if advanced visualizations did not suit public software. Constel

Analytics filled the gap: network diagrams – with a Force-directed layout method – offer a clear

added value to existing and massive data collected by commercial Web Analytics solutions. While

we believe that extensive evaluations would be necessary to declare that this is undeniably true, the

present study made a clear step in this direction.

We hope that the evolution of Web Analytics in the years to come will take interest in those

advanced visualizations and algorithms, using them in order to make analyses both understandable

for more people and more efficient at predicting users’ behavior.

Page 103: Constel Analytics - unifr.ch · 2014-06-10 · Constel Analytics Introduction - Motivation - XI - ACKNOWLEDGEMENTS Many people were involved in this project and I’d like to thank

Constel Analytics Conclusion

- 89 -

FUTURE WORK

Regarding the academic works in the field of web analytics, we believe that this thesis might serve

as a base for further evaluations regarding the relevance of network diagrams and other unusual

visualizations in web analytics.

As for the application developed during this thesis, it is bound to be improved. Evaluations and

discussions led an observation: Constel Analytics can evolve into two different directions.

The first one is the direction of visual analytics, a disciple that seeks to support analytic reasoning

through automatically processed data presented in an interactive way [40]. By using a force-directed

graph, Constel Analytics already made the first steps in that direction: it can be regarded as a

computationally enhanced Visualization as defined in Bertini and Lalanne’s taxonomy [41]. Proposed

short-term improvement in section 5 would turn it into an actual visual analytics tool that would not

only display information, but also process data in order to detect patterns in visitors’ habits or in the

way pages interact together.

The second direction is communication. Evaluators noted that Constel Analytics was not able to

provide clear, instantly understandable insights that anyone could understand. Allowing the

application to produce advanced reports highlighting complex findings while remaining clear for

everyone is a necessary step should it be integrated – or developed as – public software. To this

end, the use of infographics is considered.

Other than those two perspectives of evolution, the scope of web analytics is also bound to grow in

the next few years as explained in chapter 2.1.3. Eventually, Constel Analytics will have to handle

several types of channels – not only showing how webpages interact with each other, but rather

how the whole web ecosystem’s components work together. This will raise several concerns about

the display different types of nodes on a single visualization while keeping relevant derived values.

Page 104: Constel Analytics - unifr.ch · 2014-06-10 · Constel Analytics Introduction - Motivation - XI - ACKNOWLEDGEMENTS Many people were involved in this project and I’d like to thank
Page 105: Constel Analytics - unifr.ch · 2014-06-10 · Constel Analytics Introduction - Motivation - XI - ACKNOWLEDGEMENTS Many people were involved in this project and I’d like to thank

Constel Analytics References

- 91 -

REFERENCES 1. W3Techs. Usage Statistics and Market Share of traffic analysis tools for websites, April 2014.

W3Techs. [Online] April 2014. [Cited: April 11, 2014.]

http://w3techs.com/technologies/overview/traffic_analysis/all.

2. Ivory, Melody Y. and Hearst, Marti A. The State of the Art in Automating Usability Evaluation of

User Interfaces. ACM Computing Surveys, Vol. 33, No. 4. Berkeley : ACM, Inc, 2001, pp. 470-516.

3. Wikipedia. Web Analytics - Wikipedia, the free encyclopedia. Wikipedia. [Online] November 3,

2013. [Cited: April 8, 2014.] https://en.wikipedia.org/wiki/Web_analytics.

4. Web Analytics Association. Web Analytics Definitions. Digital Analytics Association. [Online]

September 22, 2008. [Cited: April 24, 2014.]

http://www.digitalanalyticsassociation.org/Files/PDF_standards/WebAnalyticsDefinitions.pdf.

5. Bottégal, Brice. Définition et histoire du Web analytics. Blog web analytics. [Online] January 7,

2013. [Cited: April 10, 2014.] http://www.bricebottegal.com/definition-histoire-web-

analytics/#more-226.

6. Visual Analytics: How Much Visualization and How Much Analytics? Keim, Daniel A., Mansmann,

Florian and Thomas, Jim. s.l. : ACM, 2009, ACM SIGKDD Explorations Newsletter , pp. 5-8.

7. Low-Level Components of Analytic Activity in Information Visualization. Amar, Robert, Eagan,

James and Stasko, John. Minneapolis : IEEE, 2005, 2005 IEEE Symposium on information

Visualization (InfoVis 2005), p. pp.15.

8. Wikipedia. Dendrogram - Wikipedia, the free encyclopedia. Wikipedia, the free encyclopedia.

[Online] November 22, 2013. [Cited: April 8, 2014.] https://en.wikipedia.org/wiki/Dendrogram.

9. —. Hierarchical clustering - Wikipedia, the free encyclopedia. Wikipedia, the free encyclopedia.

[Online] April 2, 2014. [Cited: April 8, 2014.] https://en.wikipedia.org/wiki/Hierarchical_clustering.

10. Teoh, Soon Tee. A Study on Multiple Views for Tree Visualization. [book auth.] San Jose State

University. Visualization and Data Analysis. San Jose : SPIE Press, 2007, pp. 99-110.

11. Bostock, Mike. Cluster Dendrogram. mbostock’s blocks. [Online] December 19, 2012. [Cited:

April 08, 2014.] http://bl.ocks.org/mbostock/4339607.

12. Skau, Drew. Battle of the Charts: Why Cartesian Wins Against Radial. visual.ly. [Online] June 7,

2012. [Cited: April 8, 2014.] http://blog.visual.ly/cartesian-vs-radial-charts/.

13. Dragos, Sanda Maria and Beldean, Alina Mihaela. Analysing Web Usage with Force-Directed

Graphs. Studia Univ. Babes-Bolyai, Informatica. 2013, pp. 75-85.

Page 106: Constel Analytics - unifr.ch · 2014-06-10 · Constel Analytics Introduction - Motivation - XI - ACKNOWLEDGEMENTS Many people were involved in this project and I’d like to thank

Constel Analytics References

- 92 -

14. Wikipedia. Force-directed graph drawing - Wikpedia, the free encyclopedia. Wikpedia, the free

encyclopedia. [Online] January 16, 2014. [Cited: April 10, 2014.] https://en.wikipedia.org/wiki/Force-

based_algorithms.

15. Manning, Christopher. Chicago Lobbyists Force-Directed Graph Visualization.

christophermanning's blocks. [Online] bl.ocks.org, January 17, 2012. [Cited: April 9, 2014.]

http://bl.ocks.org/christophermanning/1625629.

16. Bostock, Mike. Among the Oscar Contenders, a Host of Connections - Interactive Feature -

NYTimes.com. The New York Times. [Online] The New York Times, February 20, 2013. [Cited: April 9,

2014.] http://www.nytimes.com/interactive/2013/02/20/movies/among-the-oscar-contenders-a-

host-of-connections.html?_r=0.

17. IEA. IEA Sankey Diagram. IEA. [Online] [Cited: April 10, 2014.]

18. Wikipedia. Treemaping - Wikipedia, the free encyclopedia. Wikipedia, the free encyclopedia.

[Online] March 29, 2014. [Cited: April 8, 2014.] https://en.wikipedia.org/wiki/Treemapping.

19. McCandless, David, et al. Billion Dollar-o-Gram 2013 | Information is Beautiful. Information is

Beautiful. [Online] April 2013. [Cited: April 9, 2014.]

http://www.informationisbeautiful.net/visualizations/billion-dollar-o-gram-2013/.

20. Ruuskanen, Aleksi. IBM Coremetrics - Web Analytics and Digital Marketing Optimization.

Helsinki : Helsinki Metropolia University of Applied Sciences, 2013.

21. Prescott, LeeAnn. Entreprise Web Analytics Tools: The Marketer's Guide. [ed.] Karen Burka and

Claire Schoen. s.l. : Digital Marketing Depot, 2012.

22. Demers, Tom. Web Analytics Softwares Comparison: Identifying The Right Web Analytics Tool

For Your Business. Search Engine Land. [Online] May 10, 2013. [Cited: April 10, 2014.]

http://searchengineland.com/web-analytics-software-comparison-identifying-the-right-web-

analytics-tools-for-your-business-149373.

23. Singh, Brijendra and Singh, Hemant Kumar. Web Data Mining Research: A Survey. Proceedings

of IEEE International Conference on Computational Intelligence and Computing Research.

Coimbatore : s.n., 2010.

24. Carta, Tonia, Paternò, Fabio and de Santana, Vagner Figuerêdo. Web Usability Probe: A Tool for

Supporting Remote Usability Evaluation of Web Sites. Human-Computer Interaction – INTERACT

2011 . Lisbon : IFIP International Federation for Information Processing, 2011, pp. 349-357.

25. Burzacca, Paolo and Paternò, Fabio. Remote Usability Evaluation of Mobile Web Applications.

[book auth.] Masaaki Kurosu. Human-Computer Interaction, Part I. Berlin Heidelberg : Springer-

Verlag, 2013, pp. 241-248.

Page 107: Constel Analytics - unifr.ch · 2014-06-10 · Constel Analytics Introduction - Motivation - XI - ACKNOWLEDGEMENTS Many people were involved in this project and I’d like to thank

Constel Analytics References

- 93 -

26. Hong, Jason I, et al. WebQuilt: A Proxy-based Approach to Remote Web Usability Testing. ACM

Transactions on Information Systems, Vol. 19, No. 3, July 2001. Berkeley : ACM, Inc., 2001, pp. 263–

285.

27. Jalali, Mehrdad, et al. WebPUM: A Web-based recommandation system to predict user's future

movements. Expert System with Applications. s.l. : Elsevier, 2010, pp. 6201-6212.

28. Labroche, Nicolas, Lesot, Marie-Jeanne and Yaffi, Lionel. A New Web Usage Mining and

Visualization Tool. 19th IEEE International Conference on Tools with Artificial Intelligence. Patras :

IEEE Computer Society, 2007.

29. Ovsyannykov, Igor. Computer Science and Marketing: A Developing Relationship.

Inspirationfeed. [Online] Inspirationfeed, August 10, 2012. [Cited: April 10, 2014.]

http://inspirationfeed.com/articles/technology-articles/computer-science-and-marketing-a-

developing-relationship/.

30. Google. Dimensions & Metrics Reference - Google Analytics. Google Developers. [Online] [Cited:

April 20, 2014.] https://developers.google.com/analytics/devguides/reporting/core/dimsmets.

31. Wikipedia. D3.js - Wikipedia, the free encyclopedia. Wikipedia, the free encyclopedia. [Online]

April 4, 2014. [Cited: April 13, 2014.] https://en.wikipedia.org/wiki/Data-Driven_Documents.

32. Bostock, Mike. d3/src/layout/force.js - mbostock/d3. GitHub. [Online] [Cited: April 20, 2014.]

https://github.com/mbostock/d3/blob/master/src/layout/force.js.

33. —. Multi-Foci Force Layout. mbostock’s block. [Online] June 12, 2011. [Cited: April 15, 2014.]

http://bl.ocks.org/mbostock/1021841.

34. W3Techs. Usage Statistics and Market Share of Server-side Programming Languages for

Websites, April 2014. W3Techs. [Online] April 2014. [Cited: April 10, 2014.]

http://w3techs.com/technologies/overview/programming_language/all.

35. SensioLabs. High Performance PHP Framework for Web Development - Symfony. Symfony.

[Online] [Cited: April 13, 2014.] http://symfony.com/.

36. Bostock, Mike. Curved Links. mbostock’s block. [Online] bl.ocks.org, January 23, 2013. [Cited:

April 9, 2014.] http://bl.ocks.org/mbostock/4600693.

37. BELIV. BELIV 2014. BELIV. [Online] [Cited: April 11, 2014.] http://beliv.cs.univie.ac.at/.

38. Girvan, Michelle and Newman, Mark. Community structure in social and biological networks.

Proceedings of the National Academy of Sciences of the United States of America. [Online] April 6,

2002. [Cited: April 11, 2014.] http://www.pnas.org/content/99/12/7821.full.

39. Web Systems Evaluation on Users' Behaviour Modelling. Robal, Tarmo and Kalja, Ahto. Tallinn :

s.n., 2009, Volume 187: Databases and Information Systems V, pp. 41-52.

Page 108: Constel Analytics - unifr.ch · 2014-06-10 · Constel Analytics Introduction - Motivation - XI - ACKNOWLEDGEMENTS Many people were involved in this project and I’d like to thank

Constel Analytics References

- 94 -

40. Wikpedia. Visual analytics - Wikipedia, the free encyclopedia. Wikpedia, the free encyclopedia.

[Online] April 10, 2014. [Cited: April 12, 2014.] https://en.wikipedia.org/wiki/Visual_analytics.

41. Surveying the complementary role of automatic data analysis and visualization in knowledge

discovery. Bertini, Enrico and Lalanne, Denis. s.l. : ACM, 2009, Proceedings of the ACM SIGKDD

Workshop on Visual Analytics and Knowledge Discovery: Integrating Automated Analysis with

Interactive Exploration, pp. 12-20.

42. Bostock, Mike. Sankey Diagram. Mike Bostock. [Online] May 22, 2012. [Cited: April 10, 2014.]

http://bost.ocks.org/mike/sankey/.

43. UC Berkeley Visualization Lab. Flare. Data Visualization for the Web. [Online] [Cited: April 20,

2014.] http://flare.prefuse.org/.

44. Happy Recruiting. Maximizing Global Potential. Happy Recruiting. [Online] [Cited: April 20,

2014.] http://happyrecruiting.se.

Page 109: Constel Analytics - unifr.ch · 2014-06-10 · Constel Analytics Introduction - Motivation - XI - ACKNOWLEDGEMENTS Many people were involved in this project and I’d like to thank

Constel Analytics Appendix A “Billion Dollar-o-Gram visualization”

- 95 -

APPENDIX A “BILLION Dollar-O-GRAM VISUALIZATION”

Page 110: Constel Analytics - unifr.ch · 2014-06-10 · Constel Analytics Introduction - Motivation - XI - ACKNOWLEDGEMENTS Many people were involved in this project and I’d like to thank

Constel Analytics Appendix A “Billion Dollar-o-Gram visualization”

- 96 -

Page 111: Constel Analytics - unifr.ch · 2014-06-10 · Constel Analytics Introduction - Motivation - XI - ACKNOWLEDGEMENTS Many people were involved in this project and I’d like to thank
Page 112: Constel Analytics - unifr.ch · 2014-06-10 · Constel Analytics Introduction - Motivation - XI - ACKNOWLEDGEMENTS Many people were involved in this project and I’d like to thank

Constel Analytics Appendix B “Dijkstra’s implementation”

- 98 -

APPENDIX B “DIJKSTRA’S IMPLEMENTATION”

Djikstra’s algorithm has been implemented for Constel Analytics. Its main function - hungryPath() -

initially used a heuristic hungry function, but it was found that using Djikstra’s algorithm would not

cause any significant performance losses. hungryPath() is called by the findPath() function which set

up the proper parameters and styles the nodes and links if a path is found.

3. function hungryPath(end, border, processed, bestPath) {

4. var bestWeightOfTheRound = 0;

5. var bestOfTheRound = '';

6.

7. node.each(function (d) {

8. index = d.name;

9. if (!processed[index]) {

10. for (var key in border) {

11. var connectionValue = isTarget(border[key][0], d)

12. if (connectionValue) {

13. linkWeight = connectionValue / border[key][0].value * border[key][1];

14. if (linkWeight > bestWeightOfTheRound) {

15. bestWeightOfTheRound = linkWeight;

16. bestOfTheRound = [d, bestWeightOfTheRound, border[key]];

17. }

18. }

19. }

20. }

21. });

22. if (!bestOfTheRound) {

23. return false;

24. }

25. else {

26. // Add the new winner to the lists

27. bestIndex = bestOfTheRound[0].name;

28. border[bestIndex] = [bestOfTheRound[0], linkWeight, bestOfTheRound[2]];

29. processed[bestIndex] = [bestOfTheRound[0], linkWeight, bestOfTheRound[2]];

30.

31. // Remove previous border node if not connected to any unprocessed node anymore

32. var isBorderStillConnected = false;

33. node.each(function (d) {

34. if (processed.indexOf(d) == -1) {

35. var connected = isConnected(d, bestOfTheRound[1]);

36. if (connected) {

37. isBorderStillConnected = true;

38. }

39. }

40. })

Page 113: Constel Analytics - unifr.ch · 2014-06-10 · Constel Analytics Introduction - Motivation - XI - ACKNOWLEDGEMENTS Many people were involved in this project and I’d like to thank

Constel Analytics Appendix B “Dijkstra’s implementation”

- 99 -

41. if (!isBorderStillConnected) {

42. border.splice(bestOfTheRound[1], 1);

43. }

44. }

45. if (bestOfTheRound[0] == end) {

46. var bestPath = [

47. [],

48. []

49. ];

50. bestPath = rollDownPath(bestOfTheRound, bestPath);

51. return bestPath;

52. }

53. else return hungryPath(end, border, processed, bestPath);

54. }

55. function rollDownPath(lastNode, finalArray) {

56. finalArray[0].push(lastNode[0]);

57. if (lastNode[2]) {

58. finalArray[1].push([lastNode[2][0].name, lastNode[0].name]);

59. return rollDownPath(lastNode[2], finalArray);

60. }

61. else {

62. return finalArray;

63. }

64. }

65.

Page 114: Constel Analytics - unifr.ch · 2014-06-10 · Constel Analytics Introduction - Motivation - XI - ACKNOWLEDGEMENTS Many people were involved in this project and I’d like to thank

Constel Analytics APPENDIX C “Evaluations’ screenshots”

- 100 -

APPENDIX C “EVALUATIONS’ SCREENSHOTS”

EVALUATION 1

Figure 62 - New Visitors visualization, Eval. 1. The German part (bottom) was deemed as slightly larger than in the previous graph.

Figure 63 - Mobile and Tablet traffic for Eval.1. The overall graph was considered similar. It must be noted, though, that the English part is smaller

and that the German part is more interconnected than the French part.

Page 115: Constel Analytics - unifr.ch · 2014-06-10 · Constel Analytics Introduction - Motivation - XI - ACKNOWLEDGEMENTS Many people were involved in this project and I’d like to thank

Constel Analytics APPENDIX C “Evaluations’ screenshots”

- 101 -

Figure 64 - Referral traffic for Eval.1. The English part (right) is less distinct than in previous graphs.

Figure 65 - Swiss traffic for Eval.1. The French part (right) was regarded as larger than the German part. It seems clear that it stretches more than

the German part but it is hard to say it the total nodes’ size and the total nodes’ number are really higher.

Page 116: Constel Analytics - unifr.ch · 2014-06-10 · Constel Analytics Introduction - Motivation - XI - ACKNOWLEDGEMENTS Many people were involved in this project and I’d like to thank

Constel Analytics APPENDIX C “Evaluations’ screenshots”

- 102 -

Figure 66 - Spain (left) and Italy (right) filtered results, Eval. 1. The largest group for Spain is French, while Italy has two arguably similar groups

(English and French).

EVALUATION 2

Figure 67 – Country filters for Eval.2. Mozambique does not appear, Burkina Faso is far behind other countries.

Page 117: Constel Analytics - unifr.ch · 2014-06-10 · Constel Analytics Introduction - Motivation - XI - ACKNOWLEDGEMENTS Many people were involved in this project and I’d like to thank

Constel Analytics Appendix D “Installation guide”

- 103 -

APPENDIX D “INSTALLATION GUIDE”

REQUIREMENTS

The following installation is required to run Constel Analytics:

Apache Server, rewriting allowed

PHP 5.3.3 and above, unsafe Mode

JSON, PHP-XML and ctype installed

Date.timezone setting is configured in php.ini

A Google account to host the application and a Google account connected to a Google

Analytics account (it can be the same account)

(optional) Composer installed in order to update the bundles

(optional) Git installed in order to take advantage of the versioning

Constel Analytics only requires one third party bundles in addition to those proposed in the

Standard Installation of Symfony: HappyR Google Analytics bundle, version 1.2.2. Please check that

none of your application’s bundles will conflict with this particular version.

DOWNLOADING THE SOURCE CODE

Constel Analytics is hosted on Bitbucket using Git. The URL of the project is as follows (can be used

to clone the repository with HTTPS):

https:// bitbucket.org/pvanhulst/constel-analytics.git

The code source includes a whole Symfony 2 application. At the time of this writing, Constel

Analytics has not been released as an independent bundle, but it is planned in months following its

publication.

DEPLOYMENT & CONFIGURATION

Deployment takes place in a standard fashion for a Symfony 2 project: it is recommended to update

bundles, perform the database migration and clean up the cache. Since Constel Analytics does not

require a Database for now, cleaning up the cache and updating the bundles is enough. Transfer of

the application can be done in all possible manners, from simple FTP transfer to more sophisticated

tools like Source Controls.

The configuration is divided into two stages: the hosting account configuration and the website

configuration.

HOSTING ACCOUNT CONFIGURATION

During this step of the configuration, Constel Analytics will be linked to a Google Account

responsible for its operation. Quotas will be recorded on this account.

Page 118: Constel Analytics - unifr.ch · 2014-06-10 · Constel Analytics Introduction - Motivation - XI - ACKNOWLEDGEMENTS Many people were involved in this project and I’d like to thank

Constel Analytics Appendix D “Installation guide”

- 104 -

Access Google Cloud Console (https://cloud.google.com/console)

Create a new project

Create a new Client ID

The redirect URI is http(s)://your.path.to.symfony/admin/google/analytics/oauth2callback

The “config.yml” file of Symfony (/app/config directory) contains the essential configuration

information for the application. At this step of the configuration, the bundle “happy_r_google_api”

must be configured as follows:

66. happy_r_google_api:

67. application_name: APPLICATION NAME FROM THE GOOGLE CLOUD CONSOLE

68. oauth2_client_id: CLIENT ID FROM THE GOOGLE CLOUD CONSOLE

69. oauth2_client_secret: CLIENT SECRET FROM THE GOOGLE CLOUD CONSOLE

70. oauth2_redirect_uri: AUTHORIZED REDIRECT URI FROM THE GOOGLE CLOUD CONSOLE

71. developer_key: API Keys FOR BROWSER APPLICATION FROM THE GOOGLE CLOUD CONSOLE

72. site_name: WEBSITE’S NAME (UNRELATED TO GOOGLE CLOUD CONSOLE)

WEBSITE CONFIGURATION

During the second step of the configuration, a Google Analytics will be linked in order to access its

data. This step has to be repeated each time users want to see the data of another website. On the

long-run, Constel Analytics will be adapted so that it can keep in memory several websites.

In “config.yml”, configure the “happy_r_google_analytics” bundle as follows:

73. happy_r_google_analytics:

74. profile_id: PROFILE ID

75. token_file_path: %kernel.root_dir%/var/happyr/storage/

76. host: HOST

77. tracker_id: TRACKER ID

78. tracker_enabled: false

The information can be found on the main page of Google Analytics, as shown in Figure 68.

Page 119: Constel Analytics - unifr.ch · 2014-06-10 · Constel Analytics Introduction - Motivation - XI - ACKNOWLEDGEMENTS Many people were involved in this project and I’d like to thank

Constel Analytics Appendix D “Installation guide”

- 105 -

Figure 68 - Where to find the necessary information to configure HappyR Google Analytics

Once “happyr_r_google_analytics” has been configured, configure “vtwa” as follows:

79. vtwa:

80. json_cache_path: %kernel.root_dir%/var/vtwa/storage/

81. site_name: DOMAIN NAME OF THE ANALYZED WEBSITE

82. max_pagelevel: 3

83. distinction: path

json_cache_path, max_pagelevel and distinction have default values. Refer to section 3.3 for more

information about those parameters.

Empty the /app/cache, the HappyR Google Analytics storage and Constel Analytics’ cache directory

(/app/var/vtwa/storage by default).

Page 120: Constel Analytics - unifr.ch · 2014-06-10 · Constel Analytics Introduction - Motivation - XI - ACKNOWLEDGEMENTS Many people were involved in this project and I’d like to thank

Constel Analytics Appendix D “Installation guide”

- 106 -

APPENDIX E “SOURCE CODE”

The enclosed CD contains the source code of Constel Analytics with the latest commit at the time of

this writing (9db8684, 2014-04-16). For an up-to-date version please check the repository (see

previous appendix).

Page 121: Constel Analytics - unifr.ch · 2014-06-10 · Constel Analytics Introduction - Motivation - XI - ACKNOWLEDGEMENTS Many people were involved in this project and I’d like to thank

Constel Analytics Appendix D “Installation guide”

- 107 -