124
www.featurespace.co.uk REPORT 3: PREDICTING PROBLEM GAMBLERS: Analysis of industry data Gambling machines research program 28 November 2014 Authors: David Excell, Georgiy Bobashev, Daniel Gonzalez-Ordonez, Heather Wardle, Tom Whitehead, Robert J. Morris, Paul Ruddle

REPORT 3: PREDICTING PROBLEM GAMBLERS: …...way gambling behaviour and specifically problem gambling behaviour is understood. Furthermore, new insights into gambling behaviours have

  • Upload
    others

  • View
    10

  • Download
    2

Embed Size (px)

Citation preview

Page 1: REPORT 3: PREDICTING PROBLEM GAMBLERS: …...way gambling behaviour and specifically problem gambling behaviour is understood. Furthermore, new insights into gambling behaviours have

www.featurespace.co.uk

REPORT 3: PREDICTING PROBLEM GAMBLERS: Analysis of industry dataGambling machines research program 28 November 2014

Authors: David Excell, Georgiy Bobashev, Daniel Gonzalez-Ordonez, Heather Wardle, Tom Whitehead, Robert J. Morris, Paul Ruddle

Page 2: REPORT 3: PREDICTING PROBLEM GAMBLERS: …...way gambling behaviour and specifically problem gambling behaviour is understood. Furthermore, new insights into gambling behaviours have

Contents Executive Summary .......................................................................................................... 5 Introduction ....................................................................................................................... 6

About the research ....................................................................................................... 6 Policy context ........................................................................................................... 6 The research process ............................................................................................... 6

About this Report .......................................................................................................... 8 About the Research Organisations ............................................................................... 8 Unique Contribution ...................................................................................................... 9

Definitions & Assumptions .............................................................................................. 10 Gaming Machines in Great Britain .................................................................................. 12 Methodology ................................................................................................................... 15

Predictive Models ....................................................................................................... 15 Terminology ............................................................................................................ 15 Model Development Approach ............................................................................... 17

Target Variable ........................................................................................................... 18 Data Pre-processing ................................................................................................... 18 Model Complexity vs Model Information..................................................................... 19 Measuring Model Performance .................................................................................. 19

Data ................................................................................................................................ 22 Industry Data .............................................................................................................. 22

Data Request .......................................................................................................... 22 Data Received ........................................................................................................ 23 Data Quality ............................................................................................................ 24

Survey Data ................................................................................................................ 25 Proxy Sessions ........................................................................................................... 26

Measurement of Harm Markers ...................................................................................... 27 Between-Session Markers .......................................................................................... 27 Within-Session Markers .............................................................................................. 29

Data Analysis Results ..................................................................................................... 31 Baseline ...................................................................................................................... 31

Player Baseline ....................................................................................................... 33 Session Baseline .................................................................................................... 37 Summary ................................................................................................................. 40

Player Analysis (Registered Play) ............................................................................... 41

Page 3: REPORT 3: PREDICTING PROBLEM GAMBLERS: …...way gambling behaviour and specifically problem gambling behaviour is understood. Furthermore, new insights into gambling behaviours have

Using Between Session Markers ............................................................................ 41 Incorporating Within-Session Markers .................................................................... 43

Session Analysis (Unregistered Play) ......................................................................... 48 Additional Experiments ............................................................................................... 51

Removing multiple loyalty cards .................... 51 PGSI Problem Gambling Threshold ........................................................................ 52

..................................................... 54 Predicting PGSI Screening Question Responses ................................................... 55 Gambling Type Analysis ......................................................................................... 56 Factor Group Analysis ............................................................................................ 57 Debit Card Usage ................................................................................................... 58

Discussion ...................................................................................................................... 60 Can we identify harm? ................................................................................................ 61 Research Implications ................................................................................................ 62

Multiple Variables ................................................................................................... 62 Registered vs Non-Registered Play ........................................................................ 62 Mandatory ABB Limits ............................................................................................ 63

Research Limitations .................................................................................................. 63 Future Research ......................................................................................................... 63

Recommendations .......................................................................................................... 66 Conclusion ...................................................................................................................... 68 About Featurespace ....................................................................................................... 70 About RTI ........................................................................................................................ 71 References ..................................................................................................................... 72 Document Information .................................................................................................... 73

Document History ....................................................................................................... 73 Appendix A Calculating Proxy Sessions ...................................................................... 74 Appendix B - Measurement of Harm Markers ................................................................ 77

Between Session Metrics ............................................................................................ 77 1) Frequency of Play ............................................................................................... 78 2) Duration of Play .................................................................................................. 82 3) Net Expenditure .................................................................................................. 85 4) Levels of Play Engagement ................................................................................ 88 5) Number of Activities/Games Types Undertaken ................................................ 90 6) Chasing .............................................................................................................. 94

Within Session Metrics ................................................................................................ 97

Page 4: REPORT 3: PREDICTING PROBLEM GAMBLERS: …...way gambling behaviour and specifically problem gambling behaviour is understood. Furthermore, new insights into gambling behaviours have

1) Debit Card Payment Reloading and Switching .................................................. 97 2) Debit Card Payment Decline .............................................................................. 99 3) Variability In Staking Behaviour ........................................................................ 100 4) Use of Autoplay ................................................................................................ 103 5) Play of Multiple Machines Simultaneously ........................................................ 105 6) Stake Size ......................................................................................................... 106 7) Game Volatility .................................................................................................. 108 8) Way Game Played (e.g. number of bets per stake) ......................................... 111 9) Cash-Out .......................................................................................................... 114

Appendix C Representativeness of Loyalty Card Data ............................................. 116 Appendix D Candidate Predictive Modelling Approaches Explored by RTI ............. 121 Appendix E Example of transformed between session variables. ............................. 123

Page 5: REPORT 3: PREDICTING PROBLEM GAMBLERS: …...way gambling behaviour and specifically problem gambling behaviour is understood. Furthermore, new insights into gambling behaviours have

Executive Summary The Responsible Gambling Trust has been challenged by the Responsible Gambling Strategy Board to answer the following two questions:

x Is it possib le to d istinguish be tween harmful and non-harmful gaming machine p lay?

x If so, wha t measures might limit harmful p lay without impacting those who do not exhib it harmful behaviours?

Focusing on the first of these questions, this report confirms that it is possible to distinguish between harmful-and non-harmful gaming machine play. The research effort undertaken to deliver this report required the skilful processing and analysing of a large dataset using machine learning methods. By focusing on problem gambling behaviour

survey data, this advanced technological approach has produced a step-change in the way gambling behaviour and specifically problem gambling behaviour is understood.

Furthermore, new insights into gambling behaviours have been identified and are detailed in this report. For example, researchers have discovered which Problem Gambling Severity Index questions are most predictive of problem gambling behaviour;

indicative of problem gambling; and the need to consider a range of variables when attempting to distinguish between problem and non-problem gamblers. This has fundamental implications for operationalising the research and developing intervention strategies, the most critical of which is that a focus on a single factor such as reduction of stake size will not effectively prevent or reduce gambling harm.

From this research it is not possible to state categorically whether only gaming machine play predominantly contributes to problem gambling status, or whether this is accounted for by participation in multiple forms of gambling. Readers should not assume that problem gambling status is causally and predominantly related to gaming machine play.

Indeed, given the complexity of problem gambling and gambling behaviours in general, the researchers have concluded that any corporate responsibility strategy must take a balanced, rounded approach. That is, that by factoring in the environment, the individual player, and the product being played to provide a complete view rather than focusing on a single variable the gambling industry will be able to significantly improve the detection rate of problem gamblers and the minimisation of gambling related harm.

Finally, consideration has been given to the second question-- wha t measures might limit harmful p lay without impac ting those who do not exhib it harmful behaviours?although answering this does not contribute to a significant portion of this report. The researchers conclude from their analysis that operators will face trade-offs in delivering harm minimisation interventions, as some amount of non-problem-gamblers will inevitably receive interventions which are unnecessary. Therefore all interventions must be carefully evaluated in a live environment to measure their effectiveness.

Page 6: REPORT 3: PREDICTING PROBLEM GAMBLERS: …...way gambling behaviour and specifically problem gambling behaviour is understood. Furthermore, new insights into gambling behaviours have

Introduction About the research Policy context This report forms part of series of research projects commissioned by the Responsible Gambling Trust to explore the extent to which industry data generated by machines in bookmakers can be used to identify harmful patterns of play. In recent years, there have been increasing cbetter understand how consumers play machines. It is hoped that, by analysing transactional data, it will be possible to identify patterns of play that indicate someone is experiencing problems or harm from their engagement in gambling. Industry and regulators alike are keen to see if this is possible. If so, a potential new range of responsible gambling measures, tailored towards and intervening with the individual, could be developed.

To date, regulation of machines tends to be conducted at a high level, making generalisations that focus on restrictions of stake, prize, speed and number of machines. There is no regulation that is tailored towards individual players. The Gambling Commission (the industry regulator) considers that a mix of macro (i.e., stakes and prizes) and micro (i.e., the individual) regulatory approaches may be

patterns of play and, if so, what types of interventions could be introduced that intercede with gamblers experiencing problems. A further concern is to ensure that any individual-led policies intervene with those experiencing problems, whilst allowing those who are not experiencing problems to play without onerous intervention.

The objectives set by the Responsible Gambling Strategy Board (RGSB) for the broader research programme were:

x Can harmful and non-harmful gaming machine play be distinguished?, and x If so, what measures might limit harmful play without impacting those who do not

exhibit harmful behaviours?

The RGSB (Responsible Gambling Strategy Board) is the body responsible for setting strategic objectives for gambling research and policy and advising the Gambling Commission on these issues. To meet these objectives, a series of research projects was planned by the research team, a consortium of NatCen Social Research, Featurespace, Geofutures, and RTI International. These projects focus mainly on the first objective, though consideration is also given to the second. Other research projects (called contextual projects in the broader research programme) contribute to the second objective; for example, by looking at how people understand certain types of player messaging (see Collins et al, 2014).

The research process To meet the objectives set by the RGSB, a number of project stages were undertaken and three related reports have been published. The project stages are shown in Figure 1.

Page 7: REPORT 3: PREDICTING PROBLEM GAMBLERS: …...way gambling behaviour and specifically problem gambling behaviour is understood. Furthermore, new insights into gambling behaviours have

Figure 1 - Research project stages

In meeting the objective set by the RGSB, the first step was to consider what patterns of play might indicate that someone was experiencing harm. This involved a theoretical review, a rapid evidence review and consultation with key stakeholders to develop a set of metrics (or markers) which may exist within industry data and might indicate that someone was experiencing harm. The results of this stage are published in Wardle, Parke and Excel (2014), called Report 1 in this series (See Wardle, Parke & Excell, Report 1: Theoretical Markers of Harm).

The next step was to consider whether the markers of harm identified from the review were actually evident in industry-collected data. This part of the research was

Markerssection of this report.

Early analysis suggested that some of the markers of harm identified in the review could be measured using industry data and therefore further exploration of the data was warranted. A critical question for this research centred on examining the potential patterns of harm identified in theoretical markers of harm from existing academic literature and expert opinion. It was necessary to determine if these theoretical markers of harm were actually patterns of play exhibited by those experiencing harm from gambling. A crucial aspect of this is determining the extent to which potential patterns of harm differentiate between those experiencing harm and those who do not. To explore this, more detail is needed about the player and the extent to which they are experiencing gambling-related problems. This information can only be obtained by communicating directly with players.

The study by NatCen, which is documented in Report 2, fills that gap. It reports survey findings from individuals who have loyalty cards for Ladbrokes, William Hill, or Paddy Power. Using loyalty card holders as a sampling frame for a survey meant that we could link their survey responses with data collected and recorded for their loyalty card. The loyalty cards for bookmakers operate in much the same way as other loyalty cards (like Tesco Clubcards or Nectar cards) where every transaction (where the card is used) is recorded for an individual. This means it is possible to track the frequency and duration of time spent on machines, so long as individuals used their loyalty card when playing. Using this data has considerable benefits over traditional survey approaches, as it is widely accepted that estimates of gambling expenditure obtained through surveys are inaccurate.

Page 8: REPORT 3: PREDICTING PROBLEM GAMBLERS: …...way gambling behaviour and specifically problem gambling behaviour is understood. Furthermore, new insights into gambling behaviours have

The study document in this report has one primary aim: to discover if, by combing the survey results obtain by NatCen from loyalty card holders with the data held by the industry, it is possible to distinguish between harmful and non-harmful play through the identification of specific patterns in a player s behaviour. It is important to note that a majority of surveyed loyalty card holders participated in multiple forms of gambling. Therefore it is not possible to state categorically whether only gaming machine play predominantly contributed to their problem gambling status, or whether that status can be accounted for by participation in multiple forms.

Within that goal, the aims of this report are to:

x provide a description of the data and context to the gambling environment from which the data was obtained;

x provide an overview of the methodology used to analyse the data; x report on the initial findings that lead to the conclusion that industry data was

suitable for further analysis to understand harmful play; x report on the analytics performed and provide a discussion on how the results

should be interpreted, and suggest recommendations on how the results could be used and further explored.

Report 2 and Report 3 (this report) should be viewed together and it is these two reports in combination which aim to meet the research objectives set out by the RGSB.

About this Report The objective of this report is to describe the findings of the research undertaken and ensure that the results can be understood by a wide range of interested stakeholders. A significant proportion of the research effort undertaken to deliver this report has been the sophisticated manipulation and analysis of a large volume of data using state-of-the-art processing systems and statistical algorithms. A summary of the technical

review of the analytical techniques and processes described has been purposely excluded as it would not enhance the understanding of the research for a majority of the readers.

About the Research Organisations The research documented in this report has principally been carried out by an international team of researchers from Featurespace and RTI International, in tandem with invaluable input and guidance from NatCen. Featurespace took on the role of working with the industry to obtain and process their data, and then worked in conjunction with RTI to analyse the patterns within the data. NatCen undertook the loyalty card surveys and provided analysis of the responses to help guide the data analysis.

With the time-limited nature of this research program, having two organisations working in parallel with varying backgrounds in the application of data analytics enabled different approaches to be explored independently. Results were then compared and

Page 9: REPORT 3: PREDICTING PROBLEM GAMBLERS: …...way gambling behaviour and specifically problem gambling behaviour is understood. Furthermore, new insights into gambling behaviours have

the differences and similarities understood. This also provided additional oversight in terms of quality assurance to ensure valid results were produced.

Unique Contribution A unique contribution to the understanding of problem gambling is provided in this report. These contributions are summarised below:

x It is the first time that the five largest operators in Great Britain have made their data available for analysis by independent researchers.

x It is the first time in the world where land-based industry data from multiple operators has been analysed alongside a problem gambling screening score obtained by interviewing individuals. This has provided an incredibility rich data set which has the potential to unlock a whole range of new research initiatives. A number of studies have been completed previous, but the size of these samples has been significantly smaller.

x The research is based on a significant sample size (n = 3,988) compared to existing studies which have been limited to, at most, a few hundred individuals.

Page 10: REPORT 3: PREDICTING PROBLEM GAMBLERS: …...way gambling behaviour and specifically problem gambling behaviour is understood. Furthermore, new insights into gambling behaviours have

Definitions & Assumptions These terms have the following meaning within this document:

Term Definition ABB Association of British Bookmakers (http://www.abb.uk.com/) AUC Area Under the Curve. This is a measure of the accuracy of the

predictive model. A model which makes random guess would have an AUC score of 0.5. A model which produces a 100% accuracy would produce an AUC score of 1.0. A full description is provided in the Methodology section. When calculating the improvement of a model, the minimum value of 0.5 for an AUC score is taken into consideration. Therefore if Model A has an AUC score of 0.55 and Model B has an AUC score of 0.60. We would calculate the improvement as ((0.60-0.50)-(0.55-0.50))/(0.55-0.50) = 100%

B2 B2 is a category of game available on the Gaming Machines studied in this research project. The key characteristics of a B2 games are that the maximum stake is £100, the play cycle must last at least 20 seconds and the maximum prize is £500.

B3 B3 is a category of game available on the Gaming Machines studied in this research project. The key characteristics of a B3 games are that the maximum stake is £2, the play cycle must last at least 2.5 seconds and the maximum prize is £500.

Baseline Within this report we have developed baseline models which are used as a reference point for comparing the improvements generated by the models created in this report.

Bet This relates to a single game on the Gaming Machine where the player has staked a particular amount.

DCMS Department of Culture, Media and Sport False Negative A false negative is a problem gambler who has been incorrectly

identified as a non-problem gambler. False Positive A false positive is a non-problem gambler who has been incorrectly

identified as a problem gambler. Gaming Machine A touch screen electronic gaming machine featuring both B2 and

B3 category games as defined by DCMS regulation. 1 High volatility game A game in which the prizes are in-frequent and higher amounts

LBO Licenced Betting Office

1 These Gaming Machines are colloquially and unofficially known as FOBTs, or Fixed Odds Betting Terminals. A more detailed description of the type of games available on Gaming Machines, and the differentiation between B2 and B3 games, will be provided in the next report.

Page 11: REPORT 3: PREDICTING PROBLEM GAMBLERS: …...way gambling behaviour and specifically problem gambling behaviour is understood. Furthermore, new insights into gambling behaviours have

Low volatility game A game in which the prizes are frequent and lower amounts than

Month(ly) A calendar month, for example, 1-Sept to 30-Sept 2013 PGSI Problem Gambling Severity Index Playing Day Day of the week in which a registered player has had at least one

session on a machine. RGSB Responsible Gambling Strategy Board (http://www.rgsb.org.uk/) ROC Receiver Operating Characteristic This is a commonly used

graphical method to show the performance of a predictive model. A full description is provided in the Methodology section.

Sensitivity that is the proportion of correctly identified problem gamblers.

Session A continuous period of machine activity from a player. Specificity Equ

that is, the proportion of correctly identified non-problem gamblers. Stake Amount of money the customer is risking on a bet. Time Periods All time periods are in this report are either shown using the unit of

days, or in the format D.hh:mm:ss where D is the total number of whole days and hh, mm, ss represent the number of hours, minutes and seconds respectively.

True Negative A true negative is a non-problem gambler who has been correctly identified.

True Positive A true positive is a problem gambler who has been correctly identified.

Week(ly) The period from Monday to Sunday.

Page 12: REPORT 3: PREDICTING PROBLEM GAMBLERS: …...way gambling behaviour and specifically problem gambling behaviour is understood. Furthermore, new insights into gambling behaviours have

Gaming Machines in Great Britain The focus of the Machines Research Programme commissioned by the Responsible Gambling Trust was to understand if harmful and non-harmful play could be distinguished on the gaming machines operated by Licenced Betting Offices in Great Britain (England, Scotland and Wales).

Great Britain has one of the most gambling diverse markets in the world, with a wide range of gambling channels, including Retail (e.g. Casinos, Licenced Betting Offices, Bingo Halls, and Arcades), Internet, Mobile, and Telephone betting. There is also a wide range of products offered, including but not limited to, Sports Betting, Casino Games, Poker, Bingo, Lottery and Scratch Cards.

Licenced Betting Offices (LBOs) are retail premises that offer facilities to place a bet; that is, making or accepting a bet on the outcome of a race, competition, or other event. As of October 2014, there are 9,508 registered LBO premises in Great Britain, a majority of which are located on the high-street and residential areas. The five largest operators of LBOs are Betfred, Gala Coral, Ladbrokes, Paddy Power and William Hill. Within their retail premises, these operators generate -The-

Licence Betting Office is restricted to four physical Gaming Machines. Each of the five operators also provides remote gambling services. In many town centres, it is not uncommon to see a combination of LBOs located within a short distance from each other. For a detailed analysis of the spatial distribution of the LBOs, please refer to the contextual report generated by Geofutures published as part of this research programme.

There are two primary gaming machine suppliers in the Great Brian LBO market: Scientific Games and Inspired Gaming. The Gaming Machines are modern gaming terminals offering graphically rich content across a number of different game types. Images of machines from the two primary suppliers are shown in Figure 2.

Figure 2 - Gaming Machine terminals that generated the data studied in this research. Inspired gaming terminals are shown on the left and a Scientific Games terminal shown on the right.

The games offered by a Gaming Machine can fall into the following categories defined by regulation: B2, B3, B4, C and D. A significant proportion of the stakes placed on the gaming machines are from games that fall into the B2 or B3 categories. Full technical

Page 13: REPORT 3: PREDICTING PROBLEM GAMBLERS: …...way gambling behaviour and specifically problem gambling behaviour is understood. Furthermore, new insights into gambling behaviours have

standards for these categories can be found on the Gambling Commissions website2, but a summary of the salient points is provide below:

x B2 Category Games. These games have a maximum stake of £100 and a maximum prize of £500. The game cycle must last at least 20 seconds. The most popular style of B2 Category game is Roulette.

x B3 Category Games. These games have a maximum stake of £2 and a maximum prize of £500. The game cycle must last at least 2.5 seconds. The typical B3 style

The analysis completed as part of this research did not explicitly look at the difference between B2 and B3 playing characteristics. However the inputs into the predictive models included metrics about the proportion of bets on the different content categories.

One of the key contributors to the successful completion of this research programme was access to player card data. There is no regulatory requirement for Gaming Machine operators to monitor which players are using their gaming machine products. Therefore, at the time when this research was commissioned, the player cards had been implemented as loyalty card schemes to facilitate player insight and marketing. This has meant that the operators have independently implemented their own schemes with different degrees of data capture and data quality. However, there are some commonalities when the same Gaming Machine supplier has been used between

been run by Ladbrokes since 2008. Both Paddy Power and William Hill introduced their loyalty schemes in 2013, and Gala Coral introduced their Coral Connect scheme early in 2014. An example of the player cards provided by the industry is shown below in Figure 3. Some of the loyalty card programmes bridge the gap between retail and remote gambling, enabling the players to transfer funds between the channels. For the time period studied in this research, loyalty card data was only available from Ladbrokes, Paddy Power and William Hill.

2 http://www.gamblingcommission.gov.uk/shared_content_areas/gaming_machines_technical_stan.aspx

Page 14: REPORT 3: PREDICTING PROBLEM GAMBLERS: …...way gambling behaviour and specifically problem gambling behaviour is understood. Furthermore, new insights into gambling behaviours have

Figure 3 - Example Loyalty Cards from Ladbrokes, William Hill, Gala Coral and Paddy Power.

Page 15: REPORT 3: PREDICTING PROBLEM GAMBLERS: …...way gambling behaviour and specifically problem gambling behaviour is understood. Furthermore, new insights into gambling behaviours have

Methodology In this section, a brief overview is provided on the methodology and techniques used to analyse the data to answer the research question. The aim of this section is to provide enough background knowledge to the reader to aid in the interpretation of the results presented later in the report. We also explore some of the complexities of the data modelling approach in the context of distinguishing between harmful and non-harmful gaming machine play. The approach used to examine the performance of the predictive models is provided and explains how different trade-offs can be made when applying predictive models in an operational environment.

The data analysed in this process had some unique challenges that need to be considered when designing the methodology. The key challenges were:

x Data volume: Just under 10 billion data records were provided for this analysis. Analysing this volume of data required consideration of how to store, access, and process it efficiently and accurately.

x Data Skewedness: When investigating the data there is often a significant difference between the mean and median values of the data. This shows that there is a small number of extreme values which can alter our perception of what the majority of customers are doing.

x Representativeness of Registered Players: When comparing sessions generated by registered and non-registered players, we observed that registered sessions provided an over-sessions, compared to the entire data set. Detail of this analysis is provided in Appendix C.

Predictive Models In its simplest form, a predictive model takes a range of characteristics as inputs and looks at how well a prediction can be made from them. A predictive model can be as

be male. This predictive model is making an assumption that all males have hair shorter than 5cm. In the real-world, inaccurate predictions would be produced by only applying this rule. Increasing the number of input characteristics and the complexity of interpreting the relationships within the data allow accuracy to be improved.

Terminology To measure the quality of a predictive model, the target that we are trying to predict needs to be defined. For this research, we have defined our target as predicting problem gamblers. In predictive modelling tthat we have correctly identified someone as a problem gambler and a negative

-problem gambler. When

we have four metrics to quantify the quality of the output:

x True Positive: The correct identification of a problem gambler. x True Negative: The correct identification of a non-problem gambler.

Page 16: REPORT 3: PREDICTING PROBLEM GAMBLERS: …...way gambling behaviour and specifically problem gambling behaviour is understood. Furthermore, new insights into gambling behaviours have

x False Positive: The incorrect identification of a non-problem gambler as a problem gambler.

x False Negative: The incorrect identification of a problem gambler as a non-problem gambler.

In this report, these results are presented as a rate. This enables us to understand the proportion of problem gamblers/non-problem gamblers that would be identified (either correctly or incorrectly). The rates are defined as:

True Positive Rate (TPR)

True Negative Rate (TNR)

False Positive Rate (FPR)

False Negative Rate (FNR)

The objective of the predictive model is to maximise the true positive and true negative rates while minimizing the false positive and false negative rates. It is useful to note that there is a relationship between the rates, such that:

x True Positive Rate = 100% False Negative Rate x True Negative Rate = 100% False Positive Rate

As these variables are related, if we achieve a high true positive rate we also have a low false negative rate.

In Report 2 of this research programme, and also in other problem gambling literature,

definitions using the terminology introduced above:

x Sensitivity = True Positive Rate x Specificity = True Negative Rate

Page 17: REPORT 3: PREDICTING PROBLEM GAMBLERS: …...way gambling behaviour and specifically problem gambling behaviour is understood. Furthermore, new insights into gambling behaviours have

Model Development Approach To process of turning raw data into a predictive model takes a number of defined steps. This process is described below:

1. Data Validation and Preparation In this step, the received data was validated to ensure it is consistent with expectations (e.g. the format, number of records, etc.) and converted into a common format so that the data from each supplier could be analysed as a whole, rather than independently. In this step the proxy session algorithm was also used to assign individual machine events to a player session.

2. Variable Calculation In this step, a number of session and player level variables were calculated. The majority of the variables calculated were based on the theoretical makers of harm identified in Report 1. Analysis of each of the variables is provided in Appendix B of this report.

3. Data Pre-processing In this step the variables were transformed using a number of approaches to help the predictive models distinguish between different types of behaviour. More detail of these transformation approaches are provided below.

4. Dataset Selection When verifying or testing a predictive model it is important that the data used to build the model is not included. The entire data is therefore divided into three separate datasets for training, verification, and testing. Depending on the analysis, players or sessions were randomly allocated to the different datasets. In this research project, the training dataset contained 50% of the data, and the remaining data was allocated to both the verification and test datasets, which received 25% each.

5. Model Training In this step, the predictive modelling algorithms analyse the available data and determine which player patterns are most likely to relate to the target variable: in this case, our problem gambling label. In this process a number of models were generated, relating to different algorithms, parameters for those algorithms, input data, and transformations to the data. A range of predictive modelling algorithms used by RTI is described in Appendix D.

6. Model Validation Within the data validation phase, we examined all of the predictive models which had been trained to see which had the best performance on the validation data set.

7. Model Testing To confirm the accuracy of the model and to help understand its capabilities, it was testing on the final test dataset.

8. Cross validation Finally, to ensure that the original random allocation of data

repeated the training and testing phases using the model specification that delivered the best accuracy. The details of the validation process was a 10-fold cross validation. Cross validation is the process whereby the data is

of positive and negative examples. K-1 buckets are then used to train the model and then the accuracy of the model is tested on the remaining bucket.

testing.

The process described above has been designed to avoid one of the key issues of -

number of input variables, such as were present in this research project.

The concept of over-fitting when building a predictive model relates to the performance of the model when it is applied to new data that was not used in the training process.

Page 18: REPORT 3: PREDICTING PROBLEM GAMBLERS: …...way gambling behaviour and specifically problem gambling behaviour is understood. Furthermore, new insights into gambling behaviours have

Ideally, when a model is built it will learn patterns which define a generalised relationship between the inputs (gambling data) and the target output (problem gambling). When generalised relationships are identified the model should produce consistent performance across the test data and new data as it is processed. In

exist only within the training data. The model is then expecting to see these same

performance will be impacted.

Target Variable In this report we examine two different outcomes:

x Whether someone is a problem gambler x Whether a session of play comes from a problem gambler.

To define a problem gambler, we are utilising the Problem Gambling Screening Index obtained from the survey results in the loyalty card data. Participants in the survey who

-in the survey and how the score is derived is provided in the following section.

Determining whether someone is a problem gambler is important for understanding the utility of using loyalty cards to collect data across multiple gambling sessions. To determine whether someone is a problem gambler we have analysed all of the data associmeasure the accuracy of this prediction, we compare the predicted label to the actual label for that player.

Determining whether a session comes from a problem gambler is important for understanding how well problem gambling can be identified when a loyalty card is not used. A majority of the data currently being generated by the gaming machines is not linked to a loyalty card, highlighting the need for this analysis. To determine whether a session of play comes from a problem gambler we analyse the activity associated with that session and generate a single prediction. To measure the accuracy of this prediction we compare the predicted label to the problem gambling label for the player who generated the session. The limitation with this approach is that we are making an assumption that every time a problem gambler plays on the machines they are exhibiting problem gambling behaviours.

Data Pre-processing Before producing the predictive models, a number of pre-processing tasks were tested to see if they could improve the performance of the model. The pre-processing tasks involved transforming the input values so that they could be compared in different ways. This is often required, as it is in our case, when there is a non-linear relationship between the input data and what is being predicted. As an example, a £2 increase in stake might have a different predictive ability if the increase is from £2 to £4, compared to an increase from £20 to £22. This effect is further confounded in our case where the data is highly skewed with a small number of significant outliers.

Page 19: REPORT 3: PREDICTING PROBLEM GAMBLERS: …...way gambling behaviour and specifically problem gambling behaviour is understood. Furthermore, new insights into gambling behaviours have

A range of transformations have been tested in the model building process. Each of these is listed below with a brief explanation:

x Unmodified No modification is made to the underlying variable and it is fed directly into the model.

x Absolute Value When a variable takes on negative and positive values, the absolute value can be taken to reduce the range from zero to a positive

can be taken to change the meaning of the value. For instance, if a player has three sessions with net expenditure of -£200, -£5 and £250, taking the absolute value transforms this data to £200, £5 and £250. For the original data set we could interpret this as two losing sessions and one winning session. With the transformation the interpretation changes to one session with minimal change in financial outcome and two sessions with a large change in financial outcome.

x Winsorized This is a process where extreme values or outliers are removed from the output to prevent those values from dominating the patterns learnt by the model.

x Log For variables where there is a large number of samples that take on small values, and there is a small number of samples which take on high values, taking the log of these values can provide a more informative scale for a predictive algorithm. As an example, if we had the raw values 10, 100, 1000

and 3 as inputs into the predictive algorithm. x Zero Indicator For variables that have dominating proportion of zeroes (e.g.

number of games at the highest stake, amount won in a session, etc.), the hat has a

category for zeroes, and the rest of the values are aggregated based on the quartiles of the remaining data.

x Grouped This is the process where a variable which can take on many values (such as the amount won on a game) is reduced down into a smaller number of groups (such as a small, medium and large win). In statistics, this

Model Complexity vs Model Information When building a predictive model, there is often a trade-off to be made between the underlying complexity of the patterns and the degree of explanation that can be extracted from the model. In this research, we have used a hierarchy of models based on different predictive algorithms that will lead us to a more accurate separation between problem and non-problem gamblers.

Measuring Model Performance To understand the performance of the predictive models we have used the Receiver Operator Characteristic (ROC) Curve. The ROC curve was first used in World War II for the analysis of radar signals and today is commonly used to evaluate the performance of machine learning techniques. Figure 4 provides an example ROC curve with the performance of three models included. The false positive rate is plotted on the

Page 20: REPORT 3: PREDICTING PROBLEM GAMBLERS: …...way gambling behaviour and specifically problem gambling behaviour is understood. Furthermore, new insights into gambling behaviours have

horizontal axis and the true positive rate is plotted on the vertical axis. This enables us to compare the performance of correctly identifying problem gamblers (the true positive rate) against the impact of incorrectly identifying non-problem gamblers as problem gamblers (the false positive rate). The points on each of the curves correspond to the performance of the model at different thresholds. A threshold is used to make the discrimination between a problem and non-problem gambler (e.g. players scoring higher than the threshold are labelled as problem gamblers). The higher thresholds are on the left of the curve, and decrease as you follow each of the curves to the right. In an ideal world, we would produce a model as close to possible to point A in the figure. Generating models that perform near this point is very rare. This is a model operating position which has a true positive rate of 100% and a false positive rate of 0%, perfectly distinguishing between problem and non-problem gamblers. At points B and C, we are operating at the two extremes of either predicting everyone to be a non-problem gambler (point B) to predicting everyone to be a problem gambler (point C).

Model 3 has been included in this diagram to show what would be achieved by just measuring the accuracy of a models whose output was a random guess. If a model has a similar performance to Model 3, it shows that the input variables could not be used by the predictive modelling algorithm to make an informed prediction. Ideally, we want our models to deliver a performance as far away from this as possible. Models 1 and 2 are two models that are able to make informed predictions. In this case Model 1 is outperforming Model 2. This can be seen in the figure as the curve corresponding to Model 1 is higher than the curve for Model 2.

By having Model 1 and 2 on the same ROC curve it is possible to see the impact of the improved accuracy between the models. If previously we had been using Model 2 to identify problem gamblers, we could have decided to operate at point D. By looking at the vertical and horizontal axes we can see that this point generated a true positive rate of 60% and a false positive rate of 30%. That is 60% of the problem gamblers where correctly identified and 30% of our non-problem gamblers where incorrectly identified as problem gamblers.

If we now want to move to our more accurate Model 1, we have two choices. Firstly, we could decide to move to point E. This would enable us to identify the same proportion of problem gamblers (60%) but instead we would reduce the incorrect classification of non-problem gamblers from 30% to 10%. Alternatively, we could decide to move to point F. This would enable us to keep the same false positive rate (30%), but we would be correctly identifying a higher proportion of problem gamblers (improving from 60% to 90%).

Alternatively, it would be possible to trigger one intervention for customers that have a score at or above point E (where we are more confidently identifying problem gamblers) and then a second, potentially softer intervention for customers who fall between the boundaries of points E and F.

Finally, to compare the different models it is useful to have a single figure which describes their performance. In this report we have used the Area Under the Curve (AUC) metric. This value ranges from 1 (a model which produces no errors) to 0.5, the performance of Model 3. In our example, Model 1 has an AUC value of 0.85 Model 2 has an AUC value if 0.70.

Page 21: REPORT 3: PREDICTING PROBLEM GAMBLERS: …...way gambling behaviour and specifically problem gambling behaviour is understood. Furthermore, new insights into gambling behaviours have

Figure 4 Example Receiver Operator Characteristic Curve

0%

20%

40%

60%

80%

100%

0% 20% 40% 60% 80% 100%

True

Pos

itive

Rat

e

False Positive Rate

Model 1 Model 2 Model 3 (Random)

A

B

C

DE

F

Page 22: REPORT 3: PREDICTING PROBLEM GAMBLERS: …...way gambling behaviour and specifically problem gambling behaviour is understood. Furthermore, new insights into gambling behaviours have

Data In this section, a description of the data used to complete this research is provided. This covers both the data obtained from the industry along with the loyalty card survey

One key conclusion about the data used as part of this project is that although it is significant in volume (over 9.5 billion individual gaming machine events3), the breadth of the variables in each of these events is limited. For each event we only know the time when the event took place, the location where it took place, the type of event (cash in, cash out, bet, win), the game being played, and the value of the transaction (e.g. the amount staked or the amount won).

Industry Data The data used to generate this report was supplied by the five major Licensed Betting Offices in the UK (Betfred, Coral, Ladbrokes, Paddy Power and William Hill), and their gaming machines suppliers (Inspired Gaming and Scientific Games). The relationship between the LBOs and the gaming machine suppliers is shown in the table below:

Gaming Machine Supplier Licensed Betting Offices Inspired Gaming 1. Betfred

2. Paddy Power 3. William Hill

Scientific Games 1. Coral 2. Ladbrokes

Data Request Timeframe The time period covered by the data used in the research is 10 months from 1 September 2013 to 30 June 2014. For the initial evaluation on the suitability of the

available (1 September 2013 to 30 November 2013).

It is important to note that the initial sample of loyalty card holders to survey was drawn from the first 3 months of data used in the evaluation phase.

Data Attributes The primary set of data requested from the industry (where available) was:

1. Players The attributes recorded for each registered player. 2. Shops The attributes for each store which contains a Gaming Machine,

3. Machine Events The transactional data captured on each Gaming

Machine, for example: records relating to players putting money in and taking money out of the machine, placing bets, and associated winnings.

3 A machine event refers to action which is recorded on the machine, such as a note being inserted, a bet placed or money being won.

Page 23: REPORT 3: PREDICTING PROBLEM GAMBLERS: …...way gambling behaviour and specifically problem gambling behaviour is understood. Furthermore, new insights into gambling behaviours have

4. Games The games available on each Gaming Machine, in particular the legal category of the game, the type of game, and the theoretical RTP (Return to Player) at different stake levels.

The secondary data request was also comprised of:

1. Payment data Transactions relating to debit card transactions (both failed and successful) which are used to fund activity on the machines.

2. Self-Exclusion Registered players who have self-excluded. 3. Player Limits Registered players who have specified limits on their play. 4. Responsible Gambling Any information relating to players

receiving/viewing literature related to problem gambling. 5. Online Transfer

account. 6. Sports Trading Data The number of bets, turnover and winnings per

day per shop. 7. Customer Contact Data associated with contact with registered players,

e.g. marketing material, complaints, etc. 8. Promotions Data relating to player bonuses, free bets, etc. 9. Player Surveys Info

players. 10. Market reviews Reports and literature relating to the impact of

promotions.

Data Received Featurespace received varying levels of granularity in the data from each of the operators based on the underlying sophistication of their Gaming Machine offering. Rather than detail the specifics of the data received from individual operators, general parameters are listed below. Overall, we received data relating to:

x 333,091 uniquely identifiable customers4 x 8,289 unique shops x 32,650 unique Gaming Machines x 9,550,448,367 analysed machine events, including 6,768,053,704 bets x 661 different games5

Specifically for the surveyed customers we had

x 3,988 loyalty card holders x 4,374 unique shops that these players gambled at x 524,277 gaming machine sessions x 35,668,298 bets placed

4 This does not mean that the study corresponds to 333,091 individuals. An individual person may have relationships with more than one operator or may have used different loyalty cards during the three month period. 5 This is the number of unique games across the suppliers, but they may have very similar games across both (e.g. Roulette will be included twice, once for its Inspired Gaming implementation and once for its Scientific Games implementation).

Page 24: REPORT 3: PREDICTING PROBLEM GAMBLERS: …...way gambling behaviour and specifically problem gambling behaviour is understood. Furthermore, new insights into gambling behaviours have

Some important notes about the received data:

x During the period for which we obtained data, Betfred did not operate a loyalty scheme. Therefore no Betfred customers were included in the loyalty card survey.

x During the initial 3 months for which data was requested, Coral did not operate a loyalty scheme but had an internal system for recording repeat activity of customers. These labels have been included when calculating the between session metrics for the evaluation of the industry data. No Coral customers were included in the loyalty card survey.

x Paddy Power and William Hill introduced their loyalty card schemes in 2014 and therefore some of the behaviours associated with the early period of received data may not be indicative of long term customer usage. The loyalty scheme used by both Paddy Power and William Hill collects minimal information about the player (e.g. only their mobile phone and/ on registration, so are not guaranteed to be accurate.

After receiving the data, Featurespace transformed the data from the gaming machine suppliers and the operators into a common format so that data could be used to calculate metrics across the entire data set.

Data Quality In general, no significant data quality issues have been identified that would invalidate the results produced as part of this research. Minor issues were experienced, and a brief description is provided below. In most of the cases, we have been able to either work around the issue or have excluded the problematic data from the analysis. In future research, resolving these data issues may enable the performance of the predictive models to be improved.

x Debit Cards It was known at the start of the project that there was not a precise method for matching transactions on the gaming machines to debit card transactions recorded by the electronic point of sale systems. A precise match is not possible as these two systems work independently. After a player has made a deposit, either the full or partial amount is manually transferred onto the machine by a member of staff. To identify debit card

occurred in the same shop, at approximately the same time and for approximately the same value. This requirement to search introduces a degree of error within the data.

x Online Transfers We needed to further clarify with the operators how the transfers between funds used on the gaming machines to online accounts can be accurately matched. Within the time-explored.

x Operator and Supplier Data Matching We have experienced issues with matching identifiers for players and shops in the data supplied by the machines suppliers and the operators. This meant for a small subset of the data a full set of attributes could not be obtained.

x Timestamps We were made aware by one of the suppliers that they experience errors with the timestamps associated with the recorded machine

of the research presented in this report.

Page 25: REPORT 3: PREDICTING PROBLEM GAMBLERS: …...way gambling behaviour and specifically problem gambling behaviour is understood. Furthermore, new insights into gambling behaviours have

It is also worth noting that some attributes are likely to be over-represented by loyalty card contextual research completed as part of the overall research program, we know that approximately one in three individuals has multiple loyalty cards. However, and more importantlwhen they play on the machines so we are only capturing a portion of their gaming machine activity.

Survey Data To complete the data set required to undertake the research, the Loyalty Card Survey data generated by NatCen was merged with the industry data. Full details of the survey process and a complete analysis of the survey data is provided in Report 2.

The most important component of the loyalty card survey data that is used in this report is the Problem Gambling Screening Index (PGSI). The Problem Gambling Screening Index is generated by asking the following nine questions to the loyalty card holder about their gambling activity over the last 12 months:

1. How often have you bet more than you could really afford to lose? 2. How often have you needed to gamble with larger amounts of money to get the

same feeling of excitement? 3. How often have you gone back another day to try to win back the money you

lost? 4. How often have you borrowed money or sold anything to get money to

gamble? 5. How often have you felt that you might have a problem with gambling? 6. How often have people criticized your betting or told you that you had a

gambling problem, regardless of whether or not you thought it was true? 7. How often have you felt guilty about the way you gamble or what happens

when you gamble? 8. How often has your gambling caused you any health problems, including

stress or anxiety? 9. How often has your gambling caused any financial problems for you or your

household?

For each question the participant can select from the answers in the table below. The table also shows the score associate with each response.

Response Score Almost always 3 Most of the time 2 Sometimes 1 Never 0

Table 1 - PGSI Question Responses and Scores

To generate the PGSI score, the individual scores from the responses to the questions are summed together. The possible range of PGSI scores therefore ranges from 0 to 27. The PGSI specification has provided thresholds with which to label different severities of gambling related risk as listed in Table 2. This table also shows the number of survey participants that fall into each category.

Page 26: REPORT 3: PREDICTING PROBLEM GAMBLERS: …...way gambling behaviour and specifically problem gambling behaviour is understood. Furthermore, new insights into gambling behaviours have

Gambler Types PGSI Score Survey Participants

Problem Gambler 8 or above 951 Moderate Risk Between 3-7 1025 Low Risk Between 1-2 923 Non-Problem Gambler 0 1089

Table 2 - Gambler Types as defined by the PGSI Score and the number of survey participants that fall into each category.

Proxy Sessions The gaming machines operated by the LBOs in Great Britain to do not require a player to insert their loyalty card before they begin playing. It is possible for the player to insert or withdraw their card at any point of their session. Therefore if a session is only defined during the period when the card was actually inserted, it is possible for some player activity to be excluded from our analysis. To overcome this problem, an algorithm was developed to predict when sessions started and ended on the machine. A session generated by this process is referred to as a proxy session. A loyalty card player is then associated with each proxy session where their card was inserted for at least one event within the calculated proxy session.

Using a proxy session to identify player sessions has limitations but on the whole we

research. While full details of how the proxy sessions are determined is provided in Appendix A, the key variables used in this process are the balance of the machine, the time since the last event on the machine and the type of event taking place. For example, a cash-in event is more likely to indicate the start of a session than is a stake event.

Page 27: REPORT 3: PREDICTING PROBLEM GAMBLERS: …...way gambling behaviour and specifically problem gambling behaviour is understood. Furthermore, new insights into gambling behaviours have

Measurement of Harm Markers The first step in this research programme generated a list of the many theoretical markers of harm or patterns of play that might indicate that someone had problems with their gambling. The evidence review suggested that these were all plausible but it wapatterns look like. The second step of the research program was to complete a preliminary analysis of industry data to see if it was possible to calculate these metrics from industry data and if these metrics showed sufficient statistical variance to suggest that by applying predictive modelling we would be able to distinguish between harmful and non-harmful play.

The preliminary data analysis contained 2.6 billion data records of gaming data from the period 1 September 2013 to 30 November 2013. The markers in this report are broken down into two categoriesterm behaviour across multiple gaming sessions; and 2) within session markers which

From this preliminary analysis we did find that it was possible to calculate a large proportion of the metrics and that they demonstrated sufficient statistical variance to give us confidence that predictive modelling would be successful.

A summary of the characteristics found for each of the markers is provided below. A full description of this analysis is provided in Appendix B. Within the scope of the preliminary session we also examined the representativeness of registered sessions (where a loyalty card is present) compared to unregistered sessions (where a loyalty card is not present). We found that registered sessions over-sessions, in that they are often longer and involve more money and bets. A full description of this analysis is provided in Appendix C.

Between-Session Markers These metrics included frequency and duration of play, net expenditure, levels of play engagements, number of activities/games types undertaken, and chasing. Based on these metrics, we can construct a view of the behaviour of a typical player compared to that of a 90th percentile player (those who experience the most losses). This provides a snap-shot view of both average and extreme play. Use of player loyalty cards, some of which are employed only once, impacts the calculation of these figures downwards so that values may seem low.

Page 28: REPORT 3: PREDICTING PROBLEM GAMBLERS: …...way gambling behaviour and specifically problem gambling behaviour is understood. Furthermore, new insights into gambling behaviours have

Marker Results for a typical player in a 3-month period

Frequency of Play 5 sessions

Duration of Play Average session length 0:12:53

Net Expenditure Loss of £24.33

Number of Activities/Games Types Undertaken

Usually 1 game per session, with 70% of bets placed on a favourite game and 87% of bets placed on a B2 game.

Chasing Average of 3 losing sessions, but correlation between a winning/losing session and behaviours in subsequent sessions not yet established.

Table 3 - Values for the Between Session Markers at the median.

Marker Results for variables at the 90th percentile in a 3-month period

Frequency of Play 40 sessions

Duration of Play Average session length 1:08:06

Net Expenditure Loss of £776.09

Number of Activities/Games Types Undertaken

Usually 3 unique games per session and 17 unique games over the entire period

Chasing Average of 26 losing sessions, but correlation between a winning/losing session and behaviours in subsequent sessions not yet established

Table 4 - Values for the Between Session Markers at the 90th percentile.

Examination of these markers has revealed:

x A majority of the players exhibit minimal values (such as low values of stakes, session length and games played). For example, a typical player plays 5 times a month, while only 1 in 10 will play 40 times a month. The median player loss is £24.33 over a three month period, whereas 1 in 10 will lose £776.09.

x For all metrics, there are minimal circumstances where metrics have extreme values. For some of these cases it is difficult to determine exact usage: the card may be used by one person, or shared, or mistakenly left in the machine.

x For most of the variables we generally see an exponential distribution as the values increase. This means that small values are very common and large values placed a long way from the average are very rare. In other words, values around 15 are five times more common than values around 25, and values around 25 are 5 times more common than those around 35.

Page 29: REPORT 3: PREDICTING PROBLEM GAMBLERS: …...way gambling behaviour and specifically problem gambling behaviour is understood. Furthermore, new insights into gambling behaviours have

Within-Session Markers These metrics included debit card reloading and switching, variability in staking behaviour, stake size, game volatility, and the way a given game is played. Again, we can construct views of a typical and a 90th percentile player based on these metrics.

Marker Typical results for individual sessions over a 3-month period

Debit Card Reloading and Switching

Approximately 2% of sessions involve a debit card.

Variability in Staking Behaviour The median total value from 8 bets was £29

Stake Size Average stake £3.53, with a median minimum of £1.80 and median maximum of £5.40. 16 bets typically staked at lower amount.

Game Volatility Low volatility games preferred (67% of sessions)

Way Game is Played 0:03:52 session length; player cashes in £12.30 and loses £3.50 and is likely to play a single game only.

Table 5 - Values for the Within Session Markers at the median.

Marker Typical results for individual sessions at the 90th percentile over a 3-month period

Debit Card Reloading and Switching

Approximately 2% of sessions involve a debit card.

Variability in Staking Behaviour The median total value from 86 bets was £400

Stake Size Average stake £21.18, with a minimum of £10.00 and maximum of £37.60.

Way Game is Played 0:23:36 session length; player cashes in £100.00 and loses £60.00 and is likely to play a single game only.

Table 6 - Values for the Within Session Markers at the 90th percentile.

For the Within Session Markers, player behaviour for both typical and 90th percentile players has been constructed for those markers where results were conclusive. In summary x Again, the majority of players exhibit minimum values for most variables. A typical

player places 8 bets over a session totalling £29 in stakes, while only 1 in 10 players will place 86 bets totalling £400 over a session. Examination of stake size strengthens this observation: a typical stake for the majority of players is £3.53, or at most £5.40, while 1 in 10 players will stake £21.18, or a maximum of £37.60.

x By examining over 80 variables and the range of values observed for each, Featurespace was able to construct a very detailed picture of the variety of activity possible within the machines. This means that, for a large majority of the variables

Page 30: REPORT 3: PREDICTING PROBLEM GAMBLERS: …...way gambling behaviour and specifically problem gambling behaviour is understood. Furthermore, new insights into gambling behaviours have

measured for each of the markers, there is sufficient variation which enables the behaviour of sessions to be differentiated and characterised.

In summary, we successfully demonstrated in this preliminary analysis that:

x It is possible to use industry data to measure markers of theoretical harm x The distribution of values derived from these markers shows potential for being

able to differentiate between harmful and non-harmful gaming machine play.

Page 31: REPORT 3: PREDICTING PROBLEM GAMBLERS: …...way gambling behaviour and specifically problem gambling behaviour is understood. Furthermore, new insights into gambling behaviours have

Data Analysis Results

In this section results are presented in three main parts:

1. Baseline We establish baseline models so we can use this to compare our new predictive models against, measuring how well they perform in predicting problem gamblers or problem sessions.

2. Player Analysis We present predictive models developed to identify whether someone is experiencing gambling problems or not

3. Session Analysis We present predictive models showing whether a session is likely to be from a problem gambler or not.

4. Additional experiments We present the results of further investigations into the data to understand what elements of the problem gambling are more predictive than others.

A discussion of the results is contained in the next section.

-problem gambler. Therefore a true positive implies the correct identification of a problem gambler and a false-positive implies the incorrect classification of a non-problem gambler as a problem gambler. Likewise, a true negative implies the correct identification of a non-problem gambler and a false-negative implies the incorrect classification of a problem gambler as a non-problem gambler.

When contextualising the results presented, it is important to remember that the players analysed in this research represent a heavily skewed subset of very engaged loyalty card holders. Therefore the performance of the models is conservative, and if models of this type are operationalised higher accuracy rates would be expected.

Baseline To be able to interpret the results of the analysis a baseline is required with which a comparison can be made. The baseline has been established using principles from the Code of Conduct rolled out by the Association of British Bookmakers (ABB) in March 20146.

This Code itself consists of multiple elements of harm minimisation, one of which is reiterated here:

Provid ing customers with new tools such as manda tory time and money based reminders, the ab ility to set spend and time limits on gaming machines and to request machine session da ta;

have been set when a player exceeds £250 of spend or 30 minutes of session length.

6 http://www.abb.uk.com/code-of-conduct/

Page 32: REPORT 3: PREDICTING PROBLEM GAMBLERS: …...way gambling behaviour and specifically problem gambling behaviour is understood. Furthermore, new insights into gambling behaviours have

Spend is defined as the total amount of money which has been deposited into the -up is generated, but the player can still

continue playing. In the remainder of this subsection we define a player to have received this intervention if they experience this mandatory pop-up.

refers to the total amount of cash that has been loaded into the machine. In the figures below we will refer to Cash-In rather than spend so that is consistent with the terminology used throughout the report.

The selection of this Code as a baseline is advantageous as it has been implemented with the same gambling products and environment in which this analysis took place. It is acknowledged that the Code of Conduct was derived from best practice rather than a quantitative analysis similar to that which has been conducted as part of this research programme. Furthermore the analysis performed on the Code is not intended as an evaluation of the C -in-the-Featurespace have been engaged by the Responsible Gambling Trust in a separate project to complete an early impact study of the Code. The outcome of this evaluation will be published after the release of this research.

The baseline has been established on both a player and session to match the two types of models described in this report. For both models we have examined the performance of session cash-in and length, independently and in combination. When combining the two variables, Logistic Regression has been used to produce a single score to which a threshold can be applied to distinguish between problem gamblers and non-problem gamblers.

There are also some key differences to bear in mind when reviewing the results below in the context of the ABB implementation:

x The player baseline figures are based on average values over all of the essions. The ABB pop-up is triggered when one session goes above

the £250 limit. As an example, if a player had two sessions, one at £200 and the other at £275 the average cash in value would be £237.50. In our baseline analysis it would be inferred that this player would not receive the pop-up, when in fact they would have received the pop-up on one of their sessions.

x The session baseline figures look at the results of each session independently and therefore report on the percentage of sessions that would receive the pop-up message. Within this analysis presented it is not possible to infer the proportion of problem gamblers that received the pop-up. For example it might be possible that a higher proportion of problem gamblers triggers the pop-up, just which

Each of these points can be further clarified within the scope of the data available, but this analysis is outside the scope of this particular report. The subtle nuances of the implementation of the baseline and its interpretation are important to understand when contextualising the results presented in this report.

To benchmark the analytical models developed we will look at comparative detection rates of the new models compared to the baseline (e.g. we will be able to see how many more problem gamblers could be detected at the same false positive rate). As

each model to enable an immediate comparison. Models that generate higher AUC values are more accurate than those with lower values.

Page 33: REPORT 3: PREDICTING PROBLEM GAMBLERS: …...way gambling behaviour and specifically problem gambling behaviour is understood. Furthermore, new insights into gambling behaviours have

Player Baseline To generate the baseline model for players we initially looked at the session cash-in value and its predictive power. We then took the second element, session length, investigating its predictive power. Finally the two elements were combined to see what performance could be obtained.

To investigate session cash- -in amount over the entire length of their available history. Figure 5 shows the true positive and true negative rates for different average session cash-in values. This figure enables us to assess the accuracy of correctly identifying problem (the true positive rate) and non-problem (true negative rate) gamblers at different threshold values. To obtain the best performance for this model, we want the two detection rates to be at the highest this is the point where the lines cross. The cross over point is a useful metric to compare the predictive power of individual variables. For this variable, the cross-over point occurs at a detection accuracy just below 60% and for an average session cash-in value of £30.

Figure 5 Analysis of average Session Cash-In to identify problem gamblers.

The mandatory limit set by the ABB of £250 shows that 1.3% of problem gamblers would receive the pop-up and 99.2% of non-problem gamblers not receive the pop-up, however 0.8% of the non-problem gamblers would receive the pop-up. In contrast if a threshold of £100 was selected the proportion of correctly identified problem gamblers would increase from 1.3% to 10.7% the proportion of correctly identified non-problem gamblers would reduce from 99.2% to 94.3%. Therefore this means that we are able to accurately identify more problem gamblers, but at the same time we are also incorrectly labelling more non-problem gamblers as problem gamblers.

The next step in looking at the baseline model for players was to examine session length. This is shown in Figure 6. In this case we see that mandatory threshold of 30 minutes identifies a higher proportion of problem gamblers (14.5%), however a lower proportion (86.4%) of non-problem gamblers would avoid being treated, giving rise to a 13.6% false positive rate. The cross-over point for this variable occurs at around 13 minutes, but at a much lower overall detection rate compared to the average session

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 50 100 150 200 250 300Average Session Cash-In Value (£)

Detection Rates against Average Player Session Cash-In

true positive rate true negative rate

Page 34: REPORT 3: PREDICTING PROBLEM GAMBLERS: …...way gambling behaviour and specifically problem gambling behaviour is understood. Furthermore, new insights into gambling behaviours have

cash-in amount as presented above (60% compared to 55%). This tells us that average session cash-in is a better indicator of problem gamblers than session length.

Figure 6 Analysis of the average session length to identify problem gamblers.

So far we have looked at the performance of each measure individually, whereas logistic regression allows us to look at them together. This is shown in Figure 7. The Logistic Regression model produces an output between 0 and 1. The selection of a threshold will produce the detection rates as shown by this figure. The cross over point for this model is at the detection rate of 60%, this shows that be combining the session length variable with the total cash-in variable there is no improvement to our model performance.

00.10.20.30.40.50.60.70.80.9

1

0 10 20 30 40 50 60Time (minutes)

Detection rates and Average Session Length

true positive rate true negative rate

Page 35: REPORT 3: PREDICTING PROBLEM GAMBLERS: …...way gambling behaviour and specifically problem gambling behaviour is understood. Furthermore, new insights into gambling behaviours have

Figure 7 - Analysis of using average session cash-in and average session length in a predictive model to identify problem gamblers

Now we have developed three different player baseline models. These can be compared by plotting the false positive rates for each model against the true positive rate, see Figure 8. As a reminder, the objective is to get the models to be as close to the top left hand corner of the chart as possible. Figure 8 shows that the average session total cash-in variable performs better than the session length variable, which is only performing marginally above the random model. By combining these two variables in our logistic regression model we only get a marginal uplift in performance compared to just using the average session cash-in variable alone.

It is useful in this example to point out how the ROC curve can be used to assess the performance of the models and the trade-offs which need to be made. In the analysis above we concluded that average session total cash-in variable was very similar to the combined Logistic Regression model. The ROC curve corroborates this finding, but it does show that the Logistic Regression model has a marginal increase in the true positive rate in the lower range of false positive rates (between 20% and 40%).

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1Logistic Regression Score

Detection Rate Baseline Logistic Regression Player Model

true positive rate true negative rate

Page 36: REPORT 3: PREDICTING PROBLEM GAMBLERS: …...way gambling behaviour and specifically problem gambling behaviour is understood. Furthermore, new insights into gambling behaviours have

Figure 8 - Comparison of Baseline Player Models

00.10.20.30.40.50.60.70.80.9

1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

True

Pos

itive

Rate

False Positive Rate

Player Baseline Model Comparison

Average Session Cash-In

Average Session Length

Combined Logistic Regression Model (AUC=0.62)

Random

Page 37: REPORT 3: PREDICTING PROBLEM GAMBLERS: …...way gambling behaviour and specifically problem gambling behaviour is understood. Furthermore, new insights into gambling behaviours have

Session Baseline Now that the player baseline model has been developed, we use the same principle to develop a session baseline model. For the session model we are now measuring our accuracy of determining if the session is generated by a problem gambler or not. As each player has generated many sessions we are making many more predictions. It is expected that the performance of a session model will be less than that of the player model as although the session may be from a problem gambler, they might not exhibit any problematic play in each session that they play.

-in amount. Figure 9 shows the true positive and true negative rates based this variable. From the figure we can see that at the ABB limit of £250, 4.0% of problem gambler sessions would receive the intervention and 97.4% of non-problem gamblers would avoid the intervention. The point of cross-over between the two rate curves is just above 50% indicating that this variable has minimal discriminating between problem and non-problem gamblers.

Figure 9 Session Cash-In

The next step in building our session baseline model is to look at session length. The performance of this variable is shown in Figure 10. At the ABB limit of 30 minutes 12.9% of problem gambler sessions receive the intervention and 87.1% of non-problem gambler sessions avoid the intervention. Again the cross-over point between the two rate curves is only marginally above 50%, indicating minimal discrimination between the two categories of players.

00.10.20.30.40.50.60.70.80.9

1

0 50 100 150 200 250 300Value (£)

Detection Rates and Session Cash-In

true positive rate true negative rate

Page 38: REPORT 3: PREDICTING PROBLEM GAMBLERS: …...way gambling behaviour and specifically problem gambling behaviour is understood. Furthermore, new insights into gambling behaviours have

Figure 10 Session Length

investigate their combined predictive power. The performance of this model is shown in Figure 11. Compared to the figures presented previously in this section, the true positive and true negative rate lines show a sharp jump between 0 and 1. This indicates that these variables provide limited discriminatory power. Using the logistic regression model to obtain the same true positive rate as the session length model (12.9%) a threshold of around 0.225 should be selected, this corresponds to a true negative rate of 89.4%, and this represents a 2.3 percentage point improvement.

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 10 20 30 40 50 60Session length (minutes)

Detection Rates and Session Length

true positive rate true negative rate

Page 39: REPORT 3: PREDICTING PROBLEM GAMBLERS: …...way gambling behaviour and specifically problem gambling behaviour is understood. Furthermore, new insights into gambling behaviours have

Figure 11 Logistic Regression Session Model

Now that we have our three models we can compare their performance. This is shown in Figure 12. This comparison shows that the accuracy of each metric is poor, only

Figure 12 - Comparison of Baseline Session Models

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4Score logistic regression

Detection Rate Baseline Logistic Regression Player Model

true positive rate true negative rate

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1Score RF global

Baseline Session Model Comparison

Session Cash-In

Session Length

Combined Logistic Regression Model (AUC=0.52)

Random

Page 40: REPORT 3: PREDICTING PROBLEM GAMBLERS: …...way gambling behaviour and specifically problem gambling behaviour is understood. Furthermore, new insights into gambling behaviours have

Summary In this section we have constructed two baseline models using the two variables session cash-in and session length to identify problem gambling. We have found that predicting problem gambling at the player level is more effective than at the session level. We have also found that using session cash-in is a better predictor than session length. When these two variables are combined, the session length variable only provides a minimal uplift to just using the cash-in variable on its own.

their detection rates. The first row of each table corresponds to a threshold on the two models which generates a 10% true positive rate. The second row of each table corresponds to a threshold on the two models which generates a 25% true positive rate. From these tables we can then compare the false positive rates between the models to see what proportion of non-problem gamblers would be wrongly identified.

Player Model True Positive Rate

False Positive Rate

True Negative Rate

False Negative Rate

Baseline Player Model (low) 10% 5% 95% 90%

Baseline Player Model (medium) 25% 15% 85% 75%

Table 7 - Summary of Baseline Player Model Performance

Session Model True Positive Rate

False Positive Rate

True Negative Rate

False Negative Rate

Baseline Session Model (low) 10% 8% 92% 90%

Baseline Session Model (medium) 25% 22% 78% 75%

Table 8 - Summary of Baseline Session Model Performance

These results highlight the challenge of distinguishing problem and non-problem gamblers and the trade-offs that need to be made. Using the baseline player model, when detecting 25% of the true problem gamblers 15% of the non-problem gamblers will be incorrectly identified. In the following sections we show how the model accuracy can be improved by using additional variables.

Page 41: REPORT 3: PREDICTING PROBLEM GAMBLERS: …...way gambling behaviour and specifically problem gambling behaviour is understood. Furthermore, new insights into gambling behaviours have

Player Analysis (Registered Play) The aim of the analysis in this section is to improve upon the baseline player model to provide a more accurate prediction of a problem gambler. When building the player model we are able to analyse how tdifferent sessions that they complete. The player model can be applied to players who use a loyalty card. To build the improved player model we first look at the variables that can be measured across time (i.e. the between session variables defined in Appendix B). After which we then add variables that are measured within an individual session (i.e. the within session variables defined in Appendix B). This enables us to investigate more subtle changes in a plabehaviour as a whole.

Using Between Session Markers To obtain an initial view of how well the theoretical markers of harm are able to identify problem gamblers a predictive model was built as described in the Methodology section. When building this model we used all of the between session variables

be impacted by the large number of variables and would automatically increase the weight of those that are the most predictive. The performance of the predictive model is shown in Figure 13. This figure also includes the baseline player model which only utilises the average session cash-in and session length variables. The AUC score of the new model is 0.69 compared to the baseline model AUC score of 0.62. This represents a 58% improvement in overall accuracy in favour of the new model.

When comparing the two models, in particular in the range of true positive values between 10% and 60% it can be seen that the new model generates an additional 10-15% percentage points in true positive accuracy compared to the baseline. As an example if we look at the performance of the baseline model at the point indicated by the red circle on Figure 13 a true positive rate of 40% is achieved for a false positive rate of 22%. The new player model is able to maintain the same level of false positives, but instead identify 50% of the problem gamblers. Alternatively if the same true positive rate of 40% was maintained, the false positive rate could be reduced from 23% to 17%.

Page 42: REPORT 3: PREDICTING PROBLEM GAMBLERS: …...way gambling behaviour and specifically problem gambling behaviour is understood. Furthermore, new insights into gambling behaviours have

Figure 13 - Player Predictive Model (Between Session markers)

To understand what measures of activity have the biggest influence when distinguishing between problem and non-problem gamblers the model is rebuilt, but each time removing one of the inputs to measuring accuracy degradation. The input variables that cause the largest degradation in model performance are the most influential for it to make its decision. The results of this analysis showing the top 13 most influential inputs are shown below in Figure 14.

days did the player play on the gaming machines. Interestingly the next seven most important features relate to the financial characteristics of the player, including both how much the player has lost, but also the impact of how much has been won over a weekly aggregation. Although the

are likely to be co-linear to each other, that is describing broadly the same behaviour, but in subtly different ways. Towards the end of the top 13, features that describe the number of losing sessions, number of different games played and the number of sessions in a week appear. A noticeable absence from the list of important inputs are elements that describe the session length, the number of bets placed, the game selection across B2/B3 content or stake size.

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

True

Pos

itive

Rate

False Positive Rate

Player Model using Between Session Markers

Player Model (Between Session Marker, AUC=0.69)

BaseLine (AUC=0.62)

Random

Page 43: REPORT 3: PREDICTING PROBLEM GAMBLERS: …...way gambling behaviour and specifically problem gambling behaviour is understood. Furthermore, new insights into gambling behaviours have

Figure 14 - Input variable importance for Player Model (Between Session Markers)

Incorporating Within-Session Markers Having looked at the between session markers it was then important to see if any within session markers could be used to enhance the prediction of problem gambling. This was done by incorporating additional inputs that can be derived from the within-session metrics calculated additional work was done to examine the potential of transforming the metrics so that they provided more discriminatory power that could be utilised by the predictive modelling algorithms. Details of this work is provided in Appendix E.

To incorporate the within-session metrics, we have examined their characteristics over

increase the resolution we have conducted the extremes analysis by comparing characteristics of the earliest and the latest 30 sessions. We did not find any statistical differences in their characteristics. The examples of the trends are in Figure 15 and Figure 16.

The incorporation of time-series data into the model is difficult, as there is a trade-off between building a profile of regular player behaviour and then identifying potential sessions of binge behaviours. As we are still making a prediction for the player at the end of their entire investigate the time series nature of the data, it would be constructive to be able to

gambling was more problematic than others.

0 0.5 1 1.5 2 2.5 3 3.5 4

Number of Sessions Per Week

Maximum Daily Total Win

Maximum Session Different Games

Average Player Loss (Session)

Number of Losing Sessions

Average Daily Player Loss

Average Weekly Net Position

Average Daily Player Total Stake

Player Loss

Average Session Total Win

Average Daily Player Loss

Maximum Weekly Total Winnings

Number of Playing Days

Mean Decrease in Model Accuracy

Page 44: REPORT 3: PREDICTING PROBLEM GAMBLERS: …...way gambling behaviour and specifically problem gambling behaviour is understood. Furthermore, new insights into gambling behaviours have

Figure 15 - Time series of the trends of session-level probabilities of three non-problem gamblers. The most recent time corresponds to the first observation, i.e. the timeline is extending into the past.

Figure 16 - Time series of the trends of session-level probabilities of three problem gamblers. The most recent time corresponds to the first observation, i.e. the timeline is extending into the past.

To further improve the model accuracy we identified a subset of the most predictive variables and experimented with different transformations. The subset of variables included 17 between session metrics and 12 within-session metrics. A list of the 17 between session variables and their most effective transformation is provided below. The 12 within-session metrics are provided in Table 10 in the following section and

x Average loss during a session (unmodified) x Deposit after Winning vs. Loss (binned by quartiles) x Maximum monthly total pay (binned by quartiles) x Minimal value of the proportion of session cash out (binned by 3 categories:

missing, below the sample average, above the sample average) x Maximum session total (log transformed) x Maximal gap between bets in a day (log transformed and winsorized) x Maximal total session played (log transformed and winsorized) x Maximal number of stakes with high volatility (unmodified) x Maximum deposit per session (unmodified)

Page 45: REPORT 3: PREDICTING PROBLEM GAMBLERS: …...way gambling behaviour and specifically problem gambling behaviour is understood. Furthermore, new insights into gambling behaviours have

x Number of days played (winsorized and not) x Number of sessions lost money (unmodified) x Number of sessions lost money per week (unmodified) x Number of sessions per day (unmodified) x Number of stake levels (winsorized and not) x Earliest hour played (unmodified) x Latest hour played (unmodified) x Mean hour of play (unmodified)

For the final piece of analysis on the player model we looked to see what was the smallest subset of variables that could produce a model that could provide good predictive power. The Occam razor principle was applied to do this. This principle essentially provides a framework for deciding if a variable should be added to the predictive model. It states that although a complex solution may generate optimal performance, in the absence of evidence, the fewer assumptions that are made, the less likely it is for these assumptions to be incorrect. The minimal set of variables that were produced from this analysis was:

x Minimal amount of cash out (unmodified) x Number of stake levels (winsorized) x Number of days played (winsorized) x Played at or later than 9pm (unmodified)

A number of alternative models were also generated in this process. These models include the average within-session problem gambling score, frequency of session and stakes variability variables. However their performance on the test set was slightly lower than the subset listed above. Future work should focus on the analysis of alternative models and identification of the most interpre tab le and ac tionab le models with similar predictive ability.

Finally, after incorporating the within-session markers into our player model, Figure 17 compares the performance of this model to the baseline and to the random models. This final model is only slightly more predictive (AUC=0.70) compared to the model which only used the between-session variables (AUC=0.69). Compared to the baseline model, with an AUC of 0.62, overall the player model represents a 66% improvement in overall model accuracy.

Page 46: REPORT 3: PREDICTING PROBLEM GAMBLERS: …...way gambling behaviour and specifically problem gambling behaviour is understood. Furthermore, new insights into gambling behaviours have

Figure 17 - Performance of the final enhanced player model after inclusion of the with-in session markers

To illustrate the performance improvement of the enhanced model, Table 7 from the Baseline model section has been extended to include comparative performance metrics at the 10% and 25% true positive rates. This comparison is shown in Table 9. From the table we can see that at a 10% true positive rate the false positive rate has reduced from 5% to 3%. Likewise, at the 25% true positive rate the false positive rate drops from 15% to 9%.

Player Model True Positive Rate

False Positive Rate

True Negative Rate

False Negative Rate

Baseline Model (low) 10% 5% 95% 90%

Enhanced Model (low) 10% 3% 97% 90%

Baseline Model (medium) 25% 15% 85% 75%

Enhanced Model (medium) 25% 9% 91% 75%

Table 9 - Comparison of the Enhanced Player Model to the Baseline Player Model

To further understand the performance of the Enhanced model we have compared the range of scores generated by the model for problem and non-problem gamblers. Figure 18 shows the proportion of problem and non-problem gamblers that fall into different bands of problem gambling score. From this figure we can see that biggest proportion (9%) of non-problem gamblers have a score around 0.19. Not surprisingly we can see that the problem gamblers, on average have a higher score than the non-problem gamblers. The threshold for making this

0%

20%

40%

60%

80%

100%

0% 20% 40% 60% 80% 100%

True

Pos

itive

Rate

False Positive Rate

Player Model Comparison

Enhanced Player Model (AUC=0.70) Baseline (AUC=0.62) Random

Page 47: REPORT 3: PREDICTING PROBLEM GAMBLERS: …...way gambling behaviour and specifically problem gambling behaviour is understood. Furthermore, new insights into gambling behaviours have

problem gambling score is also around 0.24 indicating the unbiased nature of the

model, then it would receive a prediction in favour of the bias held by the model (e.g. to have a tendency towards either problem or non-problem gamblers).

Figure 18 - Problem Gambling scores generated for Problem and Non-Problem Gamblers

0

0.02

0.04

0.06

0.08

0.1

0.12

0.00 0.09 0.19 0.29 0.39 0.48 0.58 0.68 0.78

Prop

ortio

n of

Pla

yers

Problem Gambling Score

Player's Problem Gambling Scores

Non Problem Gambler Problem Gambler

Page 48: REPORT 3: PREDICTING PROBLEM GAMBLERS: …...way gambling behaviour and specifically problem gambling behaviour is understood. Furthermore, new insights into gambling behaviours have

Session Analysis (Unregistered Play) In this section the goal was to identify if we could predict problem gambling behaviour by only using the data available from that session. This analysis is important as it demonstrates what is possible when a player interacts with a machine without a player card. To perform this analysis we have selected each of the sessions generated by the loyalty card players. The sessions generated by a problem gambler have been labelled as a problem gambling session, and the remaining sessions labelled as a non-problem gambling sessions.

After inspecting all of the data the 12 most influential variables were identified. These variables are presented in Table 10. The variables are listed in order of their predictive power and the transform which has been applied.

Variable Transformation

Average Proportion of Cash Out grouped

Session Start Time grouped

Number of Stakes of High Volatile Games

unmodified

Minimum Stake Amount grouped

Value of Non-Debit Card Cash- grouped

Number of Stakes of Low Volatile Games

grouped

Number of Different Games Played grouped

Variance in Stake grouped

Number of Different Stake Amounts winsorized

Amount Cashed Out zero indicator

Number of different Games winsorized

Minimum Stake Amount log

Table 10 - List of most influential variables and their transformation sorted by their predictive power.

When building the predictive models, individual variables that could be derived from the data have been examined. To give an example of this, Figure 19 shows an analysis of likelihood of observing a problem gambling session for different hours of the day. This figure shows one of the dilemmas with the data we can see that there is an increase chance of observing a problem gambling session at the beginning or the end of the day, but the number of sessions that take place during these time periods is significantly lower. Therefore although there is a significant association, the predictive ability is still weak.

Page 49: REPORT 3: PREDICTING PROBLEM GAMBLERS: …...way gambling behaviour and specifically problem gambling behaviour is understood. Furthermore, new insights into gambling behaviours have

Figure 19 Figure showing that the likelihood of a problem gambler playing increases late at night, however, the total number of sessions played decreases substantially during this period.

The results for the session model are shown in Figure 20. The performance of the enhanced session model (AUC = 0.63) has improved compared to the baseline session model (AUC = 0.52) by 550%. Although we have identified a significant improvement against the baseline model, the accuracy is less than what was achieved for the player model.

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23

Prob

abilit

y

Hour of Day

Non Problem Gambler (Probability) Problem Gambler (Probability)

Page 50: REPORT 3: PREDICTING PROBLEM GAMBLERS: …...way gambling behaviour and specifically problem gambling behaviour is understood. Furthermore, new insights into gambling behaviours have

Figure 20 - Performance of the Enhanced Session Model compared to the Baseline.

To illustrate the performance improvement of the enhanced session model, Table 8 from the Baseline model section has been extended to include comparative performance metrics at the 10% and 25% true positive rates. This comparison is shown in Table 11. From the table we can see that at a 10% true positive rate, the false positive rate has reduced by more than half from 8% to 3%. Likewise, at the 25% true positive rate the false positive rate drops from 22% to 12%.

Session Model True Positive Rate

False Positive Rate

True Negative Rate

False Negative Rate

Baseline Session Model (low) 10% 8% 92% 90%

Enhanced Session Model (low) 10% 3% 97% 90%

Baseline Session Model (medium) 25% 22% 78% 75%

Enhanced Session Model (medium) 25% 12% 88% 75%

Table 11 - Comparison of the Enhanced Session Model to the Baseline Session Model

0%

20%

40%

60%

80%

100%

0% 20% 40% 60% 80% 100%

True

Pos

itive

Rate

False Positive Rate

Comparison of Session Models

Enhanced Session Model (AUC=0.63) Baseline (AUC=0.52) Random

Page 51: REPORT 3: PREDICTING PROBLEM GAMBLERS: …...way gambling behaviour and specifically problem gambling behaviour is understood. Furthermore, new insights into gambling behaviours have

Additional Experiments Up until now, we have focused the analytics on distinguishing between problem and non-problem gamblers using the definition provided by the PGSI screen; that is, problem gamblers have a score of 8 or above and non-problem gamblers have a score below 8. In this section we have sliced the data in other ways to see what impact this has on being able to distinguish between different groups of players.

Removing multiple loyalty caamblers

One of the questions in the loyalty card survey asked the participant how many loyalty cards they owned. If a participant has more than one loyalty card, then we know that we

es.

For the first part of this experiment we only included the players that said they had one loyalty card. The hypothesis behind his experiment was that the quality of data would be improved as the players included would have a higher proportion of their gaming

assume that it will be a perfect

card, or not use their loyalty card all of the time.

For the second part of this experiment we also excluded the players who had a PGSI -

previously labelled as non-problem gamblers. The hypothesis behind this experiment was that we would get a clearer differentiation between problem and non-problematic gambling behaviour.

The results of these experiments are shown in Figure 21. For the first experiment, the performance curve for the single loyalty card model had a slightly different shape to the player model, but overall the AUC metric demonstrated that the performance was similar. However for the second experiment we generated an AUC metric of 0.74, a 27% improvement over the player model.

This result of this second experiment is a really interesting finding. In we focus our attention the bottom left hand corner of this graph, from the Single Loyalty Card Model (light blue line), if we operated at a true positive rate of 16% then a false positive rate of 8% would be achieved. If we used the model generated by the second part of this experiment (the orange line) and operated at the same true positive rate (16%) the false positive rate is reduced by a third to 2.6%.

-included in the second model. Reminding ourselves of the definition of a false positive, that is a non-problem gambler who is classified as a problem gambler, we can see that

- who were labelled as non-problem gamblers from the model the number of non-problem gamblers who have been classified as problem

-risk players, we can infer that a -

classified as problem gamblers.

This is an important finding, as it demonstrates that if a majority of the false positives (in this case, potentially two- -these players is not as significant as triggering an intervention to a non-problem gambler with a PGSI score of 0.

Page 52: REPORT 3: PREDICTING PROBLEM GAMBLERS: …...way gambling behaviour and specifically problem gambling behaviour is understood. Furthermore, new insights into gambling behaviours have

Figure 21 - Model Accuracy when only Players with a Single Loyalty Card are included and Players who have an 'at risk' PGSI score excluded.

PGSI Problem Gambling Threshold The specification for the problem gambling severity index prescribes that gamblers who score 8 or above should be categorised as problem gamblers. On further analysis of how this threshold was found, it was determined by a sample size of 109 students, of which only 7 out of 9 students had a score of 8 or above. In our sample we have 951 players who score 8 or above, and this enables us to further investigate how well the model can distinguish between different levels of PGSI score. Figure 22 shows the number of customers that have PGSI Scores of 8 and above. This graph shows a gradual tailing off of the number of players who achieve the greatest score within the PGSI Scale.

Figure 24 shows the performance for different models when we modify the threshold used to distinguish between problem gamblers and non-problem gamblers. In this case, problem gamblers are defined as those being at or above the threshold level and non-problem gamblers being below the threshold. From this analysis there are two interesting results:

x The highest AUC score is achieved using a threshold of 19, performing particularly well in the true positive range between 50% and 90%. At this threshold, the

-3788. Further work needs to be carried out to understand why this threshold performs so well and to ensure this is not an anomaly with the data used.

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

True

Pos

itive

Rate

False Positive Rate

Single Loyalty Card and PGSI between 1 and 7 excluded (AUC: 0.74)

Single Loyalty Card (AUC: 0.67)

Player Model (AUC: 0.67)

Random

Page 53: REPORT 3: PREDICTING PROBLEM GAMBLERS: …...way gambling behaviour and specifically problem gambling behaviour is understood. Furthermore, new insights into gambling behaviours have

x At the highest threshold applied, 23, we have a comparatively high true positive rate of 26% for a small false positive rate 1.4%. At this threshold 73 players are labelled as problem gamblers. Although there are a reduced set of players in the positive class, this result illustrates that extreme forms of problem gambling can be identified relatively accurately.

Figure 22 - Histogram showing the number of players who have PGSI Scores of 8 and above.

0

20

40

60

80

100

120

140

8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27

Num

ber o

f Pla

yers

PGSI Score

Page 54: REPORT 3: PREDICTING PROBLEM GAMBLERS: …...way gambling behaviour and specifically problem gambling behaviour is understood. Furthermore, new insights into gambling behaviours have

Figure 23 - Model Performance using different thresholds to label Problem Gamblers

To continue on from the previous section in which the performance of the analytical models is compared when different thresholds are used to distinguish between problem and non- when the non-problem gamblers are always identified by a PGSI Score of 0, but the threshold used to define a problem gambler is increased from the range of possible values of 1 through to 27.

The result of this analysis is shown in Figure 24. From this figure we can see that there is a general pattern of the ability to predict problem gamblers improving when the of PGSI score increases. The maximum AUC score is 0.77 and occurs at thresholds 13, 16 and 19. Interestingly these scores are all three points away from each other, which is the maximum score given to some questions in the PGSI screen.

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

True

Pos

itive

Rate

False Positive Rate

Random PGSCORE=11 (AUC: 0.67) PGSCORE=15 (AUC: 0.66)

PGSCORE=19 (AUC: 0.75) PGSCORE=23 (AUC: 0.68)

Page 55: REPORT 3: PREDICTING PROBLEM GAMBLERS: …...way gambling behaviour and specifically problem gambling behaviour is understood. Furthermore, new insights into gambling behaviours have

Figure 24 - Model Performance for different thresholds for defining Problem Gamblers

Predicting PGSI Screening Question Responses The previous experiment then leads us on a journey to see if particular PGSI Screening question responses were more predictive from the data than others. For this analysis we

PGSI Screening Questions is repeated below:

1. How often have you bet more than you could really afford to lose? 2. How often have you needed to gamble with larger amounts of money to get

the same feeling of excitement? 3. How often have you gone back another day to try to win back the money you

lost? 4. How often have you borrowed money or sold anything to get money to

gamble? 5. How often have you felt that you might have a problem with gambling? 6. How often have people criticized your betting or told you that you had a

gambling problem, regardless of whether or not you thought it was true? 7. How often have you felt guilty about the way you gamble or what happens

when you gamble? 8. How often has your gambling caused you any health problems, including

stress or anxiety? 9. How often has your gambling caused any financial problems for you or your

household?

The results of our analysis are presented in Figure 25. Here we can see that questions 2, 6, 8 and 9 are the most predictable with questions 3 and 4 being the most difficult. Questions 8 and 9 are two of the questions which are more related to gambling related harm, so it is encouraging that these are highly predictive. Interestingly, question 3

0.50

0.55

0.60

0.65

0.70

0.75

0.80

0 5 10 15 20 25 30

Pred

ictiv

e M

odel

AUC

Val

ue

PGSI Threshold

Page 56: REPORT 3: PREDICTING PROBLEM GAMBLERS: …...way gambling behaviour and specifically problem gambling behaviour is understood. Furthermore, new insights into gambling behaviours have

which relates to chasing behaviour is the second hardest question to predict from the available data.

Figure 25 - Predictability of individual PGSI Screening Questions

Gambling Type Analysis Within the Loyalty Card Survey Report analysis was completed to identify four different clusters to describe different gambling types. Detailed descriptions of the classes are obtained in Report 2. A brief description is provided below:

x Cluster 1 Lowest engaged gamblers x Cluster 2 Moderately engaged gamblers x Cluster 3 Substantially engaged gamblers x Cluster 4 Heaviest engaged gamblers.

The results of distinguishing problem gamblers and non-problem gamblers for each of the four clusters is shown in Figure 26. From the overall AUC metrics for each of the clusters, problem and non-problem gamblers are more easily distinguished in Cluster 1 and Cluster 2. These clusters have the lowest levels of engagement across different forms of gambling.

gaming machines, if they have a low engagement across other forms of gambling, then for this group we will be analysing a higher overall proportion of their entire gambling activity. Conversely, for more engaged players, the gaming machine activity that we are analysing will be a lower proportion of their overall gambling activity. This result provides some evidence that to minimise problem gambling, the entire range of gambling products needs to be considered.

For the most engaged gamblers in the study, Cluster 4, it is interesting to observe that in the bottom left hand corner for true positive rate at approximately 15% we are able to reduce the false positives substantially compared to the other gambling types.

0.5

0.52

0.54

0.56

0.58

0.6

0.62

0.64

0.66

Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9

Pred

ictiv

e M

odel

AUC

PGSI Question

Page 57: REPORT 3: PREDICTING PROBLEM GAMBLERS: …...way gambling behaviour and specifically problem gambling behaviour is understood. Furthermore, new insights into gambling behaviours have

Figure 26 - Predicting problem gambling amongst different gambling types.

Factor Group Analysis In Report 2 exploratory factor analysis of the PGSI screen was conducted to explore the different types of harms that people may be experiencing. This analysis resulted in two factors being identified:

x Factor Group 1 relates to harmful gambling actions and includes chasing losses, gambling with more money to get the same excitement and betting more than one can afford to lose

x Factor Group 2 relates to harmful gambling consequences and includes items such as people criticising behaviour, health impacts, financial difficulties or feeling guilty about what happens when the participant gambles.

Within this report we have built a predictive model for the two factors to understand if players associated with one of the factors are more predictable than the other. The results produced by these models are shown in Figure 27. This analysis shows that Factor Group 1, harmful gambling actions, is more predictable (AUC=0.63) than Factor Group 2, harmful gambling consequences (AUC=0.60) by 30%.

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

True

Pos

itive

Rate

False Positive Rate

Cluster 1 (AUC: 0.67) Cluster 2 (AUC: 0.67) Cluster 3 (AUC: 0.65)

Cluster 4 (AUC: 0.63) Random

Page 58: REPORT 3: PREDICTING PROBLEM GAMBLERS: …...way gambling behaviour and specifically problem gambling behaviour is understood. Furthermore, new insights into gambling behaviours have

Figure 27 - Predictive Models based on PGSI Response Factor Groups

Debit Card Usage At the end of the time available to complete the research project we had some success with matching over the counter debit card transactions with cash-in events on the gaming machines. The matching rate achieved on the transactions was about 67%. A perfect match was not expected as not all transfers from an employee will be due to a debit card transaction. From the debit card transaction we calculated 6 new variables to consider in our player model. The variables were:

x Total amount deposited with a debit card across all sessions. x Total number of deposits with a debit card across all sessions. x Number of Sessions where a debit card was used. x Maximum amount deposited with a debit card in a session. x Maximum number of deposits with a debit card in a session. x Average value deposited with a debit card in a session.

Of the 3,988 players included in this analysis, 1,394 used a debit card in at least one session. To measure the predictive power of these variables we added these variables to our between session markers and built a new predictive model. The performance of the between session marker predictive model with and without the debit card variables is shown in Figure 28. The AUC values for both of these models was 0.69 showing that these variables did not improve the model.

Only a limited amount of time was available to explore the use of this data. Through future exploration it is still believed that this data will help improve the accuracy of the model.

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

True

Pos

itive

Rate

False Positive Rate

Predictive Models based on PGSI Response Factor Groups

Factor Group 1: AUC: 0.63 Factor Group 2: AUC: 0.60 Random

Page 59: REPORT 3: PREDICTING PROBLEM GAMBLERS: …...way gambling behaviour and specifically problem gambling behaviour is understood. Furthermore, new insights into gambling behaviours have

Figure 28 Player model performance using debit card variables

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

True

Pos

itive

Rate

False Positive Rate

Player Model Performance with Debit Card Usage

Random

Between Session Variable Player Model (AUC: 0.69)

Between Session Variable Player Model with Debit Card Variables (AUC:0.69)

Page 60: REPORT 3: PREDICTING PROBLEM GAMBLERS: …...way gambling behaviour and specifically problem gambling behaviour is understood. Furthermore, new insights into gambling behaviours have

Discussion The key research objective of this study was to determine if it was possible to distinguish between harmful and non-harmful gaming machine play. The Problem Gambling Severity Index (PGSI) score was obtained for 3,988 loyalty card holders (who agreed to data linkage) and was used as a proxy for identifying harmful play. Our analysis shows that it is possible to distinguish between problem gamblers and non-problem gamblers. The key findings are that:

x For the player model, distinguishing between problem and non-problem players was 66% more accurate than the baseline model.

x For the session model, distinguishing between problem and non-problem gambling sessions was 550% more accurate than the baseline model. However, the baseline model only performed slightly better than random.

x -positive rate can be significantly improved. This indicates that many of the

- x Increasing the PGSI score threshold used to distinguish between problem and

non-problem gamblers improves detection accuracy. The biggest uplift was found for a PGSI score threshold of 19.

x The responses to each of the questions within the PGSI screen demonstrate different levels of predictability. The most predictable questions are:

o Q2: How often have you needed to gamble with larger amounts of money to get the same feeling of excitement?

o Q6: How often have people criticized your betting or told you that you had a gambling problem, regardless of whether or not you thought it was true?

o Q8: How often has your gambling caused you any health problems, including stress or anxiety?

x The two least predictable questions are: o Q3: How often have you gone back another day to try to win back the

money you lost? o Q4: How often have you borrowed money or sold anything to get

money to gamble? x Problem Gamblers who had the lowest engagement across multiple gambling

products, but did gamble on gaming machines, were more distinguishable than players who engaged in many gambling products.

x Players who matched the factor for harmful gambling actions were more distinguishable than players who matched the factor for harmful gambling consequences.

Note, the baseline model used to calculate the uplift in model accuracy have been -in

and session length. These variables were selected as they are used in the current code of conduct by the ABB to generate pop-research it is not possible to determine the impact of gaming machine play to problem gambling status as the participants in the study participated in many different forms of gambling.

The players studied in this research represent the more engaged subsection of gaming machine players. Therefore the results presented in this document are likely to represent conservative estimates in regards to the accuracy of distinguishing problem

Page 61: REPORT 3: PREDICTING PROBLEM GAMBLERS: …...way gambling behaviour and specifically problem gambling behaviour is understood. Furthermore, new insights into gambling behaviours have

and non-problem players. If these results are operationalised we would expect a reduced false positive rate; that is, less non-problem gamblers would be incorrectly identified as problem gamblers, as in general their activity would be less involved.

Can we identify harm? In this research programme we demonstrated that is possible to identify harm via the proxy of the Problem Gambling Severity Index. Prior to this research taking place, the Association of British Bookmakers had implemented a code of conduct which triggered pop-ups on the gaming machine when either a cash-in limit or time limit was reached. A baseline model was constructed using these two variables, which delivered an AUC performance of 0.62. After incorporating additional data elements, a predictive model was developed which delivered an AUC performance of 0.70. This represented a performance increase of 66%. By applying a stricter definition of problem and non-

-define a problem gambler) the highest model performance of 0.77 was achieved. This represents a 125% improvement against our original baseline model.

If we look at other applications of predictive analytics in the gaming sector, we generally see higher detection accuracies. For example, remote operators can identify fraud with an AUC performance of 0.90 or more, and customer churn7 within an AUC accuracy of 0.85 or more. But to provide context we need to consider what these models are trying to achieve.

To begin with, in the remote environment we have the ability to measure a playeractivity in much more precise detail and cover many more interactions. In comparison, the gaming machine environment allows only a small proportion of customers to be monitored. Not all of the interactions are captured, and uncertainties exist about defining players

When identifying Fraud, we have a precise definition of what we are identifying, for example, will this player generate a deposit transaction which will then result in a chargeback from their credit card issuer? Similarly, in dealing with customer churn we have a precise definition. We may ask, will the particular customer use our product or service in the foreseeable future (such as 2 weeks, 3 months, etc.)? These predictions are heavily biased to the actions which are taking place within that environment, e.g., Is the player defrauding me? or Is the player not likely to return to my product or service?

If we compare this to the task of predicting harmful play, we have a broad definition of someone who is spending more time or money than they can afford. To provide a more precise definition we are using the Problem Gambling Severity Index, and in particular, players who score 8 and above to be problem gamblers. The screen has been

to de tec t ind ividua ls in the genera l population who have a gamb ling prob lem, or who are a t risk of deve lop ing a prob lem8

threshold of 8 is generally well accepted, but is only drawn from a sample of 148

7 Predicting customer churn is an application of predictive analytics where customers who are at risk of no longer gambling with the particular operator are identified. Customers identified by this model are generally targeted with retention offers to keep them as customers. 8Page 7, http://classes.uleth.ca/201201/hlsc3700a/The%20Canadian%20Problem%20Gambling%20Index.pdf

Page 62: REPORT 3: PREDICTING PROBLEM GAMBLERS: …...way gambling behaviour and specifically problem gambling behaviour is understood. Furthermore, new insights into gambling behaviours have

interviews of clinical respondents to a sub-sample drawn from an initial survey. The

defined as in our other examples.

This imperfection has two consequences in the model building process. Firstly, in the model building stage, decisions are made both in the design of input variables and the training process which is guided by these labels. Secondly, when evaluating the impact we are again using imprecise labels, so there could be circumstances were the model

The assessment of accuracy in this project is measured against the player, not the

point: a problem gambler may only exhibit harmful play in some forms of gambling. At the same time, the PGSI screen is a broad assessment of their overall activity across multiple products. Therefore if their gambling machine behaviour is not harmful, this increases the error of the prediction. There is a counter-argument that if a player is a problem gambler, this behaviour will be exhibited in all forms of their play.

However, given all of these challenges, we have shown that is possible to identify problem gamblers. Whilst mode improvement to current methods of identification. In from predictive models deployed for fraud and customer churn in a remote environment where richer data and a more precise target variable are available.

As a research team, we therefore believe that there is a bright outlook for the application of behavioural analytics to enhance the social responsibility strategy of operators to protect against harmful play on gaming machines. However, operators will need to make trade-offs when identifying problem gamblers, as some non-problem gamblers will also be identified and therefore receive miss-targeted interventions.

Research Implications From the research that has been completed there are number of important implications that should be considered.

Multiple Variables It was demonstrated in this research that to be able to adequately distinguish between problem and non-problem gamblers that a combination of variables needs to be considered. It was not possible to accurately identify problem gamblers through one variable alone. This demonstrates that to help mitigate the impact of problem gambling the focus should shift away from regulating particular parameters, such as stake size, but take a balanced, rounded approach which considers the player, the product and the environment.

Registered vs Non-Registered Play Being able to identify problem gamblers from registered play was more successful than identifying problematic sessions in non-registered play. The research team had been in doubt that non-registered play would yield actionable analysis. However, as universal

understand what operators can do in this situation to help fortify their responsible gambling efforts. It was demonstrated that compared to the baseline measurement, it was possible to more accurately identify problematic sessions. Although the accuracy was less than for registered play, we need to acknowledge that not all gaming sessions

Page 63: REPORT 3: PREDICTING PROBLEM GAMBLERS: …...way gambling behaviour and specifically problem gambling behaviour is understood. Furthermore, new insights into gambling behaviours have

are likely to exhibit problematic play. The method for measuring accuracy of this model assumed that all sessions generated by problem gamblers exhibited problematic play, leading to a conservative accuracy estimate. Therefore it would be premature to dismiss the potential value of within-session markers to minimise harm on non-registered play.

Mandatory ABB Limits Through the development of the baseline models some preliminary analysis on the ABB mandatory limits was possible. Through this analysis it was demonstrated that the mandatory limits set within this code are too high. For example, by looking at the average session cash-in value, at a threshold of £250 only 1.3% of the problem gamblers would have been identified.

Research Limitations There are a number of limitations to the conclusions that can be drawn from this research program. Most importantly from this research it is not possible to determine the impact of gaming machine play on overall problem gambling status, as the participants in the study engaged in many different forms of gambling.

Limitations have had an impact on our understanding of how the predictive models would perform when operationalised, or decreased our ability to provide better discrimination between the problem and non-problem gamblers. Three of the key limitations are highlighted below.

x Heavily engaged participants The sample of loyalty card surveys included in this study represented a heavily skewed subset of gaming machine players and their associated sessions. The impact of this is that the performance of the models developed in this report is likely to be a conservative estimate of how they would perform in practice (in particular for the false-positive rates).

x From the loyalty card survey we know that only 49% of the participants either always or almost always used their loyalty card. We also know that on average the participants engaged in 4.8

activity that is studied in this research is limited. With a more complete view of a pladeveloped.

x Incomplete view of a player s gambling decisions The data that was available for this research project only provided transactional details of the interactions that took place on the machine. This excluded information relating to the selection of bets within the game. Being able to incorporate the risks being taken by a player on each bet is likely to generate a more predictive model.

Future Research This initial research has just scratched the surface of what is possible. With further research, accuracy can be improved to reduce the impact of necessary trade-offs when operationalising the predictive models described in this report. There are numerous areas of investigation that are likely to improve accuracy. Four of key areas are discussed below:

Page 64: REPORT 3: PREDICTING PROBLEM GAMBLERS: …...way gambling behaviour and specifically problem gambling behaviour is understood. Furthermore, new insights into gambling behaviours have

x Better target variable The PGSI screen has been used as a proxy in this analysis for identifying harmful gaming machine play. In this report we have highlighted several deficiencies with this screen which are likely to contribute to inaccuracies of the developed models. These deficiencies include the choice of threshold for determining a problem gambler and questions that refer to harmful gambling consequences rather than harmful gambling actions. It is unrealistic to expect that a model built on analysing gaming machine data will accurately predict harmful gambling consequences. By developing a target variable which is more closely aligned to harmful gaming machine behaviour, we are likely to generate a significantly more accurate model and be able to more accurately target interventions.

x More variables Through the process of delivering this research programme we have learnt significantly more about how players interact with gaming machines. For example, in the patterns of play report we see that higher stakes and spend typically occur on mixed B2 and B3 content sessions. Therefore there is significant scope to develop additional variables based on our enhanced knowledge of gaming machine behaviour.

x Improved operation definitions of existing variables When reviewing the variables that were most informative to the predictive models we were surprised that variables relating to some markers where absent, for example: game volatility and chasing. It would be premature to rule these markers out as being important for identifying harmful play. Through further analysis of the data, the way these variables are defined could be improved, providing further insight into how the machines are used and facilitate improvements to the predictive models.

x Improved measurement of existing variables A larger range of variables were considered in this research project. However, there was not sufficient time to be able to further investigate how the measurement of these variables could be improved. For example with the debit-card data, by working more closely with the operators it may be possible to provider higher matching rates and also incorporate declined debit card transactions. Improvements could also be made by grouping together gaming machine sessions into a higher level proxy. This would enable more behaviours around movement between machines in a venue to be further understood.

Whilst delivering this research program a number of question came out of discussions with the research team when considering how the predictive models could be operationalised. The main questions are summarised below:

x What is the impact of delivering a harm minimisation intervention to a non-problem gambler? In this research project, we have identified that trade-offs need to made when using a predictive model to identify problem gamblers, in that some non-problem gamblers will be incorrectly identified as problem gamblers. The impact, both commercially from an perspective, and from an enjoyment perspective of a player, is not yet fully understood. Understanding the nature of any potential impact would enable operators to make a more informed decision when deciding how to operationalise the predictive models.

x Developed targeted interventions based on problem gambling behavioural subtypes. From the complexities of the developed models, we know that problematic gambling takes multiple forms. Further exploring the behaviours of the different types of problem gamblers would enable targeted interventions

Page 65: REPORT 3: PREDICTING PROBLEM GAMBLERS: …...way gambling behaviour and specifically problem gambling behaviour is understood. Furthermore, new insights into gambling behaviours have

to be developed, which through testing and evaluation would enhance our ability to minimise gambling related harm.

x How could behaviour across multiple gambling products be utilised? The research presented in this report only considers gambling activity on gambling machines. We know that the survey participants engage in multiple forms of gambling. By understanding how transition between different forms of gambling would provide useful insight into how harmful gambling play could be reduced for a player across multiple gambling products.

Page 66: REPORT 3: PREDICTING PROBLEM GAMBLERS: …...way gambling behaviour and specifically problem gambling behaviour is understood. Furthermore, new insights into gambling behaviours have

Recommendations When considered against previous gambling research that has been conducted in last 40 years, this research program (executed over six months) represents one of the largest step changes in knowledge about problem gambling behaviour. One of the key ways in which this been achieved is through the collaboration of a number of organisations with a variety of backgrounds. Based on what has been learnt through the analysis of the data, and to maintain the momentum gained from this research we have identified 5 key recommendations:

x Live Trials The results presented in this research show that is possible to identify problem gamblers using behavioural modelling. To further validate this result, it is important to operationalise the results of this research. It would be a missed opportunity if all of the learning remained locked in the pages of reports geneenables problem gamblers to be identified, but rather a variety of factors combine to enable problem gambling behaviours to be identified. It follows that interventions will therefore be ineffective if they focus on addressing one particular variable, such as stake size. The identification of harmful play on a gaming machine is only one step to a final objective of being able to generate interventions on the machine which would not only be effective in reducing problem gambling, but also not detract from the experience of non-problematic gamblers. Being able to focus interventions on players who are most likely to be at risk means that different types of interventions can be targeted to players and their performance evaluated. It is important that this is done in a test-and-learn cycle so that our understanding of the efficacy of various types of interventions continues to evolve.

x Continued Industry Involvement One of the key successes of this research project has been in the involvement of the industry. Our analysis of gaming machine play has only scratched the surface of the full range of gambling products available to consumers. By enriching our understanding beyond gaming machines, further insight into how to ensure a safer gambling environment for gamblers will be gained. For example, when a player self-excludes, or complains, it would be useful to understand why the player decided to exclude and to get permission to use their data for research.

x Treatment Provider Involvement One of the fundamental aspects of this

loyalty card data. Such a time-consuming data collection process would be further enriched with data available from a variety of treatment providers. To enable the continued understanding of problem gambling behaviour, building a usable knowledge base from treatment providers and gaining consent to obtain and link industry data would be an invaluable asset.

x Continued Data Exploration Analysis of industry data has only taken place at the final stages of the research program. Further exploration into the potential of what this data provides will help improve the models and uncover additional insights into gambling behaviour.

Page 67: REPORT 3: PREDICTING PROBLEM GAMBLERS: …...way gambling behaviour and specifically problem gambling behaviour is understood. Furthermore, new insights into gambling behaviours have

x Review Screening Tools We now have a significant amount of information on which to further analyse the PGSI screening tools, in particular the weighting and scoring of the individual questions. More development of these screening tools will enable further understanding of some of the deeper relationships between the individual questions and the behaviour observed from the players.

Page 68: REPORT 3: PREDICTING PROBLEM GAMBLERS: …...way gambling behaviour and specifically problem gambling behaviour is understood. Furthermore, new insights into gambling behaviours have

Conclusion

The objective of this research program was to answer the following two challenges from the Responsible Gambling Strategy Board:

x Is it possib le to d istinguish be tween harmful and non-harmful gaming machine p lay?

x If so, wha t measures might limit harmful p lay without impacting those who do not exhib it harmful behaviours?

The focus of this research has been to answer the first question. To meet this goal, we have worked with the industry obtained data relating to the activity of players on their machines, as well as loyalty card holders. This enables players to be screened for problem gambling using the PGSI screen. Report 2 in this series of documents describes the methodology and results of that exercise. This report describes the methodology and results of combing the PGSI screening data with the industry data to identify problem gamblers.

The results of the analysis show that using the PGSI Screen as a proxy for measuring harmful play, it is possible to distinguish between harmful and non-harmful gaming machine play. This is the first time that this type of analysis has been performed on a large sample across multiple operators and therefore this result marks a significant step forward in the progress towards understanding problem gambling and more general gambling behaviours.

To measure the accuracy of the analytical models, a baseline metric was produced based on the current measures used by the Association of British Bookmakers to generate pop-up interventions on their gaming machines. The AUC value for measuring accuracy of this baseline model was 0.62. The best model that used the PGSI definition of problem gambling generated an AUC score of 0.70. By using this model, an additional 10-15% more problem gamblers would be identified with the same false positive rate that is, there would be no increase in non-problem gamblers being flagged as problem gamblers. When looking at individual sessions, the baseline model generated an AUC value of 0.52. The model built during this research generated an AUC value of 0.63. This effectively enables the proportion of detected problem gamblers to be improved by 15%, or, alternatively, the proportion of non-problem gamblers identified as problem gamblers is reduced by 15% a significant improvement when compared to previous methods of measurement.

Additional experiments were run on the data which produced some interesting findings. The first was that an increase from the PGSI threshold from 8 to higher values produced an early uplift in being able to identify a subset of problem gamblers very accurately. For example, at a PGSI threshold of 23, 26% of the problem gamblers could be identified with a false positive rate of 1.4%. The objective of the model was also modified to distinguish between players who had a problem gambling score of 0 (i.e. a non-problem gambler), and those who had a problem gambling score above a particular threshold. The accuracy of the predictive models was measured for the range of possible PGSI scores. At thresholds of 13, 16 and 19, an AUC score of 0.77 was achieved. This is a significant uplift against earlier models and provides an interesting insight for further analysis of the PGSI screen.

When investigating the predictability of individual PGSI screening questions, the following three were the most predictive:

Page 69: REPORT 3: PREDICTING PROBLEM GAMBLERS: …...way gambling behaviour and specifically problem gambling behaviour is understood. Furthermore, new insights into gambling behaviours have

x Q2: How often have you needed to gamble with larger amounts of money to get the same feeling of excitement?

x Q6: How often have people criticized your betting or told you that you had a gambling problem, regardless of whether or not you thought it was true?

x Q8: How often has your gambling caused you any health problems, including stress or anxiety?

The two worst performing questions are:

x Q3: How often have you gone back another day to try to win back the money you lost?

x Q4: How often have you borrowed money or sold anything to get money to gamble?

The result of Q3 being the second least predictive has a potential ramification for the ability to be able to identify chasing behaviour from this gambling product.

Finally, after reflecting on the results of this research, we have made the following 5 recommendations to take these positive results forward, both within the industry and for further research:

x Live Trials The results of this research should be evaluated in a live environment so the effectiveness can be more accurately measured;

x Continued Industry Involvement The collaboration with the industry in the execution of this project has been one the keys to its success. This relationship between industry and researchers should be further developed to other operators within the industry;

x Treatment Provider Involvement The ability to accurately identify problem gambling behaviour within the data has been a fundamental component of this research. To further enrich this data set would enable research to continue to evolve as gambling behaviours and products evolve;

x Continued Data Exploration This research project was limited by time constraints, rather than exhausting the range of ideas that the research team has for investigating the data. Through further investigation, additional insights will appear to enable enriched understanding gambling behaviour, and in particular harmful gambling behaviour;

x Review Screening Tools This research has highlighted some of the limitations of the existing screening tools. Using this data set, it would be possible to further analyse the PGSI screening tool and potentially identify new weights for defining gambling categories or alternatively different weights for each of the questions.

Page 70: REPORT 3: PREDICTING PROBLEM GAMBLERS: …...way gambling behaviour and specifically problem gambling behaviour is understood. Furthermore, new insights into gambling behaviours have

About Featurespace Featurespace is a UK technology company at the vanguard of predictive analytics, pioneering the next level of data analysis: Adaptive Behavioural Analytics. We combine the very latest research in statistics and data analysis with a unique method of modelling human behaviour. core ARICTM technology is a revolutionary approach to accurately predicting what individuals and dynamic groups of people will do, in real time. Featurespace has deployed a series of award-winning products for fraud and risk management, as well as customer insight and retention, and is recognised as an industry authority on responsible gambling and player protection.

To find out more, visit http://www.featurespace.co.uk/

Page 71: REPORT 3: PREDICTING PROBLEM GAMBLERS: …...way gambling behaviour and specifically problem gambling behaviour is understood. Furthermore, new insights into gambling behaviours have

About RTI

RTI International is one of the world's leading research institutes, dedicated to improving the human condition by turning knowledge into practicemore than 3,700 provides research and technical expertise to governments and businesses in more than 75 countries in the areas of health and pharmaceuticals, education and training, surveys and statistics, advanced technology, international development, economic and social policy, energy and the environment, and laboratory and chemistry services. RTI has established itself as a central player in expanding knowledge about the consequences of substance abuse and the efficacy of programs that combat it. Substance use and mental health research program emphasizes the development of improved methods of measuring substance abuse and its consequences in high-risk populations. RTI uses innovative predictive analytics methods to evaluate the impact of policies and interventions.

RTI gambling analytics team: Georgiy Bobashev, Robert J. Morris, Paul Ruddle

Page 72: REPORT 3: PREDICTING PROBLEM GAMBLERS: …...way gambling behaviour and specifically problem gambling behaviour is understood. Furthermore, new insights into gambling behaviours have

References Hoffer L., Bobashev G.V., Morris R.J., (2011) Simulating patterns of heroin addiction within the social context of a local heroin market. In Gutkin B. & Ahmed S. (Ed.), The computa tiona l neurosc ience of drug add iction. Springer Verlag. pp. 313-331

Bobashev, G., Liao, D., Hampton J., and Helzer, J., Individual patterns of alcohol use. (2014) Add ictive Behaviors 39(5),934 940

Fagerström K. Time to first cigarette; the best single indicator of tobacco dependence? Monaldi Arch Chest Dis. 2003;59:91 4. Transdisciplinary Tobacco Use Research Center (TTURC) Tobacco Dependence. Baker TB, Piper ME, McCarthy DE, et al. Time to first cigarette in the morning as an index of ability to quit smoking: implications for nicotine dependence. Nicotine Tob. Res. 2007; 9:S555 70.

Page 73: REPORT 3: PREDICTING PROBLEM GAMBLERS: …...way gambling behaviour and specifically problem gambling behaviour is understood. Furthermore, new insights into gambling behaviours have

Document Information

Document History Version Date Modified By Comments

0.1 31-October-2014 David Excell First draft report sent to RGT for peer review by MROP2.

0.2 28-November-2014 David Excell Final version for publication on the RGT website.

Page 74: REPORT 3: PREDICTING PROBLEM GAMBLERS: …...way gambling behaviour and specifically problem gambling behaviour is understood. Furthermore, new insights into gambling behaviours have

Appendix A Calculating Proxy Sessions To calculate the proxy sessions, Featurespace developed an algorithm to score each

To identify the optimal threshold for determining the start of a new session, the accuracy of the score was measured against sessions defined by the use of a loyalty card. The result of this process is shown below in Figure 29

Figure 29 - ROC Curve for the Proxy Session detection Algorithm

For the results presented in this report, the threshold of 0.35 was selected. This threshold delivers a true positive rate of 87.3% with a corresponding false positive rate of 11.8%. If a lower threshold was chosen (moving further right along the ROC curve), the false positive rate would increase, resulting in an overall reduction in the reported session lengths. We are confident that the setting selected provides an optimal equilibrium between short and long sessions.

To provide an illustration of how the Proxy Session process works, a sample of activity from one gaming machine is provided in Table 12. In this table, the first 6 columns represent data that has been provided by the industry. Of particular interest is the first column, which indicates the events when the player has their card inserted into a machine. The last three columns illustrate the data which is added when calculating the proxy sessions. Each of these columns is defined as:

x Proxy Session Score provides a threshold which can be applied to determine if a new session has started.

x Session ID The unique identifier assigned to the session. x Proxy Session PlayerID The player ID that is now associated with each of the

events based on the extent of the newly defined session.

0%

20%

40%

60%

80%

100%

0% 20% 40% 60% 80% 100%

True

Pos

itive

Rat

e

False Positive Rate

Indicative Proxy Session Detection Accuracy

Operating point generating a true positive rate of 87.3% and a false positive rate of 11.8%

Page 75: REPORT 3: PREDICTING PROBLEM GAMBLERS: …...way gambling behaviour and specifically problem gambling behaviour is understood. Furthermore, new insights into gambling behaviours have

The table has been shaded so that the alternative sessions are highlighted in a different colour. The derived columns have been shaded in a darker colour. A Proxy Session Score threshold of above 0.35 has been used to define a new session. In this particular example we can see that in the second session (with ID 987655), we have extended the player ID to include the Cash Out transaction. It is interesting to note that in this example there was a 9 minute gap of inactivity between the player putting the money into the machine and then deciding to take it all out.

It is also interesting in this example to observe how choosing a higher threshold could have impacted the analysis. For example If a threshold of 0.4 had been selected, that is proxy session scores above 0.4 are only used to indicate new sessions, then the first

resulted in sessions 987655 and 987656 being merged and player 123456 being mapped to the following 4 stakes (where Action = Play).

PlayerID Timestamp Value Balance Action Game Proxy Session Score

Session ID

Proxy Session PlayerID

09:18 -1160 1180 Play Roulette 0.00 987654 09:18 720 1900 Win Roulette 0.00 987654 09:18 -1160 740 Play Roulette 0.00 987654 09:18 1260 2000 Win Roulette 0.00 987654 09:19 -1160 840 Play Roulette 0.00 987654 09:19 1440 2280 Win Roulette 0.00 987654 09:19 -1160 1120 Play Roulette 0.00 987654 09:20 -1120 0 Play Roulette 0.00 987654 123456 12:53 1000 1000 CashIn 0.58 987655 123456 123456 12:53 1000 2000 CashIn 0.08 987655 123456 123456 12:54 1000 3000 CashIn 0.04 987655 123456 13:01 -3000 0 CashOut 0.00 987655 123456 13:05 200 200 CashIn Roulette 0.38 987656 13:05 10 210 CashIn Roulette 0.04 987656 13:05 -210 0 Play Roulette 0.00 987656 13:05 360 360 Win Roulette 0.00 987656 13:06 20 380 CashIn Roulette 0.02 987656 13:06 -380 0 Play Roulette 0.00 987656 13:06 500 500 CashIn Roulette 0.10 987656 13:07 -480 20 Play Roulette 0.00 987656 13:07 200 220 CashIn Roulette 0.16 987656 13:07 -220 0 Play Roulette 0.00 987656 13:29 500 500 CashIn Slots 0.58 987657 13:29 -20 480 Play Slots 0.00 987657

Table 12 - Example application of the Proxy Session Algorithm. The unit of the value and balance fields is pence.

Page 76: REPORT 3: PREDICTING PROBLEM GAMBLERS: …...way gambling behaviour and specifically problem gambling behaviour is understood. Furthermore, new insights into gambling behaviours have

It is important to note the impact that the definition of the proxy session has in the

indeed multiple visits on a given day. For example, if the proxy sessions are too short we will see overall reductions in total staking levels, reloading, and changes in games. Conversely, if the proxy sessions are too long, then we will be asserting that players are spending more money and time on the machines than what is actually occurring.

Page 77: REPORT 3: PREDICTING PROBLEM GAMBLERS: …...way gambling behaviour and specifically problem gambling behaviour is understood. Furthermore, new insights into gambling behaviours have

Appendix B - Measurement of Harm Markers This appendix outlines how Featurespace has calculated each of the markers of plausible harm from the preliminary dataset (1-September-2013 to 30-November-2013). We have included histograms to show the distributions of each of the variables used to describe individual markers. Each of the markers has been converted into a number of variables for exploration. Combinations of variables form a harm marker, or metric, for

the purpose of analysis.

These histograms have been scaled to show 95% of the complete range of values for - -tailed data

contains extreme values which occur more frequently than are expected from a more

For heavy-median value rather, than the mean value, for more accurate interpretation. The median value is calculated by sorting the data and selecting the value in the middle. The mean value is calculated by summing all of the values and dividing by the number of values present, and is therefore subject to distortion by a few values which are significantly different to the majority.

If the reader is understanding general behaviours on machine players they should refer to the Patterns of Play report included in this research programme.

Between Session Metrics

sessions associated with a registered player. These values, extracted from 3 months of data ranging from 1-Sep-2013 to 30-Nov-2013, have then been aggregated so that a value for each of the defined outputs is generated for each registered player.

Some of the common errors within this data set concern:

x A registered player may not always be using his/her loyalty card, or may share a loyalty card with an unregistered player.

x Players may visit different operators. x Registered players may be duplicated if they are registered in more than one

LBO. x Two of the operators introduced their player cards a few months prior to the

time period represented by this data, so it may not represent stable behaviour. x Customers who have only used their card once during a session may skew the

data to an unknown extent.

Page 78: REPORT 3: PREDICTING PROBLEM GAMBLERS: …...way gambling behaviour and specifically problem gambling behaviour is understood. Furthermore, new insights into gambling behaviours have

1) Frequency of Play Aim The aim of this metric is to understand the frequency that a player uses a Gaming Machine.

Measurement

discussed in previous sections. A player may have multiple sessions on any given day.

Outputs The following variables are calculated for this marker of harm:

1. Total number of sessions between 1-Sept and 30-Nov 2013 2. Maximum number of sessions on any one day 3. Average daily sessions 4. Maximum number of sessions in any one week 5. Average weekly sessions 6. Maximum number of sessions during a month 7. Average monthly sessions 8. Average number of days between sessions 9. Maximum length of successive playing days 10. Shortest gap between playing sessions 11. Longest gap between playing sessions 12. Average gap between playing sessions

Errors The challenge with this measure is that the frequency may not be constant, and therefore aggregation transformations may hide specific increases in the rate of activity. As an example, when calculating the number of sessions over a four week period, the two players below would look the same:

Week 1 Week 2 Week 3 Week 4 Tota l Player 1 1 1 1 1 4 Player 2 1 3 0 0 4

The accuracy of the proxy sessions will also have an impact on some of the variables, in particular the maximum number of daily sessions and the shortest gap between sessions. In these cases, a session may be mistakenly separated into two different sessions, resulting in the incorrect observation of more activities that are closer together.

Results From the detailed results below, during the 3 month period a typical player will have 5 sessions and at the most 3 in a week, or 4 in a month. However at the 90th percentile a player will have 40 sessions with up to 8 sessions in a day, 14 in a week and 25 in a month.

Page 79: REPORT 3: PREDICTING PROBLEM GAMBLERS: …...way gambling behaviour and specifically problem gambling behaviour is understood. Furthermore, new insights into gambling behaviours have

The histogram above shows the total number of sessions between 1-Sept and 30-Nov 2013. The median value is 5 sessions. The number of sessions at the 10th and 90th percentile is 1 and 40 sessions retrospectively.

The histogram above shows the maximum number of sessions from a player in any one day. The median value is 4 sessions. The number of sessions at the 10th and 90th percentile is 1 and 8 retrospectively.

The histogram above shows the average number of sessions per player per day. The median value is 1 session. The number of sessions at the 10th and 90th percentile is 1 and 3 respectively.

The histogram above shows the maximum number of sessions in a week for each player. The median value is 3. The number of weekly sessions at the 10th and 90th percentile is 1 and 14 respectively.

The histogram above shows the average number of sessions in a week for a player. The median weekly sessions is 2. The number of sessions at the 10th and 90th percentile is 1 and 7 respectively.

The histogram above shows the maximum number of monthly sessions for a player. The median value is 4. The number of sessions at the 10th and 90th percentile is 1 and 25 respectively.

Page 80: REPORT 3: PREDICTING PROBLEM GAMBLERS: …...way gambling behaviour and specifically problem gambling behaviour is understood. Furthermore, new insights into gambling behaviours have

The histogram above shows the average number of monthly sessions for a player. The median value is 3. The number of sessions in the 10th and 90th percentile is 1 and 18 respectively.

The histogram above shows the average number of days between sessions for each player. The median value is 5. The number of days at the 10th and 90th percentile is 2 and 19 respectively.

The histogram above shows the maximum number of consecutive days that a player has played the machines. The median value is 1. The values at the 10th and 90th percentile are 1 and 4 respectively.

The histogram above shows the shortest time gap between player sessions. The median value is 4:22:57 seconds. The values at the 10th and 90th percentile are 00:33:19 seconds and 9.00:42:20.

The histogram above shows the longest gap between player sessions. The median value is 11 days. The values at the 10th and 90th percentile are 0:42:22 hours and 32 days respectively.

The histogram shows the average gap between player sessions. The median value is 4 days. The values at the 10th and 90th percentile are 19:33:43 and 16.15:14:19 respectively.

Page 81: REPORT 3: PREDICTING PROBLEM GAMBLERS: …...way gambling behaviour and specifically problem gambling behaviour is understood. Furthermore, new insights into gambling behaviours have

1. Number of

Sessions 2. Maximum

Daily Sessions

3. Average Daily

Sessions

4. Maximum Weekly

Sessions

5. Average Weekly

Sessions Mean 16 4 2 6 3 Median 5 2 1 3 2 Percentile 5 1 1 1 1 1 10 1 1 1 1 1 25 2 1 1 1 1 50 5 2 1 3 2 75 15 4 2 6 4 90 40 8 3 14 7 95 69 11 4 21 10

6. Maximum Monthly

Sessions

7. Average Monthly

Sessions

8. Average Dates Between

Sessions

9. Maximum Successive

Playing Days Mean 10 8 8 2 Median 4 3 5 1 Percentile 5 1 1 1 1 10 1 1 2 1 25 1 1 3 1 50 4 3 5 1 75 11 8 10 2 90 25 18 19 4 95 40 29 27 6

10. Shortest Gap Between Sessions

11. Longest Gap Between Sessions

12. Average Gap Between Sessions

Mean 3.10:11:21 14.03:28:17 7.05:38:52 Median 0.04:22:57 10.23:51:19 4.06:28:47 Percentile 5 0.00:31:20 0.00:02:50 0.07:27:43 10 0.00:33:19 0.00:59:04 0.19:33:43 25 0.00:47:06 3.20:12:04 1.20:50:56 50 0.04:22:57 10.23:51:19 4.06:28:47 75 2.00:42:51 20.14:39:52 8.16:36:28 90 9.00:42:20 32.02:48:56 16.15:14:19 95 18.21:04:09 41.21:43:28 24.12:22:56

Page 82: REPORT 3: PREDICTING PROBLEM GAMBLERS: …...way gambling behaviour and specifically problem gambling behaviour is understood. Furthermore, new insights into gambling behaviours have

2) Duration of Play Aim To understand how long a player is at a machine for any particular session.

Measurement The length of a session as defined by the time difference between the timestamp at the

measurement will be seconds, and the start and end of the sessions will be determined by the proxy sessions.

Outputs The following variables are calculated for this marker of harm:

1. The longest session of play 2. The mean duration of play 3. 4. Total amount of play over the period 1-Sept to 30-Nov 5. Average daily duration of play 6. Maximum duration of play on a day 7. Average weekly duration of play 8. Maximum duration of play in a week 9. Average monthly duration of play 10. Maximum duration of play in a given month

Errors The errors will be similar to proxy sessions: a player may have used all available funds, then leave the machine to re-load. In this circumstance, a new session would be identified for the same player.

The length of session is likely to be determined by the amount of money a customer deposits and the size of any wins experience by the customer.

Results From the detailed results below for a typical player, we can see that the average session length is 0:12:53 and the mean of the players longest session is 0:32:30. The typical playing times over a day, week and month are 0:23:20, 0:32:39 and 0:49:53 respectively. In each of the histograms below the unit of measurement of the horizontal axis is hours.

The histogram above shows the longest playing session for each player. The median

The histogram shows the mean session duration for each player. The median value is

Page 83: REPORT 3: PREDICTING PROBLEM GAMBLERS: …...way gambling behaviour and specifically problem gambling behaviour is understood. Furthermore, new insights into gambling behaviours have

value is 0:32:30. The values at the 10th and 90th percentile are 0:02:15 and 11:01:19 respectively.

0:12:53. The values for the 10th and 90th percentile are 0:01:30 and 1:08:06 respectively.

This histogram shows the variance of a

ngth. The median value is 0:07:04. The values for the 10th and 90th percentiles are 0:00:00 and 1:32:04.

This histogram shows the total amount of time a player has played. The median value is 1:10:26.The values at the 10th and 90th percentiles are 0:02:48 and 19:14:25.

This histogram shows the average amount of daily play for each player. The median amount of time is 0:23:20. The values at the 10th and 90th percentiles are 0:01:40 and 2:11:34.

This histogram shows the maximum amount of time a player has played on any giving day. The mean amount of time is 0:42:38. The values at the 10th and 90th percentiles are 0:02:20 and 11:17:44.

This histogram shows the average amount of playing time in a weekly period. The median amount of time is 0:32:39. The values at the 10th and 90th percentiles are 0:01:49 and 4:05:21.

This histogram shows the maximum amount of time a player has played in any given week. The median amount of time is 0:50:15. The values at the 10th and 90th percentiles are 0:02:25 and 12:42:06.

Page 84: REPORT 3: PREDICTING PROBLEM GAMBLERS: …...way gambling behaviour and specifically problem gambling behaviour is understood. Furthermore, new insights into gambling behaviours have

This histogram shows the average amount of time a player has played over a month. The median amount of time is 0:49:53. The values at the 10th and 90th percentiles are 0:02:20 and 9:59:17.

The histogram shows the maximum amount of time a player has played over a month. The median amount of time is 1:00:39. The values at the 10th and 90th percentiles are 0:02:37 and 15:19:44.

1. Longest Sessions

2. Average Session Length

3. Session Length

Variance

4. Total Playing Time

5. Average Daily Playing

Time Mean 0.02:16:59 0.00:33:07 0.00:32:51 0.07:10:44 0.00:58:02 Median 0.00:32:30 0.00:12:53 0.00:07:04 0.01:10:26 0.00:23:20 Percentile 5 0.00:00:55 0.00:00:37 0.00:00:00 0.00:01:02 0.00:00:39 10 0.00:02:15 0.00:01:30 0.00:00:00 0.00:02:48 0.00:01:40 25 0.00:09:30 0.00:05:04 0.00:00:11 0.00:14:27 0.00:07:10 50 0.00:32:30 0.00:12:53 0.00:07:04 0.01:10:26 0.00:23:20 75 0.01:31:31 0.00:30:53 0.00:22:35 0.05:27:29 0.00:58:19 90 0.11:01:19 0.01:08:06 0.01:32:04 0.19:14:45 0.02:11:34 95 0.14:23:20 0.01:54:18 0.03:20:35 1.09:29:50 0.03:40:49

6. Maximum Daily Playing

Time

7. Average Weekly

Playing Time

8. Maximum Weekly

Playing Time

9. Average Monthly

Playing Time

10. Maximum

Monthly Playing Time

Mean 0.02:42:21 0.01:37:44 0.03:29:43 0.03:31:40 0.05:02:37 Median 0.00:42:38 0.00:32:39 0.00:50:15 0.00:49:53 0.01:00:39 Percentile 5 0.00:00:56 0.00:00:42 0.00:00:57 0.00:00:53 0.00:01:00 10 0.00:02:20 0.00:01:49 0.00:02:25 0.00:02:20 0.00:02:37 25 0.00:10:55 0.00:08:29 0.00:11:45 0.00:11:12 0.00:13:04 50 0.00:42:38 0.00:32:39 0.00:50:15 0.00:49:53 0.01:00:39 75 0.02:08:44 0.01:35:01 0.02:53:07 0.03:03:05 0.04:07:03 90 0.11:17:44 0.04:05:21 0.12:42:06 0.09:59:17 0.15:19:44 95 0.15:05:15 0.07:04:48 0.17:02:21 0.16:19:40 0.23:14:18

Page 85: REPORT 3: PREDICTING PROBLEM GAMBLERS: …...way gambling behaviour and specifically problem gambling behaviour is understood. Furthermore, new insights into gambling behaviours have

3) Net Expenditure Aim To understand how much a player is winning or losing on the machine.

Measure Net expenditure is calculated as the total stake amount minus the total win amount. This figure will exclude any bonuses credited to the player where such bonuses can be sufficiently identified.

Output The following variables are calculated for this marker of harm:

1. Total net expenditure over the three month time period 2. Maximum session net expenditure 3. Average session net expenditure 4. Daily average net expenditure 5. Daily maximum net expenditure 6. Weekly average net expenditure 7. Weekly maximum net expenditure. 8. Average monthly net expenditure 9. Maximum monthly net expenditure

Note daily and weekly figures are only calculated on days and weeks when the player actually played.

Errors If a player has had a significant win then this could skew the results, masking potential big losses before or after the win.

Results From the detailed results below it can be seen that a typical player has lost £24.33 over the 3 months. The typical maximum loss for a player in any session is £50.10. Over a day, week and month, a typical player would lose £8.22, £11.00 and £17.50 respectively.

Note that if the net expenditure amount is negative, this indicates the player has won money.

This histogram above shows the distribution of

This histogram shows the distribution of the

Page 86: REPORT 3: PREDICTING PROBLEM GAMBLERS: …...way gambling behaviour and specifically problem gambling behaviour is understood. Furthermore, new insights into gambling behaviours have

month period. The median value is £24.33. The values at the 10th and 90th percentiles are -£180.00 and £776.09 respectively.

given session. The median value is £50.10. The values at the 10th and 90th percentiles are £0.00 and £500.00 respectively.

The histogram above shows the distribution of

xpenditure. The median value is £5.07. The values at the 10th and 90th percentiles are -£30.99 and £86.67 respectively.

The histogram above shows the distribution of

median value is £8.22. The values at the 10th and 90th percentile are -£54.75 and £157.00 respectively.

The histogram above shows the distribution of

median value is £50.00. The values at the 10th and 90th percentile are -£3.75 and £558.50 respectively.

The histogram above shows the distribution of

median value is £11.00. The values at the 10th and 90th percentile are -£77.18 and £236.40 respectively.

The histogram above shows the distribution of each plThe median value is £47.60. The values at the 10th and 90th percentiles are £13.40 and £607 respectively.

The histogram above shows the distribution of

The median value is £17.50. The values at the 10th and 90th percentiles are -£122.40 and £441.81 respectively.

Page 87: REPORT 3: PREDICTING PROBLEM GAMBLERS: …...way gambling behaviour and specifically problem gambling behaviour is understood. Furthermore, new insights into gambling behaviours have

The histogram above shows the distribution of

The median value is £39.90. The values at the 10th and 90th percentile are £53.30 and £695.50 respectively.

1. Total Expenditure

2. Maximum Session

Expenditure

3. Average Session

Expenditure

4. Average Daily

Expenditure

5. Maximum Daily

Expenditure Mean 200.91 168.60 16.61 31.31 184.64 Median 24.33 50.10 5.07 8.22 50.00 Percentile 5 -451.44 -16.60 -95.36 -160.00 -60.00 10 -180.00 0.00 -30.99 -54.75 -3.75 25 -6.50 7.60 -1.67 -2.31 5.00 50 24.33 50.10 5.07 8.22 50.00 75 227.60 200.00 25.00 46.78 217.90 90 776.09 500.00 86.67 157.00 558.50 95 1380.75 792.90 175.00 294.90 900.00

6. Average Weekly

Expenditure

7. Maximum Weekly

Expenditure

8. Average Monthly

Expenditure

9. Maximum Monthly

Expenditure Mean 48.73 196.04 101.27 209.62 Median 11.00 47.60 17.50 39.90 Percentile 5 -214.98 -109.00 -319.80 -222.00 10 -77.18 -13.40 -122.40 -53.30 25 -3.00 4.00 -4.80 1.20 50 11.00 47.60 17.50 39.90 75 74.75 230.00 139.20 237.80 90 236.40 607.00 441.81 695.50 95 417.30 994.20 758.00 1176.78

Page 88: REPORT 3: PREDICTING PROBLEM GAMBLERS: …...way gambling behaviour and specifically problem gambling behaviour is understood. Furthermore, new insights into gambling behaviours have

4) Levels of Play Engagement Aim The aim is of this metric is to determine how involved a player is with their playing environment.

Measurement

been combined to measure how engaged each player is when their attributes are compared to the rest of the player base.

Output The output from this analysis is a ranking of all the players from the most engaged to the least engaged.

Errors Errors in this metric are due to the selection and reliability of the input variables used to

Results The tables below show snap-shots of the most engaged players ranked from 10,000 to 10,019 and 100,000 and 100,019 respectively. The total number of players used in this analysis was 244,450. Here we can see that the 10,000th most engaged player had 14 sessions, played 9 difference games, had an average session length of 3.5 hours and lost in total £137.80. The 100,000th most engaged player had 7 sessions, played only 1 game, had an average session length of 30 minutes and lost just under £400.

Sessions

Max Monthly

Time (days)

Average Session Length

Average Session Stakes

Number of Games Played

Player Loss

10000 14 2.1 3:34:28 206.86 9 137.8 10001 33 1.3 1:00:50 1624.80 15 1075 10002 137 0.9 0:12:49 98.76 31 -1010.15 10003 5 0.7 3:21:13 5697.27 4 180 10004 146 0.6 0:16:14 816.16 23 1223.55 10005 43 0.9 0:42:32 412.53 32 238.18 10006 110 1.1 0:16:53 99.98 31 59.7 10007 227 1.0 0:13:51 443.43 9 5092.7 10008 141 0.8 0:11:35 219.50 29 2203.09 10009 14 0.6 1:00:51 2959.51 28 -916.1 10010 60 0.6 0:38:15 1344.15 29 1215.55 10011 10 1.6 3:46:19 2991.12 1 810 10012 28 0.9 1:23:30 2972.85 11 2439.8 10013 155 0.7 0:15:05 132.60 31 1925.3 10014 97 0.6 0:20:55 242.85 36 3679.85 10015 36 0.9 1:06:30 538.77 28 -173.71

Page 89: REPORT 3: PREDICTING PROBLEM GAMBLERS: …...way gambling behaviour and specifically problem gambling behaviour is understood. Furthermore, new insights into gambling behaviours have

10016 23 0.7 1:09:17 3151.50 17 -2234 10017 92 1.7 0:47:24 196.21 12 663.99 10018 87 0.7 0:12:43 332.67 37 1114.25 10019 81 1.0 0:31:55 219.61 29 1952.4

Sessions

Max Monthly

Time (days)

Average Session Length

Average Session Stakes

Number of Games Played

Player Loss

100000 7 0.1 0:30:03 726.29 1 399.6 100001 12 0.1 0:10:05 118.31 12 -615.65 100002 2 0.0 0:17:11 3710.20 1 -100 100003 4 0.0 0:14:01 1658.45 6 2018.9 100004 6 0.0 0:19:36 1842.97 2 513.7 100005 18 0.1 0:09:21 177.73 9 367.6 100006 16 0.1 0:11:02 161.30 7 232.45 100007 39 0.2 0:07:37 22.60 4 161.55 100008 1 0.0 0:21:24 3556.00 2 -124 100009 4 0.0 0:22:16 307.53 12 149 100010 1 0.0 0:20:58 4167.00 1 -261 100011 8 0.1 0:23:08 375.53 5 -14.6 100012 11 0.1 0:17:38 74.08 10 42.45 100013 1 0.0 0:47:32 2342.60 1 22.4 100014 22 0.2 0:14:12 96.35 4 206.85 100015 1 0.0 0:46:48 763.00 6 -109.5 100016 11 0.1 0:11:16 159.08 12 166.25 100017 10 0.1 0:19:14 27.44 12 105.25 100018 6 0.1 0:35:09 108.80 8 31.5 100019 12 0.2 0:24:16 89.69 4 119.5

Page 90: REPORT 3: PREDICTING PROBLEM GAMBLERS: …...way gambling behaviour and specifically problem gambling behaviour is understood. Furthermore, new insights into gambling behaviours have

5) Number of Activities/Games Types Undertaken Aim The aim of this marker is to understand the different range of activities undertaken by the player. Here, we want to understand if this customer explores different styles of

We also want to understand if there is any increase in the number of activities as a player becomes more familiar with the machine.

Measurement The measurement will only be related to the different types of activities taking place within the context of the Gaming Machine.

Output The following variables will be calculated for this marker of harm:

1. The total number of different games played by this customer 2. The average number of games played by this customer per session 3. The percentage of bets placed by this player on the most popular game 4. The maximum number of games played in any one session 5. The percentage of bets on B2 games 6. The percentage of bets on B3 games 7. The number of different stake levels made by the player 8. 9. The increase in the number of B2 games played. This has been calculated as

the average of the number of games played in the current week, divided by the average number of games played in the preceding 7 weeks.

10. The increase in the number of B2 bets played, calculated on a weekly basis. This has been calculated as the proportion of the number of bets placed in the last 7 weeks of the data set compared to the total number of bets in the entire 14 weeks.

Errors The errors associated with this marker are due to staking levels, which may change within a session due to player wins or losses.

Results From the detailed results below, a typical player engages with 3 different games, but usually only one game in a session. Over 70% of the bets placed will be on the playerfavourite game. A majority of the bets (87%) are placed on B2 games.

Page 91: REPORT 3: PREDICTING PROBLEM GAMBLERS: …...way gambling behaviour and specifically problem gambling behaviour is understood. Furthermore, new insights into gambling behaviours have

This histogram shows the distribution of the number of different games played by each player. The median value is 3. The values at the 10th and 90th percentiles are 1 and 17 respectively.

The histogram shows the average number of games played in a session by each player. The median value is 1. The values at the 10th and 90th percentiles are 1 and 3 respectively.

This histogram shows the proportion of bets

onal favourite game. The median value is 70%. The values at the 10th and 90th percentiles are 30% and 100% respectively.

This histogram shows maximum number of different games played by a player in any one session. The median value is 2 games. The values at the 10th and 90th percentiles are 1 and 7 respectively.

The histogram above shows the distribution of the proportion of bets that the player makes on B2 games. The median value is 87%. The values at the 10th and 90th percentiles are 0.4% and 100% respectively.

The histogram above shows the distribution of the proportion of best that a player makes on B3 games. The median value is 0%. The values at the 10th and 90th percentiles are 0% and 96% respectively.

Page 92: REPORT 3: PREDICTING PROBLEM GAMBLERS: …...way gambling behaviour and specifically problem gambling behaviour is understood. Furthermore, new insights into gambling behaviours have

This histogram shows the distribution of the number of different stake values bet by each player. The median value is 15. The values at the 10th and 90th percentiles are 1 and 118 respectively. This value at the 90th percentile was higher than expected, but this is due to players being able to make multiple bets

This histogram shows the distribution of the number of different stake values the player has chosen for their first bet. The median value is 3. The values at the 10th and 90th percentile are 1 and 14 respectively.

This histogram shows the distribution over the rate at which players are increasing the amount of B2 games they play. The median rate is 0.13. The values at the 10th and 90th percentile are 0 and 2 respectively.

This histogram shows the distribution over the proportion of B2 bets placed in the 2nd half of the data provided. The median value is 37.2%. The values at the 10th and 90th percentiles are 0% and 100% respectively.

Page 93: REPORT 3: PREDICTING PROBLEM GAMBLERS: …...way gambling behaviour and specifically problem gambling behaviour is understood. Furthermore, new insights into gambling behaviours have

1. Number of Unique

Games

2. Average Unique

Session Games

3. Percentage of Favourite

Game

4. Maximum Unique

Session Games

5. Percentage of B2 Game

Bets Mean 7 2 69% 3 63.8% Median 3 1 70% 2 86.9% Percentile 5 1 1 23% 1 0.0% 10 1 1 30% 1 0.4% 25 1 1 45% 1 19.7% 50 3 1 70% 2 86.9% 75 8 2 100% 4 100.0% 90 17 3 100% 7 100.0% 95 25 4 100% 10 100.0%

6. Percentage of B3 Game

Bets

7. Unique Stake

Values per Session

8. Unique First Stake Values per

Session

9. Increase in B2

Games

10. Increase in B2 Bets

Mean 23.2% 42 6 0.73 45.0% Median 0.0% 15 3 0.13 37.2% Percentile 5 0.0% 1 1 0.00 0.0% 10 0.0% 2 1 0.00 0.0% 25 0.0% 4 1 0.00 0.0% 50 0.0% 15 3 0.13 37.2% 75 42.6% 50 7 1.00 100.0% 90 96.0% 118 14 2.00 100.0% 95 100.0% 178 20 3.00 100.0%

Page 94: REPORT 3: PREDICTING PROBLEM GAMBLERS: …...way gambling behaviour and specifically problem gambling behaviour is understood. Furthermore, new insights into gambling behaviours have

6) Chasing Aim The aim of this marker is to identify sessions where the intent of the player is to win back the money that was lost in the previous session. Within this marker, player reloading of the Gaming Machine within an ongoing session is not analysed.

Measurement As the emotional state of the player cannot be inferred from the data, this marker is more challenging to measure. From the data, we can only measure correlations between the outcome of one session, to the time until the subsequent session and any actions measured in that session.

Output The following variables will be calculated for this marker of harm:

1. The number of sessions where the player lost money 2. The percentage of sessions where the player lost money 3. Impact of the initial deposit value on winning and losing sessions. This output

has been calculated as the median initial deposit in the sessions following a session where the player made a profit, minus the median initial deposit in the sessions following a session where the player made a loss. A value greater than 0 indicates that the player typically deposits more after a winning session.

4. Impact of the time between sessions after winning and losing sessions. This is the median time between a winning session and beginning the next session, minus the median time between a losing session and beginning the next session. A value greater than 0 indicates the player typically returns for a new session sooner if they lost money in their last session.

Errors There are no specific errors associated with the variables that have been calculated for this marker. However, we believe further analysis may provide more insight into this particular marker.

Results From the detailed results below, a typical player has had 3 losing sessions during the 3 month period which corresponds to 67% of their sessions. Using the data to uncover signs of chasing between sessions has proven difficult. Metric 3 indicates that the amount won or lost -in. Metric 4 shows that after a losing session, a player is more likely to start the next session 2 minutes later. Neither of these results is strong enough to show a particular correlation between winning and losing sessions and the behaviours observed in the next sessions.

The figure shown at the end of this section shows the relationship between the amount lost by a player in one session (on the horizontal axis), and the amount deposited by the player on the next session (on the vertical axis). Interestingly, here we can see that a

Page 95: REPORT 3: PREDICTING PROBLEM GAMBLERS: …...way gambling behaviour and specifically problem gambling behaviour is understood. Furthermore, new insights into gambling behaviours have

This histogram shows the distribution of the number of losing sessions experienced by a player. The median value is 3. The values at the 10th and 90th percentiles are 0 and 26 respectively.

This histogram shows the distribution of the proportion of losing sessions that a player experiences. The median value is 67%. The values at the 25th and 75th percentiles are 50% and 100% respectively.

This histogram shows the distribution of the

winning or losing session. The median value is £0.00. The values at the 10th and 90th percentile are -£10.00 and £10.00 respectively.

This histogram shows the distribution in the change in time between sessions after a winning or losing session. The median value is 0:01:53. The values at the 10th and 90th percentile are -3.19:01:20 and 4:04:24:31 respectively.

1. Number of Losing

Sessions

2. Percentage of Losing Sessions

3. Initial Deposit Difference

Between Winning and

Losing

4. Session Gap Difference

Between Winning and

Losing Mean 10.4 64% 1.5 0.03:56:42 Median 3.0 67% 0.0 -0.00:01:53 Percentile 5 0.0 0% -16.0 -7.10:12:24 10 0.0 0% -10.0 -3.19:01:20 25 1.0 50% -2.8 -0.18:56:30 50 3.0 67% 0.0 -0.00:01:53 75 10.0 100% 3.9 0.18:12:39 90 26.0 100% 10.0 4.04:24:31 95 45.0 100% 20.0 8.22:21:48

Page 96: REPORT 3: PREDICTING PROBLEM GAMBLERS: …...way gambling behaviour and specifically problem gambling behaviour is understood. Furthermore, new insights into gambling behaviours have
Page 97: REPORT 3: PREDICTING PROBLEM GAMBLERS: …...way gambling behaviour and specifically problem gambling behaviour is understood. Furthermore, new insights into gambling behaviours have

Within Session Metrics 1) Debit Card Payment Reloading and Switching Aim The aim of this marker is to understand if a player uses a debit card as a secondary payment method within their gaming session.

Measurement The measurement is only applied to sessions where the first cash-in is a non-debit card, and at least one subsequent sequence of cash-ins is associated with a debit card.

Output The following variables are calculated for this marker of harm:

1. If the session initiated by a debit card transaction 2. The number of debit-card transactions in the session where at least one debit

card transaction took place 3. Total value of the debit card cash-ins in the session 4. Total value of other cash-ins in the session

The following metrics are only applied to sessions where the first cash-in sequence does not include a debit card and at least one later sequence of cash-ins uses a debit card. A sequence is defined by one or more cash-any bet events.

5. If the session initiated by a non-debit card transaction but a subsequent debit card transaction occurred

6. If the player started the session with cash (or voucher), the number of subsequent cash-in events which are associated by a debit card

7. The value of the initial cash-in sequence

Errors It is difficult to map all debit card payments to amounts transferred to Gaming Machines. For example, a player may withdraw £100 from their debit card, place a sports bet for £20 and transfer £80 to a Gaming Machine. Alternatively, the player may have £50 cash and transfer this with the £100 withdrawn from the debit card on to the Gaming Machine. In this instance, the data only indicates that there is a £150 transfer.

Results Due to the time constraints required to generate this report, has been used to generate the first 6 variables. Due to the confidential nature of this non-aggregated data, these results have been removed from the report. The 7th variable has been calculated across all of the operators. From the results, we could identify that 2% of sessions involved the use of a debit card.

Page 98: REPORT 3: PREDICTING PROBLEM GAMBLERS: …...way gambling behaviour and specifically problem gambling behaviour is understood. Furthermore, new insights into gambling behaviours have

This histogram shows the distribution of the initial amount cashed into a session. The median value is £8.00. The values at the 10th and 90th percentiles are £1.00 and £20.00 respectively.

7. Initial Cash In Amount Mean 13.58 Median 8 Percentile 5 1 10 1 25 2.4 50 8 75 20 90 20 95 40

Page 99: REPORT 3: PREDICTING PROBLEM GAMBLERS: …...way gambling behaviour and specifically problem gambling behaviour is understood. Furthermore, new insights into gambling behaviours have

2) Debit Card Payment Decline Aim The aim of this marker is to identify if a player has exhausted his/her bank balance.

Measurement This maker could be calculated from the debit card transaction result returned from the

payment terminals.

Output Unfortunately, due to the time restrictions of this project and data availabipossible to explore this marker of harm.

Page 100: REPORT 3: PREDICTING PROBLEM GAMBLERS: …...way gambling behaviour and specifically problem gambling behaviour is understood. Furthermore, new insights into gambling behaviours have

3) Variability In Staking Behaviour Aim The aim of this marker is to understand how a player s staking changes within a session; in particular, are changes in the stake value occurring due to a win (because more money is available to play with), or do changes begin to increase after a period of no wins?

Measurement

this changes under different scenarios which take place during the session.

Output The following variables will be calculated for this marker of harm:

1. 2.

level before increasing?) 3. The number of wins in the session 4. The total amount won in the session 5. The number of bets in the session 6. The total amount bet in the session 7. The average ratio of the stake size after and before a win (e.g. for each win,

calculate the post-stake value divided by the pre-stake value and average the amounts over the session)

Errors The impact of a player winning may influence the level at which they are comfortable staking.

Results The results below show that, for a typical player, the median total value from the 8 bets was £29. Typically 3 of the bets will result in a win, but there has been little evidence to suggest that a win results in an increase of stake value. The typical standard deviation is £0.81, which is a low amount of variation.

Page 101: REPORT 3: PREDICTING PROBLEM GAMBLERS: …...way gambling behaviour and specifically problem gambling behaviour is understood. Furthermore, new insights into gambling behaviours have

This histogram shows the distribution in the variance of stakes over all of the sessions. The median value is 0.67 pounds squared. The values at the 10th and the 90th percentiles are 0.00 and 8.28 pounds squared respectively.

This histogram shows the distribution over how a playesessions. Values above 1 indicate that the staking level increased. The median value is 1. The value at the 10th and 90th percentiles are 0.611 and 2 retrospectively.

This histogram shows the distribution of the number of winning bets that a player has over a session. The median value is 3. The values at the 10th and 90th percentiles are 0 and 26 respectively.

This histogram shows the distribution of total winnings across the sessions. The median value is £20. The values at the 10th and 90th percentiles are £0.00 and £391.50 respectively.

The histogram above shows the distribution of the number of bets played across the sessions. The median value is 8. The values at the 10th and 90th percentiles are 1 and 86 respectively.

The histogram above shows the distribution of the total amount bet in the sessions. Note that this variable will be different to the total amount cashed in by the player, as it will include amounts re-staked from previous wins in the session. The median value is £29.00. The values at the 10th and 90th percentile are

Page 102: REPORT 3: PREDICTING PROBLEM GAMBLERS: …...way gambling behaviour and specifically problem gambling behaviour is understood. Furthermore, new insights into gambling behaviours have

£1.55 and £400.00 respectively.

The histogram above shows the distribution of the ratio in stake before and after a win. If the ratio is greater than 1 then the player has increased their stake size. The median value for this variable is 1. The values at the 10th and 90th percentile are 0 and 1.35 respectively.

1. Player's Stake Variance

2. Stake Value Gradient

3. Number of Winning Sessions

4. Amount won in Winning Sessions

5. Number of Bets Placed in Sessions

Mean 2.89 2.05 10.53 186.81 37.26 Median 0.67 1 3 20 8 Percentile 5 0.00 0 0 0 1 10 0.00 0 0 0 1 25 0.00 0 0 0 2 50 0.67 1 3 20 8 75 3.25 2 10 111.6 28 90 8.28 5 26 391.5 86 95 13.56 9 44 783.7265 164

6. Amount Bet in Sessions

7. Stake Value Ratio Before/After Win

Mean 193.74 1.86 Median 29 1 Percentile 5 1 0.11 10 1.55 0.611 25 6.2 1 50 29 1 75 120.4 1.11 90 400 2 95 792.8 3.43

Page 103: REPORT 3: PREDICTING PROBLEM GAMBLERS: …...way gambling behaviour and specifically problem gambling behaviour is understood. Furthermore, new insights into gambling behaviours have

4) Use of Autoplay Aim

Gaming Machine.

Measurement As there is no autoplay feature on the Gaming Machines, we instead examined the gap between bets to identify if the customer is playing as fast as the machine will allow.

any choices about the bet about to be made. Data was not available in the supplied data set.

Output The following variables will be calculated for this marker of harm:

1. The number of gaps between B2 bets which are at the 20 second legal limit 2. The percentage of gaps between B2 bets which are at the 20 second legal

limit 3. The average gap between bets on B2 games

Errors Note that the values calculated below only include games from machines supplied by Scientific Games. Inspired Gaming machines data was not used, due to the way in which it had been transformed for the preliminary analysis.

Results The results below show that a majority of sessions on B2 games are longer than the legal limit of 20 seconds. The typical gap between bets is 31.7 seconds. Approximately 10% of all sessions contain bets which are only staked at the minimum legal gap, however we observe from the first variable that a significant majority of these sessions have a small number of bets.

This histogram shows the number of bets at the legal 20 second B2 limit. The median value is 0. The values at the 10th and 90th percentiles are 0 and 15 bets.

This histogram shows the percentage of bets within a session which are at the legal limit. The median value is 40%. The values at the 10th and 90th percentiles are 10% and 100% respectively.

Page 104: REPORT 3: PREDICTING PROBLEM GAMBLERS: …...way gambling behaviour and specifically problem gambling behaviour is understood. Furthermore, new insights into gambling behaviours have

This histogram shows the distribution of the average gap between B2 games within a session. The median value is 31.7 seconds. The values at the 10th and 90th percentiles are 22.7 seconds and 53.8 seconds.

1. Number of B2 Bets at 20 Second Limit

2. Percentage of B2 Bets at 20 Second

Limit

3. Average Gap Between B2 Bets

(Seconds) Mean 8.64 0.48 35.4 Median 0 0.40 31.7 Percentile 5 0 0.07 21.5 10 0 0.10 22.7 25 0 0.20 25.8 50 0 0.40 31.7 75 2 0.78 41.1 90 15 1.00 53.8 95 39 1.00 63.2

Page 105: REPORT 3: PREDICTING PROBLEM GAMBLERS: …...way gambling behaviour and specifically problem gambling behaviour is understood. Furthermore, new insights into gambling behaviours have

5) Play of Multiple Machines Simultaneously Aim The aim of this marker is to understand if a player is using multiple machines at the same time.

Measurement It is very difficult to measure this marker from the data, as it is challenging to accurately identify a player from their data. It may be possible to examine machines which are located in proximity to each other that have sessions that start at roughly the same time, where the same game is played with the same staking levels and there is a correlation in the play between the two machines (e.g., wherein the play on one machine is slightly delayed when compared to another in its proximity). It may also be possible to look at debit card transactions which have been split and then transferred to two machines.

Output No outputs are calculated for this metric due to the difficulties associated with measurement.

Page 106: REPORT 3: PREDICTING PROBLEM GAMBLERS: …...way gambling behaviour and specifically problem gambling behaviour is understood. Furthermore, new insights into gambling behaviours have

6) Stake Size Aim The aim is to identify the value of the stakes in a session.

Measurement The measurement is based on the stake value.

Output The following variables will be calculated for this marker of harm:

1. Minimum stake size of that session 2. Number of bets at the minimum stake size 3. Maximum stake size of that session 4. Number of bets at the maximum stake size 5. Proportion of stakes at each betting level

Errors There are no errors associated with the measurement of this metric.

Results The results below show that a typical player will bet at two different stake values in a session. The average amount is £3.53. The median minimum amount is £1.80 and the median maximum amount is £5.40. A player will typically place 16 bets at the lowest value in that session.

The histogram above shows the distribution of the minimum stake value in each session. The median value is £1.80. The values at the 10th and 90th percentile are 20p and £10.00 respectively.

The histogram above shows the distribution of the number of bets at the minimum stake amount for that session. The median value is 16. The values at the 10th and 90th percentile are 3 and 120 respectively.

Page 107: REPORT 3: PREDICTING PROBLEM GAMBLERS: …...way gambling behaviour and specifically problem gambling behaviour is understood. Furthermore, new insights into gambling behaviours have

The histogram above shows the distribution of the maximum stake amount across the sessions. The median value is £5.40. The values at the 10th and 90th percentile are 50p and £37.60 respectively.

The histogram above shows the distribution of the number of best at the maximum B2 stake level (£100). The median value is 0.

The histogram above shows the distribution of the average stake value across the sessions. The median value is £3.53. The values at the 10th and 90th percentiles are £0.50 and £21.18 respectively.

1. Minimum

Bet Amount in Session

2. Number of Bets at Minimum Session Amount

3. Maximum Bet Amount

In Session

4. Number of Bets at Maximum

Session Amount

5. Average Bet Value in

Session

Mean 4.45 52.83 13.89 3.11 8.57 Median 1.8 16 5.4 0 3.53 Percentile 5 0.2 2 0.25 0 0.24 10 0.2 3 0.5 0 0.50 25 1 6 1.8 0 1.05 50 1.8 16 5.4 0 3.53 75 5 46 18 0 10.00 90 10 120 37.6 0 21.18 95 20 215 60 1 34.00

Page 108: REPORT 3: PREDICTING PROBLEM GAMBLERS: …...way gambling behaviour and specifically problem gambling behaviour is understood. Furthermore, new insights into gambling behaviours have

7) Game Volatility Aim

characteristics. A low volatile game is defined as one with frequent small winnings whilst a high volatile game has infrequent large winnings.

Measurement The measurement examines the proportion of switching between games with different levels of volatility.

Output The following variables will be calculated for this marker of harm:

1. Number of bets on low volatile games 2. Proportion of bets on low volatile games 3. Number of bets on high volatile games 4. Proportion of bets on high volatile games 5. Number of changes from a low volatile game to a high volatile game 6. Number of changes from a low volatile game to another low volatile game 7. Number of changes from a high volatile game to a low volatile game 8. Number of changes from a high volatile game to another high volatile game

Errors The behaviours associated with variables calculated for this metric may not be due to a conscious decision of the player. For example, changes observed may be due to

game has either a high or low volatility.

Results From the results below, we can see that a majority of players place most of their bets on low volatility games (frequent small wins). Approximately 67% of sessions contain bets on only low volatility games. The table below shows the average number of times a player changes games to the same or different volatilities. From this table, we can see that players are more likely to keep playing games with the same level of volatility.

To Low Volatile Game

To High Volatile Game

From Low Volatile Game 0.11 0.07 From High Volatile Game 0.08 0.22

Page 109: REPORT 3: PREDICTING PROBLEM GAMBLERS: …...way gambling behaviour and specifically problem gambling behaviour is understood. Furthermore, new insights into gambling behaviours have

This histogram shows the distribution across sessions of the number of stakes on low volatile games. The median value is 3. The values at the 10th and 90th percentile are 0 and 28 respectively.

This histogram shows the distribution across the sessions of the proportion of bets on low volatile games. Approximately 67% of sessions are played on only low volatile games.

This histogram shows the distribution across sessions of the number of bets on high volatile games. The median value is 0. The values at the 10th and 90th percentile are 0 and 41 respectively.

This histogram shows the distribution across the sessions of the proportion of bets on high volatile games. Approximately 7% of sessions are played on only high volatile games.

1. Number of Bets on Low Volatile Games

2. Proportion of Bets on Low Volatile Games

3. Number of Bets on High Volatile Games

4. Proportion of Bets on High Volatile Games

Mean 11.58 0.69 19.15 0.20 Median 3 1 0 0 Percentile 5 0 0 0 0 10 0 0 0 0 25 1 0.01 0 0 50 3 1 0 0 75 11 1 0 0 90 28 1 41 1 95 47 1 104 1

Page 110: REPORT 3: PREDICTING PROBLEM GAMBLERS: …...way gambling behaviour and specifically problem gambling behaviour is understood. Furthermore, new insights into gambling behaviours have

6. Switch Between Low Volatile Games

7. Switch from a High to a Low Volatile Game

8. Switch Between High Volatile Games

Mean 0.11 0.08 0.22 Median 0 0 0 Percentile 5 0 0 0 10 0 0 0 25 0 0 0 50 0 0 0 75 0 0 0 90 0 0 0 95 1 1 1

Page 111: REPORT 3: PREDICTING PROBLEM GAMBLERS: …...way gambling behaviour and specifically problem gambling behaviour is understood. Furthermore, new insights into gambling behaviours have

8) Way Game Played (e.g. number of bets per stake) Aim The aim is to examine the different ways a session can be characterised.

Measurement For this marker, we have examined some of the key components of how a session can be described, such as its length, number of bets and games played, money spent and net position of the player.

Output The following variables will be calculated for this marker of harm:

1. The session length 2. Number of times stake value increases 3. Number of times stake value decreases 4. Number of different games played 5. Number of different game types played (e.g., collating all of the roulette game

types together) 6. Total amount cashed in during the session 7. Net position of the session (e.g., total stake minus total win)

Errors Any errors associated with the variables calculated for this marker will be associated with the accuracy of the proxy session calculation.

Results From the detailed results below, the typical session is 0:03:52 long with the player cashing in £12.30 and losing £3.50. A player is more likely to decrease than increase their stake. The player is most likely to play on a single game.

This histogram shows the distribution across the session lengths. The median session is 0:03:58 long. The values at the 10th and 90th percentile are 0:00:26 and 0:25:54.

This histogram shows the distribution of the number of times a player increases the amount staked across the session. The median value is 0. The values at the 10th and 90th percentile are 0 and 66 respectively.

Page 112: REPORT 3: PREDICTING PROBLEM GAMBLERS: …...way gambling behaviour and specifically problem gambling behaviour is understood. Furthermore, new insights into gambling behaviours have

This histogram shows the distribution of the number of times a player decreases the amount staked across the session. The median value is 1. The values at the 10th and 90th percentile are 0 and 10 respectively.

The histogram shows the distribution of the number of different games played across the sessions. The median value is 1. The values at the 10th and 90th percentiles are 1 and 2 respectively.

The histogram shows the distribution of the number of types of games played in a session. The median value is 1. Only in the 95th percentile does the value increase to 2.

This histogram shows the distribution of the amount cashed into each session. The median value is £12.30. The values at the 10th and 90th percentile are £1.50 and £100.00 respectively.

This histogram shows the distribution of the net position across the sessions. A negative value indicates a win for the player. The median value is £3.50. The values at the 10th and 90th percentiles are -£60.00 and £51.50 respectively.

Page 113: REPORT 3: PREDICTING PROBLEM GAMBLERS: …...way gambling behaviour and specifically problem gambling behaviour is understood. Furthermore, new insights into gambling behaviours have

1. Session Length (Minutes)

2. Number of Stake Value Increases

3. Number of Stake Value Decreases

4. Unique Games Played in Session

5. Unique Game Types Played in Session

Mean 0:17:50 27.08 3.63 1.34 1.07 Median 0:03:58 0 1 1 1 Percentile 5 0:00:12 0 0 1 1 10 0:00:26 0 0 1 1 25 0:01:16 0 0 1 1 50 0:03:58 0 1 1 1 75 0:10:52 8 4 1 1 90 0:25:54 66 10 2 1 95 0:45:53 145 16 3 2

6. Total Session Cash In Amount

7. Net Position of Session

Mean 43.45 -9.75 Median 12.30 3.50 Percentile 5 1.00 -161.40 10 1.50 -60.00 25 5.00 -5.25 50 12.30 3.50 75 40.00 20.00 90 100.00 51.50 95 173.60 100.00

Page 114: REPORT 3: PREDICTING PROBLEM GAMBLERS: …...way gambling behaviour and specifically problem gambling behaviour is understood. Furthermore, new insights into gambling behaviours have

9) Cash-Out Aim The aim is to understand whether players will stop playing when they win, or play to a zero balance.

Measurement To calculate this mato cash out and then identified some of the key characteristics of this decision.

Output The following variables will be calculated for this marker of harm:

1. If the player cashed out, the return generated on their initial winnings 2. If the player cashed out, the total amount cashed out 3. The number of cash-out sequences 4. When a player cashed out, the proportion of the balance that was cashed out

(e.g., this will be less than 100% when players continue to play after winning)

Errors The player may cash out on one particular machine, and then cash the same money back into another machine.

Results From the detailed results below, we can see that approximately 25% of sessions result in a player cashing out. When a player cashes out, they are typically taking out £40.00, or a return of 178% of their original cash-in. Also, in over 90% of cash-out sessions the customer is withdrawing their entire balance.

This histogram shows the distribution of the return that players have achieved based on the percentage of the total amount cashed in. This metric is only calculated for sessions where the player cashed out. The median value is 178%. The values at the 10th and 90th percentile are 37% and 875% respectively.

This histogram shows the distribution of the total amount cashed out by players in sessions were at least one cash out was recorded. The median value is £40.00. The values at the 10th and 90th percentile are £4.80 and £316.80

Page 115: REPORT 3: PREDICTING PROBLEM GAMBLERS: …...way gambling behaviour and specifically problem gambling behaviour is understood. Furthermore, new insights into gambling behaviours have

This histogram shows the distribution over the number of cash-out sequences made by the player. If the value is greater than 1, it indicates the player withdrew some winnings, played for a bit longer, and then withdrew again. The median value for is 0. At the 75th and 95th percentile the values are 1 and 2 respectively.

This histogram examines the proportion of the available balance that a player decides to cash-out. The median value is 100%. At the 5th and 10th percentile the values are 93% and 100% respectively.

1. Player Percentage

Return

2. Value Cashed Out

3. Number of Cash Out

Sequences

4. Proportion of Available Balance

Cashed Out Mean 9.87 125.65 0.53 0.99 Median 1.78 40.00 0 1.00 Percentile 5 0.09 0.80 0 0.93 10 0.37 4.80 0 1.00 25 1.00 12.00 0 1.00 50 1.78 40.00 0 1.00 75 3.72 108.00 1 1.00 90 8.75 316.80 1 1.00 95 17.63 550.00 2 1.00

Page 116: REPORT 3: PREDICTING PROBLEM GAMBLERS: …...way gambling behaviour and specifically problem gambling behaviour is understood. Furthermore, new insights into gambling behaviours have

Appendix C Representativeness of Loyalty Card Data As this research programme has focused on the analysis of loyalty card players, it is useful to understand the representativeness of this dataset. To examine the representativeness we calculated 8 within session metrics for the registered players and for the entire data set. The histograms for each of these calculations are shown in the Figures below. The red columns represent metrics from registered players and the blue columns represent metrics from the entire data set.

Visually, in these figures it can be observed that the registered players are over-

a metric based on the Kolmogorov-Smirnov test. This test essentially provides a metric calculating the degree of difference between two statistical distributions. The results of applying this test are shown in Table 13. A higher value in this table indicates a poor correspondence between the two distributions.

In general, there is a reasonable fit between the registered sessions and the entire population, the main exception being the length of session and amount cashed in. The registered sessions are biased to longer sessions with higher amounts of money cashed in. This is likely due to registered players being more engaged players, and also to the fact that players may only insert their card when they plan to have a longer session on the Gaming Machine.

Variable Modified Kolmogorov-Smirnov Value

Number of winning sessions 0.113 Amount won 0.118 Number of pets placed 0.118 Total amount bet in session 0.071 Average bet value in session 0.102 Session length 0.281 Total session cash in 0.231 Net position of session 0.102

Table 13 - Measurement of the representativeness of registered sessions

Page 117: REPORT 3: PREDICTING PROBLEM GAMBLERS: …...way gambling behaviour and specifically problem gambling behaviour is understood. Furthermore, new insights into gambling behaviours have

The histogram above shows the distribution of the number of winning bets in a session. All sessions are represented by blue and registered sessions by red. The modified Kolmogorov-Smirnov result is 0.113.

The histogram above shows the distribution of the total winnings across the sessions. All sessions are represented by blue and registered sessions by red. The modified Kolmogorov-Smirnov result is 0.072.

Page 118: REPORT 3: PREDICTING PROBLEM GAMBLERS: …...way gambling behaviour and specifically problem gambling behaviour is understood. Furthermore, new insights into gambling behaviours have

The histogram above shows the distribution of the number of bets placed across the sessions. All sessions are represented by blue and registered sessions by red. The modified Kolmogorov-Smirnov result is 0.118.

The histogram above shows the distribution of the total amount bet across the sessions. All sessions are represented by blue and registered sessions by red. The modified Kolmogorov-Smirnov result is 0.071.

Page 119: REPORT 3: PREDICTING PROBLEM GAMBLERS: …...way gambling behaviour and specifically problem gambling behaviour is understood. Furthermore, new insights into gambling behaviours have

The histogram above shows the distribution of the average stake across the sessions. All sessions are represented by blue and registered sessions by red. The modified Kolmogorov-Smirnov result is 0.102.

The histogram above shows the distribution of session lengths. All sessions are represented by blue and registered sessions by red. The modified Kolmogorov-Smirnov result is 0.281.

Page 120: REPORT 3: PREDICTING PROBLEM GAMBLERS: …...way gambling behaviour and specifically problem gambling behaviour is understood. Furthermore, new insights into gambling behaviours have

The histogram above shows the distribution of amounts cashed into each session. All sessions are represented by blue and registered sessions by red. The modified Kolmogorov-Smirnov result is 0.213.

The histogram above shows the distribution of the net position across each session. All sessions are represented by blue and registered sessions by red. The modified Kolmogorov-Smirnov result is 0.102.

Page 121: REPORT 3: PREDICTING PROBLEM GAMBLERS: …...way gambling behaviour and specifically problem gambling behaviour is understood. Furthermore, new insights into gambling behaviours have

Appendix D Candidate Predictive Modelling Approaches Explored by RTI

algorithms utilised by RTI when exploring the ability to distinguish between problem and non-problem gamblers.

Ma in c and id a te pre d ic t ive mod e l. Our baseline model is a logistic step-wise regression model which can capture both: main effects and heterogeneity based on the interactions. Higher order interactions were identified by using classification trees [Hastie and Tibshirani, 2009]. These interactions were added to the regression model to provide possible improvement. Thus, our basic model has a form:

𝑌~𝛽0 +∑𝛽1𝑖𝑋𝑖𝑖

+∑𝛾𝑖𝑗𝑋𝑗𝑋𝑖𝑖𝑗

,

where Y indicates the outcome, X i corresponds to a set of influential predictor variable, coefficients beta and gamma correspond to the main effects and the interactions. In addition to this simpler model we used advanced methods that include classification tree ensembles, random forests, artificial neural networks (ANN), and support vector machines (SVMs). Below is a brief description of methodology that we have used.

C lass if ic a t ion Tre e Ense mb les and Random Fores ts: Classification trees recursively partition subjects into groups in such a manner that groups are as internally homogeneous as possible, while cross-group heterogeneity is maximized. Splits are made in order of decreasing statistical significance, i.e., beginning with the most significant split. The predicted value for an individual is calculated as the proportion with the outcome for the subjects within its terminal node. In order to assess the stability, the procedure is repeated many times on bootstrapped samples, thus forming

outcomes involves the aggregation (e.g., using means) of predicted values over the ensemble. Random forests extend tree ensembles to incorporate randomization in the node splitting process, when only a random subset of weak predictors is allowed to enter the model. This additional use of randomization allows the model to incorporate useful, but weaker, predictors that otherwise would be masked by stronger predictors. The size of the ensembles and forests will be in the range of 400-1,000 trees.

Random forests also allow one to rank the variables according to their impact on prediction. Each variable was randomly reshuffled one at a time and the variables were ranked according to the loss of prediction power. The more loss in prediction a scrambled variable incurred the more influential it was.

Art if ic ia l N eura l N e tworks (AN Ns): We will consider an application of non-parametric methods such as artificial neural networks. ANN algorithms resemble a network of interconnecting functions (hidden components) where an output of one or several components becomes an input into another component. Although the structure of each component is clearly defined with functional form usually resembling a sigmoid, the entire neural network can become very complex and thus ANNs are often referred to as

other methods, interpretability is rarely possible.

Sup port Ve c tor Ma ch ines: Support vector machine analysis for a dichotomous outcome, which endeavors to correctly separate subject data into proper groups based

Page 122: REPORT 3: PREDICTING PROBLEM GAMBLERS: …...way gambling behaviour and specifically problem gambling behaviour is understood. Furthermore, new insights into gambling behaviours have

on covariate information, is similar in concept to linear discrimination. Use of support vector machine methodology to build predictive models is described in Hastie et al. Similar to classification tree ensembles and random forests, all potential predictors are considered simultaneously in the model building process.

Page 123: REPORT 3: PREDICTING PROBLEM GAMBLERS: …...way gambling behaviour and specifically problem gambling behaviour is understood. Furthermore, new insights into gambling behaviours have

Appendix E Example of transformed between session variables. variables and the range of values that are exhibited by both problem and non-problem gamblers after transformation. Each of these figures show that there is a high-degree of overlap between the problem and non-problem gamblers, illustrating the challenge of distinguish these players.

To be able to distinguish these players we are relying the predictive algorithms to identify combinations of these variables that when considered together, provide a high degree of predictive power.

Figure 30 - The range of values taken of for the log transformed maximum amount of total session time on a given day. Shown for problem and non-problem gamblers.

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

-0.1

90.

190.

580.

971.

351.

742.

122.

512.

903.

283.

674.

064.

444.

835.

215.

605.

996.

376.

767.

157.

537.

928.

318.

699.

089.

469.

8510

.24

10.6

211

.01

11.4

011

.78

Log of the maximum amount of total session time on a given day

Non-Problem Gambler Problem Gambler

Page 124: REPORT 3: PREDICTING PROBLEM GAMBLERS: …...way gambling behaviour and specifically problem gambling behaviour is understood. Furthermore, new insights into gambling behaviours have

Figure 31 - The range of values taken of for the log transformed maximum amount of lost on a given day. Shown for problem and non-problem gamblers.

Figure 32 The range of values taken of for the log transformed maximum amount cashed in on given day. Shown for problem and non-problem gamblers.

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

-0.2

00.

200.

611.

021.

431.

842.

252.

663.

073.

483.

894.

304.

715.

125.

535.

946.

356.

767.

177.

587.

998.

408.

819.

229.

6310

.04

10.4

510

.86

11.2

711

.68

12.0

912

.50

Log of the maximum amount lost on a given day

Non-Problem Gambler Problem Gambler

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

-0.2

20.

220.

651.

081.

511.

942.

372.

813.

243.

674.

104.

534.

965.

395.

836.

266.

697.

127.

557.

988.

428.

859.

289.

7110

.14

10.5

711

.00

11.4

411

.87

12.3

012

.73

13.1

6

Log of the maximum amount cashed in on a given day

Non-Problem Gambler Problem Gambler