Predicting problem gambling among online sports and race bettors: Assessing the value of Machine Learning using behavioural and self-reported data

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Background and aims: Online gambling operators collect detailed behavioural data that can be used to identify customers at risk of harmful gambling. This study examined the value of machine learning in this context. We compared models trained on 30 days versus six months of behavioural data and explored whether incorporating self-reported survey responses enhanced performance.Methods: Customers from two Australian sports and race betting sites (N=1,470) completed a survey including the Problem Gambling Severity Index (PGSI) and measures of employment, income, gambling satisfaction, number of gambling accounts, and estimated past-30-day expenditure. Survey responses were matched to participants’ account data provided by operators. We built a series of machine learning models to classify participants into risk groups (PGSI 1-7 [no-to-moderate-risk] vs. PGSI ≥8 [high-risk]), comparing performance across data windows (30 days vs. six months), and with or without survey variables.Results: Models using only behavioural data achieved adequate classification accuracy (Area Under the Receiver Operating Characteristic curve [AUROC]=0.74–0.75), with similar performance across 30-day and six-month data windows. The most predictive account-based variables were age, deposit intensity (deposits per active day), average stake size, and days since last bet. Combining behavioural data with self-reported variables substantially enhanced performance (AUROC=0.76–0.85). Two self-reported variables—number of gambling accounts held and gambling satisfaction—were primarily responsible for improvements.Conclusions: Machine learning models can effectively detect at-risk customers on online sports and race betting sites using only 30 days of behavioural data. Performance can be improved by adding minimal, non-intrusive self-report measures.

Article activity feed