Identifying Key Predictors of Smoking Cessation Success: Text-Based Feature Selection Using a Large Language Model

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Background

The most effective way to reduce mortality and morbidity among current smokers is to quit smoking. Although about half of smokers attempted to quit, only one-tenth succeeded in 2022.

Objective

To identify key predictors of smoking cessation success to inform cessation interventions and increase quitting rates.

Methods

We analyzed data from waves 5 and 6 of the Population Assessment of Tobacco and Health (PATH) study (December 2018 to November 2021). Using OpenAI’s GPT-4.1, we identified the top 45 variables from wave 5 that are highly predictive of 12-month smoking abstinence in wave 6, based on descriptions of survey variables. We then validated the predictive power of the GPT-4.1-selected variables by comparing the performance of eXtreme Gradient Boosting (XGBoost) trained on different sets of variables. Finally, we derived insights into the top 10 variables, ranked according to their SHapley Additive exPlanations values.

Results

The performance of XGBoost trained with all possible wave 5 variables and the 45 selected variables was almost identical (AUC:0.749 vs AUC:0.752). The top 10 variables included past 30-day smoking frequency, minutes from waking up to smoking first cigarette, important people’s views on tobacco use, prevalence of tobacco use among close associates, daily electronic nicotine product use, emotional dependence, and health harm concerns.

Conclusion

This study demonstrates the ability of OpenAI’s GPT-4.1 to identify the top 45 PATH wave 5 variables associated with 12-month smoking abstinence using only their descriptions. This approach could help researchers design more effective survey questionnaires and improve efficiency of data collection.

What is already known on this topic

Generative artificial intelligence models have recently been applied to assess their potential in addressing various tobacco-related issues, such as detecting tobacco products in social media videos and promoting vaping cessation. However, their application in identifying the most significant predictors of tobacco use behavior, based on survey data, remains unexplored.

What this study adds

GPT-4.1 successfully assigned high-quality importance scores to survey variables for predicting 12-month smoking abstinence over two years among current established smokers. It accomplished this using only the textual descriptions of the survey variables, without accessing the actual survey data. Based on these importance scores, GPT-4.1 can aid in identifying the most crucial variables for predicting smoking cessation success.

How this study might affect research, practice or policy

This study demonstrates the capacity of GPT-4.1 to perform feature selection, paving the way for future exploration of this innovative approach to address other tobacco-related issues.

Article activity feed