Analysis of mental and physical disorders associated with COVID-19 in online health forums: a natural language processing study

This article has been Reviewed by the following groups

Read the full article

Abstract

Online health forums provide rich and untapped real-time data on population health. Through novel data extraction and natural language processing (NLP) techniques, we characterise the evolution of mental and physical health concerns relating to the COVID-19 pandemic among online health forum users.

Setting and design

We obtained data from three leading online health forums: HealthBoards, Inspire and HealthUnlocked, from the period 1 January 2020 to 31 May 2020. Using NLP, we analysed the content of posts related to COVID-19.

Primary outcome measures

(1) Proportion of forum posts containing COVID-19 keywords; (2) proportion of forum users making their very first post about COVID-19; (3) proportion of COVID-19-related posts containing content related to physical and mental health comorbidities.

Results

Data from 739 434 posts created by 53 134 unique users were analysed. A total of 35 581 posts (4.8%) contained a COVID-19 keyword. Posts discussing COVID-19 and related comorbid disorders spiked in early March to mid-March around the time of global implementation of lockdowns prompting a large number of users to post on online health forums for the first time. Over a quarter of COVID-19-related thread titles mentioned a physical or mental health comorbidity.

Conclusions

We demonstrate that it is feasible to characterise the content of online health forum user posts regarding COVID-19 and measure changes over time. The pandemic and corresponding public response has had a significant impact on posters’ queries regarding mental health. Social media data sources such as online health forums can be harnessed to strengthen population-level mental health surveillance.

Article activity feed

  1. SciScore for 10.1101/2020.12.14.20248155: (What is this?)

    Please note, not all rigor criteria are appropriate for all manuscripts.

    Table 1: Rigor

    NIH rigor criteria are not applicable to paper type.

    Table 2: Resources

    Software and Algorithms
    SentencesResources
    This approach is consistent with similar studies examining healthcare related data from Twitter.[13,14] QMUL is registered as a data controller with the Information Commissioner’s Office (ICO; registration number: Z5507327), which covers all research activities undertaken at the university.
    Twitter.
    suggested: None

    Results from OddPub: We did not detect open data. We also did not detect open code. Researchers are encouraged to share open data when possible (see Nature blog).


    Results from LimitationRecognizer: We detected the following sentences addressing limitations in the study:
    Strengths and weaknesses: Online health forums are an important source of real-world, real-time, population-level data on people living through the COVID-19 pandemic. Online health forums also afford users anonymity to discuss aspects of their experience they might otherwise have been embarrassed or fearful to disclose in identifiable forms of social media. We have demonstrated that it is possible to automate information extraction from these posts using natural language processing, providing access to a rich reservoir of previously untapped real-world data from health-specific online resources. Our approach was able to automatically extract data from a large sample of over 53,000 unique users at a fraction of the cost of previous approaches that have relied on social media individual participant recruitment and manual review of posts generating sample sizes in the low hundreds.[7] Some studies screened users on Twitter via depression symptom questionnaires and used their tweets to train depression onset classifiers.[6,22] Analogous approaches have been used with Facebook data.[8] Our study has some limitations. At present it is difficult to establish whether concerned posters have pre-existing mental or physical health issues, have experienced confirmed COVID-19 illness themselves, are recovered, or have become unwell for the first time. Online health forums are help-seeking communities; this introduces self-selection bias in which individuals from disadvantaged backgrounds ...

    Results from TrialIdentifier: No clinical trial numbers were referenced.


    Results from Barzooka: We did not find any issues relating to the usage of bar graphs.


    Results from JetFighter: We did not find any issues relating to colormaps.


    Results from rtransparent:
    • Thank you for including a conflict of interest statement. Authors are encouraged to include this statement when submitting to a journal.
    • Thank you for including a funding statement. Authors are encouraged to include this statement when submitting to a journal.
    • Thank you for including a protocol registration statement.

    About SciScore

    SciScore is an automated tool that is designed to assist expert reviewers by finding and presenting formulaic information scattered throughout a paper in a standard, easy to digest format. SciScore checks for the presence and correctness of RRIDs (research resource identifiers), and for rigor criteria such as sex and investigator blinding. For details on the theoretical underpinning of rigor criteria and the tools shown here, including references cited, please follow this link.