Mining long-COVID symptoms from Reddit: characterizing post-COVID syndrome from patient reports

This article has been Reviewed by the following groups

Read the full article See related articles

Abstract

Our objective was to mine Reddit to discover long-COVID symptoms self-reported by users, compare symptom distributions across studies, and create a symptom lexicon. We retrieved posts from the /r/covidlonghaulers subreddit and extracted symptoms via approximate matching using an expanded meta-lexicon. We mapped the extracted symptoms to standard concept IDs, compared their distributions with those reported in recent literature and analyzed their distributions over time. From 42 995 posts by 4249 users, we identified 1744 users who expressed at least 1 symptom. The most frequently reported long-COVID symptoms were mental health-related symptoms (55.2%), fatigue (51.2%), general ache/pain (48.4%), brain fog/confusion (32.8%), and dyspnea (28.9%) among users reporting at least 1 symptom. Comparison with recent literature revealed a large variance in reported symptoms across studies. Temporal analysis showed several persistent symptoms up to 15 months after infection. The spectrum of symptoms identified from Reddit may provide early insights about long-COVID.

Article activity feed

  1. SciScore for 10.1101/2021.06.15.21259004: (What is this?)

    Please note, not all rigor criteria are appropriate for all manuscripts.

    Table 1: Rigor

    NIH rigor criteria are not applicable to paper type.

    Table 2: Resources

    No key resources detected.


    Results from OddPub: We did not detect open data. We also did not detect open code. Researchers are encouraged to share open data when possible (see Nature blog).


    Results from LimitationRecognizer: We detected the following sentences addressing limitations in the study:
    Our study has several notable limitations: the detection of symptoms is not 100% accurate, users may not report all the symptoms they experience (leading to underestimation of symptom prevalence), Reddit users tend to be younger compared to the general population, and rare symptoms may have been missed during our lexicon preparation. In our future work, we will attempt to analyze, from social media, how long-COVID typically progresses, the distribution of time spans at which users report the resolution of symptoms, and treatment regimens that appear to help. Social media data, including from Reddit and Twitter, may help the medical community to better understand long-COVID from the perspective of patients and thus enable them to improve the care they provide. Considering the large volume of data that is generated in social media about this topic, it is necessary to develop automated methods involving NLP for long-term surveillance. To aid future research, we have made the lexicon developed for this study publicly available.

    Results from TrialIdentifier: No clinical trial numbers were referenced.


    Results from Barzooka: We did not find any issues relating to the usage of bar graphs.


    Results from JetFighter: We did not find any issues relating to colormaps.


    Results from rtransparent:
    • Thank you for including a conflict of interest statement. Authors are encouraged to include this statement when submitting to a journal.
    • Thank you for including a funding statement. Authors are encouraged to include this statement when submitting to a journal.
    • No protocol registration statement was detected.

    Results from scite Reference Check: We found no unreliable references.


    About SciScore

    SciScore is an automated tool that is designed to assist expert reviewers by finding and presenting formulaic information scattered throughout a paper in a standard, easy to digest format. SciScore checks for the presence and correctness of RRIDs (research resource identifiers), and for rigor criteria such as sex and investigator blinding. For details on the theoretical underpinning of rigor criteria and the tools shown here, including references cited, please follow this link.