Investigating phenotypes of pulmonary COVID-19 recovery: A longitudinal observational prospective multicenter trial

Curation statements for this article:
  • Curated by eLife

    eLife logo

    Evaluation Summary:

    This manuscript which links early markers of inflammation with residual abnormalities on chest CT following SARS-CoV-2 infection. Surprisingly, early surveyed symptoms do not predict long term radiologic outcomes (6 months after infection) while inflammatory markers have stronger predictive value. Residual symptoms are common at the 6 month time point.

    (This preprint has been reviewed by eLife. We include the public reviews from the reviewers here; the authors also receive private feedback with suggested changes to the manuscript. Reviewer #1 and Reviewer #2 agreed to share their name with the authors.)

This article has been Reviewed by the following groups

Read the full article See related articles

Abstract

The optimal procedures to prevent, identify, monitor, and treat long-term pulmonary sequelae of COVID-19 are elusive. Here, we characterized the kinetics of respiratory and symptom recovery following COVID-19.

Methods:

We conducted a longitudinal, multicenter observational study in ambulatory and hospitalized COVID-19 patients recruited in early 2020 (n = 145). Pulmonary computed tomography (CT) and lung function (LF) readouts, symptom prevalence, and clinical and laboratory parameters were collected during acute COVID-19 and at 60, 100, and 180 days follow-up visits. Recovery kinetics and risk factors were investigated by logistic regression. Classification of clinical features and participants was accomplished by unsupervised and semi-supervised multiparameter clustering and machine learning.

Results:

At the 6-month follow-up, 49% of participants reported persistent symptoms. The frequency of structural lung CT abnormalities ranged from 18% in the mild outpatient cases to 76% in the intensive care unit (ICU) convalescents. Prevalence of impaired LF ranged from 14% in the mild outpatient cases to 50% in the ICU survivors. Incomplete radiological lung recovery was associated with increased anti-S1/S2 antibody titer, IL-6, and CRP levels at the early follow-up. We demonstrated that the risk of perturbed pulmonary recovery could be robustly estimated at early follow-up by clustering and machine learning classifiers employing solely non-CT and non-LF parameters.

Conclusions:

The severity of acute COVID-19 and protracted systemic inflammation is strongly linked to persistent structural and functional lung abnormality. Automated screening of multiparameter health record data may assist in the prediction of incomplete pulmonary recovery and optimize COVID-19 follow-up management.

Funding:

The State of Tyrol (GZ 71934), Boehringer Ingelheim/Investigator initiated study (IIS 1199-0424).

Clinical trial number:

ClinicalTrials.gov: NCT04416100

Article activity feed

  1. Author Response:

    Reviewer #1 (Public Review):

    The investigators' goals were to describe the epidemiology and kinetics of post-acute covid lung sequalae and to determine the risk factors predictive of persistent lung impairment. A major strength of the study is the longitudinal observation through 6 months with protocolized clinical assessments that included patient-reported outcomes, lung function tests, inflammatory marker testing, and computed tomography of the chest, in a reasonably sized cohort that reflects the spectrum of disease severity in the pre-vaccination era. We learn a great deal about the different patterns of recovery in this group of COVID-19 survivors. The primary epidemiologic finding is that 52% of survivors continued to have symptoms at 6 months, while up to 72% of those with severe COVID requiring ICU level care continued to have lung abnormalities by chest imaging. This confirms general observations of "long covid" which also encompasses non-lung effects. While lung disease is less common in those with milder disease, the proportion of patients who were never hospitalized but experienced persistent symptoms is striking (50%), with lung function impairment in 17% at 6 months. As expected, the patients who had the most severe disease-those who needed the ICU-had the highest degree of chest imaging abnormalities. The kinetics of recovery is a significant observation: Figure 3 shows that most of the post-acute recovery in structural lung abnormalities occurs in the first 3 months and slows down thereafter, particularly for the hospitalized non-ICU patients. The investigators then embarked on a sophisticated analysis to determine how to predict persistent lung abnormalities (as detected by chest CT) at 6 months. When analyzed individually, among 50 clinical characteristics or lab values, the strongest unfavorable risk factors were elevated IL-6 (an inflammatory cytokine that is the target of tocilizumab) and CRP (c-reactive protein). Other variables that were strongly associated with CT abnormalities included immunosuppressive therapy, ICU stay as well as pre-existing conditions. When machine learning techniques were applied, risk factors that correlated with each other could be grouped together, and the patients could be categorized as low, intermediate, and high risk for delayed pulmonary recovery. As expected, known factors for COVID19 infection (age, male sex, medical comorbidities) and disease severity (need for oxygen therapy, ICU care and antibiotics) were more frequent in the intermediate and high risk groups. These predictive factors at acute COVID and day 60 follow-up mostly held up when tested against part of the cohort that was not used for analysis. Interestingly lung function impairment as measured by pulmonary function tests were only weakly correlated with persistent and severe chest imaging abnormalities.

    The novelty of this study lies in taking the epidemiology a step further with a machine learning analysis to determine which clinical characteristics and chest imaging features at the onset of acute COVID-19 are predictive of later persistent disease. One limitation of this study, however, is that it was conducted on patients in the early part of the pandemic, prior to the widespread use of remdesivir and corticosteroids/anti-cytokine therapies, that are now considered standard of care. Based on these findings, we can now hypothesize that current treatments are likely to reduce the impact of long-covid.

    We would like to thank the reviewer for careful study of the manuscript and appreciation of our work. We agree, that our longitudinal cohort and its hospitalized, severe COVID-19 subset in particular encompasses the patients, for whom the therapeutic armamentarium was limited and far from the therapeutic options available now. Whether novel anti-viral and anti-inflammatory medication as well as, in case of the vaccinated patients, the immunization status may accelerate the recovery or reduce the pulmonary damage is a matter of current research also in our center. We address this issue in the Discussion section to support a clear interpretation of the data by the interested reader.

    Machine learning (artificial intelligence, AI) is now being increasingly used to answer clinical questions on limited cohorts; the application of machine learning in this study contributes to our conceptual understanding of how clinical characteristics and biological factors cluster together to contribute to long-term COVID outcomes. Namely, the profound inflammation that characterizes severe acute COVID-19 pneumonia and poor early outcomes also contributes to chronic lung damage in survivors. In addition, a robust antiviral immune response (as seen with elevated anti-viral antibodies) without elevated systemic inflammatory markers were associated with less severe chest imaging patterns, also supporting the notion that an individual's immune response to the virus is responsible for the trajectory of disease. As noted, a significant proportion of non-hospitalized patients also suffered from chronic lung impairments. Taken together, the impact of prolonged convalescence on the workforce, healthcare, and individual lives should not be underestimated. These results underscore the paramount need for continued public health measures and vaccinations to prevent COVID-19, particularly for the most vulnerable individuals (older, immunocompromised, and with preexisting health problems). These observations provide additional biologic justification for the use of agents directed at reducing lung inflammation early in the course of disease, and potentially at an early post-recovery time point (i.e 2 months). Machine learning algorithms may one day help clinicians decide which patients should be targeted for additional therapies after the acute phase. With further study, implementation of AI to real world medicine may be on the horizon.

    We agree with the Reviewer that machine learning algorithms can overcome limitations of ‘canonical’, ordinal and generalized regression methods in the multidimensional setting i. e. when the number of available clinical parameters approaches or exceeds the number of observations/patients. Consequently, machine learning or AI allows for serial screening of medical record data at low cost and supports diagnostic and therapeutic decisions. We discuss those two aspects in the revised manuscript in the context of acute COVID-19 course prediction and long COVID prediction and phenotyping in light of the recent literature [1–4,6].

    Reviewer #2 (Public Review):

    This is a potentially valuable manuscript which links early markers of inflammation with residual abnormalities on chest CT following SARS-CoV-2 infection. Surprisingly, early surveyed symptoms do not predict long term radiologic outcomes (6 months after infection) while inflammatory markers have stronger predictive value. The cohort is well designed and the selected tools for analysis are appropriate.

    We thank the Reviewer for the careful study, critic and appreciation of our work.

    While this finding is potentially of high importance for clinical practice, the endpoints are inconsistently defined, and certain components of the machine learning and clustering analyses are difficult to interpret as presented. It is therefore challenging to understand whether the conclusions are justified by the analysis.

    We apologize for this unclarity. In the revised manuscript, we precisely define the analysis endpoints (any radiological lung findings at the 6-month follow-up, radiological lung abnormalities with CT score > 5, lung function impairment and persistent symptoms at the 6-month follow-up) of the analysis; see: Introduction and Methods/Study design. We also indicate the numbers of participants reaching those endpoints in Table 3.

    Several components of the analysis are confusing and would benefit from further elucidation:

    1. The authors do not clearly define "delayed pulmonary recovery". My sense is that they are using several radiologic based definitions rather than their functional definition (defined by FEV1, FEV:FVC & DLCO) of lung function but this is never explicitly stated. Are the functional outcomes and symptomatic recovery considered in any of the analyses other than correlations with radiologic findings in S1?

    As described above in our previous response, the prime focus and primary endpoint of the analysis was the presence of radiological lung abnormalities at the 6-month follow-up. Our motivation to focus on radiological endpoints was to focus on the potential development of persistent structural lung abnormalities, fibrosis and interstitial lung disease following COVID-19, as observed in SARS-CoV-1 patients [7,8]. Of note, lung function parameters were only weak correlates of radiological impairment as shown in Figure 3 – figure supplement 1 – 3 and our previous work [27]. This finding is in line with numerous studies in ILD patients which demonstrate a low sensitivity of lung function testing (especially FEV1 and FVC assessment) in patients with early interstitial lung disease (ILD) [10,11]. In addition, we could not exclude a pre-existing, COVID-19-independent impairment of lung function in a subset of the study participants suffering from pulmonary diseases, obesity and/or cardiovascular diseases (Table 1). Thus, lung function parameters only partially reflect COVID-19 mediated lung injury and convalescence.

    Nevertheless, we agree, that clinical and functional endpoints are of great interest for the scientific and clinical community. For this reason, we present additional results of univariable risk modeling for long-term (6-month follow-up) symptom persistence and lung function impairment (Figure 5, Appendix 1 – table 2), the results of machine learning modeling for those outcomes (Figure 9, Appendix 1 – table 5) and discuss the findings. We also present the prevalence of such long-term manifestations and lung function impairment in the Low-, Intermediate and High-Risk clusters of the study participants defined by non-CT and non-lung function clinical features (Figure 8).

    1. To this end, I was surprised that the functional definition and symptomatic recovery were not used as the primary endpoints. The functional definition and resolution of symptoms seem most important for the recovering patient so seems like the more important outcome. However, in Figures 5-7, it is often not clear whether the functional outcome is being considered at all.

    As mentioned above, the focus of the study was the assessment of structural lung impairment following COVID-19 and both, lung function parameters as well as symptom burden moderately correlate with structural lung damage (Figure 3 – figure supplement 1 – 3) – a phenomenon observed previously in SARS-CoV-1 [7,8]. Although the symptom burden and its resolution during follow-up are of major importance for the individual patient during post-acute recovery, these parameters are not a good marker for the potential long-term pulmonary outcome. E.g. younger patients with moderate to severe lung damage may demonstrate only mild pulmonary symptoms during post-acute recovery, but the structural damage may be associated with severe impairment at long-term follow-up due to progression of lung fibrosis or age-related decrease of functional pulmonary capacity [11]. Still, we agree with the reviewer that the follow-up on symptoms and lung function is of interest for the reader and additionally included those outcomes in the univariate and multi-parameter risk modeling. In addition, we present the frequencies of symptom persistence and lung function impairment in the low-, intermediate- and high-risk participant clusters defined solely by non-CT and non-lung function clinical parameters. See previous issue for more details.

    1. For the clustering in figure 5, I am uncertain how CT severity score >5 & CT abnormalities cluster separately, when these 2 outcomes appear to logically overlap. Specifically, does the CT abnormalities outcome include patients with the high severity score outcome? In other words, are patients in the "high severity" group a subset of patients with "CT abnormality"? If not a subset, then the CT abnormality should be labeled "non-severe CT abnormality". This could all be clarified by listing the number of patients in each group and showing with a Venn diagram whether there is any overlap.

    We apologize for the lacking clarity in this matter. As pointed by the reviewer, the patients with CT abnormalities scores > 5 points were a subset of the participants with any CT abnormalities. The same was true for the GGO-positive subgroup. We agree, that the overlap between the radiological outcomes obscures the message of the clustering and modeling results. To overcome this, we removed the GGO outcome variable from the analyses in the revised manuscript. In the revised manuscript, we clearly differentiate between mild (CT severity score ≤ 5) and moderate-to-severe radiological abnormalities (CT severity score > 5) in feature (Figure 6) and participant clustering (Figure 8). Frequencies of mild and moderate-tosevere CT abnormalities in the study collective stratified by the severity of acute COVID-19 are presented in Figure 3 – figure supplement 3B. Numbers of the study participants with any, mild or moderate-to-severe CT abnormalities at the subsequent follow-up visits are listed in Table 3.

    1. For the same reason, figure 4 is hard to interpret. Are CT severity >5 being compared to those with normal CTs only or those with normal or mild / moderate CTs? Please provide more specific definitions of normal, "CT abnormality" and "severe CT abnormality" and provide the number of people in each category and specify the comparator groups in all analyses.

    We are sorry for the confusion. In Figure 4 of the initial manuscript, any CT abnormalities, GGO-positivity and abnomalities with CT severity score > 5 were analyzed as separate outcome variables. The baseline was specific for the given explanatory variable, e. g. for the ICU stay this was the mild COVID-19 group or for the elevated IL-6, normal serum IL-6 levels. In the revised manuscript we present the modeling results in an abbreviated form for the 5 strongest co-variates of any CT abnormalities, moderate-to-severe CT abnormalities (CT severity score > 5), persistent symptoms and lung function impairment each (Figures 4 – 5). We indicate the baseline and the n number in the plots. The complete summary of univariable risk modeling with the requested information is provided in Appendix 1 – table 2.

    1. Similarly, how can GGO @V3 be used a potential explanatory variable for the outcome CT abnormalities @V3 when these 2 variables are clearly non-independent. Inclusion of highly related and likely correlated variables may throw off the overall conclusions of the clustering analysis.

    We agree with the editor and the reviewer that this representation was confusing. For this reason and the reasons described in Response 4, we removed the GGO variable from the revised analysis pipeline and differentiate between mild (CT severity score ≤ 5) and moderate-tosevere (CT severity score > 5) radiological lung abnormalities in modeling and machine learning classification. In addition, we define symptom and participant clusters solely with the non-CT parameters (Figure 6 – 7). To investigate the association of mild and moderate-to-severe CT abnormalities with other non-CT variables (Figure 6, Supplementary Figure S5), the CT features are assigned to the no-CT clusters by a k-NN-based label propagation algorithm, i. e. semi-supervised procedure [12,13,26] employed in our recent paper as well [6].

    1. In Figure 6, the criteria for the low, medium, and high-risk subsets are unclear. Is this high risk for persistent functional abnormality, radiologic abnormality, or both? Why were 3 sub populations selected? Was this done subjectively based on the clustering algorithm?

    This is an important issue. The study subject clusters were named according to the increasing frequency of any radiological lung abnormalities in the respective cluster (Figure 8A). We stress this more clearly in the revised manuscript. In addition, as suggested by the reviewer above, we show the frequency of functional lung impairment and persistent symptoms in the study participant clusters. There are multiple criteria for choice of the optimal clustering algorithm and the optimal number of clusters. In our cohort, two criteria for the choice of optimal clustering algorithm were applied:

    1. High fraction of the data set variance ‘explained’ by the cluster assignment (ratio of between-cluster sum-of-squares to the total sum-of-squares, Figure 6 – figure supplement 1A and Figure 7 – figure supplement 1A)
    2. The relatively highest cluster stability or reproducibility of the clustering structure in 20-fold cross-validation (Figure 6 – figure supplement 1B and Figure 7 – figure supplement 1B) [15] The optimal number of clusters of the study participants based on non-CT study variables was based on the algorithm (SOM + hierarchical clustering algorithm, see Reviewer 2, Issue 4) [17,18], as done usually in the unsupervised or semi-supervised setting. The prime criterion for the optimal cluster number was the bend of the curve of within-cluster sum-of-squares versus cluster number as presented in Figure 7 – figure supplement 1D. In addition, this decision was supported by a visual analysis the SOM node dendrogram (Figure 7 – figure supplement 1E) and the curve of the crossvalidated stability statistic (classification error) vs cluster number (Figure 7 – figure supplement 1F) [15].
    1. The accuracy and sensitivity of the machine learning approaches shown in S5 & S6 are somewhat limited. Please comment on why such highly granular data can only provide limited prediction about degree of lung damage post infection. Are there missing data types that might make the algorithm more predictive?

    This is an important issue that deserves more discussion in the revised manuscript. Each of the machine learning classifiers presented in the previous and the revised version of the manuscript was extremely sensitive and specific at predicting the outcomes in the training data encompassing the entire cohort (Supplementary Figure S11), as expected. However, their performance was way worse in repeated holdout (previous version) or 20-fold cross-validation (revision, Figure 9) used here as surrogate tools used to check the sensitivity and specificity with ‘unseen’ test data. We believe that there are two prime sources of such suboptimal performance: the size of the training set and the choice of the classifier. To address the first limitation, the following alterations to the analysis pipeline were introduced:

    1. We do not restrict the analysis to the subset of the CovILD study with the complete set of all variables. Instead, the non-missingness criterion is applied to each outcome variable separately (any CT abnormalities: n = 109, moderate-to-severe abnormalities: n = 109, lung function impairment: n = 111, persistent symptoms: n = 133).
    2. We altered the internal validation strategy. Instead of the repeated holdout approach applied to the machine learning classification, which strongly limits the size of the training data set, we switched to 20-fold cross-validation both for the cluster algorithms (Figure 6 – figure supplement 1BD and Figure 7 – figure supplement 1BF) [15] and the machine learning models (Figure 9, Appendix 1 – table 5) [19]. To address the second issue, the following changes were introduced:
    3. We compare the performance of a broader set of classifiers representing different classes of machine learning algorithms provided by the R package caret [19] (tree model: C5.0 [20], bagged tree model: Random Forests [21], support vector machines with radial kernel [22], shallow neural network: nnet [23], and elastic net regression: glmnet [24]) (Figure 9, Appendix 1 – table 4).
    4. Finally, a model ensemble representing a linear combination of the classifiers presented above developed with the elastic net regression algorithm (Figure 9, Figure 9 – figure supplement 2) and tools provided by caretEnsemble package [25]. Such model displayed better performance at predicting any CT abnormalities and persistent symptoms than single classifiers (Figure 9, Appendix 1 – table 5). Finally, we agree with the Reviewer, that the input variable set, despite its size, was still not complete. We believe that inclusion of other inflammatory markers recorded during acute COVID19 and at the 60-day follow-up may additionally improve the prediction of the radiological abnormalities at the 6-month follow-up visit. Of note, our data set missed important readouts of cellular immunity such as neutrophil levels or neutrophil: lymphocyte ratio (NLR) and blood parameters for the mild COVID-19 subset. We discuss this issue in more detail in the revised Discussion section.
    1. The authors state that "the sole application of a lung function measurement at screening for subjects at risk of delayed lung recovery may bear insufficient sensitivity". I am not sure that I agree with this assessment. From the perspective of a patient, full recovery of lung function with limited or no residual symptoms, even in the presence of residual chest CT abnormalities, seems like a favorable outcome. I would suggest either changing this statement or providing citations that associate residual chest CT abnormalities (in the absence of residual functional lung dysfunction) with adverse long-term outcomes. Do the authors hypothesize that persistent radiologic abnormalities may predate organizing pneumonia which will ultimately become symptomatic?

    We thank the reviewer for the interesting point of discussion. We agree with the reviewer that the functional status and symptom burden is of major importance for the individual patient in the postacute phase of COVID-19. Still, prioritizing lung function over mild structural lung abnormalities may pose two major problems. First, as previously discussed, lung function testing has a rather low sensitivity to detect early ILD [10,11], is not a good prognostic marker for long-term clinical outcomes and may not correlate well with patients' symptom burden. For instance, a patient with a normal lung function status may still be highly symptomatic (e. g. due to reduced capacity of respiratory muscle function) [7] and/or demonstrate structural lung abnormalities (e.g. it has been shown for various ILD that lung function test such as FVC and FEV1 may be normal even in pronounced disease and lung function testing is not sufficient to rule out ILD [10]). Second, to date, it is not known if persistent structural lung abnormalities following COVID-19 (even when mild) are at risk for progressing at long-term follow-up. Especially, sub-clinical structural changes may behave like incidentally detected interstitial lung abnormalities (ILAs) and develop to symptomatic progressive fibrotic interstial lung disease including IPF [11]. For this reason, we think that further pulmonary follow-up is necessary for patients with structural lung abnormalities due to COVID-19 and a sole focus on lung function is not sufficient to assess pulmonary COVID-19 outcomes [9].

    1. The authors note selection bias against ordering CT and perhaps inflammatory markers early during infection as a limitation. I would suggest a sensitivity analysis to understand whether this misclassification will impact the model's predictions.

    We now address this issue in a more detailed way. As shown in Figure 1, there was indeed a significant dropout of participants during the study due to missing the longitudinal visits and missingness of the longitudinal variable set. This phenomenon was indeed the most evident for the mild COVID-19 patients, who lost interest at the participation most likely because of subjective complete convalescence. This issue is discussed now as a limitation in the revised manuscript. In the revised manuscript, we investigated highly influential factors for clustering and machine learning classifiers. To determine, which variables played the most important role for the clustering of the study individuals, we applied the explanatory variable ‘noising’ procedure initially described by Breiman for the random forest algorithm [21] and compared the ‘explained’ variance (ratio of between-cluster sum-of-squares to the total sum-of-squares) of the initial clustering structure with the clustering structures generated in the datasets with noised variables. Although this algorithm is not free from shortages such as blindness to tight correlations, it may provide a coarse measure of the variable’s impact on the cluster formation (Figure 7 – figure supplement 2). For three of the machine learning algorithms tested importance statistics were extracted from the models: (1) for the C5.0 algorithm, the percentage of variable usage in the decision tree, (2) for the Random Forests algorithm, the delta of Gini index obtained by variable noising [21] and (3) for the elastic net/glmNet procedure, the absolute values of regression coefficients β [24] (Figure 9 – figure supplement 4 – 7). The technical details are provided in Methods, the cluster and model importance data are discussed in the manuscript text.

    References

    1. Gutmann C, Takov K, Burnap SA, et al. SARS-CoV-2 RNAemia and proteomic trajectories inform prognostication in COVID-19 patients admitted to intensive care. Nat Commun 2021;12. doi:10.1038/S41467-021-23494-1
    2. Benito-León J, Castillo MD Del, Estirado A, et al. Using Unsupervised Machine Learning to Identify Age- and Sex-Independent Severity Subgroups Among Patients with COVID-19: Observational Longitudinal Study. J Med Internet Res 2021;23. doi:10.2196/25988
    3. Demichev V, Tober-Lau P, Lemke O, et al. A time-resolved proteomic and prognostic map of COVID-19. Cell Syst 2021;12:780. doi:10.1016/J.CELS.2021.05.005
    4. Estiri H, Strasser ZH, Brat GA, et al. Evolving phenotypes of non-hospitalized patients that indicate long COVID. BMC Med 2021;19. doi:10.1186/S12916-021-02115-0
    5. Sudre CH, Murray B, Varsavsky T, et al. Attributes and predictors of long COVID. Nat Med 2021;27. doi:10.1038/s41591-021-01292-y
    6. Sahanic S, Tymoszuk P, Ausserhofer D, et al. Phenotyping of acute and persistent COVID-19 features in the outpatient setting: exploratory analysis of an international cross-sectional online survey. Clin Infect Dis Published Online First: 26 November 2021. doi:10.1093/CID/CIAB978
    7. Hui DS, Wong KT, Ko FW, et al. The 1-Year Impact of Severe Acute Respiratory Syndrome on Pulmonary Function, Exercise Capacity, and Quality of Life in a Cohort of Survivors. Chest 2005;128:2247–61. doi:10.1378/CHEST.128.4.2247
    8. Ng CK, Chan JWM, Kwan TL, et al. Six month radiological and physiological outcomes in severe acute respiratory syndrome (SARS) survivors. Thorax 2004;59:889–91. doi:10.1136/THX.2004.023762
    9. Raghu G, Wilson KC. COVID-19 interstitial pneumonia: monitoring the clinical course in survivors. Lancet Respir. Med. 2020;8:839–42. doi:10.1016/S2213-2600(20)30349-0
    10. Suliman YA, Dobrota R, Huscher D, et al. Pulmonary function tests: High rate of falsenegative results in the early detection and screening of scleroderma-related interstitial lung disease. Arthritis Rheumatol 2015;67:3256–61. doi:10.1002/ART.39405/ABSTRACT
    11. Hatabu H, Hunninghake GM, Richeldi L, et al. Interstitial lung abnormalities detected incidentally on CT: a Position Paper from the Fleischner Society. Lancet Respir Med 2020;8:726. doi:10.1016/S2213-2600(20)30168-5
    12. Leng M, Wang J, Cheng J, et al. Adaptive semi-supervised clustering algorithm with label propagation. J Softw Eng 2014;8:14–22. doi:10.3923/JSE.2014.14.22
    13. Lelis L, Sander J. Semi-supervised density-based clustering. Proc - IEEE Int Conf Data Mining, ICDM 2009;:842–7. doi:10.1109/ICDM.2009.143
    14. Huang C, Huang L, Wang Y, et al. 6-month consequences of COVID-19 in patients discharged from hospital: a cohort study. Lancet 2021;397:220–32. doi:10.1016/S0140- 6736(20)32656-8
    15. Lange T, Roth V, Braun ML, et al. Stability-Based Validation of Clustering Solutions. Neural Comput 2004;16:1299–323. doi:10.1162/089976604773717621
    16. Hartigan JA, Wong MA. Algorithm AS 136: A K-Means Clustering Algorithm. Appl Stat 1979;28:100. doi:10.2307/2346830
    17. Kohonen T. Self-Organizing Maps. Berlin, Heidelberg: : Springer Berlin Heidelberg 1995. doi:10.1007/978-3-642-97610-0
    18. Vesanto J, Alhoniemi E. Clustering of the self-organizing map. IEEE Trans Neural Networks 2000;11:586–600. doi:10.1109/72.846731
    19. Kuhn M. Building predictive models in R using the caret package. J Stat Softw 2008;28:1–26. doi:10.18637/jss.v028.i05
    20. Quinlan JR. C4.5: Programs for Machine Learning. San Francisco, CA, USA: : Morgan Kaufmann Publishers Inc. 1993. doi:10.5555/152181
    21. Breiman L. Random forests. Mach Learn 2001;45:5–32. doi:10.1023/A:1010933404324
    22. Weston J, Watkins C. Multi-Class Support Vector Machines. 1998.
    23. Ripley BD. Pattern recognition and neural networks. Cambridge University Press 2014. doi:10.1017/CBO9780511812651
    24. Friedman J, Hastie T, Tibshirani R. Regularization paths for generalized linear models via coordinate descent. J Stat Softw 2010;33:1–22. doi:10.18637/jss.v033.i01
    25. Deane-Mayer ZA, Knowles JE. Ensembles of Caret Models [R package caretEnsemble version 2.0.1]. 2019.https://cran.r-project.org/package=caretEnsemble (accessed 13 Dec 2021).
    26. Glennan T, Leckie C, Erfani SM. Improved Classification of Known and Unknown Network Traffic Flows Using Semi-supervised Machine Learning. Lect Notes Comput Sci (including Subser Lect Notes Artif Intell Lect Notes Bioinformatics) 2016;9723:493–501. doi:10.1007/978-3-319-40367-0_33
    27. Sonnweber T, Sahanic S, Pizzini A, et al. Cardiopulmonary recovery after COVID-19 - an observational prospective multi-center trial. Eur Respir J Published Online First: 10 December
    28. doi:10.1183/13993003.03481-2020
  2. Evaluation Summary:

    This manuscript which links early markers of inflammation with residual abnormalities on chest CT following SARS-CoV-2 infection. Surprisingly, early surveyed symptoms do not predict long term radiologic outcomes (6 months after infection) while inflammatory markers have stronger predictive value. Residual symptoms are common at the 6 month time point.

    (This preprint has been reviewed by eLife. We include the public reviews from the reviewers here; the authors also receive private feedback with suggested changes to the manuscript. Reviewer #1 and Reviewer #2 agreed to share their name with the authors.)

  3. Reviewer #1 (Public Review):

    The investigators' goals were to describe the epidemiology and kinetics of post-acute covid lung sequalae and to determine the risk factors predictive of persistent lung impairment. A major strength of the study is the longitudinal observation through 6 months with protocolized clinical assessments that included patient-reported outcomes, lung function tests, inflammatory marker testing, and computed tomography of the chest, in a reasonably sized cohort that reflects the spectrum of disease severity in the pre-vaccination era. We learn a great deal about the different patterns of recovery in this group of COVID-19 survivors. The primary epidemiologic finding is that 52% of survivors continued to have symptoms at 6 months, while up to 72% of those with severe COVID requiring ICU level care continued to have lung abnormalities by chest imaging. This confirms general observations of "long covid" which also encompasses non-lung effects. While lung disease is less common in those with milder disease, the proportion of patients who were never hospitalized but experienced persistent symptoms is striking (50%), with lung function impairment in 17% at 6 months. As expected, the patients who had the most severe disease-those who needed the ICU-had the highest degree of chest imaging abnormalities. The kinetics of recovery is a significant observation: Figure 3 shows that most of the post-acute recovery in structural lung abnormalities occurs in the first 3 months and slows down thereafter, particularly for the hospitalized non-ICU patients. The investigators then embarked on a sophisticated analysis to determine how to predict persistent lung abnormalities (as detected by chest CT) at 6 months. When analyzed individually, among 50 clinical characteristics or lab values, the strongest unfavorable risk factors were elevated IL-6 (an inflammatory cytokine that is the target of tocilizumab) and CRP (c-reactive protein). Other variables that were strongly associated with CT abnormalities included immunosuppressive therapy, ICU stay as well as pre-existing conditions. When machine learning techniques were applied, risk factors that correlated with each other could be grouped together, and the patients could be categorized as low, intermediate, and high risk for delayed pulmonary recovery. As expected, known factors for COVID19 infection (age, male sex, medical comorbidities) and disease severity (need for oxygen therapy, ICU care and antibiotics) were more frequent in the intermediate and high risk groups. These predictive factors at acute COVID and day 60 follow-up mostly held up when tested against part of the cohort that was not used for analysis. Interestingly lung function impairment as measured by pulmonary function tests were only weakly correlated with persistent and severe chest imaging abnormalities.

    The novelty of this study lies in taking the epidemiology a step further with a machine learning analysis to determine which clinical characteristics and chest imaging features at the onset of acute COVID-19 are predictive of later persistent disease. One limitation of this study, however, is that it was conducted on patients in the early part of the pandemic, prior to the widespread use of remdesivir and corticosteroids/anti-cytokine therapies, that are now considered standard of care. Based on these findings, we can now hypothesize that current treatments are likely to reduce the impact of long-covid.

    Machine learning (artificial intelligence, AI) is now being increasingly used to answer clinical questions on limited cohorts; the application of machine learning in this study contributes to our conceptual understanding of how clinical characteristics and biological factors cluster together to contribute to long-term COVID outcomes. Namely, the profound inflammation that characterizes severe acute COVID-19 pneumonia and poor early outcomes also contributes to chronic lung damage in survivors. In addition, a robust antiviral immune response (as seen with elevated anti-viral antibodies) without elevated systemic inflammatory markers were associated with less severe chest imaging patterns, also supporting the notion that an individual's immune response to the virus is responsible for the trajectory of disease. As noted, a significant proportion of non-hospitalized patients also suffered from chronic lung impairments. Taken together, the impact of prolonged convalescence on the workforce, healthcare, and individual lives should not be underestimated. These results underscore the paramount need for continued public health measures and vaccinations to prevent COVID-19, particularly for the most vulnerable individuals (older, immunocompromised, and with preexisting health problems). These observations provide additional biologic justification for the use of agents directed at reducing lung inflammation early in the course of disease, and potentially at an early post-recovery time point (i.e 2 months). Machine learning algorithms may one day help clinicians decide which patients should be targeted for additional therapies after the acute phase. With further study, implementation of AI to real world medicine may be on the horizon.

  4. Reviewer #2 (Public Review):

    This is a potentially valuable manuscript which links early markers of inflammation with residual abnormalities on chest CT following SARS-CoV-2 infection. Surprisingly, early surveyed symptoms do not predict long term radiologic outcomes (6 months after infection) while inflammatory markers have stronger predictive value. The cohort is well designed and the selected tools for analysis are appropriate.

    While this finding is potentially of high importance for clinical practice, the endpoints are inconsistently defined, and certain components of the machine learning and clustering analyses are difficult to interpret as presented. It is therefore challenging to understand whether the conclusions are justified by the analysis.

    Several components of the analysis are confusing and would benefit from further elucidation:

    1. The authors do not clearly define "delayed pulmonary recovery". My sense is that they are using several radiologic based definitions rather than their functional definition (defined by FEV1, FEV:FVC & DLCO) of lung function but this is never explicitly stated. Are the functional outcomes and symptomatic recovery considered in any of the analyses other than correlations with radiologic findings in S1?

    2. To this end, I was surprised that the functional definition and symptomatic recovery were not used as the primary endpoints. The functional definition and resolution of symptoms seem most important for the recovering patient so seems like the more important outcome. However, in Figures 5-7, it is often not clear whether the functional outcome is being considered at all.

    3. For the clustering in figure 5, I am uncertain how CT severity score >5 & CT abnormalities cluster separately, when these 2 outcomes appear to logically overlap. Specifically, does the CT abnormalities outcome include patients with the high severity score outcome? In other words, are patients in the "high severity" group a subset of patients with "CT abnormality"? If not a subset, then the CT abnormality should be labeled "non-severe CT abnormality". This could all be clarified by listing the number of patients in each group and showing with a Venn diagram whether there is any overlap.

    4. For the same reason, figure 4 is hard to interpret. Are CT severity >5 being compared to those with normal CTs only or those with normal or mild / moderate CTs? Please provide more specific definitions of normal, "CT abnormality" and "severe CT abnormality" and provide the number of people in each category and specify the comparator groups in all analyses.

    5. Similarly, how can GGO @V3 be used a potential explanatory variable for the outcome CT abnormalities @V3 when these 2 variables are clearly non-independent. Inclusion of highly related and likely correlated variables may throw off the overall conclusions of the clustering analysis

    6. In Figure 6, the criteria for the low, medium, and high-risk subsets are unclear. Is this high risk for persistent functional abnormality, radiologic abnormality, or both? Why were 3 sub populations selected? Was this done subjectively based on the clustering algorithm?

    7. The accuracy and sensitivity of the machine learning approaches shown in S5 & S6 are somewhat limited. Please comment on why such highly granular data can only provide limited prediction about degree of lung damage post infection. Are there missing data types that might make the algorithm more predictive?

    8. The authors state that "the sole application of a lung function measurement at screening for subjects at risk of delayed lung recovery may bear insufficient sensitivity". I am not sure that I agree with this assessment. From the perspective of a patient, full recovery of lung function with limited or no residual symptoms, even in the presence of residual chest CT abnormalities, seems like a favorable outcome. I would suggest either changing this statement or providing citations that associate residual chest CT abnormalities (in the absence of residual functional lung dysfunction) with adverse long-term outcomes. Do the authors hypothesize that persistent radiologic abnormalities may predate organizing pneumonia which will ultimately become symptomatic?

    9. The authors note selection bias against ordering CT and perhaps inflammatory markers early during infection as a limitation. I would suggest a sensitivity analysis to understand whether this misclassification will impact the model's predictions.

  5. SciScore for 10.1101/2021.06.22.21259316: (What is this?)

    Please note, not all rigor criteria are appropriate for all manuscripts.

    Table 1: Rigor

    EthicsConsent: N = 18 subjects denied to give an informed consent, N = 27 were declared difficulties to appear at the study follow-ups.
    IRB: The study was approved by the institutional review board at the Medical University of Innsbruck (approval number: 1103/2020), and registered at ClinicalTrials.gov (NCT04416100).
    Sex as a biological variablenot detected.
    RandomizationPrediction of lung lesions by distance weighted kNN17 and naive Bayes18 algorithms was tested in 200 random training/test subset splits of the cohort data (training n = 80, test n = 28).
    Blindingnot detected.
    Power Analysisnot detected.

    Table 2: Resources

    No key resources detected.


    Results from OddPub: We did not detect open data. We also did not detect open code. Researchers are encouraged to share open data when possible (see Nature blog).


    Results from LimitationRecognizer: We detected the following sentences addressing limitations in the study:
    Our study bears limitations primarily concerning the low sample size and the cross-sectional character of the trial. Furthermore, data incompleteness and selection bias linked to disease severity (e. g. mild cases were not subjected to CT scans during acute COVID-19) resulted in a considerable dropout rate and potentially confounded the clustering and risk prediction analyses. Additionally, the candidate risk factors and the risk-assessment algorithms of perturbed pulmonary recovery presented here call for verification in a larger, independent multi-center collective of COVID-19 convalescents. In summary, we herein present a comprehensive description of the resolution of symptoms and structural pulmonary abnormalities in the first 6 months of COVID-19 convalescence. We report a high frequency of lung abnormalities and symptoms present in almost half of the studied population and a flattened recovery kinetics after three-months post-COVID-19. Systematic risk modeling and clustering analysis reveled a set of clinical variables linked to protracted recovery apart from the severity of acute infection such as inflammatory markers, anti-S1/S2 IgG, multi-morbidity, and male sex. Of practical importance, we demonstrate that automated classification algorithms may help to identify individuals at risk of persistent lung lesions and relocate resources to prevent long-term disability.

    Results from TrialIdentifier: We found the following clinical trial numbers in your paper:

    IdentifierStatusTitle
    NCT04416100RecruitingDevelopment of Interstitial Lung Disease (ILD) in Patients W…


    Results from Barzooka: We did not find any issues relating to the usage of bar graphs.


    Results from JetFighter: We did not find any issues relating to colormaps.


    Results from rtransparent:
    • Thank you for including a conflict of interest statement. Authors are encouraged to include this statement when submitting to a journal.
    • Thank you for including a funding statement. Authors are encouraged to include this statement when submitting to a journal.
    • Thank you for including a protocol registration statement.

    Results from scite Reference Check: We found no unreliable references.


    About SciScore

    SciScore is an automated tool that is designed to assist expert reviewers by finding and presenting formulaic information scattered throughout a paper in a standard, easy to digest format. SciScore checks for the presence and correctness of RRIDs (research resource identifiers), and for rigor criteria such as sex and investigator blinding. For details on the theoretical underpinning of rigor criteria and the tools shown here, including references cited, please follow this link.