Insights from a Pan India Sero-Epidemiological survey (Phenome-India Cohort) for SARS-CoV-2

Curation statements for this article:
  • Curated by eLife

    eLife logo

    Evaluation Summary:

    The paper presents results of a serological survey done on 10,000+ employees and workers associated with CSIR labs in India during August-September 2020. This is a one of the few surveys with a pan-India footprint, making it a valuable addition to understanding of Covid-19 pandemic evolution in the country. The reviewers felt that the writing could have been tighter and more targeted.

    (This preprint has been reviewed by eLife. We include the public reviews from the reviewers here; the authors also receive private feedback with suggested changes to the manuscript. Reviewer #2 and Reviewer #3 agreed to share their names with the authors.)

This article has been Reviewed by the following groups

Read the full article See related articles

Abstract

To understand the spread of SARS-CoV2, in August and September 2020, the Council of Scientific and Industrial Research (India), conducted a sero-survey across its constituent laboratories and centers across India. Of 10,427 volunteers, 1058 (10.14%) tested positive for SARS CoV2 anti-nucleocapsid (anti-NC) antibodies; 95% with surrogate neutralization activity. Three-fourth recalled no symptoms. Repeat serology tests at 3 (n=346) and 6 (n=35) months confirmed stability of antibody response and neutralization potential. Local sero-positivity was higher in densely populated cities and was inversely correlated with a 30 day change in regional test positivity rates (TPR). Regional seropositivity above 10% was associated with declining TPR. Personal factors associated with higher odds of sero-positivity were high-exposure work (Odds Ratio, 95% CI, p value; 2·23, 1·92–2·59, 6·5E-26), use of public transport (1·79, 1·43–2·24, 2·8E-06), not smoking (1·52, 1·16–1·99, 0·02), non-vegetarian diet (1·67, 1·41–1·99, 3·0E-08), and B blood group (1·36,1·15-1·61, 0·001).

Impact Statement

Widespread asymptomatic and undetected SARS-CoV2 infection affected more than a 100 million Indians by September 2020. Declining new cases thereafter may be due to persisting humoral immunity amongst sub-communities with high exposure.

Funding

Council of Scientific and Industrial Research, India (CSIR)

Article activity feed

  1. Author Response to Public Reviews

    Reviewer #1 (Public Review):

    [...] The deficiencies of this study are:

    1. This is a very specific cohort, largely urban, with - presumably - relatively higher levels of education. It is hard to see how this might translate into a general statement about the population

    We agree with the reviewer that this is a very specific cohort, largely urban, and with higher levels of education than average. We further agree that the utility of this cohort is not in making general statements about the population, but rather in deriving specific insights for which the cohort is best suited. We enumerate some of them that are present in this manuscript:

    a) It is as important to understand the relative degree of spread between Indian cities, where a combination of denser population and indoor lives has led to the greatest spread of disease. Since pandemics are typically self-limiting, regions with greater spread are further along the course and can expect declines faster. This provides useful insight for public health strategy. While our cohort does not necessarily represent the average population, it is similar between cities, something that is not true for any other survey. The ICMR national serosurvey is a random selection of districts and is heavily rural biased.1 While that is important, that is not where fast growing outbreaks are likely based on a very outdoor life and lower density. Other city-wise serosurveys are variable in target population as well as methodology and cannot be easily compared.2-5 Thus our data is the first that permits comparison between many important urban regions of India, showing which regions were more advanced along the course and where future outbreaks were still likely. We note here that some of the regions identified by this survey as high risk such as Kerala, interior Maharashtra, amongst others, are where the outbreaks continued until much later.

    b) The CSIR cohort has the added advantage of greater baseline data and repeated access, we are able to determine antibody stability, as shown, and possible correlates.

    c) The cohort is well suited to understanding clinical associations of SARS CoV2 infections such as symptom rate and severity amongst its participants as well as associations of infection risks (using seropositivity as an imperfect surrogate).

    1. The presentation of Figure 1 was quite confusing, especially the colour coding

    Figure 1 was made to represent cities with CSIR labs where the sero-survey was carried out in different colour coding formats to have a quick understanding of prevalence. The cities with sero-prevalence greater than 10 percent were coded as green, while cities with seroprevalence between 5-10 percent values were coded as yellow. Cities with less than 5 percent sero-positivity were depicted as red for these may turn up into hotspots or rise of cases may be higher in these cities later when sero-positivity is used as an indirect surrogate of infection. Though, these cities while truly may not represent the state population, the state colour were coded as a gradient blue in respective format to reflect increased sero-positivity in a darker shade according to city sero-positivity. We realize that colour coding of states may have created a confusion and have removed this in the revised manuscript.

    1. It is surprising that the state of Maharashtra shows only intermediate to low levels of seropositivity, given that the impact of the pandemic was largest there and especially in the city of Pune. There have been alternative serosurveys for Pune which found much higher levels of seropositivity from about the same period.

    The Pune city sero-surveillance which has been pointed out by the reviewer was a survey of Pune’s five most affected sub-wards and not the Pune population in general. 6 Despite all the limitations, which we accept in the prior comment, our overall crude positivity rate of 10% is very similar to that of the ICMR national serosurvey, and in general the patterns we see are along the lines of what is known about severity of outbreaks. Thus, there is no real evidence to the contrary that would establish inaccuracy of the trends seen by us, and we respectfully note that surprising findings may be the most valuable ones. In fact, seeing current trends of rising cases in Maharashtra, including in Pune, when compared to other cities, our survey values may have been more correct.

    1. The statement "Seropositivity of 10% or more was associated with reductions in TPR which may mean declining transmission": For a disease with R of about 2, this would actually be somewhat early in the epidemic, so you wouldn't expect to see this in an indicator such as TPR. TPR is also strongly correlated with amounts of testing which isn't accounted for.

    We agree that for R of about 2, one would not expect a decline at sero-positivity of about 10%. However, it is worth noting that general seropositivity during the declining phase of the outbreak has been in this range for not just India, but also in major western European cities, New York, amongst other. This has three explanations. First, the highly exposed community containing the high-contact spreaders gets infected first, with higher seropositivity, thus effectively shortening or blocking transmission chains. We too note a much higher seropositivity amongst public transport users who may better represent this sub-population. Second, R0 of 2-3 is the potential of this virus. R-effective after measures are put in place may be much lower. Better compliance with masking in India may have been important. Last, the fraction of population immune at baseline is unknown but has been variably estimated at 20-30% from T cell reactivity studies as well as closed area breakouts such as ships. This is a speculative area but may help understand the results.

    We agree that we do not directly account for testing rate, which is difficult to adjust for and can affect TPR despite that fact that TPR already is one way of adjusting for different levels of testing. Since our data is a trend across different geographies, but for 30 days bracketing the sample collection, different testing rate would not in itself explain the very strong inverse association of seropositivity with TPR. Given that high seropositivity areas are likely more advanced in the course of the pandemic, we favour that as the explanation. This is after noting the issues with overall seropositivity as a surrogate of population immunity as above.

    1. The correlation with vegetarianism is unusual - you might have argued that this could potentially protect against disease but that it might protect against infection is hard to credit. Much of South Asia is not particularly vegetarian but has seen significantly less impact

    We very well agree with the statement that much of South-Asia is particularly non-vegetarian and when we started analyzing our data, it was observed that our cohort had a 70:30 ratio for non-vegetarian population to vegetarian population which was in agreement with what nationwide surveys have concluded in the past and hence our cohort was not biased in terms of sampling for this variable. We hereby in this work have tried to demonstrate seropositivity as an indirect surrogate of infection and the data was not analyzed in respected of zonal distribution and was analyzed for the entire cohort where we obtained the said observation. At this stage, we cannot speculate on the role a vegetarian diet may play in decrease sero-positivity amongst vegetarian individuals but could possibly relate it to antiinflammatory effects and effect of high fibre diet in protecting gut mucosa against viral invasion. Existing studies have only speculated on the role diet could play and there are no affirmative or largely biochemical studies to provide further evidence on this cause effect relationship. We also did a multi-collinearity analysis to study if diet was related to any other variable being studied but we didn’t find any such association.

    1. On the same point above, it is possible that social stratification associated with diet - direct employees being more likely to be vegetarian than contract workers - might be a confounder here, since outsourced staff seem to be at higher risk.

    When we analyzed the data, we also hypothesized for the above stated bias; a person’s occupation or job reflecting indirectly the socio-economic status can have an influence on diet preferences, but we didn’t obtain such a finding. In our cohort also, outsourced staff had higher non-vegetarianism than staff. Against 70:30 ratio of non-vegetarianism to vegetarianism, for the entre cohort, outsourced staff had 83 percent non-vegetarianism while staff had 66 percent, but sero-positivity amongst non-vegetarians in both the groups had higher sero-positivity of 17.25 and 8.77 percent respectively against sero-positivity of 11.89 and 6.05 percent amongst vegetarian people. We also did a logistic regression and collinearity assessment through VIF score but did not observe any such association and hence this was not acting as a confounder. For females, we rather didn’t found this association and only found transport and occupation to be significant, hence to a certain extent it is the crowding environment and occupational exposure which stand as major exposure variables when both the genders are taken into consideration.

    1. There may be correlations to places of residence that again act as confounders. If direct employees are provided official accommodation, they may simply have had less exposure, being more protected.

    We agree with this statement and have stated that outsourced personnel and staff who have to travel and specifically utilize public means of transport are exposed to a higher risk. We regret if this was not clear. The subgroup of people who use public transport reflect a more generalizable sub-population from within the cohort, with all associated risks and confounders. While we attempt a regression to separate a few of them, that is not the primary focus of this work.

    1. The correlations with blood group don't seem to match what is known from elsewhere

    We are unsure of what the reviewer is matching our data with, but have tried to explain why we consider our results to be broadly concordant. As advised elsewhere, this has not been detailed in the revised manuscript. Data for 7496 individuals was available for their Blood Group type and serological status. Blood Group (BG) distribution amongst total samples collected was similar to national reference based on a recent systematic review. Hence the sample characteristics of our cohort were similar to the national population reference. Through the literature available, it has been observed that ‘O’ BG type has less risk of getting infected which was observed in our study also. In our study, BG type O was associated with a lower sero-positivity rate, with an OR of 0∙76 (95 % CI 0∙64 -0∙91, p=0∙018) vs Non O blood group types with a overall sero-positivity of 7.09 percent which was less than the cohort wide sero-seropositivity. BG type AB and B had higher chances of testing sero-positive is what has been observed by available literature which was corroborated in our findings too. In regard to available literature; BG A has a higher risk of getting infection and this was contrary to our finding where we obtained a favourable OR in favour BG type A albeit it was not significant on statistical testing.

    1. The statement that "declining cases may reflect persisting humeral immunity among sub-communities with higher exposure" is unsupported. What sub-communities?

    We regret the lack of clarity. The wording has now been corrected, it just refers to subgroups of population with high levels of exposure.

    Reviewer #2 (Public Review):

    1. The extrapolation of the study results to the country may not be completely acceptable with the basic difference from the country's urban rural divide and a largely agricultural economy. The female gender is underrepresented in the study cohort, and no children have been included.

    We agree with the reviewer that this is a specific cohort, largely urban. We also agree that a cohort of people utilizing public transport would be better representative and we are following the individuals as the cohort enables to follow them and get further insights. We further agree that the utility of this cohort is not in making general statements about the population, but rather in deriving specific insights for which the cohort is best suited. We enumerate some of them that are present in this manuscript. (See also response to Reviewer #1.)

    1. The observations regarding corelates of sero-positivity such as diet smoking etc would need specifically designed adequately powered studies to confirm the same. The sample size for the three and six months follow up to conclude stability of the humoral immunity, is small and requires further follow-up of the cohort. The role of migration of labour helping the spread of the pandemic simultaneously to all parts of the country though attractive may not explain lower rates in states like UP and Bihar where maximum migrants moved to.

    We agree that the observations in regard to diet and smoking are only hypothesis generating and need specifically designed studies to confirm the findings. We have also mentioned in the manuscript that associations found between seropositivity and some of the parameters should be confirmed with studies specifically designed for this purpose. We are following up more individuals at three and six months to ascertain the stability of the antibodies. Maximum migrants in the early phase moved to UP and Bihar and it would indeed be expected that seeding would be higher there. While known cases were low for these states, the seropositivity data supports that seeding did occur but may have gone undetected. The ICMR Aug-Sept serosurvey data, for example, shows seropositivity in districts of these states to be higher than those Gujarat or Rajasthan.

    1. A large chunk of seropositive data set has been removed representing the big cities of Delhi and Bengaluru while correlating Test Positivity Rate citing duration as the reason. However, these cities also had different testing strategies and health infrastructure and hence are important.

    We agree that some data was removed. This was because the sample collection was extended over a longer interval, making point estimates meaningless for some labs, especially CSIR-IGIB which conducted many mini-surveys. For Delhi, only IGIB has been removed and other labs are still kept in the analysis. The graph directionality and trends remain same when analyzed with the excluded data. On keeping Bengaluru data, R square doesn’t change to second decimal place and remains same. When adding back the data from CSIR-IGIB, the R square is 0.32, maintaining the directionality and trend.

    1. Test positivity rate depends on testing strategy and type of test used; whether RTPCR or the Rapid Antigen Test and the ratio of the two tests was different in different parts of the country.

    This is a very well taken point, but the data was taken as a surrogate from a third party website and the further breakup of positivity rate was not available. It should of course be done ideally with one type of test only but this was not possible. Our larger point is that for a given part of the country TPR went down when seropositivity went up. This is relevant even if different parts of the country used different ratios of the test.

    Reviewer #3 (Public Review):

    [...] Weaknesses: While it is a pan-India survey, the population is not quite representative of general population of the country. CSIR labs are mostly in cities, and most of the employees use private transport. So the results cannot be generalized to the country as a whole. Restricting to people using public transport would be a better representation, although it still would not be fully representative.

    We agree with the reviewer that this is a specific cohort, largely urban. We also agree that a cohort of people utilizing public transport would be better representative and we are following the individuals as the cohort enables to follow them and get further insights. We further agree that the utility of this cohort is not in making general statements about the population, but rather in deriving specific insights for which the cohort is best suited. We enumerate some of them that are present in this manuscript. (See also response to Reviewer #1.)

  2. Author Response:

    Reviewer 1:

    The deficiencies of this study are:

    1. This is a very specific cohort, largely urban, with - presumably - relatively higher levels of education. It is hard to see how this might translate into a general statement about the population

    We agree with the reviewer that this is a very specific cohort, largely urban, and with higher levels of education than average. We further agree that the utility of this cohort is not in making general statements about the population, but rather in deriving specific insights for which the cohort is best suited. We enumerate some of them that are present in this manuscript.

    a) It is as important to understand the relative degree of spread between Indian cities, where a combination of denser population and indoor lives has led to the greatest spread of disease. Since pandemics are typically self-limiting, regions with greater spread are further along the course and can expect declines faster. This provides useful insight for public health strategy. While our cohort does not necessarily represent the average population, it is similar between cities, something that is not true for any other survey. The ICMR national serosurvey is a random selection of districts and is heavily rural biased.1 While that is important, that is not where fast growing outbreaks are likely based on a very outdoor life and lower density. Other city-wise serosurveys are variable in target population as well as methodology and cannot be easily compared.2-5 Thus our data is the first that permits comparison between many important urban regions of India, showing which regions were more advanced along the course and where future outbreaks were still likely. We note here that some of the regions identified by this survey as high risk such as Kerala, interior Maharashtra, amongst others, are where the outbreaks continued until much later.

    b) The CSIR cohort has the added advantage of greater baseline data and repeated access, we are able to determine antibody stability, as shown, and possible correlates

    c) The cohort is well suited to understanding clinical associations of SARS CoV2 infections such as symptom rate and severity amongst its participants as well as associations of infection risks (using seropositivity as an imperfect surrogate).

    1. The presentation of Figure 1 was quite confusing, especially the colour coding

    Figure 1 was made to represent cities with CSIR labs where the sero-survey was carried out in different colour coding formats to have a quick understanding of prevalence. The cities with sero-prevalence greater than 10 percent were coded as green, while cities with sero-prevalence between 5-10 percent values were coded as yellow. Cities with less than 5 percent sero-positivity were depicted as red for these may turn up into hotspots or rise of cases may be higher in these cities later when sero-positivity is used as an indirect surrogate of infection. Though, these cities while truly may not represent the state population, the state colour were coded as a gradient blue in respective format to reflect increased sero-positivity in a darker shade according to city sero-positivity.

    1. It is surprising that the state of Maharashtra shows only intermediate to low levels of seropositivity, given that the impact of the pandemic was largest there and especially in the city of Pune. There have been alternative serosurveys for Pune which found much higher levels of seropositivity from about the same period.

    The Pune city sero-surveillance which has been pointed out by the reviewer was a survey of Pune’s five most affected sub-wards and not the Pune population in general. 6 Despite all the limitations, which we accept in the prior comment, our overall crude positivity rate of 10% is very similar to that of the ICMR national serosurvey, and in general the patterns we see are along the lines of what is known about severity of outbreaks. Thus, there is no real evidence to the contrary that would establish inaccuracy of the trends seen by us, and we respectfully note that surprising findings may be the most valuable ones. In fact, seeing current trends of rising cases in Maharashtra, including in Pune, when compared to other cities, our survey values may have been more correct.

    1. The statement "Seropositivity of 10% or more was associated with reductions in TPR which may mean declining transmission": For a disease with R of about 2, this would actually be somewhat early in the epidemic, so you wouldn't expect to see this in an indicator such as TPR. TPR is also strongly correlated with amounts of testing which isn't accounted for.

    We agree that for R of about 2, one would not expect a decline at sero-positivity of about 10%. However, it is worth noting that general seropositivity during the declining phase of the outbreak has been in this range for not just India, but also in major western European cities, New York, amongst other.7 This has three explanations. First, the highly exposed community containing the high-contact spreaders gets infected first, with higher seropositivity, thus effectively shortening or blocking transmission chains. We too note a much higher seropositivity amongst public transport users who may better represent this sub-population. Second, R0 of 2-3 is the potential of this virus. R-effective after measures are put in place may be much lower.8 9 Better compliance with masking in India may have been important. Last, the fraction of population immune at baseline is unknown but has been variably estimated at 20-30% from T cell reactivity studies as well as closed area breakouts such as ships. This is a speculative area but may help understand the results. We agree that we do not directly account for testing rate, which is difficult to adjust for and can affect TPR despite that fact that TPR already is one way of adjusting for different levels of testing. Since our data is a trend across different geographies, but for 30 days bracketing the sample collection, different testing rate would not in itself explain the very strong inverse association of seropositivity with TPR. Given that high seropositivity areas are likely more advanced in the course of the pandemic, we favour that as the explanation. This is after noting the issues with overall seropositivity as a surrogate of population immunity as above.

    1. The correlation with vegetarianism is unusual - you might have argued that this could potentially protect against disease but that it might protect against infection is hard to credit. Much of South Asia is not particularly vegetarian but has seen significantly less impact

    We very well agree with the statement that much of South-Asia is particularly non-vegetarian and when we started analyzing our data, it was observed that our cohort had a 70:30 ratio for non-vegetarian population to vegetarian population which was in agreement with what nationwide surveys have concluded in the past and hence our cohort was not biased in terms of sampling for this variable. 10 We hereby in this work have tried to demonstrate sero-positivity as an indirect surrogate of infection and the data was not analyzed in respected of zonal distribution and was analyzed for the entire cohort where we obtained the said observation. At this stage, we cannot speculate on the role a vegetarian diet may play in decrease sero-positivity amongst vegetarian individuals but could possibly relate it to anti-inflammatory effects and effect of high fibre diet in protecting gut mucosa against viral invasion. Existing studies have only speculated on the role diet could play and there are no affirmative or largely biochemical studies to provide further evidence on this cause effect relationship.11 12 We also did a multi-collinearity analysis to study if diet was related to any other variable being studied but we didn’t find any such association.

    1. On the same point above, it is possible that social stratification associated with diet - direct employees being more likely to be vegetarian than contract workers - might be a confounder here, since outsourced staff seem to be at higher risk.

    When we analyzed the data, we also hypothesized for the above stated bias; a person’s occupation or job reflecting indirectly the socio-economic status can have an influence on diet preferences, but we didn’t obtain such a finding. In our cohort also, outsourced staff had higher non-vegetarianism than staff. Against 70:30 ratio of non-vegetarianism to vegetarianism, for the entre cohort, outsourced staff had 83 percent non-vegetarianism while staff had 66 percent, but sero-positivity amongst non-vegetarians in both the groups had higher sero-positivity of 17.25 and 8.77 percent respectively against sero-positivity of 11.89 and 6.05 percent amongst vegetarian people. We also did a logistic regression and collinearity assessment through VIF score but did not observe any such association and hence this was not acting as a confounder. For females, we rather didn’t found this association and only found transport and occupation to be significant, hence to a certain extent it is the crowding environment and occupational exposure which stand as major exposure variables when both the genders are taken into consideration.

    1. There may be correlations to places of residence that again act as confounders. If direct employees are provided official accommodation, they may simply have had less exposure, being more protected.

    That was a standing hypothesis for this work as CSIR labs provides accommodation at campus at most of the labs, this data we couldn’t study as the variable was not available for where we could have observed the residence status of a person if he/she resides in office provided accommodation or outside the lab in city. Though we didn’t study this exclusively, it remains more of speculative than affirmative but this is in agreement for a hypothesis that outsourced personnel and staff who have to travel and specifically utilize public means of transport are exposed to a higher risk.

    1. The correlations with blood group don't seem to match what is known from elsewhere.

    Data for 7496 individuals was available for their Blood Group type and serological status. Blood Group (BG) distribution amongst total samples collected was similar to national reference based on a recent systematic review.13 Hence the sample characteristics of our cohort were similar to the national population reference. Through the literature available, it has been observed that ‘O’ BG type has less risk of getting infected which was observed in our study also.14-19 In our study, BG type O was associated with a lower sero-positivity rate, with an OR of 0∙76 (95 % CI 0∙64 -0∙91, p=0∙018) vs Non O blood group types with a overall sero-positivity of 7.09 percent which was less than the cohort wide sero-seropositivity. BG type AB and B had higher chances of testing sero-positive is what has been observed by available literature which was corroborated in our findings too.17 In regard to available literature; BG A has a higher risk of getting infection and this was contrary to our finding where we obtained a favourable OR in favour BG type A albeit it was not significant on statistical testing.14-16 18

    1. The statement that "declining cases may reflect persisting humeral immunity among sub-communities with higher exposure" is unsupported. What sub-communities?

    Wording has been corrected, it just refers to sub-groups of population with high levels of exposure

    Reviewer 2:

    Weaknesses:

    1. The extrapolation of the study results to the country may not be completely acceptable with the basic difference from the country's urban rural divide and a largely agricultural economy. The female gender is underrepresented in the study cohort, and no children have been included.

    We agree with the reviewer that this is a specific cohort. We agree that female gender in the cohort is underrepresented and hence all variable based associations were done separately for male and female. For low number of female samples in the cohort, association with smoking etc could not be carried out, while, it was obtained as not significant on model testing for diet variable. As the ethical approval didn’t permit us to have data on children, we couldn’t provide the same, but it is complemented through ICMR survey who have provided data for younger individuals. We further agree that the utility of this cohort is not in making general statements about the population, but rather in deriving specific insights for which the cohort is best suited. We enumerate some of them that are present in this manuscript.

    a) It is as important to understand the relative degree of spread between Indian cities, where a combination of denser population and indoor lives has led to the greatest spread of disease. Since pandemics are typically self-limiting, regions with greater spread are further along the course and can expect declines faster. This provides useful insight for public health strategy. While our cohort does not necessarily represent the average population, it is similar between cities, something that is not true for any other survey. The ICMR national sero-survey is a random selection of districts and is heavily rural biased.1 While that is important, that is not where fast growing outbreaks are likely based on a very outdoor life and lower density. Other city-wise serosurveys are variable in target population as well as methodology and cannot be easily compared.2-5 Thus our data is the first that permits comparison between many important urban regions of India, showing which regions were more advanced along the course and where future outbreaks were still likely. We note here that some of the regions identified by this survey as high risk such as Kerala, interior Maharashtra, amongst others, are where the outbreaks continued until much later.

    b) The CSIR cohort has the added advantage of greater baseline data and repeated access, we are able to determine antibody stability, as shown, and possible correlates

    c) The cohort is well suited to understanding clinical associations of SARS CoV2 infections such as symptom rate and severity amongst its participants as well as associations of infection risks (using seropositivity as an imperfect surrogate)

    1. The observations regarding corelates of sero-positivity such as diet smoking etc would need specifically designed adequately powered studies to confirm the same. The sample size for the three and six months follow up to conclude stability of the humoral immunity, is small and requires further follow-up of the cohort. The role of migration of labour helping the spread of the pandemic simultaneously to all parts of the country though attractive may not explain lower rates in states like UP and Bihar where maximum migrants moved to.

    We agree that the observations in regard to diet and smoking are only hypothesis generating and need specifically designed studies to confirm the findings. We have also mentioned in the manuscript that associations found between seropositivity and some of the parameters should be confirmed with studies specifically designed for this purpose. We are following up more individuals at three and six months to ascertain the stability of the antibodies. Maximum migrants in the early phase moved to UP and Bihar and it would indeed be expected that seeding would be higher there. While known cases were low for these states, the seropositivity data supports that seeding did occur but may have gone undetected. The ICMR Aug-Sept serosurvey data, for example, shows seropositivity in districts of these states to be higher than those Gujarat or Rajasthan.

    1. A large chunk of seropositive data set has been removed representing the big cities of Delhi and Bengaluru while correlating Test Positivity Rate citing duration as the reason. However, these cities also had different testing strategies and health infrastructure and hence are important.

    We agree that for these cities, data was removed considering sample collection was extended in these labs, though for Delhi, only IGIB has been removed, rest all Delhi labs data are still in the analysis. The data was removed for the mentioned reason above but, the graph directionality and trends remain same when analyzed with the excluded data. On keeping Bengaluru data, R square doesn’t change to second decimal place and remains same, while on adding the data from CSIR-IGIB, the R square is 0.32, maintaining the directionality and trend. We would like to state that for IGIB, the collection spanned over considerable time duration when the sero-positivity was low to the time when sero-positivity had come to mentioned levels in Delhi.

    1. Test positivity rate depends on testing strategy and type of test used; whether RTPCR or the Rapid Antigen Test and the ratio of the two tests was different in different parts of the country.

    This is a very well taken point, but the data was taken as a surrogate from a third party website for calculation purposes only with results obtained were logically expected, though yes, it should be done ideally with one type of test only and this could influence the outcome and interpretation , but when we saw the data and compare to the observed real trends, the graph directionality is in agreement and hence the adoption of these as surrogate could well work in these scenarios specifically when context is in of large scale heterogenous population.

    Reviewer 3:

    Weaknesses: While it is a pan-India survey, the population is not quite representative of general population of the country. CSIR labs are mostly in cities, and most of the employees use private transport. So the results cannot be generalized to the country as a whole. Restricting to people using public transport would be a better representation, although it still would not be fully representative.

    We agree with the reviewer that this is a specific cohort, largely urban. We also agree that a cohort of people utilizing public transport would be better representative and we are following the individuals as the cohort enables to follow them and get further insights. We further agree that the utility of this cohort is not in making general statements about the population, but rather in deriving specific insights for which the cohort is best suited. We enumerate some of them that are present in this manuscript.

    a) It is as important to understand the relative degree of spread between Indian cities, where a combination of denser population and indoor lives has led to the greatest spread of disease. Since pandemics are typically self-limiting, regions with greater spread are further along the course and can expect declines faster. This provides useful insight for public health strategy. While our cohort does not necessarily represent the average population, it is similar between cities, something that is not true for any other survey. The ICMR national sero-survey is a random selection of districts and is heavily rural biased.1 While that is important, that is not where fast growing outbreaks are likely based on a very outdoor life and lower density. Other city-wise serosurveys are variable in target population as well as methodology and cannot be easily compared.2-5 Thus our data is the first that permits comparison between many important urban regions of India, showing which regions were more advanced along the course and where future outbreaks were still likely. We note here that some of the regions identified by this survey as high risk such as Kerala, interior Maharashtra, amongst others, are where the outbreaks continued until much later.

    b) The CSIR cohort has the added advantage of greater baseline data and repeated access, we are able to determine antibody stability, as shown, and possible correlates

    c) The cohort is well suited to understanding clinical associations of SARS CoV2 infections such as symptom rate and severity amongst its participants as well as associations of infection risks (using seropositivity as an imperfect surrogate).

  3. Reviewer #3 (Public Review):

    The paper presents results of a serological survey done on 10,000+ employees and workers associated with CSIR labs in India during August-September 2020. The survey finds 10.14% seropositively. In addition, correlations are drawn between seropositivity and biological and lifestyle factors. A follow up study is also done on a subset of employees found seropositive and antibodies are found to survive even after six months in most.

    Strengths: This is a one of the two surveys with a pan-India footprint, making it a valuable addition to understanding of Covid-19 pandemic evolution in the country. It also finds good inverse correlation between seropositivity and (i) blood group O, (ii) vegetarian diet, (iii) smoking, and (iv) use of private transport. While (iv) is obvious, (ii) and (iii) are a little surprising. It suggests a deeper study is required to understand the reasons behind it.

    Weaknesses: While it is a pan-India survey, the population is not quite representative of general population of the country. CSIR labs are mostly in cities, and most of the employees use private transport. So the results cannot be generalized to the country as a whole. Restricting to people using public transport would be a better representation, although it still would not be fully representative.

    The data collection and analysis are done meticulously, and provide some new insights into differential impact of Covid-19 virus on people.

  4. Reviewer #2 (Public Review):

    The authors have been able to carry out a well-planned countrywide sero-survey in a cohort of 10,427 employees of their organization with 23 laboratories spread over 17 states and 2 union territories. The reported sero-positivity of 10.14% among persons mainly from cities and towns, helps understand the spread of the pandemic across the country and corelates well with the point prevalence of active infections in the various states of India during the same period. It helps understand the role of asymptomatic cases in increasing sero-positivity as 2/3 of the personnel could not remember any symptoms or illness.

    Strengths:

    1. The strength of this study is a large pan India cohort with all demographic details captured, which can be easily followed up. The sero-positivity datasets corelate well with the national Covid cases data in the states of India as reported in the public domain during the same time frame. The time period of Aug Sept after the mass migration of labourers from cities to rural India was possibly responsible for a quick spread of the infection and this study is able to capture the same effectively.

    2. The study has also correlated the antibodies to Nuclear Capsid Antigen with the Neutralizing antibody levels and the correlation is good. However, this needs to be followed up to interpret humoral stability especially with the interesting observation of declining Antibodies to nuclear capsid antigen at six months but levels of neutralizing antibodies being stable after an initial drop at three months.

    3. The study demonstrates an inverse correlation between the changes in test positivity rate and sero-positivity suggesting reduced transmission with increasing sero-positivity. The sero-positivity was higher in densely populated areas suggesting faster transmission.

    Weakness:

    1. The extrapolation of the study results to the country may not be completely acceptable with the basic difference from the country's urban rural divide and a largely agricultural economy. The female gender is underrepresented in the study cohort, and no children have been included.

    2. The observations regarding corelates of sero-positivity such as diet smoking etc would need specifically designed adequately powered studies to confirm the same. The sample size for the three and six months follow up to conclude stability of the humoral immunity, is small and requires further follow-up of the cohort. The role of migration of labour helping the spread of the pandemic simultaneously to all parts of the country though attractive may not explain lower rates in states like UP and Bihar where maximum migrants moved to.

    3. A large chunk of seropositive data set has been removed representing the big cities of Delhi and Bengaluru while correlating Test Positivity Rate citing duration as the reason. However, these cities also had different testing strategies and health infrastructure and hence are important.

    4. Test positivity rate depends on testing strategy and type of test used; whether RTPCR or the Rapid Antigen Test and the ratio of the two tests was different in different parts of the country.

    Overall a good study where the authors have been able to effectively show a relatively high sero-positivity than reported infections possibly due to asymptomatic cases. It will be able to provide insight into immune memory in COVID 19 as they continue with follow-up quantitative sero-assay for the cohort

  5. Reviewer #1 (Public Review):

    The paper "Insights from a Pan India Sero-Epidemiological survey (Phenome-India cohort) for SARS-CoV2" reports a longitudinal survey of about 10000 subjects from laboratories of the CSIR (India) who consented to be tested for antibodies to SARS-CoV-2, across August and September 2020. The methodology is a standard one, using the Roche kit to test for antibodies to the nucleocapsid antigen with a followup to detect neutralising antibodies using the GENScript Kit. A questionnaire for all participants asked about age, gender, pre-existing conditions and blood group, among other questions.

    The principal results of the study were:

    1. An overall seropositivity of 10.14% [95% CI: 9.6 - 10.7] but a large variation across locations

    2. Virtually all of the seropositive exhibited neutralising activity

    3. Seropositivity correlated with population density in different locations

    4. A weak correlation was seen to changes in the test positivity across locations

    5. A large asymptomatic fraction (~75%) who did not recall symptoms

    6. Of those symptomatic, most reported mild flu-like symptoms with fever

    7. A correlation with blood group, with seropositivity highest for AB, follow by B, O and A

    8. A vegetarian diet correlated with reduced seropositivity

    9. Antibody levels remained constant for 3 months across a sub-sample white neutralising activity was lost in ~30% of this subsample. Over a longer period, in a still smaller subsample of those tested at 3 months, anti-nucleocapsid antibody levels declines while neutralising antibody levels remained roughly constant

    10. There is a reasonable agreement with the results of the second Indian serosurvey which obtained a seroprevalence of about 7% India-wise, although excluding urban hotspots.

    The deficiencies of this study are:

    1. This is a very specific cohort, largely urban, with - presumably - relatively higher levels of education. It is hard to see how this might translate into a general statement about the population

    2. The presentation of Figure 1 was quite confusing, especially the colour coding

    3. It is surprising that the state of Maharashtra shows only intermediate to low levels of seropositivity, given that the impact of the pandemic was largest there and especially in the city of Pune. There have been alternative serosurveys for Pune which found much higher levels of seropositivity from about the same period.

    4. The statement "Seropositivity of 10% or more was associated with reductions in TPR which may mean declining transmission": For a disease with R of about 2, this would actually be somewhat early in the epidemic, so you wouldn't expect to see this in an indicator such as TPR. TPR is also strongly correlated with amounts of testing which isn't accounted for.

    5. The correlation with vegetarianism is unusual - you might have argued that this could potentially protect against disease but that it might protect against infection is hard to credit. Much of South Asia is not particularly vegetarian but has seen significantly less impact

    6. On the same point above, it is possible that social stratification associated with diet - direct employees being more likely to be vegetarian than contract workers - might be a confounder here, since outsourced staff seem to be at higher risk.

    7. There may be correlations to places of residence that again act as confounders. If direct employees are provided official accommodation, they may simply have had less exposure, being more protected.

    8. The correlations with blood group don't seem to match what is known from elsewhere

    9. The statement that "declining cases may reflect persisting humeral immunity among sub-communities with higher exposure" is unsupported. What sub-communities?

  6. Evaluation Summary:

    The paper presents results of a serological survey done on 10,000+ employees and workers associated with CSIR labs in India during August-September 2020. This is a one of the few surveys with a pan-India footprint, making it a valuable addition to understanding of Covid-19 pandemic evolution in the country. The reviewers felt that the writing could have been tighter and more targeted.

    (This preprint has been reviewed by eLife. We include the public reviews from the reviewers here; the authors also receive private feedback with suggested changes to the manuscript. Reviewer #2 and Reviewer #3 agreed to share their names with the authors.)

  7. SciScore for 10.1101/2021.01.12.21249713: (What is this?)

    Please note, not all rigor criteria are appropriate for all manuscripts.

    Table 1: Rigor

    Institutional Review Board StatementIACUC: Study Design, Sampling and Data Collection: The longitudinal cohort study was approved by Institutional ethics committee of CSIR-IGIB.
    Consent: Informed consent was obtained from all the participants and the samples were collected maintaining all recommended precautions.
    Randomizationnot detected.
    Blindingnot detected.
    Power Analysisnot detected.
    Sex as a biological variablenot detected.

    Table 2: Resources

    Antibodies
    SentencesResources
    Blood samples (6 ml) were collected in EDTA vials from each participant and antibodies to SARS-CoV-m 2 nucleocapsid antigen were measured using an Electro-chemiluminescence Immunoassay (ECLIA)-Elecsys Anti-SARS-CoV-2 kit (Roche Diagnostics) as per the manufacturer’s protocol.
    Anti-SARS-CoV-2
    suggested: None
    Data and Statistical Analysis: Region wise and total sero-positivity was calculated from the fraction of samples positive for antibodies to SARS-CoV-2 nucleocapsid antigen.
    SARS-CoV-2 nucleocapsid antigen.
    suggested: (Proteintech Cat# 28769-1-AP, RRID:AB_2881212)

    Results from OddPub: We did not detect open data. We also did not detect open code. Researchers are encouraged to share open data when possible (see Nature blog).


    Results from LimitationRecognizer: An explicit section about the limitations of the techniques employed in this study was not found. We encourage authors to address study limitations.

    Results from TrialIdentifier: No clinical trial numbers were referenced.


    Results from Barzooka: We did not find any issues relating to the usage of bar graphs.


    Results from JetFighter: We did not find any issues relating to colormaps.


    Results from rtransparent:
    • Thank you for including a conflict of interest statement. Authors are encouraged to include this statement when submitting to a journal.
    • Thank you for including a funding statement. Authors are encouraged to include this statement when submitting to a journal.
    • No protocol registration statement was detected.

    About SciScore

    SciScore is an automated tool that is designed to assist expert reviewers by finding and presenting formulaic information scattered throughout a paper in a standard, easy to digest format. SciScore checks for the presence and correctness of RRIDs (research resource identifiers), and for rigor criteria such as sex and investigator blinding. For details on the theoretical underpinning of rigor criteria and the tools shown here, including references cited, please follow this link.