Trends in Self-citation Rates in High-impact Neurology, Neuroscience, and Psychiatry Journals

Curation statements for this article:
  • Curated by eLife

    eLife logo

    eLife assessment

    This study examines how self-citations in selected neurology, neuroscience, and psychiatry journals differ according to geography, gender, seniority, and subfield. The evidence supporting the claims is mostly convincing, but certain aspects of the analysis would benefit from further work. Overall, the article is a valuable addition to the literature on self-citations

This article has been Reviewed by the following groups

Read the full article

Abstract

Citation metrics influence academic reputation and career trajectories. Recent works have highlighted flaws in citation practices in the Neurosciences, such as the under-citation of women. However, self-citation rates—or how much authors cite themselves—have not yet been comprehensively investigated in the Neurosciences. This work characterizes self-citation rates in basic, translational, and clinical Neuroscience literature by collating 100,347 articles from 63 journals between the years 2000-2020. In analyzing over five million citations, we demonstrate four key findings: 1) increasing self-citation rates of Last Authors relative to First Authors, 2) lower self-citation rates in low- and middle-income countries, 3) gender differences in self-citation stemming from differences in the number of previously published papers, and 4) variations in self-citation rates by field. Our characterization of self-citation provides insight into citation practices that shape the perceived influence of authors in the Neurosciences, which in turn may impact what type of scientific research is done and who gets the opportunity to do it.

Article activity feed

  1. eLife assessment

    This study examines how self-citations in selected neurology, neuroscience, and psychiatry journals differ according to geography, gender, seniority, and subfield. The evidence supporting the claims is mostly convincing, but certain aspects of the analysis would benefit from further work. Overall, the article is a valuable addition to the literature on self-citations

  2. Reviewer #1 (Public review):

    In this manuscript, the authors use a large dataset of neuroscience publications to elucidate the nature of self-citation within the neuroscience literature. The authors initially present descriptive measures of self-citation across time and author characteristics; they then produce an inclusive model to tease apart the potential role of various article and author features in shaping self-citation behavior. This is a valuable area of study, and the authors approach it with a rich dataset and solid methodology.

    The revisions made by the authors in this version have greatly improved the validity and clarity of the statistical techniques, and as a result the paper's findings are more convincing.

    This paper's primary strengths are: 1) its comprehensive dataset that allows for a snapshot of the dynamics of several related fields; 2) its thorough exploration of how self-citation behavior relates to characteristics of research and researchers.

    Its primary weakness is that the study stops short of digging into potential mechanisms in areas where it is potentially feasible to do so - for example, studying international dynamics by identifying and studying researchers who move between countries, or quantifying more or less 'appropriate' self-citations via measures of abstract text similarity.

    Yet while these types of questions were not determined to be in scope for this paper, the study is quite effective at laying the important groundwork for further study of mechanisms and motivations, and will be a highly valuable resource for both scientists within the field and those studying it.

  3. Reviewer #2 (Public review):

    The study presents valuable findings on self-citation rates in the field of Neuroscience, shedding light on potential strategic manipulation of citation metrics by first authors, regional variations in citation practices across continents, gender differences in early-career self-citation rates, and the influence of research specialization on self-citation rates in different subfields of Neuroscience. While some of the evidence supporting the claims of the authors is solid, some of the analysis seems incomplete and would benefit from more rigorous approaches.

  4. Reviewer #3 (Public review):

    This paper analyses self-citation rates in the field of Neuroscience, comprising in this case, Neurology, Neuroscience and Psychiatry. Based on data from Scopus, the authors identify self-citations, that is, whether references from a paper by some authors cite work that is written by one of the same authors. They separately analyse this in terms of first-author self-citations and last-author self-citations. The analysis is well-executed and the analysis and results are written down clearly. The interpretation of some of the results might prove more challenging. That is, it is not always clear what is being estimated.

    This issue of interpretability was already raised in my review of the previous revision, where I argued that the authors should take a more explicit causal framework. The authors have now revised some of the language in this revision, in order to downplay causal language. Although this is perfectly fine, this misses the broader point, namely that it is not clear what is being estimated. Perhaps it is best to refer to Lundberg et al. (2021) and ask the authors to clarify "What is your Estimand?" In my view, the theoretical estimands the authors are interested in are causal in nature. Perhaps the authors would argue that their estimands are descriptive. In either case, it would be good if the authors could clarify that theoretical estimand.

    Finally, in my previous review, I raised the issue of when self-citations become "problematic". The authors have addressed this issue satisfactorily, I believe, and now formulate their conclusions more carefully.

    Lundberg, I., Johnson, R., & Stewart, B. M. (2021). What Is Your Estimand? Defining the Target Quantity Connects Statistical Evidence to Theory. American Sociological Review, 86(3), 532-565. https://doi.org/10.1177/00031224211004187

  5. Author response:

    The following is the authors’ response to the original reviews.

    Public Reviews:

    Reviewer #1 (Public Review):

    In this manuscript, the authors use a large dataset of neuroscience publications to elucidate the nature of self-citation within the neuroscience literature. The authors initially present descriptive measures of self-citation across time and author characteristics; they then produce an inclusive model to tease apart the potential role of various article and author features in shaping self-citation behavior. This is a valuable area of study, and the authors approach it with an appropriate and well-structured dataset.

    The study's descriptive analyses and figures are useful and will be of interest to the neuroscience community. However, with regard to the statistical comparisons and regression models, I believe that there are methodological flaws that may limit the validity of the presented results. These issues primarily affect the uncertainty of estimates and the statistical inference made on comparisons and model estimates - the fundamental direction and magnitude of the results are unlikely to change in most cases. I have included detailed statistical comments below for reference.

    Conceptually, I think this study will be very effective at providing context and empirical evidence for a broader conversation around self-citation. And while I believe that there is room for a deeper quantitative dive into some finer-grained questions, this paper will be a valuable catalyst for new areas of inquiry around citation behavior - e.g., do authors change self-citation behavior when they move to more or less prestigious institutions? do self-citations in neuroscience benefit downstream citation accumulation? do journals' reference list policies increase or decrease self-citation? - that I hope that the authors (or others) consider exploring in future work.

    Thank you for your suggestions and your generally positive view of our work. As described below, we have made the statistical improvements that you suggested.

    Statistical comments:

    (1) Throughout the paper, the nested nature of the data does not seem to be appropriately handled in the bootstrapping, permutation inference, and regression models. This is likely to lead to inappropriately narrow confidence bands and overly generous statistical inference.

    We apologize for this error. We have now included nested bootstrapping and permutation tests. We defined an “exchangeability block” as a co-authorship group of authors. In this dataset, that meant any authors who published together (among the articles in this dataset) as a First Author / Last Author pairing were assigned to the same exchangeability block. It is not realistic to check for overlapping middle authors in all papers because of the collaborative nature of the field. In addition, we believe that self-citations are primarily controlled by first and last authors, so we can assume that middle authors do not control self-citation habits. We then performed bootstrapping and permutation tests in the constraints of the exchangeability blocks.

    We first describe this in the results (page 3, line 110):

    “Importantly, we accounted for the nested structure of the data in bootstrapping and permutation tests by forming co-authorship exchangeability blocks.”

    We also describe this in 4.8 Confidence Intervals (page 21, line 725):

    “Confidence intervals were computed with 1000 iterations of bootstrap resampling at the article level. For example, of the 100,347 articles in the dataset, we resampled articles with replacement and recomputed all results. The 95% confidence interval was reported as the 2.5 and 97.5 percentiles of the bootstrapped values.

    We grouped data into exchangeability blocks to avoid overly narrow confidence intervals or overly optimistic statistical inference. Each exchangeability block comprised any authors who published together as a First Author / Last Author pairing in our dataset. We only considered shared First/Last Author publications because we believe that these authors primarily control self-citations, and otherwise exchangeability blocks would grow too large due to the highly collaborative nature of the field. Furthermore, the exchangeability blocks do not account for co-authorship in other journals or prior to 2000. A distribution of the sizes of exchangeability blocks is presented in Figure S15.”

    In describing permutation tests, we also write (page 21, line 739):

    “4.9 P values

    P values were computed with permutation testing using 10,000 permutations, with the exception of regression P values and P values from model coefficients. For comparing different fields (e.g., Neuroscience and Psychiatry) and comparing self-citation rates of men and women, the labels were randomly permuted by exchangeability block to obtain null distributions. For comparing self-citation rates between First and Last Authors, the first and last authorship was swapped in 50% of exchangeability blocks.”

    For modeling, we considered doing a mixed effects model but found difficulties due to computational power. For example, with our previous model, there were hundreds of thousands of levels for the paper random effect, and tens of thousands of levels for the author random effect. Even when subsampling or using packages designed for large datasets (e.g., mgcv’s bam function: https://www.rdocumentation.org/packages/mgcv/versions/1.9-1/topics/bam), we found computational difficulties.

    As a result, we switched to modeling results at the paper level (e.g., self-citation count or rate). We found that results could be unstable when including author-level random effects because in many cases there was only one author per group. Instead, to avoid inappropriately narrow confidence bands, we resampled the dataset such that each author was only represented once. For example, if Author A had five papers in this dataset, then one of their five papers was randomly selected. We updated our description of our models in the Methods section (page 21, line 754):

    “4.10 Exploring effects of covariates with generalized additive models

    For these analyses, we used the full dataset size separately for First and Last Authors (Table S2). This included 115,205 articles and 5,794,926 citations for First Authors, and 114,622 articles and 5,801,367 citations for Last Authors. We modeled self-citation counts, self-citation rates, and number of previous papers for First Authors and Last Authors separately, resulting in six total models.

    We found that models could be computationally intensive and unstable when including author-level random effects because in many cases there was only one author per group. Instead, to avoid inappropriately narrow confidence bands, we resampled the dataset such that each author was only represented once. For example, if Author A had five papers in this dataset, then one of their five papers was randomly selected. The random resampling was repeated 100 times as a sensitivity analysis (Figure S12).

    For our models, we used generalized additive models from mgcv’s “gam” function in R 49. The smooth terms included all the continuous variables: number of previous papers, academic age, year, time lag, number of authors, number of references, and journal impact factor. The linear terms included all the categorical variables: field, gender affiliation country LMIC status, and document type. We empirically selected a Tweedie distribution 50 with a log link function and p=1.2. The p parameter indicates that the variance is proportional to the mean to the p power 49. The p parameter ranges from 1-2, with p=1 equivalent to the Poisson distribution and p=2 equivalent to the gamma distribution. For all fitted models, we simulated the residuals with the DHARMa package, as standard residual plots may not be appropriate for GAMs 51. DHARMa scales the residuals between 0 and 1 with a simulation-based approach 51. We also tested for deviation from uniformity, dispersion, outliers, and zero inflation with DHARMa. Non-uniformity, dispersion, outliers, and zero inflation were significant due to the large sample size, but small in effect size in most cases. The simulated quantile-quantile plots from DHARMa suggested that the observed and simulated distributions were generally aligned, with the exception of slight misalignment in the models for the number of previous papers. These analyses are presented in Figure S11 and Table S7.

    In addition, we tested for inadequate basis functions using mgcv’s “gam.check()” function 49. Across all smooth predictors and models, we ultimately selected between 10-20 basis functions depending on the variable and outcome measure (counts, rates, papers). We further checked the concurvity of the models and ensured that the worst-case concurvity for all smooth predictors was about 0.8 or less.”

    The direction of our results primarily stayed the same, with the exception of gender results. Men tended to self-cite slightly less (or equal self-citation rates) after accounting for numerous covariates. As such, we also modeled the number of previous papers to explain the discrepancy between our raw data and the modeled gender results. Please find the updated results text below (page 11, line 316):

    “2.9 Exploring effects of covariates with generalized additive models

    Investigating the raw trends and group differences in self-citation rates is important, but several confounding factors may explain some of the differences reported in previous sections. For instance, gender differences in self-citation were previously attributed to men having a greater number of prior papers available to self-cite 7,20,21. As such, covarying for various author- and article-level characteristics can improve the interpretability of self-citation rate trends. To allow for inclusion of author-level characteristics, we only consider First Author and Last Author self-citation in these models.

    We used generalized additive models (GAMs) to model the number and rate of self-citations for First Authors and Last Authors separately. The data were randomly subsampled so that each author only appeared in one paper. The terms of the model included several article characteristics (article year, average time lag between article and all cited articles, document type, number of references, field, journal impact factor, and number of authors), as well as author characteristics (academic age, number of previous papers, gender, and whether their affiliated institution is in a low- and middle-income country). Model performance (adjusted R2) and coefficients for parametric predictors are shown in Table 2. Plots of smooth predictors are presented in Figure 6.

    First, we considered several career and temporal variables. Consistent with prior works 20,21, self-citation rates and counts were higher for authors with a greater number of previous papers. Self-citation counts and rates increased rapidly among the first 25 published papers but then more gradually increased. Early in the career, increasing academic age was related to greater self-citation. There was a small peak at about five years, followed by a small decrease and a plateau. We found an inverted U-shaped trend for average time lag and self-citations, with self-citations peaking approximately three years after initial publication. In addition, self-citations have generally been decreasing since 2000. The smooth predictors showed larger decreases in the First Author model relative to the Last Author model (Figure 6).

    Then, we considered whether authors were affiliated with an institution in a low- and middle-income country (LMIC). LMIC status was determined by the Organisation for Economic Co-operation and Development. We opted to use LMIC instead of affiliation country or continent to reduce the number of model terms. We found that papers from LMIC institutions had significantly lower self-citation counts (-0.138 for First Authors, -0.184 for Last Authors) and rates (-12.7% for First Authors, -23.7% for Last Authors) compared to non-LMIC institutions. Additional results with affiliation continent are presented in Table S5. Relative to the reference level of Asia, higher self-citations were associated with Africa (only three of four models), the Americas, Europe, and Oceania.

    Among paper characteristics, a greater number of references was associated with higher self-citation counts and lower self-citation rates (Figure 6). Interestingly, self-citations were greater for a small number of authors, though the effect diminished after about five authors. Review articles were associated with lower self-citation counts and rates. No clear trend emerged between self-citations and journal impact factor. In an analysis by field, despite the raw results suggesting that self-citation rates were lower in Neuroscience, GAM-derived self-citations were greater in Neuroscience than in Psychiatry or Neurology.

    Finally, our results aligned with previous findings of nearly equivalent self-citation rates for men and women after including covariates, even showing slightly higher self-citation rates in women. Since raw data showed evidence of a gender difference in self-citation that emerges early in the career but dissipates with seniority, we incorporated two interaction terms: one between gender and academic age and a second between gender and the number of previous papers. Results remained largely unchanged with the interaction terms (Table S6).

    2.10 Reconciling differences between raw data and models

    The raw and GAM-derived data exhibited some conflicting results, such as for gender and field of research. To further study covariates associated with this discrepancy, we modeled the publication history for each author (at the time of publication) in our dataset (Table 2). The model terms included academic age, article year, journal impact factor, field, LMIC status, gender, and document type. Notably, Neuroscience was associated with the fewest number of papers per author. This explains how authors in Neuroscience could have the lowest raw self-citation rates but the highest self-citation rates after including covariates in a model. In addition, being a man was associated with about 0.25 more papers. Thus, gender differences in self-citation likely emerged from differences in the number of papers, not in any self-citation practices.”

    (2) The discussion of the data structure used in the regression models is somewhat opaque, both in the main text and the supplement. From what I gather, these models likely have each citation included in the model at least once (perhaps twice, once for first-author status and one for last-author status), with citations nested within citing papers, cited papers, and authors. Without inclusion of random effects, the interpretation and inference of the estimates may be misleading.

    Please see our response to point (1) to address random effects. We have also switched to GAMs (see point #3 below) and provided more detail in the methods. Notably, we decided against using author-level effects due to poor model stability, as there can be as few as one author per group. Instead, we subsampled the dataset such that only one paper appeared from each author.

    (3) I am concerned that the use of the inverse hyperbolic sine transform is a bit too prescriptive, and may be producing poor fits to the true predictor-outcome relationships. For example, in a figure like Fig S8, it is hard to know to what extent the sharp drop and sign reversal are true reflections of the data, and to what extent they are artifacts of the transformed fit.

    Thank you for raising this point. We have now switched to using generalized additive models (GAMs). GAMs provide a flexible approach to modeling that does not require transformations. We described this in detail in point (1) above and in Methods 4.10 Exploring effects of covariates with generalized additive models (page 21, line 754).

    “4.10 Exploring effects of covariates with generalized additive models

    For these analyses, we used the full dataset size separately for First and Last Authors (Table S2). This included 115,205 articles and 5,794,926 citations for First Authors, and 114,622 articles and 5,801,367 citations for Last Authors. We modeled self-citation counts, self-citation rates, and number of previous papers for First Authors and Last Authors separately, resulting in six total models.

    We found that models could be computationally intensive and unstable when including author-level random effects because in many cases there was only one author per group. Instead, to avoid inappropriately narrow confidence bands, we resampled the dataset such that each author was only represented once. For example, if Author A had five papers in this dataset, then one of their five papers was randomly selected. The random resampling was repeated 100 times as a sensitivity analysis (Figure S12).

    For our models, we used generalized additive models from mgcv’s “gam” function in R 48. The smooth terms included all the continuous variables: number of previous papers, academic age, year, time lag, number of authors, number of references, and journal impact factor. The linear terms included all the categorical variables: field, gender affiliation country LMIC status, and document type. We empirically selected a Tweedie distribution 49 with a log link function and p=1.2. The p parameter indicates that the variance is proportional to the mean to the p power 48. The p parameter ranges from 1-2, with p=1 equivalent to the Poisson distribution and p=2 equivalent to the gamma distribution. For all fitted models, we simulated the residuals with the DHARMa package, as standard residual plots may not be appropriate for GAMs 50. DHARMa scales the residuals between 0 and 1 with a simulation-based approach 50. We also tested for deviation from uniformity, dispersion, outliers, and zero inflation with DHARMa. Non-uniformity, dispersion, outliers, and zero inflation were significant due to the large sample size, but small in effect size in most cases. The simulated quantile-quantile plots from DHARMa suggested that the observed and simulated distributions were generally aligned, with the exception of slight misalignment in the models for the number of previous papers. These analyses are presented in Figure S11 and Table S7.

    In addition, we tested for inadequate basis functions using mgcv’s “gam.check()” function 48. Across all smooth predictors and models, we ultimately selected between 10-20 basis functions depending on the variable and outcome measure (counts, rates, papers). We further checked the concurvity of the models and ensured that the worst-case concurvity for all smooth predictors was about 0.8 or less.”

    (4) It seems there are several points in the analysis where papers may have been dropped for missing data (e.g., missing author IDs and/or initials, missing affiliations, low-confidence gender assessment). It would be beneficial for the reader to know what % of the data was dropped for each analysis, and for comparisons across countries it would be important for the authors to make sure that there is not differential missing data that could affect the interpretation of the results (e.g., differences in self-citation being due to differences in Scopus ID coverage).

    Thank you for raising this important point. In the methods section, we describe how the data are missing (page 18, line 623):

    “4.3 Data exclusions and missingness

    Data were excluded across several criteria: missing covariates, missing citation data, out-of-range values at the citation pair level, and out-of-range values at the article level (Table 3). After downloading the data, our dataset included 157,287 articles and 8,438,733 citations. We excluded any articles with missing covariates (document type, field, year, number of authors, number of references, academic age, number of previous papers, affiliation country, gender, and journal). Of the remaining articles, we dropped any for missing citation data (e.g., cannot identify whether a self-citation is present due to lack of data). Then, we removed citations with unrealistic or extreme values. These included an academic age of less than zero or above 38/44 for First/Last Authors (99th percentile); greater than 266/522 papers for First/Last Authors (99th percentile); and a cited year before 1500 or after 2023. Subsequently, we dropped articles with extreme values that could contribute to poor model stability. These included greater than 30 authors; fewer than 10 references or greater than 250 references; and a time lag of greater than 17 years. These values were selected to ensure that GAMs were stable and not influenced by a small number of extreme values.

    In addition, we evaluated whether the data were not missing at random (Table S8). Data were more likely to be missing for reviews relative to articles, for Neurology relative to Neuroscience or Psychiatry, in works from Africa relative to the other continents, and for men relative to women. Scopus ID coverage contributed in part to differential missingness. However, our exclusion criteria also contribute. For example, Last Authors with more than 522 papers were excluded to help stabilize our GAMs. More men fit this exclusion criteria than women.”

    Due to differential missingness, we wrote in the limitations (page 16, line 529):

    “Ninth, data were differentially missing (Table S8) due to Scopus coverage and gender estimation. Differential missingness could bias certain results in the paper, but we hope that the dataset is large enough to reduce any potential biases.”

    Reviewer #2 (Public Review):

    The authors provide a comprehensive investigation of self-citation rates in the field of Neuroscience, filling a significant gap in existing research. They analyze a large dataset of over 150,000 articles and eight million citations from 63 journals published between 2000 and 2020. The study reveals several findings. First, they state that there is an increasing trend of self-citation rates among first authors compared to last authors, indicating potential strategic manipulation of citation metrics. Second, they find that the Americas show higher odds of self-citation rates compared to other continents, suggesting regional variations in citation practices. Third, they show that there are gender differences in early-career self-citation rates, with men exhibiting higher rates than women. Lastly, they find that self-citation rates vary across different subfields of Neuroscience, highlighting the influence of research specialization. They believe that these findings have implications for the perception of author influence, research focus, and career trajectories in Neuroscience.

    Overall, this paper is well written, and the breadth of analysis conducted by authors, with various interactions between variables (eg. gender vs. seniority), shows that the authors have spent a lot of time thinking about different angles. The discussion section is also quite thorough. The authors should also be commended for their efforts in the provision of code for the public to evaluate their own self-citations. That said, here are some concerns and comments that, if addressed, could potentially enhance the paper:

    Thank you for your review and your generally positive view of our work.

    (1) There are concerns regarding the data used in this study, specifically its bias towards top journals in Neuroscience, which limits the generalizability of the findings to the broader field. More specifically, the top 63 journals in neuroscience are based on impact factor (IF), which raises a potential issue of selection bias. While the paper acknowledges this as a limitation, it lacks a clear justification for why authors made this choice. It is also unclear how the "top" journals were identified as whether it was based on the top 5% in terms of impact factor? Or 10%? Or some other metric? The authors also do not provide the (computed) impact factors of the journals in the supplementary.

    We apologize for the lack of clarity about our selection of journals. We agree that there are limitations to selecting higher impact journals. However, we needed to apply some form of selection in order to make the analysis manageable. For instance, even these 63 journals include over five million citations. We better describe our rationale behind the approach as follows (page 17, line 578):

    “We collected data from the 25 journals with the highest impact factors, based on Web of Science impact factors, in each of Neurology, Neuroscience, and Psychiatry. Some journals appeared in the top 25 list of multiple fields (e.g., both Neurology and Neuroscience), so 63 journals were ultimately included in our analysis. We recognize that limiting the journals to the top 25 in each field also limits the generalizability of the results. However, there are tradeoffs between breadth of journals and depth of information. For example, by limiting the journals to these 63, we were able to look at 21 years of data (2000-2020). In addition, the definition of fields is somewhat arbitrary. By restricting the journals to a set of 63 well-known journals, we ensured that the journals belonged to Neurology, Neuroscience, or Psychiatry research. It is also important to note that the impact factor of these journals has not necessarily always been high. For example, Acta Neuropathologica had an impact factor of 17.09 in 2020 but 2.45 in 2000. To further recognize the effects of impact factor, we decided to include an impact factor term in our models.”

    In addition, we have now provided the 2020 impact factors in Table S1.

    By exclusively focusing on high impact journals, your analysis may not be representative of the broader landscape of self-citation patterns across the neuroscience literature, which is what the title of the article claims to do.

    We agree that this article is not indicative of all neuroscience literature, but rather the top journals. Thus, we have changed the title to: “Trends in Self-citation Rates in High-impact Neurology, Neuroscience, and Psychiatry Journals”. We would also like to note that compared to previous bibliometrics works in neuroscience (Bertolero et al. 2020; Dworkin et al. 2020; Fulvio et al. 2021), this article includes a wider range of data.

    (2) One other concern pertains to the possibility that a significant number of authors involved in the paper may not be neuroscientists. It is plausible that the paper is a product of interdisciplinary collaboration involving scientists from diverse disciplines. Neuroscientists amongst the authors should be identified.

    In our opinion, neuroscience is a broad, interdisciplinary field. Individuals performing neuroscience research may have a neuroscience background. Yet, they may come from many backgrounds, such as physics, mathematics, biology, chemistry, or engineering. As such, we do not believe that it is feasible to characterize whether each author considers themselves a neuroscientist or not. We have added the following to the limitations section (page 16, line 528):

    “Eighth, authors included in this work may not be neurologists, neuroscientists, or psychiatrists. However, they still publish in journals from these fields.”

    (3) When calculating self-citation rate, it is important to consider the number of papers the authors have published to date. One plausible explanation for the lower self-citation rates among first authors could be attributed to their relatively junior status and short publication record. As such, it would also be beneficial to assess self-citation rate as a percentage relative to the author's publication history. This number would be more accurate if we look at it as a percentage of their publication history. My suspicion is that first authors (who are more junior) might be more likely to self-cite than their senior counterparts. My suspicion was further raised by looking at Figures 2a and 3. Considering the nature of the self-citation metric employed in the study, it is expected that authors with a higher level of seniority would have a greater number of publications. Consequently, these senior authors' papers are more likely to be included in the pool of references cited within the paper, hence the higher rate.

    While the authors acknowledge the importance of the number of past publications in their gender analysis, it is just as important to include the interplay of seniority in (1) their first and last author self-citation rates and (2) their geographic analysis.

    Thank you for this thoughtful comment. We agree that seniority and prior publication history play an important role in self-citation rates.

    For comparing First/Last Author self-citation rates, we have now included a plot similar to Figure 2a, where self-citation as a percentage of prior publication history is plotted.

    (page 4, line 161): “Analyzing self-citations as a fraction of publication history exhibited a similar trend (Figure S3). Notably, First Authors were more likely than Last Authors to self-cite when normalized by prior publication history.

    For the geographic analysis, we made two new maps: 1) that of the number of previous papers, and 2) that of the journal impact factor (see response to point #4 below).

    (page 5, line 185): “We also investigated the distribution of the number of previous papers and journal impact factor across countries (Figure S4). Self-citation maps by country were highly correlated with maps of the number of previous papers (Spearman’s r=0.576, P=4.1e-4; 0.654, P=1.8e-5 for First and Last Authors). They were significantly correlated with maps of average impact factor for Last Authors (0.428, P=0.014) but not Last Authors (Spearman’s r=0.157, P=0.424). Thus, further investigation is necessary with these covariates in a comprehensive model.”

    Finally, we included a model term for the number of previous papers (Table 2). We analyzed this both for self-citation counts and self-citation rates and found a strong relationship between publication history and self-citations. We also included the following section where we modeled the number of previous papers for each author (page 13, line 384):

    “2.10 Reconciling differences between raw data and models

    The raw and GAM-derived data exhibited some conflicting results, such as for gender and field of research. To further study covariates associated with this discrepancy, we modeled the publication history for each author (at the time of publication) in our dataset (Table 2). The model terms included academic age, article year, journal impact factor, field, LMIC status, gender, and document type. Notably, Neuroscience was associated with the fewest number of papers per author. This explains how authors in Neuroscience could have the lowest raw self-citation rates but the highest self-citation rates after including covariates in a model. In addition, being a man was associated with about 0.25 more papers. Thus, gender differences in self-citation likely emerged from differences in the number of papers, not in any self-citation practices.”

    (4) Because your analysis is limited to high impact journals, it would be beneficial to see the distribution of the impact factors across the different countries. Otherwise, your analysis on geographic differences in self-citation rates is hard to interpret. Are these differences really differences in self-citation rates, or differences in journal impact factor? It would be useful to look at the representation of authors from different countries for different impact factors.

    We made a map of this in Figure S4 (see our response to point #3 above).

    (page 5, line 185): “We also investigated the distribution of the number of previous papers and journal impact factor across countries (Figure S4). Self-citation maps by country were highly correlated with maps of the number of previous papers (Spearman’s r=0.576, P=4.1e-4; 0.654, P=1.8e-5 for First and Last Authors). They were significantly correlated with maps of average impact factor for Last Authors (0.428, P=0.014) but not Last Authors (Spearman’s r=0.157, P=0.424). Thus, further investigation is necessary with these covariates in a comprehensive model.”

    We also included impact factor as a term in our model. The results suggest that there are still geographic differences (Table 2, Table S5).

    (5) The presence of self-citations is not inherently problematic, and I appreciate the fact that authors omit any explicit judgment on this matter. That said, without appropriate context, self-citations are also not the best scholarly practice. In the analysis on gender differences in self-citations, it appears that authors imply an expectation of women's self-citation rates to align with those of men. While this is not explicitly stated, use of the word "disparity", and also presentation of self-citation as an example of self-promotion in discussion suggest such a perspective. Without knowing the context in which the self-citation was made, it is hard to ascertain whether women are less inclined to self-promote or that men are more inclined to engage in strategic self-citation practices.

    We agree that on the level of an individual self-citation, our study is not useful for determining how related the papers are. Yet, understanding overall trends in self-citation may help to identify differences. Context is important, but large datasets allow us to investigate broad trends. We added the following text to the limitations section (page 16, line 524):

    “In addition, these models do not account for whether a specific citation is appropriate, as some situations may necessitate higher self-citation rates.”

    Reviewer #3 (Public Review):

    This paper analyses self-citation rates in the field of Neuroscience, comprising in this case, Neurology, Neuroscience and Psychiatry. Based on data from Scopus, the authors identify self-citations, that is, whether references from a paper by some authors cite work that is written by one of the same authors. They separately analyse this in terms of first-author self-citations and last-author self-citations. The analysis is well-executed and the analysis and results are written down clearly. There are some minor methodological clarifications needed, but more importantly, the interpretation of some of the results might prove more challenging. That is, it is not always clear what is being estimated, and more importantly, the extent to which self-citations are "problematic" remains unclear.

    Thank you for your review. We attempted to improve the interpretation of results, as described in the following responses.

    When are self-citations problematic? As the authors themselves also clarify, "self-citations may often be appropriate". Researchers cite their own previous work for perfectly good reasons, similar to reasons of why they would cite work by others. The "problem", in a sense, is that researchers cite their own work, just to increase the citation count, or to promote their own work and make it more visible. This self-promotional behaviour might be incentivised by certain research evaluation procedures (e.g. hiring, promoting) that overly emphasise citation performance. However, the true problem then might not be (self-)citation practices, but instead, the flawed research evaluation procedures that emphasis citation performance too much. So instead of problematising self-citation behaviour, and trying to address it, we might do better to address flawed research evaluation procedures. Of course, we should expect references to be relevant, and we should avoid self-promotional references, but addressing self-citations may just have minimal effects, and would not solve the more fundamental issue.

    We agree that this dataset is not designed to investigate the downstream effects of self-citations. However, self-citation practices are more likely to be problematic when they differ across specific groups. This work can potentially spark more interest in future longitudinal designs to investigate whether differences in self-citation practices leads to differences in career outcomes, for example. We added the following text to clarify (page 17, line 565):

    “Yet, self-citation practices become problematic when they are different across groups or are used to “game the system.” Future work should investigate the downstream effects of self-citation differences to see whether they impact the career trajectories of certain groups. We hope that this work will help to raise awareness about factors influencing self-citation practices to better inform authors, editors, funding agencies, and institutions in Neurology, Neuroscience, and Psychiatry.”

    Some other challenges arise when taking a statistical perspective. For any given paper, we could browse through the references, and determine whether a particular reference would be warranted or not. For instance, we could note that there might be a reference included that is not at all relevant to the paper. Taking a broader perspective, the irrelevant reference might point to work by others, included just for reasons of prestige, so-called perfunctory citations. But it could of course also include self-citations. When we simply start counting all self-citations, we do not see what fraction of those self-citations would be warranted as references. The question then emerges, what level of self-citations should be counted as "high"? How should we determine that? If we observe differences in self-citation rates, what does it tell us?

    Our focus is when the self-citation practices differ across groups. We agree that, on a case-by-case basis, there is no exact number for a self-citation rate that is “high.” With a dataset of the current size, evaluating whether each individual self-citation is appropriate is not feasible. If we observe differences in self-citation rate, this may tell us about broad (not individual-level) trends and differences in self-citing practice. If one group is self-citing much more highly compared to another group–even after covarying relevant variables such as prior publication history–then the self-citation differences can likely be attributed to differences in self-citation practices/behaviors.

    For example, the authors find that the (any author) self-citation rate in Neuroscience is 10.7% versus 15.9% in Psychiatry. What does this difference mean? Are psychiatrists citing themselves more often than neuroscientists? First author men showed a self-citation rate of 5.12% versus a self-citation rate of 3.34% of women first authors. Do men engage in more problematic citation behaviour? Junior researchers (10-year career) show a self-citation rate of about 5% compared to a self-citation rate of about 10% for senior researchers (30-year career). Are senior researchers therefore engaging in more problematic citation behaviour? The answer is (most likely) "no", because senior authors have simply published more, and will therefore have more opportunities to refer to their own work. To be clear: the authors are aware of this, and also take this into account. In fact, these "raw" various self-citation rates may, as the authors themselves say, "give the illusion" of self-citation rates, but these are somehow "hidden" by, for instance, career seniority.

    We included numerous covariates in our model. In addition, to address the difference between “raw” and “modeled” self-citation rates, we added the following section (page 13, line 384):

    “2.10 Reconciling differences between raw data and models

    The raw and GAM-derived data exhibited some conflicting results, such as for gender and field of research. To further study covariates associated with this discrepancy, we modeled the publication history for each author (at the time of publication) in our dataset (Table 2). The model terms included academic age, article year, journal impact factor, field, LMIC status, gender, and document type. Notably, Neuroscience was associated with the fewest number of papers per author. This explains how authors in Neuroscience could have the lowest raw self-citation rates but the highest self-citation rates after including covariates in a model. In addition, being a man was associated with about 0.25 more papers. Thus, gender differences in self-citation likely emerged from differences in the number of papers, not in any self-citation practices.”

    Again, the authors do consider this, and "control" for career length and number of publications, et cetera, in their regression model. Some of the previous observations then change in the regression model. Neuroscience doesn't seem to be self-citing more, there just seem to be junior researchers in that field compared to Psychiatry. Similarly, men and women don't seem to show an overall different self-citation behaviour (although the authors find an early-career difference), the men included in the study simply have longer careers and more publications.

    But here's the key issue: what does it then mean to "control" for some variables? This doesn't make any sense, except in the light of causality. That is, we should control for some variable, such as seniority, because we are interested in some causal effect. The field may not "cause" the observed differences in self-citation behaviour, this is mediated by seniority. Or is it confounded by seniority? Are the overall gender differences also mediated by seniority? How would the selection of high-impact journals "bias" estimates of causal effects on self-citation? Can we interpret the coefficients as causal effects of that variable on self-citations? If so, would we try to interpret this as total causal effects, or direct causal effects? If they do not represent causal effects, how should they be interpreted then? In particular, how should it "inform author, editors, funding agencies and institutions", as the authors say? What should they be informed about?

    We apologize for our misuse of language. We will be more clear, as in most previous self-citation papers, that our analysis is NOT causal. Causal datasets do have some benefits in citation research, but a limitation is that they may not cover as wide of a range of authors. Furthermore, non-causal correlational studies can still be useful in informing authors, editors, funding agencies, and institutions. Association studies are widely used across various fields to draw non-causal conclusions. We made numerous changes to reduce our causal language.

    Before: “We then developed a probability model of self-citation that controls for numerous covariates, which allowed us to obtain significance estimates for each variable of interest.”

    After (page 3, line 113): “We then developed a probability model of self-citation that includes numerous covariates, which allowed us to obtain significance estimates for each variable of interest.”

    Before: “As such, controlling for various author- and article-level characteristics can improve the interpretability of self-citation rate trends.”

    After (page 11, line 321): “As such, covarying various author- and article-level characteristics can improve the interpretability of self-citation rate trends.”

    Before: “Initially, it appeared that self-citation rates in Neuroscience are lower than Neurology and Psychiatry, but after controlling for various confounds, the self-citation rates are higher in Neuroscience.”

    After (page 15, line 468): “Initially, it appeared that self-citation rates in Neuroscience are lower than Neurology and Psychiatry, but after considering several covariates, the self-citation rates are higher in Neuroscience.”

    We also added the following text to the limitations section (page 16, line 526):

    “Seventh, the analysis presented in this work is not causal. Association studies are advantageous for increasing sample size, but future work could investigate causality in curated datasets.”

    The authors also "encourage authors to explore their trends in self-citation rates". It is laudable to be self-critical and review ones own practices. But how should authors interpret their self-citation rate? How useful is it to know whether it is 5%, 10% or 15%? What would be the "reasonable" self-citation rate? How should we go about constructing such a benchmark rate? Again, this would necessitate some causal answer. Instead of looking at the self-citation rate, it would presumably be much more informative to simply ask authors to check whether references are appropriate and relevant to the topic at hand.

    We believe that our tool is valuable for authors to contextualize their own self-citation rates. For instance, if an author has published hundreds of articles, it is not practical to count the number of self-citations in each. We have added two portions of text to the limitations section:

    (page 16, line 524): “In addition, these models do not account for whether a specific citation is appropriate, though some situations may necessitate higher self-citation rates.”

    (page 16, line 535): “Despite these limitations, we found significant differences in self-citation rates for various groups, and thus we encourage authors to explore their trends in self-citation rates. Self-citation rates that are higher than average are not necessarily wrong, but suggest that authors should further reflect on their current self-citation practices.”

    In conclusion, the study shows some interesting and relevant differences in self-citation rates. As such, it is a welcome contribution to ongoing discussions of (self) citations. However, without a clear causal framework, it is challenging to interpret the observed differences.

    We agree that causal studies provide many benefits. Yet, association studies also provide many benefits. For example, an association study allowed us to analyze a wider range of articles than a causal study would have.

    Recommendations for the authors:

    Reviewer #1 (Recommendations For The Authors):

    Statistical suggestions:

    (1) To improve statistical inference, nesting should be accounted for in all of the analyses. For example, the logistic regression model using citing/cited pairs should include random effects for article, author, and perhaps subfield, in order for independence of observations to be plausible. Similarly, bootstrapping and permutation would ideally occur at the author level rather than (or in addition to) the paper level.

    Detailed updates addressing these points are in the public review. In short, we found computational challenges with many levels of the random effects (>100,000) and millions of observations at the citation pairs level. As such, we decided to model citations rates and counts by paper. In this case, we found that results could be unstable when including author-level random effects because in many cases there was only one author per group. Instead, to avoid inappropriately narrow confidence bands, we resampled the dataset such that each author was only represented once. For example, if Author A had five papers in this dataset, then one of their five papers was randomly selected. We repeated the random resampling 100 times (Figure S12). We updated our description of our models in the Methods section (page 21, line 754).

    For permutation tests and bootstrapping, we now define an “exchangeability block” as a co-authorship group of authors. In this dataset, that meant any authors who published together (among the articles in this dataset) as a First Author / Last Author pairing were assigned to the same exchangeability block. It is not realistic to check for overlapping middle authors in all papers because of the collaborative nature of the field. In addition, we believe that self-citations are primarily controlled by first and last authors, so we can assume that middle authors do not control self-citation habits. We then performed bootstrapping and permutation tests in the constraints of the exchangeability blocks.

    (2) In general, I am having trouble understanding the structure of the regression models. My current belief is that rows are composed of individual citations from papers' reference lists, with the outcome representing their status as a self-citation or not, and with various citing article and citing author characteristics as predictors. However, the fact that author type is included in the model as a predictor (rather than having a model for FA self-citations and another for LA self-citations) suggests to me that each citation is entered as two separate rows - once noting whether it was a FA self-citation and once noting whether it was an LA self-citation - and then it is run as a single model.

    (2a) If I am correct, the model is unlikely to be producing valid inference. I would recommend breaking this analysis up into two separate models, and including article-, author-, and subfield-level random effects. You could theoretically include a citation-level random effect and keep it as one model, but each 'group' would only have two observations and the model would be fairly unstable as a result.

    (2b) If I am misunderstanding (and even if not), I would encourage you to provide a more detailed description of the dataset structure and the model - perhaps with a table or diagram

    We split the data into two models and decided to model on the level of a paper (self-citation rate and self-citation count). In addition, we subsampled the dataset such that each author only appears once to avoid misestimation of confidence intervals (see point (1) above). As described in the public review, we included much more detail in our methods section now to improve the clarity of our models.

    (3) I would suggest removing the inverse hyperbolic sine transform and replacing it with a more flexible approach to estimating the relationships' shape, like generalized additive models or other spline-based methods to ensure that the chosen method is appropriate - or at the very least checking that it is producing a realistic fit that reflects the underlying shape of the relationships.

    More details are available in the public review, but we now use GAMs throughout the manuscript.

    (4) For the "highly self-citing" analysis, it is unclear why papers in the 15-25% range were dropped rather than including them as their own category in an ordinal model. I might suggest doing the latter, or explaining the decision more fully

    We previously included this analysis as a paper-level model because our main model was at the level of citation pairs. Now, we removed this analysis because we model self-citation rates and counts by paper.

    (5) It would be beneficial for the reader to know what % of the data was dropped for each analysis, and for your team to make sure that there is not differential missing data that could affect the interpretation of the results (e.g., differences in self-citation being due to differences in Scopus ID coverage).

    Thank you for this suggestion. We added more detailed missingness data to 4.3 Data exclusions and missingness. We did find differential missingness and added it to the limitations section. However, certain aspects of this cannot be corrected because the data are just not available (e.g., Scopus coverage issues). Further details are available in the public review.

    Conceptual thoughts:

    (1) I agree with your decision to focus on the second definition of self-citation (self-cites relative to my citations to others' work) rather than the first (self-cites relative to others' citations to my work). But it does seem that the first definition is relevant in the context of gaming citation metrics. For example, someone who writes one paper per year with a reference list of 30% self-citations will have much less of an impact on their H-index than someone who writes 10 papers per year with 10% self-citations. It could be interesting to see how these definitions interact, and whether people who are high on one measure tend to be high on the other.

    We agree this would be interesting to investigate in the future. Unfortunately, our dataset is organized at the level of the paper and thus does not contain information regarding how many times the authors cite a particular work. We hope that we can explore this interaction in the future.

    (2) This is entirely speculative, but I wonder whether the increasing rate of LA self-citation relative to FA self-citation is partly due to PIs over-citing their own lab to build up their trainees' citation records and help them succeed in an increasingly competitive job market. This sounds more innocuous than doing it to benefit their own reputation, but it would provide another mechanism through which students from large and well-funded labs get a leg-up in the job market. Might be interesting to explore, though I'm not exactly sure how :)

    This is a very interesting point. We do not have any means to investigate this with the current dataset, but we added it to the discussion (page 14, line 421):

    “A third, more optimistic explanation is that principal investigators (typically Last Authors) are increasingly self-citing their lab’s papers to build up their trainee’s citation records for an increasingly competitive job market.”

    Reviewer #2 (Recommendations For The Authors):

    (1) In regards to point 1 in the public review: In the spirit of transparency, the authors would benefit from providing a rationale for their choice of top journals, and the methodology used to identify them. It would also be valuable to include the impact factor of each journal in the S1 table alongside their names.

    Given the availability and executability of code, it would be useful to see how and if the self-citation trends vary amongst the "low impact" journals (as measured by the IF). This could go in any of the three directions:

    a. If it is found that self-citations are not as prevalent in low impact journals, this could be a great starting point for a conversation around the evaluation of journals based on impact factor, and the role of self-citations in it.

    b. If it is found that self-citations are as prevalent in low impact journals as high impact journals, that just strengthens your results further.

    c. If it is found that self-citations are more prevalent in low impact journals, this would mean your current statistics are a lower bound to the actual problem. This is also intuitive in the sense that high impact journals get more external citations (and more exposure) than low impact journals, as such authors (and journals) may be less likely to self-cite.

    Expanding the dataset to include many more journals was not feasible. Instead, we included an impact factor term in our models, as detailed in the public review. We found no strong trends in the association between impact factor and self-citation rate/count. Another important note is that these journals were considered “high impact” in 2020, but many had lower impact factors in earlier years. Thus, our modeling allows us to estimate how impact factor is related to self-citations across a wide range of impact factors.

    It is crucial to consider utilizing such a comprehensive database as Scopus, which provides a more thorough list of all journals in Neuroscience, to obtain a more representative sample. Alternatively, other datasets like Microsoft Academic Graph, and OpenAlex offer information on the field of science associated with each paper, enabling a more comprehensive analysis.

    We agree that certain datasets may offer a wider view of the entire field. However, we included a large number of papers and journals relative to previous studies. In addition, Scopus provides a lot of detailed and valuable author-level information. We had to limit our calls to the Scopus API so restricted journals by 2020 impact factor.

    (2) In regards to point 2 in the public review: To enhance the accuracy and specificity of the analysis, it would be beneficial to distinguish neuroscientists among the co-authors. This could be accomplished by examining their publication history leading up to the time of publication of the paper, and identify each author's level of engagement and specialization within the field of neuroscience.

    Since the field of neuroscience is largely based on collaborations, we find that it might be impossible to determine who is a neuroscientist. For example, a researcher with a publication history in physics may now be focusing on computational neuroscience research. As such, we feel that our current work, which ensures that the papers belong to neuroscience, is representative of what one may expect in terms of neuroscience research and collaboration.

    (3) In regards to point 3 in the public review: I highly recommend plotting self-citation rate as the number of papers in the reference list over the number of total publications to date of paper publication.

    As described in the public review, we have now done this (Figure S3).

    (4) In regards to point 5 in the public review: It would be useful to consider the "quality" of citations to further the discussion on self-citations. For instance, differentiating between self-citations that are perfunctory and superficial from those that are essential for showing developmental work, would be a valuable contribution.

    Other databases may have access to this information, but ours unfortunately does not. We agree that this is an interesting area of work.

    (5) The authors are to be commended for their logistic regression models, as they control for many confounders that were lacking in their earlier descriptive statistics. However, it would be beneficial to rerun the same analysis but on a linear model whereby the outcome variable would be the number of self-citations per author. This would possibly resolve many of the comments mentioned above.

    Thank you for your suggestion. As detailed in the public review, we now model the number of self-citations. This is modeled on the paper level, not the author level, because our dataset was downloaded by paper, not by author.

    Minor suggestions:

    (1) Abstract says one of your findings is: "increasing self-citation rates of First Authors relative to Last Authors". Your results actually show the opposite (see Figure 1b).

    Thank you for catching this error. We corrected it to match the results and discussion in the paper:

    “…increasing self-citation rates of Last Authors relative to First Authors.”

    (2) It might be interesting to compute an average academic age for each paper, and look at self-citation vs average academic age plot.

    We agree that this would be an interesting analysis. However, to limit calls to the API, we collected academic age data only on First and Last Authors.

    (3) It may be interesting to look at the distribution of women in different subfields within neuroscience, and the interaction of those in the context of self-citations.

    Thank you for this interesting suggestion. We added the following analysis (page 9, line 305):

    “Furthermore, we explored topic-by-gender interactions (Figure S10). In short, men and women were relatively equally represented as First Authors, but more men were Last Authors across all topics. Self-citation rates were higher for men across all topics.”

    Reviewer #3 (Recommendations For The Authors):

    - In the abstract, "flaws in citation practices" seems worded rather strongly.

    We respectfully disagree, as previous works have shown significant bias in citation practices. For example, Dworkin et al. (Dworkin et al. 2020) found that neuroscience reference lists tended to under-cite women, even after including various covariates.

    - Links of the references to point to (non-accessible) paperpile references, you would probably want to update this.

    We apologize for the inconvenience and have now removed these links.

    - p 2, l 24: The explanation of ref. (5) seems to be a bit strangely formulated. The point of that article is that citations to work that reinforce a particular belief are more likely to be cited, which *creates* unfounded authority. The unfounded authority itself is hence no part of the citation practices

    Thank you for catching our misinterpretation. We have now removed this part of the sentence.

    - p 3, l 16: "h indices" or "citations" instead of "h-index".

    We now say “h-indices”.

    - p 5, l 5: how was the manual scoring done?

    We added the following to the caption of Figure S1.

    “Figure S1. Comparison between manual scoring of self-citation rates and self-citation rates estimated from Python scripts in 5 Psychiatry journals: American Journal of Psychiatry, Biological Psychiatry, JAMA Psychiatry, Lancet Psychiatry, and Molecular Psychiatry. 906 articles in total were manually evaluated (10 articles per journal per year from 2000-2020, four articles excluded for very large author list lengths and thus high difficulty of manual scoring). For manual scoring, we downloaded information about all references for a given article and searched for matching author names.”

    - p 5, l 23: Why this specific p-value upper bound of 4e-3? From later in the article, I understand that this stems from the 10000 bootstrap sample, with then taking a Bonferroni correction? Perhaps good to clarify this briefly somewhere.

    Thank you for this suggestion. We now perform Benjamini/Hochberg false discovery rate (FDR) correction, but we added a description of the minimum P value from permutations (page 21, line 748):

    “All P values described in the main text were corrected with the Benjamini/Hochberg 16 false discovery rate (FDR) correction. With 10,000 permutations, the lowest P value after applying FDR correction is P=2.9e-4, which indicates that the true point would be the most extreme in the simulated null distribution.”

    - Fig. 1, caption: The (a) and (b) labelling here is a bit confusing, because the first sentence suggests both figures portray the same, but do so for different time periods. Perhaps rewrite, so that (a) and (b) are both described in a single sentence, instead of having two different references to (a) and (b).

    Thank you for pointing this out. We fixed the labeling of this caption:

    “Figure 1. Visualizing recent self-citation rates and temporal trends. a) Kernel density estimate of the distribution of First Author, Last Author, and Any Author self-citation rates in the last five years. b) Average self-citation rates over every year since 2000, with 95% confidence intervals calculated by bootstrap resampling.”

    - p7, l 9: Regarding "academic age", note that there might be a difference between "age" effects and "cohort" effects. That is, there might be difference between people with a certain career age who started in 1990 and people with the same career age, but who started in 2000, which would be a "cohort" effect.

    We agree that this is a possible effect and have added it to the limitations (page 16, line 532):

    “Tenth, while we considered academic age, we did not consider cohort effects. Cohort effects would depend on the year in which the individual started their career.”

    - p 7, l 15: "jumps" suggests some sort of sudden or discontinuous transition, I would just say "increases".

    We now say “increases.”

    - Fig. 2: Perhaps it should be made more explicit that this includes only academics with at least 50 papers. Could the authors please clarify whether the same limitation of at least 50 papers also features in other parts of the analysis where academic age is used? This selection could affect the outcomes of the analysis, so its consequences should be carefully considered. One possibility for instance is that it selects people with a short career length who have been exceptionally productive, namely those that have had 50 papers, but only started publishing in 2015 or so. Such exceptionally productive people will feature more highly in the early career part, because they need to be so productive in order to make the cut. For people with a longer career, the 50 papers would be less of a hurdle, and so would select more and less productive people more equally.

    We apologize for the lack of clarity. We did not use this requirement where academic age was used. We mainly applied this requirement when aggregating by country, as we did not want to calculate self-citation rate in a country based on only several papers. We have clarified various data exclusions in our new section 4.3 Data exclusions and missingness.

    - p 8, l 11: The affiliated institution of an author is not static, but rather changes throughout time. Did the authors consider this? If not, please clarify that this refers to only the most recent affiliation (presumably). Authors also often have multiple affiliations. How did the authors deal with this?

    The institution information is at the time of publication for each paper. We added more detail to our description of this on page 19, line 656:

    “For both First and Last Authors, we found the country of their institutional affiliation listed on the publication. In the case of multiple affiliations, the first one listed in Scopus was used.”

    - p 10, l 6: How were these self-citation rates calculated? This is averaged per author (i.e. only considering papers assigned to a particular topic) and then averaged across authors? (Note that in this way, the average of an author with many papers will weigh equally with the average of an author with few papers, which might skew some of the results).

    We calculate it across the entire topic (i.e., do NOT calculate by author first). We updated the description as follows (page 7, line 211):

    “We then computed self-citation rates for each of these topics (Figure 4) as the total number of self-citations in each topic divided by the total number of references in each topic…”

    - p 13, l 18: Is the academic age analysis here again limited to authors having at least 50 papers?

    This is not limited to at least 50 papers. To clarify, the previous analysis was not limited to authors with 50 papers. It was instead limited to ages in our dataset that had at least 50 data points. e.g., If an academic age of 70 only had 20 data points in our dataset, it would have been excluded.

    - Fig. 5: Here, comparing Fig. 5(d) and 5(f) suggests that partly, the self-citation rate differences between men and women, might be the result of the differences in number of papers. That is, the somewhat higher self-citation rate at a given academic age, might be the result of the higher number of papers at that academic age. It seems that this is not directly described in this part of the analysis (although this seems to be the case from the later regression analysis).

    We agree with this idea and have added a new section as follows (page 13, line 384):

    “2.10 Reconciling differences between raw data and models

    The raw and GAM-derived data exhibited some conflicting results, such as for gender and field of research. To further study covariates associated with this discrepancy, we modeled the publication history for each author (at the time of publication) in our dataset (Table 2). The model terms included academic age, article year, journal impact factor, field, LMIC status, gender, and document type. Notably, Neuroscience was associated with the fewest number of papers per author. This explains how authors in Neuroscience could have the lowest raw self-citation rates by highest self-citation rates after including covariates in a model. In addition, being a man was associated with about 0.25 more papers. Thus, gender differences in self-citation likely emerged from differences in the number of papers, not in any self-citation practices.”

    - Section 2.10. Perhaps the authors could clarify that this analysis takes individual articles as the unit of analysis, not citations.

    We updated all our models to take individual articles and have clarified this with more detailed tables.

    - p 18, l 10: "Articles with between 15-25% self-citation rates were 10 discarded" Why?

    We agree that these should not be discarded. However, we previously included this analysis as a paper-level model because our main model was at the level of citation pairs. Now, we removed this analysis because we model self-citation rates and counts by paper.

    - p 20, l 5: "Thus, early-career researchers may be less incentivized to 5 self-promote (e.g., self-cite) for academic gains compared to 20 years ago." How about the possibility that there was less collaboration, so that first authors would be more likely to cite their own paper, whereas with more collaboration, they will more often not feature as first author?

    This is an interesting point. We feel that more collaboration would generally lead to even more self-citations, if anything. If an author collaborates more, they are more likely to be on some of the references as a middle author (which by our definition counts toward self-citation rates).

    - p 20, l 15: Here the authors call authors to avoid excessive self-citations. Of course, there's nothing wrong with calling for that, but earlier the authors were more careful to not label something directly as excessive self-citations. Here, by stating it like this, the authors suggest that they have looked at excessive self-citations.

    We rephrased this as follows:

    Before: “For example, an author with 30 years of experience cites themselves approximately twice as much as one with 10 years of experience on average. Both authors have plenty of works that they can cite, and likely only a few are necessary. As such, we encourage authors to be cognizant of their citations and to avoid excessive self-citations.”

    After: “For example, an author with 30 years of experience cites themselves approximately twice as much as one with 10 years of experience on average. Both authors have plenty of works that they can cite, and likely only a few are necessary. As such, we encourage authors to be cognizant of their citations and to avoid unnecessary self-citations.”

    - p 22, l 11: Here again, the same critique as p 20, l15 applies.

    We switched “excessively” to “unnecessarily.”

    - p 23, l 12: The authors here critique ref. (21) of ascertainment bias, namely that they are "including only highly-achieving researchers in the life 12 sciences". But do the authors not do exactly the same thing? That is, they also only focus on the top high-impact journals.

    We included 63 high-impact journals with tens of thousands of authors. In addition, some of these journals were not high-impact at the time of publication. For example, Acta Neuropathologica had an impact factor of 17.09 in 2020 but 2.45 in 2000. This still is a limitation of our work, but we do cover a much broader range of works than the listed reference (though their analysis also has many benefits since it included more detailed information).

    - p 26, l 22-26: It seems that the matching is done quite broadly (matching last names + initials at worst) for self-citations, while later (in section 4.9, p 31, l 9), the authors switch to only matching exact Scopus Author IDs. Why not use the same approach throughout? Or compare the two definitions (narrow / broad).

    Thank you for catching this mistake. We now use the approach of matching Scopus Author IDs throughout.

    - S8: it might be nice to explore open alternatives, such as OpenAlex or OpenAIRE, instead of the closed Scopus database, which requires paid access (which not all institutions have, perhaps that could also be corrected in the description in GitHub).

    Thank you for this suggestion. Unfortunately, switching databases would require starting our analysis from the beginning. On our GitHub page, we state: “Please email matthew.rosenblatt@yale.edu if you have trouble running this or do not have institutional access. We can help you run the code and/or run it for you and share your self-citation trends.” We feel that this will allow us to help researchers who may not have institutional access. In addition, we released our aggregated, de-identified (title and paper information removed) data on GitHub for other researchers to use.

  6. eLife assessment

    This study examines how self-citations in the neuroscience literature differ according to geography, gender, seniority, and subfield within neuroscience. The evidence supporting the claims is mostly solid, but aspects of the analysis - notably concerning estimates of uncertainty, and the exact interpretation of the results - would benefit from further work. Overall, the article is a valuable addition to the literature on self-citations

  7. Reviewer #1 (Public Review):

    In this manuscript, the authors use a large dataset of neuroscience publications to elucidate the nature of self-citation within the neuroscience literature. The authors initially present descriptive measures of self-citation across time and author characteristics; they then produce an inclusive model to tease apart the potential role of various article and author features in shaping self-citation behavior. This is a valuable area of study, and the authors approach it with an appropriate and well-structured dataset.

    The study's descriptive analyses and figures are useful and will be of interest to the neuroscience community. However, with regard to the statistical comparisons and regression models, I believe that there are methodological flaws that may limit the validity of the presented results. These issues primarily affect the uncertainty of estimates and the statistical inference made on comparisons and model estimates - the fundamental direction and magnitude of the results are unlikely to change in most cases. I have included detailed statistical comments below for reference.

    Conceptually, I think this study will be very effective at providing context and empirical evidence for a broader conversation around self-citation. And while I believe that there is room for a deeper quantitative dive into some finer-grained questions, this paper will be a valuable catalyst for new areas of inquiry around citation behavior - e.g., do authors change self-citation behavior when they move to more or less prestigious institutions? do self-citations in neuroscience benefit downstream citation accumulation? do journals' reference list policies increase or decrease self-citation? - that I hope that the authors (or others) consider exploring in future work.

    Statistical comments:

    (1) Throughout the paper, the nested nature of the data does not seem to be appropriately handled in the bootstrapping, permutation inference, and regression models. This is likely to lead to inappropriately narrow confidence bands and overly generous statistical inference.

    (2) The discussion of the data structure used in the regression models is somewhat opaque, both in the main text and the supplement. From what I gather, these models likely have each citation included in the model at least once (perhaps twice, once for first-author status and one for last-author status), with citations nested within citing papers, cited papers, and authors. Without inclusion of random effects, the interpretation and inference of the estimates may be misleading.

    (3) I am concerned that the use of the inverse hyperbolic sine transform is a bit too prescriptive, and may be producing poor fits to the true predictor-outcome relationships. For example, in a figure like Fig S8, it is hard to know to what extent the sharp drop and sign reversal are true reflections of the data, and to what extent they are artifacts of the transformed fit.

    (4) It seems there are several points in the analysis where papers may have been dropped for missing data (e.g., missing author IDs and/or initials, missing affiliations, low-confidence gender assessment). It would be beneficial for the reader to know what % of the data was dropped for each analysis, and for comparisons across countries it would be important for the authors to make sure that there is not differential missing data that could affect the interpretation of the results (e.g., differences in self-citation being due to differences in Scopus ID coverage).

  8. Reviewer #2 (Public Review):

    The authors provide a comprehensive investigation of self-citation rates in the field of Neuroscience, filling a significant gap in existing research. They analyze a large dataset of over 150,000 articles and eight million citations from 63 journals published between 2000 and 2020. The study reveals several findings. First, they state that there is an increasing trend of self-citation rates among first authors compared to last authors, indicating potential strategic manipulation of citation metrics. Second, they find that the Americas show higher odds of self-citation rates compared to other continents, suggesting regional variations in citation practices. Third, they show that there are gender differences in early-career self-citation rates, with men exhibiting higher rates than women. Lastly, they find that self-citation rates vary across different subfields of Neuroscience, highlighting the influence of research specialization. They believe that these findings have implications for the perception of author influence, research focus, and career trajectories in Neuroscience.

    Overall, this paper is well written, and the breadth of analysis conducted by authors, with various interactions between variables (eg. gender vs. seniority), shows that the authors have spent a lot of time thinking about different angles. The discussion section is also quite thorough. The authors should also be commended for their efforts in the provision of code for the public to evaluate their own self-citations. That said, here are some concerns and comments that, if addressed, could potentially enhance the paper:

    1. There are concerns regarding the data used in this study, specifically its bias towards top journals in Neuroscience, which limits the generalizability of the findings to the broader field. More specifically, the top 63 journals in neuroscience are based on impact factor (IF), which raises a potential issue of selection bias. While the paper acknowledges this as a limitation, it lacks a clear justification for why authors made this choice. It is also unclear how the "top" journals were identified as whether it was based on the top 5% in terms of impact factor? Or 10%? Or some other metric? The authors also do not provide the (computed) impact factors of the journals in the supplementary.

    By exclusively focusing on high impact journals, your analysis may not be representative of the broader landscape of self-citation patterns across the neuroscience literature, which is what the title of the article claims to do.

    2. One other concern pertains to the possibility that a significant number of authors involved in the paper may not be neuroscientists. It is plausible that the paper is a product of interdisciplinary collaboration involving scientists from diverse disciplines. Neuroscientists amongst the authors should be identified.

    3. When calculating self-citation rate, it is important to consider the number of papers the authors have published to date. One plausible explanation for the lower self-citation rates among first authors could be attributed to their relatively junior status and short publication record. As such, it would also be beneficial to assess self-citation rate as a percentage relative to the author's publication history. This number would be more accurate if we look at it as a percentage of their publication history. My suspicion is that first authors (who are more junior) might be more likely to self-cite than their senior counterparts. My suspicion was further raised by looking at Figures 2a and 3. Considering the nature of the self-citation metric employed in the study, it is expected that authors with a higher level of seniority would have a greater number of publications. Consequently, these senior authors' papers are more likely to be included in the pool of references cited within the paper, hence the higher rate.

    While the authors acknowledge the importance of the number of past publications in their gender analysis, it is just as important to include the interplay of seniority in (1) their first and last author self-citation rates and (2) their geographic analysis.

    4. Because your analysis is limited to high impact journals, it would be beneficial to see the distribution of the impact factors across the different countries. Otherwise, your analysis on geographic differences in self-citation rates is hard to interpret. Are these differences really differences in self-citation rates, or differences in journal impact factor? It would be useful to look at the representation of authors from different countries for different impact factors.

    5. The presence of self-citations is not inherently problematic, and I appreciate the fact that authors omit any explicit judgment on this matter. That said, without appropriate context, self-citations are also not the best scholarly practice. In the analysis on gender differences in self-citations, it appears that authors imply an expectation of women's self-citation rates to align with those of men. While this is not explicitly stated, use of the word "disparity", and also presentation of self-citation as an example of self-promotion in discussion suggest such a perspective. Without knowing the context in which the self-citation was made, it is hard to ascertain whether women are less inclined to self-promote or that men are more inclined to engage in strategic self-citation practices.

  9. Reviewer #3 (Public Review):

    This paper analyses self-citation rates in the field of Neuroscience, comprising in this case, Neurology, Neuroscience and Psychiatry. Based on data from Scopus, the authors identify self-citations, that is, whether references from a paper by some authors cite work that is written by one of the same authors. They separately analyse this in terms of first-author self-citations and last-author self-citations. The analysis is well-executed and the analysis and results are written down clearly. There are some minor methodological clarifications needed, but more importantly, the interpretation of some of the results might prove more challenging. That is, it is not always clear what is being estimated, and more importantly, the extent to which self-citations are "problematic" remains unclear.

    When are self-citations problematic? As the authors themselves also clarify, "self-citations may often be appropriate". Researchers cite their own previous work for perfectly good reasons, similar to reasons of why they would cite work by others. The "problem", in a sense, is that researchers cite their own work, just to increase the citation count, or to promote their own work and make it more visible. This self-promotional behaviour might be incentivised by certain research evaluation procedures (e.g. hiring, promoting) that overly emphasise citation performance. However, the true problem then might not be (self-)citation practices, but instead, the flawed research evaluation procedures that emphasis citation performance too much. So instead of problematising self-citation behaviour, and trying to address it, we might do better to address flawed research evaluation procedures. Of course, we should expect references to be relevant, and we should avoid self-promotional references, but addressing self-citations may just have minimal effects, and would not solve the more fundamental issue.

    Some other challenges arise when taking a statistical perspective. For any given paper, we could browse through the references, and determine whether a particular reference would be warranted or not. For instance, we could note that there might be a reference included that is not at all relevant to the paper. Taking a broader perspective, the irrelevant reference might point to work by others, included just for reasons of prestige, so-called perfunctory citations. But it could of course also include self-citations. When we simply start counting all self-citations, we do not see what fraction of those self-citations would be warranted as references. The question then emerges, what level of self-citations should be counted as "high"? How should we determine that? If we observe differences in self-citation rates, what does it tell us?

    For example, the authors find that the (any author) self-citation rate in Neuroscience is 10.7% versus 15.9% in Psychiatry. What does this difference mean? Are psychiatrists citing themselves more often than neuroscientists? First author men showed a self-citation rate of 5.12% versus a self-citation rate of 3.34% of women first authors. Do men engage in more problematic citation behaviour? Junior researchers (10-year career) show a self-citation rate of about 5% compared to a self-citation rate of about 10% for senior researchers (30-year career). Are senior researchers therefore engaging in more problematic citation behaviour? The answer is (most likely) "no", because senior authors have simply published more, and will therefore have more opportunities to refer to their own work. To be clear: the authors are aware of this, and also take this into account. In fact, these "raw" various self-citation rates may, as the authors themselves say, "give the illusion" of self-citation rates, but these are somehow "hidden" by, for instance, career seniority.

    Again, the authors do consider this, and "control" for career length and number of publications, et cetera, in their regression model. Some of the previous observations then change in the regression model. Neuroscience doesn't seem to be self-citing more, there just seem to be junior researchers in that field compared to Psychiatry. Similarly, men and women don't seem to show an overall different self-citation behaviour (although the authors find an early-career difference), the men included in the study simply have longer careers and more publications.

    But here's the key issue: what does it then mean to "control" for some variables? This doesn't make any sense, except in the light of causality. That is, we should control for some variable, such as seniority, because we are interested in some causal effect. The field may not "cause" the observed differences in self-citation behaviour, this is mediated by seniority. Or is it confounded by seniority? Are the overall gender differences also mediated by seniority? How would the selection of high-impact journals "bias" estimates of causal effects on self-citation? Can we interpret the coefficients as causal effects of that variable on self-citations? If so, would we try to interpret this as total causal effects, or direct causal effects? If they do not represent causal effects, how should they be interpreted then? In particular, how should it "inform author, editors, funding agencies and institutions", as the authors say? What should they be informed about?

    The authors also "encourage authors to explore their trends in self-citation rates". It is laudable to be self-critical and review ones own practices. But how should authors interpret their self-citation rate? How useful is it to know whether it is 5%, 10% or 15%? What would be the "reasonable" self-citation rate? How should we go about constructing such a benchmark rate? Again, this would necessitate some causal answer. Instead of looking at the self-citation rate, it would presumably be much more informative to simply ask authors to check whether references are appropriate and relevant to the topic at hand.

    In conclusion, the study shows some interesting and relevant differences in self-citation rates. As such, it is a welcome contribution to ongoing discussions of (self) citations. However, without a clear causal framework, it is challenging to interpret the observed differences.