Comparing the outputs of intramural and extramural grants funded by National Institutes of Health

Xiang Zheng
Qiyao Yang
Jai Potnuri
Chaoqun Ni
B Ian Hutchins

Curated by eLife

eLife Assessment

This important study used five metrics to compare the cost-effectiveness of intramural and extramural research funded by the National Institutes of Health in the United States between 2009 and 2019. They found that each type of research had its own set of strengths: extramural research was more cost-effective in terms of publications, whereas intramural research was more cost-effective in terms of influencing clinical work. The evidence supporting these findings is mostly solid, but there are a number of questions about the methods and data - notably about indirect cost recovery and other non-NIH sources of funding - that need to be answered.

This article has been Reviewed by the following groups

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

Evaluated articles (PREreview)
Evaluated articles (eLife)

Abstract

Funding agencies use a variety of mechanisms to fund research. The National Institutes of Health in the United States, for example, employs scientists to perform research at its own laboratories (intramural research), and it also awards grants to pay for research at external institutions such as universities (extramural research). Here, using data from 1594 intramural grants and 97054 extramural grants funded between 2009 and 2019, we compare the scholarly outputs from these two funding mechanisms in terms of number of publications, relative citation ratio and clinical metrics. We find that extramural awards more cost-effectively fund outputs commonly used for academic review such as number of publications and citations per dollar, while intramural awards are more cost-effective at generating work that influences future clinical work, more closely in line with agency health goals. These findings provide evidence that institutional incentives associated with different funding mechanisms drive their comparative strengths.

PREreview
Oct 15, 2025
This Zenodo record is a permanently preserved version of a PREreview. You can view the complete PREreview at https://prereview.org/reviews/17353888.

This study fills a significant gap in the literature to date by conducting a comprehensive, side-by-side comparison of products arising from intramural and extramurally funded NIH research projects. The authors constructed a vast dataset consisting of 98,648 projects (97,054 extramural and 1,594 intramural) from 2009 through 2019, correlating with more than 621,000 publications. Through the application of modern bibliometric metrics, such as, -i.e., publication number, Relative Citation Ratio (RCR), and clinical translation indices (APT, clinical citations),-the study sought to determine which funding modality has a greater influence and cost-effectiveness in pursuing the NIH mission. One …
This Zenodo record is a permanently preserved version of a PREreview. You can view the complete PREreview at https://prereview.org/reviews/17353888.

This study fills a significant gap in the literature to date by conducting a comprehensive, side-by-side comparison of products arising from intramural and extramurally funded NIH research projects. The authors constructed a vast dataset consisting of 98,648 projects (97,054 extramural and 1,594 intramural) from 2009 through 2019, correlating with more than 621,000 publications. Through the application of modern bibliometric metrics, such as, -i.e., publication number, Relative Citation Ratio (RCR), and clinical translation indices (APT, clinical citations),-the study sought to determine which funding modality has a greater influence and cost-effectiveness in pursuing the NIH mission. One notable finding was that while extramural research showed greater cost-effectiveness in producing academic publications and citations, intramural research showed greater efficiency in producing clinically relevant outputs. This conclusion is significant given the fact that intramural projects were not overwhelmingly focused on human research; they still showed a stronger association with clinical outcomes. One of the main virtues of the study is the enormous scale of the dataset —(98,648 projects linked to over 621,000 publications),— which makes the findings robust and influential. The use of updated bibliometric measures also adds depth and credibility to the analysis. The study's implementation of robust, contemporary metrics presents a valuable framework with potential applications for future research impact evaluations.

However, the 10-year window (2009-2019), limited by intramural data availability, might not be sufficient for truly measuring long-term impact. The reliance on a simplistic classification scheme based only on activity codes may introduce errors, while excluding jointly funded publications could underestimate the impact of collaborative work. Additionally, the wide confidence intervals in the data leave room for uncertainty. Nevertheless, by showing how different types of NIH funding shape not just scientific output but clinical relevance, the study adds valuable nuance to a debate that has surprisingly received little attention in recent decades.

Major concerns and feedback:

This description of the propensity score matching approach does not include enough detail to aid in reproducibility. Crucial information is missing, such as the actual algorithm used to match (e.g., nearest neighbor or optimal matching), the method followed with respect to replacement while matching, the distance applied to the caliper, and the plan followed for handling multiple match or sparse match cases.

The authors are advised to describe the matching algorithm used predominantly, to define the caliper width (e.g., 0.2 of the standard deviation of the logit of the reproducibility score), and to define the matching ratio (e.g., 1:1 or 1:k). They should also describe the process followed to handle ties and report whether there were any projects dropped because of an insufficiency of appropriate matches.

The handling of propensity score matching is insufficiently described. For example, the authors have not clarified how multiple matches were dealt with, how data was managed when matches were unavailable, or how criteria were prioritized (tier 1 mandatory vs. tier 2 flexible). Similarly, the statistical power calculation for a 10-year dataset should be better justified. Regression methods, while mentioned, remain unclear and require further explanation

It is advisable to include the matching ratio used (e.g., 1:1 or 1:k) and explain whether matching was done with or without replacement, as this detail is critical for reproducibility.

The choice of a 10-year window (2009-2019) is presented as a data availability constraint rather than a methodological choice. This short timeframe is a significant limitation for assessing "long-term impact," particularly for clinical translation, which can take decades. The conclusions about impact and cost-effectiveness may be premature.

It is advisable to the authors should more strongly frame this as a key limitation in the discussion. They could also conduct a sensitivity analysis on a subset of projects from the earliest years (e.g., 2009-2011) to see if impact metrics differ for older projects, providing some insight into time-based trends.

It is advisable to emphasize in the Discussion section that the restricted 10-year window represents a major limitation for assessing long-term impact, and that this constraint may influence interpretation of the findings.

The analysis spans only a 10-year window (2009–2019), primarily because intramural data was only available from 2008. This may be too limited to truly capture long-term impact, especially given the long gestation period of clinical research outcomes. The authors should justify why this timeframe is sufficient or discuss how it constrains conclusions.

The claim that "intramural work is more aligned with NIH's mission" is an interpretive leap not fully supported by the data presented. The study shows intramural research is more clinically oriented, but the NIH mission is broad, encompassing fundamental discovery, training, and public health. This overstates the findings.

The suggestion is that the language should be moderated to precisely reflect the results. For example: "Our findings suggest that the intramural program demonstrates a particular strength in producing clinically relevant outputs, which is one critical component of the NIH's broader mission."

It is advisable to include the moderated rephrasing example here directly, e.g., "Our findings suggest that the intramural program demonstrates a particular strength in producing clinically relevant outputs, which is one critical component of the NIH's broader mission." This helps make the feedback more actionable.

The claim that intramural research is more aligned with NIH's mission feels overstated. Given the smaller number of intramural projects and their wide confidence intervals, the evidence does not convincingly support such a broad assertion. A more cautious interpretation would strengthen the paper's credibility.

The exclusion of collaboratively funded publications is a major methodological decision that may systematically bias the results. It could disproportionately undervalue the contribution of one funding mechanism (likely intramural, which may rely more on collaborations) and misrepresent the collaborative nature of modern science.

The suggestion is that while re-inclusion may be complex, the authors must discuss this limitation more thoroughly, explicitly stating how this exclusion might have skewed the comparisons of output and impact between the two funding types.

Excluding jointly funded outputs might disproportionately affect intramural projects and should be discussed as a potential source of bias in the limitations section. Consider adding a brief sensitivity analysis, including these publications if data permit.

Minor concerns and feedback:

The research's specific aims are badly phrased in the last section of the introduction. We suggest the authors state a definite sentence or a short paragraph enumerating systematically the main and secondary aims of the research (e.g., "The aims of the study are: 1) To compare the bibliometric output. 2) To assess the cost-effectiveness.)

The regression approaches, as stated, are not discussed with enough clarity across a general profile of people. The authors should include in the methods section a brief, accessible description of the objective of each regression model applied, with corresponding technical statistical definitions.

There are certain inaccuracies and potential for greater clarity, such as an incompatibility in color representation in a figure legend (green/red compared with green/blue) and unclear axis labels. The authors should: 1). Modify the Figure 2 legend so that it exactly matches the colors used. 2). Resize the Y-axis title of Figure 1 to something more descriptive, such as "Intramural: Extramural Proportion Ratio," with an explanation that a value >1 indicates an intramural focus and <1 means extramural focus. Figure 2's caption incorrectly refers to green and red when the figure uses green and blue. In Figure 1, the Y-axis label could be simplified—for example, "Intramural: Extramural Proportion Ratio" would make the metric immediately interpretable (>1 intramural focus, <1 extramural focus)

The manuscript is brief and could be enhanced through greater clarification of the summary of results, interpretation in the framework of what has been written before, and limitations and strengths. The authors should redesign the text with clear subheadings (e.g., "Main Results," "Comparison with Previous Studies," "Advantages and Disadvantages," "Implications") to increase readability and ensure each section attains adequate prominence.

While ethically low-risk, the manuscript does lack an official statement declaring that ethical approval was not required. The authors should include a very short sentence in the methods section: "This study drew only on publicly available, aggregated data concerning research grants and publications, and conducted no research involving human subjects; accordingly, ethical approval was not necessary.". Although the study uses only retrospective, publicly available data, an explicit statement clarifying that ethical approval is not applicable would be useful.

The much smaller number of intramural projects means that their results are inherently less precise. This limitation should be emphasized more directly, along with the risk of overinterpreting findings. Reiterate that this limitation should be highlighted within the Discussion to avoid overinterpreting the relative performance of intramural projects.

The exclusion of jointly funded publications could underestimate collaboration and impact, particularly for intramural research. This limitation is acknowledged but deserves more explicit discussion in terms of how it biases the results.

The limitations are listed but not fully unpacked. The authors could strengthen this section by explicitly addressing how each limitation (e.g., exclusion of collaborations, smaller intramural sample) impacts robustness and by clarifying what steps were taken to reduce potential bias.

The Discussion is concise but too compressed. Clearly separating the strengths from the limitations, adding short summaries after each results section, and slightly expanding the implications would make the manuscript easier to read and interpret.

Competing interests

Teena Bajaj and Rosario Rogel-Salazar were facilitators and organizers of this call.

Use of Artificial Intelligence (AI)

The authors declare that they did not use generative AI to come up with new ideas for their review.
Read the original source
eLife
Oct 2, 2025

eLife Assessment

This important study used five metrics to compare the cost-effectiveness of intramural and extramural research funded by the National Institutes of Health in the United States between 2009 and 2019. They found that each type of research had its own set of strengths: extramural research was more cost-effective in terms of publications, whereas intramural research was more cost-effective in terms of influencing clinical work. The evidence supporting these findings is mostly solid, but there are a number of questions about the methods and data - notably about indirect cost recovery and other non-NIH sources of funding - that need to be answered.

Read the original source
eLife
Oct 2, 2025

Reviewer #1 (Public review):

Summary:
This article carefully compares intramural vs. extramural National Institutes of Health funded research during 2009-2019, according to a variety of bibliometric indices. They find that extramural awards more cost-effectively fund outputs commonly used for academic review such as number of publications and citations per dollar, while intramural awards are more cost-effective at generating work that influences future clinical work, more closely in line with agency health goals.

Strengths:
Great care was taken in selecting and cleaning the data, and in making sure that intramural vs. extramural projects were compared appropriately. The data has statistical validation. The trends are clear and convincing.

Weaknesses:
The Discussion is too short and descriptive, and needs more perspective - why are the …

Reviewer #1 (Public review):

Summary:
This article carefully compares intramural vs. extramural National Institutes of Health funded research during 2009-2019, according to a variety of bibliometric indices. They find that extramural awards more cost-effectively fund outputs commonly used for academic review such as number of publications and citations per dollar, while intramural awards are more cost-effective at generating work that influences future clinical work, more closely in line with agency health goals.

Strengths:
Great care was taken in selecting and cleaning the data, and in making sure that intramural vs. extramural projects were compared appropriately. The data has statistical validation. The trends are clear and convincing.

Weaknesses:
The Discussion is too short and descriptive, and needs more perspective - why are the findings important and what do they mean? Without recommending policy, at least these should discuss possible implications for policy.

The biggest problem I have with this submission is Figure 3, which shows a big decrease in clinical-related parameters between 2014 and 2019 in both intramural and extramural research (panels C, D and E). There is no obvious explanation for this and I did not see any discussion of this trend, but it cries out for investigation. This might, for example, reflect global changes in funding policies which might also influence the observed closing gaps between intramural and extramural research.

Read the original source
eLife
Oct 2, 2025

Reviewer #2 (Public review):

Summary:
This article reports a cost-effectiveness comparison of intramural and extramural that NIH funded between 2009 and 2019. Using data obtained from NIH RePORTER, they linked total project costs to publication output, using robust validated metrics including Relative Citation Ratio (RCR), Approximate Potential to Translate (APT), and clinical citations. They find that after adjusting for confounders in regression and propensity-score analyses, extramural projects were generally more cost-effective, though intramural projects were more cost effective for generating clinical citations. They also describe differences in the topics of intramural- and extramural-funded publications, with intramural projects more likely to generate papers on viral infections and immunity or cancer metastases and survival, …

Reviewer #2 (Public review):

Summary:
This article reports a cost-effectiveness comparison of intramural and extramural that NIH funded between 2009 and 2019. Using data obtained from NIH RePORTER, they linked total project costs to publication output, using robust validated metrics including Relative Citation Ratio (RCR), Approximate Potential to Translate (APT), and clinical citations. They find that after adjusting for confounders in regression and propensity-score analyses, extramural projects were generally more cost-effective, though intramural projects were more cost effective for generating clinical citations. They also describe differences in the topics of intramural- and extramural-funded publications, with intramural projects more likely to generate papers on viral infections and immunity or cancer metastases and survival, but less likely to generate papers on pregnancy and maternal health, brain connectivity and tasks, and adolescent experiences and depression. The authors aptly describe the different natures of the intramural and extramural funding models, including that extramural researchers spend much time writing grant applications and that the work described in extramural publications often receives funding from sources other than NIH grants.

Strengths:
The authors leveraged publicly available data (including RePORTER and the iCite repository) and used robust validated metrics (RCR, APT, clinical citations). They carefully considered a large number of confounders, including those related to the PI, and performed several well-described regression analyses.

Weaknesses:
Figure 3A shows intramural projects producing about 2.75 papers per year in 2009, whereas extramural projects are producing just over 1 paper per year. Extramural projects appear to catch up over the next five years. While the authors attempt to explain the difference in their figure legend, another explanation is that the intramural projects started well before 2009 but, as the authors state, intramural data only became available in 2009.

As the authors note, funding information is often complex and difficult to characterize for an analysis like this. How did the authors handle: i) publications linked to multiple extramural grants; ii) publications linked to intramural and extramural grants; iii) publications linked NIH grants and non-NIH grants?
I would think it necessary to somehow apportion credit, as otherwise it would appear that extramural projects are more productive than they truly are.

Also, it is not clear if the authors took account of the indirect costs paid by the NIH to universities that have received extramural grants.

Read the original source
eLife
Oct 2, 2025

Reviewer #3 (Public review):

Summary:
The manuscript "Comparing the outputs of intramural and extramural grants funded by National Institutes of Health" demonstrates a comparative study on two funding mechanisms adopted by the National Institutes of Health (NIH). The authors adopted a quantitative approach and introduced five metrics to compare the output of intramural and extramural grants. These findings reveal the impacts of intramural and extramural grants on the scientific community, providing funders with insights into the future decisions of funding mechanisms they should take.

Strengths:
The authors clearly presented their methods for processing the NIH project data and classifying projects into either intramural or extramural categories. The limitations of the study are also well-addressed.

Weaknesses:
The article would benefit …

Reviewer #3 (Public review):

Summary:
The manuscript "Comparing the outputs of intramural and extramural grants funded by National Institutes of Health" demonstrates a comparative study on two funding mechanisms adopted by the National Institutes of Health (NIH). The authors adopted a quantitative approach and introduced five metrics to compare the output of intramural and extramural grants. These findings reveal the impacts of intramural and extramural grants on the scientific community, providing funders with insights into the future decisions of funding mechanisms they should take.

Strengths:
The authors clearly presented their methods for processing the NIH project data and classifying projects into either intramural or extramural categories. The limitations of the study are also well-addressed.

Weaknesses:
The article would benefit from a more thorough discussion of the literature, a clearer presentation of the results (especially in the figure captions), and the inclusion of evidence to support some of the claims.

Read the original source
Version published to 10.7554/elife.108929.1 on eLife
Oct 2, 2025
Version published to 10.7554/elife.108929 on eLife
Oct 2, 2025
Version published to 10.1101/2023.11.09.566298 on bioRxiv
Nov 11, 2023

This article has been Reviewed by the following groups

Discuss this preprint

Listed in

Abstract

Article activity feed

Major concerns and feedback:

Minor concerns and feedback:

Competing interests

Use of Artificial Intelligence (AI)