The Accessibility of Published Research in AERA Journals, 2010-2022
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
AbstractUsing a novel public Internet data mining approach, we found that 65.0% of all articles published between 2010 and 2022 in AERA journals are publicly accessible. Analyses for the differences in the versions of articles that were accessible revealed that most of the accessible files were the published version (estimated 51.9% - 64.0%); many were pre-prints (16.1% - 26.1%); and, few had an open access license (3.5% - 9.4%). We discuss the tensions surrounding these findings and the present publishing system, and join calls to further expand open access. The Accessibility of Published Research in AERA Journals, 2010-2022While many educational researchers at academic institutions have access to American Educational Research Association (AERA) journal articles, others (such as teachers and school board members) typically do not. However, many articles are accessible through search engines such as Google Scholar—regardless of institutional access. How and why do individuals without institutional subscriptions access research articles?Sharing educational research in open, public, and accessible forms is widely considered to be of professional value. Several parts of the AERA Policy & Advocacy resources are dedicated to open access, and the AERA Open Access Portal states: “AERA is committed to expanding the open dissemination of scholarly work.” Open access matters not only for increasing the visibility of scholarship but also for (1) advancing societal use of research, (2) broadening impact, and (3) honoring public investment in research. Many educators, practitioners, and policymakers—particularly those without institutional affiliations or in under-resourced institutions and organizations—lack access to paywalled journals, limiting their ability to engage with and apply scholarly findings. This challenge, where structural and economic barriers frequently restrict access to research, limited the potential impact of educational research. Furthermore, much educational research is publicly funded. Restricting access to those who can pay undermines its potential to serve the public good. In this context, open access is not just a publishing mode: it is a reflection of the field’s commitment to inclusivity, utility, and social responsibility.Though sharing scholarship in an open access way is valued by many, recent research suggests that the scholarly enterprise writ large has a ways to go in terms of making open access a reality. For the scholarly literature as a whole, Piwowar et al. (2018) found that only around 28% of articles are open access. Additionally, education scholars have critiqued the less-than-open ways that educational researchers share their work (Roehrig et al., 2018; van der Zee & Reich, 2018). In a recent study, Makel et al. (2021) estimated that about 50% of surveyed education researchers posted copies of their research articles online so that they are not behind a paywall.This paper provides a descriptive portrait of how widespread open-access publishing is in educational research, using AERA journals as the sample given AERA’s role as a nationally and globally important organization that publishes several of the highest citation impact journals in educational research.Our analysis is guided by the following research questions:RQ #1: What percentage of AERA journal articles are accessible?RQ #2: Which versions of AERA journal articles are accessible?RQ #3: Which platforms are used to share published articles?MethodMethodology and Data SourcesWe used a public Internet data mining approach (Kimmons & Veletsianos, 2018) to provide a broad and descriptive portrait of the population of AERA articles published in the years 2010 through 2022. Specifically, we manually created a list of all articles (N = 2,516) published in those years in the six non-open access AERA journals by visiting the webpage for each journal and saving information on the journal, title, and year for each article. The six journals are American Educational Research Journal, Educational Researcher, Journal of Educational and Behavioral Statistics, Educational Evaluation and Policy Analysis, Review of Educational Research, and Review of Research in Education. AERA Open is entirely open access, and thus excluded, though we discuss this as a notable feature of AERA journals in our findings.Sample: Articles in AERA JournalsTo determine our sample, we used SerpAPI, a third-party, web-based service that provides access to the results from Google Scholar and other services’ searches in a structured (i.e., amenable to a data mining approach to research) manner to identify whether a version of an article is available through Google Scholar and which specific files for the article are available. This search using SerpAPI yielded 5,402 files as our analytic sample. Before proceeding, we needed to establish that the data returned through SerpAPI mirrored what is accessible when one directly uses Google Scholar. Thus, to validate the use of SerpAPI, we randomly selected 100 AERA Open articles. One research team member manually compared the results through Google Scholar and SerpAPI. The rest of the research team reviewed and discussed the results. The outcome of this validation showed that the results obtained through SerpAPI were valid and comparable. Specifically, the output for all 100 articles exactly mirrored what we found through a manual approach, which gave us confidence to proceed to the data analyses . We note, for transparency and replicability, that we report in this manuscript the findings from all analyses, which were determined a priori; no ad-hoc decisions were made to alter the analytic plan to yield or report particular findings. Data AnalysesTo answer RQ #1, we then determined the proportion of the 2,516 articles that were freely- and publicly accessible (even if such articles did not have an open access license, such as those that are shared after being published on third-party, commercial platforms, like ResearchGate), and calculated the percentage of articles with such versions available. To answer RQ #2, we randomly sampled 250 files (to balance between coding efficiency and statistical power) from the 5,402 files (associated with the 2,516 articles in the population) initially accessed to answer RQ #1 and qualitatively coded them. Drawing on the codes in Piwowar et al.’s (2018) study, we developed a coding frame with six codes corresponding to the different versions of open-access formats. Briefly, a published piece might appear in three ways: behind a paywall with strict reuse limits, freely viewable online but lacking an explicit open-use license, or formally released under an open-access license that invites broad sharing and adaptation. A pre-print is the author’s own manuscript circulated before the publisher’s final editing and typesetting. Some works cannot be found in any form, and still other accessible files are ancillary materials—like slide decks or data files—that are related to the study but are not the article itself. These different versions are represented in Table 1, with six codes for the different versions and their definitions. To establish the reliability of the coding frame, the first author coded a sample of files and shared these codes and the reasoning for them with a second coder. Then, the second coder independently coded 250 files, after which the first author reviewed these codes, discussed any questions, and discussed the final code for all files. Then, to make inferences from our sample of 250 articles to the population of 5,402 files, we used logistic regression models to estimate and present confidence intervals for these estimates. The Supplementary Material includes additional detail on these logistic regression models.Finally, to answer RQ #3, we used the urltools R package (Keyes et al., 2019) to identify the domains for the files that were accessed. We calculated the frequency and percentage of most common domains.FindingsRQ #1: What percentage of AERA journal articles are accessible?Overall, 65.0% (n = 1,635) of articles published in non-open access AERA journals included a link to an accessible version of the article. This includes articles accessible in any format—i.e., as pre-prints or published articles. There was variability between journals, ranging from 59.1% to 74.7% (see Figure 1). Figure 1. The percentage and number of freely accessible articles published in non-open access AERA journals between 2010-2022 Of these, 66.4% of accessible articles had multiple files available—an average of 2.53 (SD = 1.62) files per article, which points to the findings for the next RQ on the versions of accessed files.RQ #2: Which versions of articles are accessible?Most of the 5,402 accessible files were the published version of the article with restricted access—an estimated 58.0% (see Table 1). A smaller proportion—20.8%—were pre-prints. Fewer still were an initially unexpected version: the published version with “free” access, comprising 10.0% of files. Finally, only 6.0% were the published version with an open access license.Table 1. The version of accessed files.Version of FileDescriptionNumber of Files in Coded SampleEstimated Percentage [CI] for All Accessible FilesPublished Article, Restricted AccessThe published article as a PDF; license is restricted to those who subscribe to the journal or purchase access to the article.14558.0% [51.9% – 64.0%]Pre-PrintA version of the article other than the published article; typically a PDF or Word document5220.8% [16.1% – 26.1%]Published Article, Free AccessThe published article as a PDF; freely available but not open-access licensed (typically2510.0% [6.7% – 14.1%]Published Article, Open AccessThe published article as a PDF; open-access licensed156.0% [3.5% – 9.4%]Not AccessibleNot available in any form72.8% [1.2% – 5.2%]OtherSome other file or resource (e.g., a PDF of presentation slides and other files that are not articles); not the published article or a pre-print/post-print62.4% [0.9% – 4.8%]RQ #3: Which platforms are used to share published articles?The most common platform associated with accessed articles was ResearchGate (15.62% of all accessed articles), a commercial platform for academics to share their research, followed by Sage, the official Sage website for the open-access and free-access articles accessible through it (14.81%)—typically, free access and open access articles. Academia.edu (9.87%) was the third most common, followed by other various non-profit and commercial research sharing platforms and government and non-profit organization websites. The ten most common domains for accessed articles are presented in Table 2. Table 2. The ten most common domains for accessed articles.DomainDescription of the Domain% (n articles)www.researchgate.netResearcher networking site for sharing scientific papers.15.6% (891)journals.sagepub.comSAGE Publishing’s online journal platform.14.8% (845)www.academia.eduAcademic social network for posting and discovering papers.9.9% (563)citeseerx.ist.psu.eduPublic digital library of computer and information-science papers.8.6% (492)scholar.google.comGoogle’s academic literature search portal.5.3% (302)scholar.archive.orgSearch interface to Internet Archive’s collection of scholarly works.5.1% (292)files.eric.ed.govFull-text documents hosted by the ERIC education database.2.7% (151)core.ac.ukAggregator offering free full-text access to research outputs from repositories worldwide.1.9% (110)www.ncbi.nlm.nih.govNIH portal to biomedical literature and databases (e.g., PubMed).1.5% (84)edworkingpapers.orgOpen repository of preliminary research papers in education.1.2% (66)DiscussionAround two-thirds (65.0%) of the articles published in non-open AERA journals were accessible through Google Scholar in some form. Using a sample of accessible versions of articles and logistic regression models to make inferences from this sample to the population from which it was drawn, most (58.0%) of these are published versions of articles that have a restricted access license, though we found that many such articles are posted on third-party, commercial platforms, such as ResearchGate and Academia.edu. A smaller proportion of articles were posted to ERIC or to institutional repositories, and many articles (around one-third) were shared via domains not among the top ten, suggesting that there is not a high degree of centralization in terms of where articles are shared, and that there are different legal, ethical, and regulatory considerations at play when authors share. Fewer accessible articles were pre-prints (20.8%), and fewer were published with an open access license (6.0%). We note that all AERA Open journal articles (which were not included in this analysis) were and are open, and that this has a bearing on how open AERA articles are as a whole. We also note that an estimated 2.8% of the versions of files accessible were not, in fact, accessible—essentially, Google Scholar returned a result that was invalid or not accessible in a small proportion of cases.These results are encouraging in that educational research published in AERA journals appears more accessible than the scholarly literature as a whole. Piwowar et al. (2018) reported that around one-third of articles across fields from 2009 to 2015 were publicly accessible, with an accelerating trend in the openness of these articles. However, we may have further to go. Around one-third of articles are not accessible, and there may be biases at play in terms of which published articles are shared (e.g., some authors may be more comfortable than others sharing their articles with restricted access licenses due to a higher risk tolerance and others may share due to a better understanding of how one may allowably share one’s work through an institutional agreement, e.g. COAPI ). We note here that the legal and regulatory landscape is complex, and not all authors may fully understand when, how, and where they might share their work. Furthermore, some authors may have institutional financial support to cover open-access fees or other expenses for open science that others do not (Kessler et al., 2021).One limitation of this work is our reliance solely on Google Scholar—which is likely fairly comprehensive, but which may not return every version of every article. Another limitation is our exclusive focus on AERA journals—others, in other educational research journals, may show that there is substantial variability between journals and fields. In light of these, we present these findings in the spirit of questioning how we share our scholarly work, while recognizing the financial and other challenges of doing so. There are tensions between the current partially open status quo and a potential approach that treats open access as the default—one in which any student, teacher or school board member, or researcher in any country around the world could access the work published in AERA journals. Individuals can share their work through platforms such as ERIC , the Open Science Framework , or their institutions’ repositories, and academic librarians can support this work. But this is not an individual’s task, alone. We urge institutions to advocate for greater open access (e.g., by joining the Coalition of Open Access Policy Institutions ) and for leaders and individuals in AERA and other organizations to continue to advocate for expanding open access further.