CD2K: An Accessible Framework for Leveraging Public Omics Data in Biomedical Research Training
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
The exponential growth of publicly available omics data presents both an opportunity and a challenge for biomedical researchers, particularly those in low- and middle-income countries (LMICs). The Collective Omics Data to Knowledge (CD2K) initiative aims to address this challenge by providing an accessible framework for biomedical research training. This review describes the development and implementation of the CD2K program, which comprises three core modules: COD1 (reductionist interpretation of collective omics data), COD2 (creation of curated dataset collections), and COD3 (re-analysis of omics data on a global scale). The CD2K approach emphasizes the reuse and reinterpretation of public data, integrating literature mining and emerging technologies like Large Language Models (LLMs). A key feature of the program is its focus on accessibility, designed to make the exploitation of large-scale datasets amenable to researchers without extensive data science skills. The curriculum aims to equip trainees with a range of skills, from basic data interpretation to more advanced bioinformatics analysis, with an emphasis on producing tangible outputs such as peer-reviewed publications, which directly address career development needs. The CD2K initiative has involved researchers from multiple institutions across several countries, resulting in several publications and publicly available dataset collections. While still in its early stages, the program shows promise in providing a structured framework for leveraging public omics data in biomedical research. This review also discusses the current limitations of the CD2K approach and ongoing efforts to expand its reach. By offering an accessible model for building research capacity, the CD2K initiative represents a step towards fostering data-driven discovery in global biomedical research, particularly in resource-limited settings.