Eduomics: a Nextflow pipeline to simulate -omics data for education

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Moving past learning just algorithms and code is a key challenge of bioinformatics education: the ideal goal is for students to acquire higher-order knowledge such as the ability to solve biological problems with the appropriate tools, and more importantly learn to interpret the results in the broader context where bioinformatics is needed. To design such a teaching and learning experience, data simulations play a key role: however, there is a massive barrier to adoption. Different data types are produced by different tools, requiring educators to learn each of them and adapt their workflow to the necessary dependencies, requirements and input files. Additionally, most existing data simulation solutions are meant for benchmarking and methods development rather than education and cannot provide the context needed to teach students the critical interpretation skills they need to move beyond problem-based learning to what we call storyline-based learning. A significant effort must be placed also when many datasets with the same characteristics are needed, such as in tutoring or assessment in higher education. Here, we present eduomics: a Nextflow pipeline meant to automate the simulation workflow for both genomic and transcriptomic next-generation sequencing data, and to produce realistic clinical scenarios to provide students with clues and a biomedical story necessary for the interpretation of their results. Eduomics removes barriers to adoption, by requiring the user to just decide which chromosome datasets should be simulated on, and which type of data they would like to simulate. There is no need to learn specific tools and resolve their dependencies. The use of Gemini API provides an innovative approach to generate plausible clinical scenarios, consistent with the genes where either a pathological mutation or differential expression has been simulated. With eduomics, we offer an accessible and scalable solution to design comprehensive learning experiences and innovate bioinformatics education.

Article activity feed