OmixLitMiner 2: Guided Literature Mining Tools for Automated Categorization of Marker Candidates in Omics Studies
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Omics analyses are crucial for understanding molecular mechanisms in biological research. The vast quantity of detected biomolecules presents a significant challenge in identifying potential biomarkers. Traditional methods rely heavily on labor-intensive literature mining to extract meaningful insights from long lists of regulated candidates. To address this, we developed OmixLitMiner 2 to improve the efficiency of omics data interpretation, increase the speed for the validation of results and accelerate further evaluation based on the selection of marker candidates for subsequent experiments. The updated tool utilizes UniProt for synonym and protein name retrieval and employs the PubMed database as well as PubTator 3.0 for mining abstracts and full texts of available biomedical literature. It allows for advanced keyword-based searches and provides classification of proteins or genes with respect to their awareness level in relationship to scientific questions. OmixLitMiner 2 offers improved functionality over the previous version and comes with a user-friendly Google Colab interface. In comparison to the previous version OmixLitMiner 2 improves the retrieval and classification of relevant publications. The tool significantly reduces the time required for manual searches, as demonstrated in a case study involving proteomic data from spatially resolved mouse brain cortex layers.
Statement of Significance
We developed OmixLitMiner 2 to determine, for a given set of marker candidates, the extent to which they have been described in the literature. The tool is easy-to-use and can quickly generate a categorized list of references obtained by automated literature searches for protein and gene names and involving keywords related to a specific scientific question. The categorization provides a ranking of how well-studied the genes or proteins: Candidates of category 1 are well-known regarding the scientific question; Candidates of category 2 have been mentioned in a publication associated with the specific scientific question; Candidates of category 3 have never been described to be associated with the specific scientific question; Candidates of category 4 are not yet known proteins and are not associated with a gene name. This classification can aid in the decision, which protein or gene candidates to choose for follow-up experiments.