How to prepare the input data and run MCScanX efficiently?

Xi Zhang
David Roy Smith

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

The protocol by Wang et al. is particularly useful as it outlines the steps for efficiently identifying colinear blocks in intra-and inter-species BLASTP outputs by using MCScanX. We recently discovered that the protocol lacks the pre-processing steps for checking if there are multiple isoforms derived from alternative splicing. Conserved sequences derived from alternative splicing can have similar functional domains, to avoid mis-prediction of gene duplicates, especially for the genome data from NCBI or other online resources. Without this step, the number of duplicate genes will be overrepresented. Besides we shared some useful experience to faster preparing the input data and easier running MCScanX. This is including alternative options to prepare the ‘.gff’ input file and iterated all-against-all BLASTP processing. Lastly, we want to raise awareness of the potential challenges when preparing the input files and highlight potential issues when using the protocol.

Version published to 10.1101/2025.07.29.666888 on bioRxiv
Aug 1, 2025

Discuss this preprint

Listed in

Abstract

Article activity feed