How to prepare the input data and run MCScanX efficiently?
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
The protocol by Wang et al. is particularly useful as it outlines the steps for efficiently identifying colinear blocks in intra-and inter-species BLASTP outputs by using MCScanX. We recently discovered that the protocol lacks the pre-processing steps for checking if there are multiple isoforms derived from alternative splicing. Conserved sequences derived from alternative splicing can have similar functional domains, to avoid mis-prediction of gene duplicates, especially for the genome data from NCBI or other online resources. Without this step, the number of duplicate genes will be overrepresented. Besides we shared some useful experience to faster preparing the input data and easier running MCScanX. This is including alternative options to prepare the ‘.gff’ input file and iterated all-against-all BLASTP processing. Lastly, we want to raise awareness of the potential challenges when preparing the input files and highlight potential issues when using the protocol.