TEclass2: Classification of transposable elements using Transformers

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Motivation

Transposable elements (TEs) are interspersed repetitive sequences that are major constituents of most eukaryotic genomes and are crucial for genome evolution. Despite the existence of multiple tools for their classification and annotation, none of them can achieve completely reliable results making it a challenge for genomic studies. In this work, we introduce TEclass2, a new software that uses a deep learning approach based upon a linear Transformer architecture with a k-mer to-kenizer and further adaptations to handle DNA sequences. This software has an easy configuration that allows training models on new datasets and the classification of TE models providing multiple metrics for a reliable evaluation of the results.

Results

This work shows a successful adaptation of deep learning with Transformers for the classification of TE models from consensus sequences, and these results lay a foundation for novel methodologies in bioinformatics. We provide a tool for the training of models and the classification of consensus sequences from TE models on custom data and a web page interface with a pre-trained dataset based on curated and non-curated TE libraries allowing a fast and simple classification of TEs.

Availability

https://bioinformatics.uni-muenster.de/tools/teclass2/index.pl

Contact

wojmak@uni-muenster.de

Supplementary information

Supplementary data are available at Bioinformatics online.

Article activity feed