Self-attention based deep learning model for predicting the coronavirus sequences from high-throughput sequencing data

ZhenNan Wang
Chaomei Liu

Read the full article

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Transformer models have achieved excellent results in various tasks, primarily due to the self-attention mechanism. We explore using self-attention for detecting coronavirus sequences in high-throughput sequencing data, offering a novel approach for accurately identifying emerging and highly variable coronavirus strains. Coronavirus and human genome data were obtained from the Genomic Data Commons (GDC) and the National Genomics Data Center (NGDC) databases. After preprocessing, a simulated high-throughput sequencing dataset of coronavirus-infected samples was constructed. This dataset was divided into training, validation, and test datasets. The self-attention-based model was trained on the training datasets, tested on the validation and test datasets, and SARS-CoV-2 genome data were collected as an independent test datasets. The results showed that the self-attention-based model outperformed traditional bioinformatics methods in terms of performance on both the test and the independent test datasets, with a significant improvement in computation speed. The self-attention-based model can sensitively and rapidly detect coronavirus sequences from high-throughput sequencing data while exhibiting excellent generalization ability. It can accurately detect emerging and highly variable coronavirus strains, providing a new approach for identifying such viruses.

Version published to 10.1101/2024.08.07.24311618v1 on medRxiv
Aug 7, 2024

DeepKin: Predicting relatedness from low-coverage genomes and paleogenomes with convolutional neural networks

This article has 9 authors:
1. Merve N. Güler
2. Ardan Yılmaz
3. Büşra Katırcıoğlu
4. Sarp Kantar
5. Tara Ekin Ünver
6. Kıvılcım Başak Vural
7. N. Ezgi Altınışık
8. Emre Akbaş
9. Mehmet Somel
This article has no evaluationsLatest version Aug 9, 2024
AI-Enabled Pipeline for Virus Detection, Validation, and SNP Discovery from Next- Generation Sequencing Data

This article has 3 authors:
1. Abozar Ghorbani
2. Mahsa Rostami
3. Pietro Hiram Guzzi
This article has no evaluationsLatest version Aug 2, 2024
A modified decision tree improves generalization across multiple brains proteomic data sets and reveals the role of ferroptosis in Alzheimer’s disease

This article has 8 authors:
1. Mark V. Ivanov
2. Anna S. Kopeykina
3. Elizaveta M. Kazakova
4. Irina A. Tarasova
5. Zhao Sun
6. Valeriy I. Postoenko
7. Jinghua Yang
8. Mikhail V. Gorshkov
This article has no evaluationsLatest version Aug 7, 2024

Listed in

Abstract

Article activity feed

Related articles

DeepKin: Predicting relatedness from low-coverage genomes and paleogenomes with convolutional neural networks

AI-Enabled Pipeline for Virus Detection, Validation, and SNP Discovery from Next- Generation Sequencing Data

A modified decision tree improves generalization across multiple brains proteomic data sets and reveals the role of ferroptosis in Alzheimer’s disease