TSTA: thread and SIMD-based trapezoidal pairwise/multiple sequence-alignment method

Peiyu Zong
Wenpeng Deng
Jian Liu
Jue Ruan

Curated by GigaByte

Editors Assessment:

The article presents strategies for accelerating sequence alignment using multithreading and SIMD (Single Instruction, Multiple Data) techniques, and introduces a new algorithm called TSTA (Thread and SIMD-Based Trapezoidal Pairwise/Multiple Sequence-Alignment). The Technical Release write-up presenting a detailed description of TSTA's performance in pairwise sequence alignment (PSA) and multiple sequence alignment (MSA), and compares it with various existing alignment algorithms. Demonstrating the performance gains achieved by vectorized SIMD technology and the application of threading. Testing and debugging a few errors, and adding some more background detail, demonstrating it can achieve faster comparison speed. Demonstrating TSTA's efficacy in pairwise sequence alignment and multiple sequence alignment, particularly with long reads, and showcasing considerable speed enhancements compared to existing tools.

This evaluation refers to version 1 of the preprint

This article has been Reviewed by the following groups

Read the full article

Listed in

Evaluated articles (GigaByte)
Endorsed by GigaByte (scotted400)

Abstract

The rapid advancements in sequencing length necessitate the adoption of increasingly efficient sequence alignment algorithms. The Needleman–Wunsch method introduces the foundational dynamic-programming matrix calculation for global alignment, which evaluates the overall alignment of sequences. However, this method is known to be highly time-consuming. The proposed TSTA algorithm leverages both vector-level and thread-level parallelism to accelerate pairwise and multiple sequence alignments. Availability and implementation Source codes are available at https://github.com/bxskdh/TSTA.

GigaByte
Nov 21, 2024

Editors Assessment:

The article presents strategies for accelerating sequence alignment using multithreading and SIMD (Single Instruction, Multiple Data) techniques, and introduces a new algorithm called TSTA (Thread and SIMD-Based Trapezoidal Pairwise/Multiple Sequence-Alignment). The Technical Release write-up presenting a detailed description of TSTA's performance in pairwise sequence alignment (PSA) and multiple sequence alignment (MSA), and compares it with various existing alignment algorithms. Demonstrating the performance gains achieved by vectorized SIMD technology and the application of threading. Testing and debugging a few errors, and adding some more background detail, demonstrating it can achieve faster comparison speed. Demonstrating TSTA's efficacy in pairwise sequence alignment and multiple sequence alignment, …

Editors Assessment:

The article presents strategies for accelerating sequence alignment using multithreading and SIMD (Single Instruction, Multiple Data) techniques, and introduces a new algorithm called TSTA (Thread and SIMD-Based Trapezoidal Pairwise/Multiple Sequence-Alignment). The Technical Release write-up presenting a detailed description of TSTA's performance in pairwise sequence alignment (PSA) and multiple sequence alignment (MSA), and compares it with various existing alignment algorithms. Demonstrating the performance gains achieved by vectorized SIMD technology and the application of threading. Testing and debugging a few errors, and adding some more background detail, demonstrating it can achieve faster comparison speed. Demonstrating TSTA's efficacy in pairwise sequence alignment and multiple sequence alignment, particularly with long reads, and showcasing considerable speed enhancements compared to existing tools.

This evaluation refers to version 1 of the preprint

Read the original source
GigaByte
Nov 21, 2024
AbstractsThe rapid advancements in sequencing length necessitate the adoption of increasingly efficient sequence alignment algorithms. The Needleman-Wunsch method introduces the foundational dynamic programming (DP) matrix calculation for global alignment, which evaluates the overall alignment of sequences. However, this method is known to be highly time-consuming. The proposed TSTA algorithm leverages both vector-level and thread-level parallelism to accelerate pairwise and multiple sequence alignments.

This work has been published in GigaByte Journal under a CC-BY 4.0 license (https://doi.org/10.46471/gigabyte.141). These reviews are as follows.

Reviewer 1. Xingbao Song and Baoxing Song

Zong et al. implemented a TSTA package that integrated the difference method, the stripe method, SIMD, and multiple threading approaches to perform …
AbstractsThe rapid advancements in sequencing length necessitate the adoption of increasingly efficient sequence alignment algorithms. The Needleman-Wunsch method introduces the foundational dynamic programming (DP) matrix calculation for global alignment, which evaluates the overall alignment of sequences. However, this method is known to be highly time-consuming. The proposed TSTA algorithm leverages both vector-level and thread-level parallelism to accelerate pairwise and multiple sequence alignments.

This work has been published in GigaByte Journal under a CC-BY 4.0 license (https://doi.org/10.46471/gigabyte.141). These reviews are as follows.

Reviewer 1. Xingbao Song and Baoxing Song

Zong et al. implemented a TSTA package that integrated the difference method, the stripe method, SIMD, and multiple threading approaches to perform efficient sequence alignments. The TSTA toolkit could conduct pairwise and multiple sequence alignments. The memory cost of TSTA is comparable with the most efficient one. Overall, TSTA is a good package, and the manuscript is well-written. While I have a few suggestions:

The minimap2 should be mentioned in the section on "difference recurrence relation." It has a much broader range of users and implemented an algorithm that is slightly different from the one by Suzuki, etc.

The striped SIMD is also implemented in reads mappers, such as BWA.

Page 14, line 215 "1k bps", line 227 "1000 kbps", line 230 and table1 "100k". They should be consistent.

In Table 4, I am not sure I understood the second and third lines correctly. Please clarify.

I tried to compile TSTA from the source code. To compile the package, I had to copy 'seqio.h' into the 'msa' and 'psa' folders. Please fix it.

Reviewer 2. Yuansheng Liu

The article explores strategies for accelerating sequence alignment using multithreading and SIMD (Single Instruction, Multiple Data) techniques, and introduces a new algorithm called TSTA. The paper provides a detailed description of TSTA's performance in pairwise sequence alignment (PSA) and multiple sequence alignment (MSA), and compares it with various existing alignment algorithms. Experimental results indicate that TSTA demonstrates significant speed advantages, particularly when handling long sequences and in the no-backtracking mode. However, the experiments on MSA are limited by the experimental environment, which does not fully address the needs of current sequencing technologies concerning long reads and depth. Specifically, the low number of sequences in MSA does not meet the requirements for downstream genomic analysis applications. While the algorithm is highly innovative, its performance on short sequences and during the backtracking phase still requires optimization.

In line 7, the TSTA algorithm utilizes vector-level and thread-level parallelism to accelerate pairwise and multiple sequence alignment. Why are there no experiments designed specifically to evaluate the global alignment performance of TSTA with vector-level parallelism? Or are there any other experimental designs that demonstrate the improved performance of TSTA when vector-level parallelism is employed?

In line 149, is the Active-F method used by the TSTA algorithm contributing to the excessive memory usage and access time overhead observed during the iterative process of PSA? Are there better optimization strategies from this perspective? If not, why does TSTA incur higher time costs in traceback as shown in Table 1? Why does bsalign result in lower time consumption?

Can you provide the time breakdown for each part of the parallel computation in TSTA for PSA (including at least CPU computation overhead, communication overhead, and I/O overhead) to clarify if there will be significant communication overhead issues with larger datasets and more threads?

Table 2 shows that both real and simulated datasets have issues with insufficient depth and short reads. In real MSA processes, it is common to encounter comparisons with depth over 60X and lengths exceeding 100 kbps for long reads. The results under the current experimental conditions seem to perform poorly for such data scenarios. Can you address this?

Gene data often includes repetitive regions that affect the accuracy of alignment algorithms. Can you design experiments to verify how TSTA performs in aligning long repetitive regions? Specifically, how accurately does TSTA align sequences in such regions compared to other methods?

Besides repetitive regions, sequencing errors produced by ONT R10 chips can also impact alignment accuracy. Alignment algorithms used in genome correction often struggle to detect such errors. How does TSTA handle such issues during MSA? Can the algorithm be designed to address these sequencing errors more effectively? Re-review: After thoroughly reviewing the revised manuscript and testing the TSTA tool, I cannot endorse the manuscript for publication in its current form. I encourage the authors to address the following issues thoroughly and consider re-submitting after significant improvements. Efficiency Concerns: In the context of multiple sequence alignment (MSA), I find that TSTA does not demonstrate a significant advantage in terms of efficiency. I conducted a test with approximately 2G of homologous diploid reads (not too large data), and the tool has been running for around 29 hours. Despite this extensive runtime, the process remains incomplete. This is far from the efficiency one would expect from a tool designed for large-scale sequence alignment. Functionality Issues: There are still unresolved issues with the tool's functionality. The -f parameter does not appear to work as intended, and there are also problems with the -o parameter. Such issues need to be addressed to ensure the tool's reliability and usability.
Read the original source
Version published to 10.46471/gigabyte.141
Nov 5, 2024
Version published to 10.1101/2024.09.18.613655v1 on bioRxiv
Sep 22, 2024

Ish: SIMD and GPU Accelerated Local and Semi-Global Alignment as a CLI Filtering Tool

This article has 1 author:
1. Seth Stadick
This article has no evaluationsLatest version Jun 9, 2025
Tomtom-lite: Accelerating Tomtom enables large-scale and real-time motif similarity scoring

This article has 1 author:
1. Jacob Schreiber
This article has no evaluationsLatest version May 31, 2025
FastGA: Fast Genome Alignment

This article has 3 authors:
1. Gene Myers
2. Richard Durbin
3. Chenxi Zhou
This article has no evaluationsLatest version Jun 19, 2025

This article has been Reviewed by the following groups

Listed in

Abstract

Article activity feed

Related articles

Ish: SIMD and GPU Accelerated Local and Semi-Global Alignment as a CLI Filtering Tool

Tomtom-lite: Accelerating Tomtom enables large-scale and real-time motif similarity scoring

FastGA: Fast Genome Alignment