An Evolutionary Statistics Toolkit for Simplified Sequence Analysis on Web with Client-Side Processing

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

We present the “Evolutionary Statistics Toolkit”, a user-friendly web-based platform designed for specialized analysis of genetic sequences, which integrates multiple evolutionary statistics. The toolkit focuses on a selection of specialized tools, including Tajima’s D calculator with Site Frequency Spectrum (SFS), Shannon’s Entropy (H), alignment re-formatting, HGSV to FASTA conversion, pair-wise frequency analysis, FASTA to SEQRES, RNA 2D structure alignment, and kurtosis coefficient calculator. Tajima’s D is calculated using the reference formula: D = (π - θ W ) / sqrt(V D ), where π corresponds to the average number of differences, θ W is Watterson’s estimator of θ, and V D is the variance of π - θ W . Shannon’s Entropy is defined as H = -∑ p i * log 2 (p i ), where p i is the probability of occurrence of each unique character (nucleotide or amino acid) in the sequence. The toolkit facilitates streamlined workflows for early researchers in evolutionary biology, genomics, and related fields. With comparing with existing codes, we propose it also emerges as an educational interactive website for beginners in evolutionary statistics. The source code for each tool in the toolkit is available through GitHub links provided on the website. This open-source approach allows users to inspect the code, suggest improvements, or further adapt the tools for their specific usage and research needs. This article describes the functionalities, and validation of each tool within the platform, along with comparison with accessible existing statistical utilities. The toolkit is freely accessible on: https://www.alperkaragol.com/toolkit

Article activity feed