LDscore: a scalable, Python 3-powered web platform for LD score regression analysis

This article has been Reviewed by the following groups

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

Log in to save this article

Abstract

Linkage disequilibrium score regression (LDSC) is an important analytical tool for quantifying heritability and estimating genetic correlations between complex traits. However, the LDSC original implementation relies on an outdated Python 2 framework and deploying the standard command-line tools requires significant setup, data access, and computational expertise, creating a barrier for many researchers. To overcome these limitations, we developed LDscore, a significant technical and accessibility upgraded version of LDSC that allows for rapid analysis of GWAS data. The core advancement is the recoding of the LDSC framework in Python 3, enabling computational optimization and ensuring long-term sustainability. Built on top of this improved foundation, LDscore is implemented as a free, publicly available web application integrated within the popular NCI LDlink framework. LDscore can accelerate scientific research by providing an intuitive graphical interface for heritability estimation, genetic correlation, and LD score calculation, including access to an expanded range of reference populations for online analysis. Notably, our results show that selecting the most appropriate reference population LD panel, even at the subcontinental ancestry group level, is essential for minimizing population stratification bias in heritability estimation. By leveraging cloud computing for superior scalability and eliminating the need for local installation, LDscore adheres to FAIR principles, improving access, traceability, and reproducibility across an expanded set of reference populations, and effectively widens access to researchers worldwide providing support for in-depth genetic analyses.

Brief summary

Linkage disequilibrium score regression (LDSC), a widely-used method for quantifying heritability and genetic correlation, is limited by an outdated Python 2 framework and complex command-line deployment. We developed LDscore, a significant technical upgrade built on Python 3 for sustainability and computational optimization. LDscore is a free, cloud-based web application integrated into NCI LDlink. LDscore eliminates installation barriers, offering an intuitive interface for computing heritability estimates, LD scores, and genetic correlation. Crucially, LDscore expands the range of reference populations available in LDSC, which can reduce population-stratification-based bias. Leveraging cloud computing, LDscore accelerates and widens global researcher access to LDSC-based genetic computation.

Availability

LDscore is freely available within LDlink at https://ldlink.nih.gov/ldscore . Source code for the updated LDSC Python3 framework is available at https://github.com/CBIIT/ldsc under the GNU General Public License v3.0 and the webtool code is at https://github.com/CBIIT/nci-webtools-dceg-linkage (webtool code) under the MIT license.

Article activity feed

  1. including summaries by minor allele frequency for given dataset in PLINK [10] format (e.g.,*.bed, *.bim, and *.fam files).

    In the future it might be nice to expand this to take non-PLINK formats (i.e. just a table of individuals x SNPs). People are trying to move away from the limitations of the PLINK format and it may not be so ubiquitous in the future.

  2. Once the dataset is uploaded, a data checking step is performed to ensure the uploaded summary statistics adhere to necessary formatting standards for the analysis, minimizing common user errors.

    After trying the online tool, I think this step could be more verbose. I wasn't quite sure in what ways certain datasets were failing.

  3. Rather than requiring users to find and download a LD reference panel or generate their own LD reference data, a computationally intensive and error-prone process, LDscore provides immediately accessible LD reference panels.

    This is also a lot of work crammed into a short paper, and again this will expand the possibilities for what people can do with LD scores.

  4. Our tool, LDscore, addresses these limitations by first providing a massive technical overhaul of the method, migrating the core LDSC framework to Python 3.

    This is a huge contribution to the field. LDSC is a great tool and the overhaul will make it more accessible.