Formal Statistical Replication Analysis in Lung Cancer Genome-Wide Association Studies

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Dozens of genome-wide association studies (GWAS) have identified thousands of single nucleotide polymor-phisms (SNPs) associated with lung cancer risk. However, it remains challenging to translate these findings to clinical insights. One well-known obstacle is the large amount of type I error attached to GWAS; attempted solutions such as setting a p -value threshold across multiple cohorts or looking for small meta-analysis p -values have only somewhat reduced false positive findings. In contrast, here we advocate for a statistical model-based replication analysis. We first demonstrate that a formal statistical test for the replication com-posite null hypothesis - i.e. that the regression coefficient of a SNP falls in the same direction in multiple cohorts simultaneously - can curate a smaller, higher-quality list of significant SNPs than common alterna-tives. In two-way simulations, the false discovery rate (FDR) of model-based replication analysis is 6.4 times lower than that of meta-analysis with a p < 10 −8 threshold. In three-way replication analysis, 9.8% of the International Lung Cancer Consortium GWAS significant SNPs are replicated for squamous cell lung cancer while 33.8% are replicated for lung adenocarcinoma. Finally, we construct polygenic risk scores (PRSs) and find the replication-based PRS achieves virtually identical performance to a GWAS-significant PRS while us-ing 87.3% fewer variants. Thus, formal model-based replication analysis can greatly reduce spurious findings while still identifying important variants, allowing for more robust and more efficient translation of GWAS results.

Article activity feed