Formal Statistical Replication Analysis in Lung Cancer Genome-Wide Association Studies
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Dozens of genome-wide association studies (GWAS) have identified thousands of single nucleotide polymor-phisms (SNPs) associated with lung cancer risk. However, it remains challenging to translate these findings to clinical insights. One well-known obstacle is the large amount of type I error attached to GWAS; attempted solutions such as setting a p -value threshold across multiple cohorts or looking for small meta-analysis p -values have only somewhat reduced false positive findings. In contrast, here we advocate for a statistical model-based replication analysis. We first demonstrate that a formal statistical test for the replication com-posite null hypothesis - i.e. that the regression coefficient of a SNP falls in the same direction in multiple cohorts simultaneously - can curate a smaller, higher-quality list of significant SNPs than common alterna-tives. In two-way simulations, the false discovery rate (FDR) of model-based replication analysis is 6.4 times lower than that of meta-analysis with a p < 10 −8 threshold. In three-way replication analysis, 9.8% of the International Lung Cancer Consortium GWAS significant SNPs are replicated for squamous cell lung cancer while 33.8% are replicated for lung adenocarcinoma. Finally, we construct polygenic risk scores (PRSs) and find the replication-based PRS achieves virtually identical performance to a GWAS-significant PRS while us-ing 87.3% fewer variants. Thus, formal model-based replication analysis can greatly reduce spurious findings while still identifying important variants, allowing for more robust and more efficient translation of GWAS results.