Iterative hard thresholding in genome-wide association studies: Generalized linear models, prior weights, and double sparsity
This article has been Reviewed by the following groups
Listed in
- Evaluated articles (GigaScience)
Abstract
Background
Consecutive testing of single nucleotide polymorphisms (SNPs) is usually employed to identify genetic variants associated with complex traits. Ideally one should model all covariates in unison, but most existing analysis methods for genome-wide association studies (GWAS) perform only univariate regression.
Results
We extend and efficiently implement iterative hard thresholding (IHT) for multiple regression, treating all SNPs simultaneously. Our extensions accommodate generalized linear models, prior information on genetic variants, and grouping of variants. In our simulations, IHT recovers up to 30% more true predictors than SNP-by-SNP association testing and exhibits a 2–3 orders of magnitude decrease in false-positive rates compared with lasso regression. We also test IHT on the UK Biobank hypertension phenotypes and the Northern Finland Birth Cohort of 1966 cardiovascular phenotypes. We find that IHT scales to the large datasets of contemporary human genetics and recovers the plausible genetic variants identified by previous studies.
Conclusions
Our real data analysis and simulation studies suggest that IHT can (i) recover highly correlated predictors, (ii) avoid over-fitting, (iii) deliver better true-positive and false-positive rates than either marginal testing or lasso regression, (iv) recover unbiased regression coefficients, (v) exploit prior information and group-sparsity, and (vi) be used with biobank-sized datasets. Although these advances are studied for genome-wide association studies inference, our extensions are pertinent to other regression problems with large numbers of predictors.
Article activity feed
-
Now published in GigaScience doi: 10.1093/gigascience/giaa044
Benjamin B. Chu 1Department of Computational Medicine, UCLA, Los Angeles, USAFind this author on Google ScholarFind this author on PubMedSearch for this author on this siteORCID record for Benjamin B. ChuKevin L. Keys 2Department of Medicine, University of California, San Francisco, USAFind this author on Google ScholarFind this author on PubMedSearch for this author on this siteORCID record for Kevin L. KeysChristopher A. German 3Department of Biostatistics, Fielding School of Public Health at UCLA, USAFind this author on Google ScholarFind this author on PubMedSearch for this author on this siteHua Zhou 3Department of Biostatistics, Fielding School of Public Health at UCLA, USAFind this author on Google ScholarFind this author on PubMedSearch for this author on this …
Now published in GigaScience doi: 10.1093/gigascience/giaa044
Benjamin B. Chu 1Department of Computational Medicine, UCLA, Los Angeles, USAFind this author on Google ScholarFind this author on PubMedSearch for this author on this siteORCID record for Benjamin B. ChuKevin L. Keys 2Department of Medicine, University of California, San Francisco, USAFind this author on Google ScholarFind this author on PubMedSearch for this author on this siteORCID record for Kevin L. KeysChristopher A. German 3Department of Biostatistics, Fielding School of Public Health at UCLA, USAFind this author on Google ScholarFind this author on PubMedSearch for this author on this siteHua Zhou 3Department of Biostatistics, Fielding School of Public Health at UCLA, USAFind this author on Google ScholarFind this author on PubMedSearch for this author on this siteORCID record for Hua ZhouJin J. Zhou 4Division of Epidemiology and Biostatistics, University of Arizona, Tucson, AZ 85724, USAFind this author on Google ScholarFind this author on PubMedSearch for this author on this siteEric Sobel 1Department of Computational Medicine, UCLA, Los Angeles, USA5Department of Human Genetics, David Geffen School of Medicine at UCLA, Los Angeles, USAFind this author on Google ScholarFind this author on PubMedSearch for this author on this siteJanet S. Sinsheimer 1Department of Computational Medicine, UCLA, Los Angeles, USA5Department of Human Genetics, David Geffen School of Medicine at UCLA, Los Angeles, USAFind this author on Google ScholarFind this author on PubMedSearch for this author on this siteORCID record for Janet S. SinsheimerFor correspondence: jsinshei@ucla.eduKenneth Lange 1Department of Computational Medicine, UCLA, Los Angeles, USA5Department of Human Genetics, David Geffen School of Medicine at UCLA, Los Angeles, USAFind this author on Google ScholarFind this author on PubMedSearch for this author on this siteORCID record for Kenneth LangeFor correspondence: klange@ucla.edu
A version of this preprint has been published in the Open Access journal GigaScience (see paper https://doi.org/10.1093/gigascience/giaa044 ), where the paper and peer reviews are published openly under a CC-BY 4.0 license.
These peer reviews were as follows:
Reviewer 1: http://dx.doi.org/10.5524/REVIEW.102262 Reviewer 2: http://dx.doi.org/10.5524/REVIEW.102263
-
-
-
-