Assessing the validity of minimal phenotyping: a UK Biobank study.

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Background: Minimal phenotyping is the practice of using a small number of questionnaire items to measure a phenotype instead of the entire scale. It is often used as a method of data harmonization when combining different samples in the same analysis, as well as a method for reducing questionnaire length in a single sample. This approach assumes that larger samples are superior to smaller, better-measured samples, but there is little empirical research quantifying these trade-offs. Here we explore the impact of minimal phenotyping on the accuracy of Genome-Wide Association Studies (GWAS) both for individual samples, and for consortia that combine results across samples.Method: We used data for three multi-item instruments from the UK Biobank to simulate the effect of using fewer items to measure a phenotype in a GWAS compared to using the full scale. We report the outputs of the GWAS, including genome-wide significant hits and polygenic risk scores, and compare the results using reduced measures to the results of the full-scale score in a single sample. We benchmark these results against the effect of reducing sample size. We then simulated a consortium by splitting UK Biobank into smaller sub-samples, each with a reduced but overlapping measure of the phenotype. We compare results based on the meta-analysis of a single common item with those based on standardising different multi-item measures of the same phenotype.Results: Reducing the quality of phenotyping affects a GWAS’s accuracy in a similar way to reducing the sample size. For example, removing four items from the full neuroticism scale of 12 items has a similar effect to removing 50,000 participants from the sample of 390822, in terms of the number of genome-wide significant hits, the variance explained and the correlation of the polygenic risk scores. Likewise, in our simulation, a neuroticism GWAS consortium where each sample uses different but overlapping multi-item measures on performs better than one that selects a single-item measure common to all samples, reproducing 20 out of 44 genome-wide significant hits and explaining 0.11% the phenotypic variance out of 0.16%, compared to 4 hits and 0.08% for the single-item measure.Discussion: The results of this study challenge the assumption that achieving larger sample sizes through minimal phenotyping is necessarily better than improving measurement quality. We found a comparable drop-off in the accuracy of results when reducing measurement quality compared to reducing sample size. Researchers must balance the costs of accurate phenotyping versus large samples when designing measures, and consider the possible advantage of using the best available multi-item measures rather than harmonising to a consistent minimal phenotype when designing consortium analysis plans.

Article activity feed