Imputing not available values in single-cell DNA methylation data using the median is straightforward and effective
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Recent advances in single-cell DNA methylation have provided unprecedented opportunities to explore cellular epigenetic differences with maximal resolution. Due to the number of methylation sites exceeding the computational limits of current analytical methods, a common workflow for single-cell DNA methylation analysis is binning the genome into multiple regions and computing the average methylation level within each region. In this process, imputing not available (NA) values which are caused by the limited number of captured methylation sites is a necessary preprocessing step for downstream analyses. Existing studies have employed several simple imputation methods (such as zero imputation or mean imputation); however, there is a lack of theoretical studies or benchmark tests evaluating these approaches. Through both experiments and theoretical analysis, we found that using the median to impute missing data can effectively and simply reflect the methylation state of the NA values, providing an accurate foundation for downstream analyses.