Mutational and Expression Profile of ZNF217, ZNF750, ZNF703 Zinc Finger Genes in Kenyan Women Diagnosed with Breast Cancer

Curation statements for this article:
  • Curated by eLife

    eLife logo

    eLife Assessment

    This study presents a valuable finding on the mutational and expression profile of ZNF217, ZNF750, ZNF703 Zinc finger genes in Kenya women with BCs. The evidence supporting the claims of the authors is solid. The work will be of interest to scientists or clinicians working in the field of diagnosis and detection for breast cancer.

This article has been Reviewed by the following groups

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Abstract

Objective

To characterize the mutational landscape and expression profiles of ZNF217, ZNF703, and ZNF750, and assess their clinical relevance in breast cancer patients from Kenya.

Methods

Whole-exome sequencing (WES) and RNA sequencing (RNA-Seq) data from 23 paired tumor–normal samples were analyzed in a Linux-based environment. Somatic mutations were identified using MuTect2 following alignment to the hg38 reference genome and annotation with VEP. Variants were classified by type, coding consequence, and protein position, and mapped to functional domains. Recurrent mutations were identified, and comparisons were made with The Cancer Genome Atlas (TCGA). Gene expression was quantified using STAR and featureCounts, normalized with DESeq2, and analyzed using paired statistical tests with multiple testing correction. Principal component analysis (PCA) and regression analyses were performed to assess expression patterns and clinical associations.

Results

ZNF217 and ZNF750 exhibited high mutational burdens, whereas ZNF703 showed a lower mutation frequency. Mutations were predominantly single nucleotide variants, with missense and synonymous variants as the major classes. Variants were distributed across protein sequences, with limited domain enrichment and no clear hotspot clustering. Recurrent mutations were gene-specific and infrequent. Comparison with TCGA data showed concordant mutation prevalence for ZNF217, low frequency for ZNF703, and absence of ZNF750 mutations. All three genes were significantly upregulated in tumors compared to matched normal tissues (ZNF217: p = 0.00068; ZNF703: p = 0.00475; ZNF750: p = 0.00366). Tumor expression exceeded normal expression in 74% of cases for ZNF217, 64% for ZNF703, and 83% for ZNF750. PCA demonstrated partial separation between tumor and normal samples. ZNF703 expression was positively associated with body mass index (β = 0.194, p = 0.025), and ZNF750 expression was higher in estrogen receptor–positive tumors (β = 1.050, p = 0.005).

Conclusion

ZNF217, ZNF703, and ZNF750 display distinct mutation and expression profiles in breast cancer, with evidence of cohort-specific variation. These findings highlight gene-specific mechanisms of dysregulation and emphasize the value of integrating genomic and transcriptomic analyses.

Article activity feed

  1. eLife Assessment

    This study presents a valuable finding on the mutational and expression profile of ZNF217, ZNF750, ZNF703 Zinc finger genes in Kenya women with BCs. The evidence supporting the claims of the authors is solid. The work will be of interest to scientists or clinicians working in the field of diagnosis and detection for breast cancer.

  2. Reviewer #2 (Public review):

    Summary:

    The authors sought to characterize the somatic mutation landscape and gene expression profiles of Kenyan breast cancer patients. By comparing Whole Exome Sequencing (WES) and RNA-seq data from 23 paired tumor-normal samples against The Cancer Genome Atlas (TCGA) cohorts, the study specifically aimed to highlight the role of the ZNF gene family.

    Strengths:

    The study addresses a critical gap in genomic research by focusing on an underrepresented African population, which is essential for achieving global health equity in oncology.

    Weaknesses:

    The cohort is relatively small for definitive landscape characterization. The study fails to explore the mechanistic link between identified somatic mutations and observed aberrant gene expression.

    Impact and Utility:

    The impact of this work is currently limited. While the data adds to the growing repository of African genomic samples, the lack of novelty and mechanistic insight reduces its utility for the broader scientific community. To be clinically valuable, the study would need to offer more robust, unbiased profiling that could eventually inform population-specific diagnostics or therapies.

    Additional Context:

    Breast cancer in African populations often presents with different clinical trajectories compared to Western cohorts. While any data from these regions is vital, "landscape" studies require high statistical power and unbiased analysis to differentiate true population-specific drivers from noise or small-sample variance. Without a clear regulatory mechanism linking mutations to phenotypes, the findings remain preliminary observations.

  3. Reviewer #3 (Public review):

    Summary:

    This revised study analyzes the somatic mutational profiles and transcriptomic expression of three zinc-finger genes (ZNF217, ZNF703, ZNF750) in 23 Kenyan women with breast cancer, using whole-exome sequencing and RNA-sequencing of paired tumor-normal tissues. A total of 358 somatic mutations were detected, and all three genes were significantly upregulated in tumors compared to normal tissues (ZNF217 showing the most prominent difference). The findings provide preliminary evidence for the idenfication of diagnostic/prognostic biomarkers or therapeutic targets in sub-Saharan African populations.

    Strengths:

    The study's key strengths lie in its focus on an underrepresented Kenyan cohort, addressing a critical gap in sub-Saharan African breast cancer genomic research. It integrates DNA-level mutation analysis with RNA-level expression data, leveraging standardized bioinformatics pipelines and rigorous quality control to deliver detailed insights into mutation types, functional impacts, and amino acid changes.

    Comments on revised version:

    After careful revision by the authors, the manuscript has become more rigorous. The limitations including small sample size and lack of functional validation are properly acknowledged, and conclusions are prudently presented as hypothesis‑generating rather than causal claims. Meanwhile, strengthened multi‑omics analyses, TCGA validation, logical reorganization of results and improved figure presentation further enhance the reliability of this work.

  4. Author response:

    The following is the authors’ response to the previous reviews

    Public Reviews:

    Reviewer #1 (Public review):

    Weaknesses:

    (1) Research scope

    The results primarily focus on mutations in ZNF217, ZNF703, and ZNF750, with limited correlation analyses between mutations and gene expression. The rationale for focusing only on these genes is unclear. Given the availability of large breast cancer cohorts such as TCGA and METABRIC, the authors should compare their mutation profiles with these datasets. Beyond European and U.S. cohorts, sequencing data from multiple countries, including a recent Nigerian breast cancer study (doi: 10.1038/s41467-021-27079-w), should also be considered. Since whole-exome sequencing was performed, it is unclear why only four genes were highlighted, and why comparisons to previous literature were not included.

    We have significantly strengthened the biological and clinical rationale for focusing on these three genes in the Introduction. Specifically, we now clearly justify their selection based on distinct functional roles: ZNF217 (oncogene, 20q13 amplification); ZNF703 (luminal subtype oncogenic driver); ZNF750 (tumor suppressor involved in differentiation). We have also explicitly define the knowledge gap: lack of mutation and expression data for these genes in African populations, particularly Kenyan cohorts.

    Importantly, we have now incorporated comparative analysis with TCGA data in the Results. This include; A new section on “Recurrent mutations and comparison with TCGA”; a new table, “Table 6” and a curated dataset, “Supplementary Table S4”

    (2) Language and Style Issues

    There are many typos and clear errors in the main text (e.g. (ref)).

    Additionally, several statements read unnaturally. For example:

    "Investigators uncovered 170 mutations ..." should instead be phrased as "We identified 170 mutations ...."

    "The research team ..." should be rephrased as "Our team ...."

    The manuscript has undergone comprehensive language editing throughout the revised draft.

    (3) Methods and Data Analysis Details

    The methods section is vague, with general descriptions rather than specific details of data processing and analysis. The authors should provide:

    (a) Parameters used for trimming, mapping, and variant calling (rather than referencing another paper such as Tang et al. 2023).

    (b) Statistical methods for somatic mutation/SNP detection.

    (c) Details of RNA purification and RNA-seq library preparation.

    Without these details, the reproducibility of the study is limited.

    We have fully revised and substantially expanded the Methods section to improve clarity, transparency, and reproducibility. In the revised manuscript, we now provide explicit details of all key analytical steps. These include quality control procedures using FastQC and MultiQC, as well as read trimming parameters implemented in Trimmomatic (leading and trailing quality <3, sliding window 4:15, and minimum read length of 36 bp). We also clearly describe alignment of reads to the hg38 reference genome using BWA-MEM, followed by somatic variant calling using MuTect2 in paired tumor–normal mode with incorporation of a Panel of Normals (PON). Variant filtering criteria are now explicitly stated, including minimum read depth (≥10), base quality (≥20), and variant allele fraction (≥0.05), and functional annotation was performed using VEP (v108).

    In addition, we have included details on variant validation through visualization in the Integrative Genomics Viewer (IGV), as well as RNA-seq processing steps using STAR for alignment, featureCounts for quantification, and DESeq2 for normalization and differential expression analysis. Statistical analyses are now clearly described, including the use of paired tests and Benjamini–Hochberg correction for multiple testing. Collectively, these additions directly address the reviewer’s concerns by ensuring that all analytical procedures are transparently reported and fully reproducible.

    (4) Data Reporting

    This study has the potential to provide a valuable resource for the field. However, data-sharing plans are unclear. The authors should:

    (a) Deposit sequencing data in a public repository.

    (b) Provide supplementary tables listing all detected mutations and all differentially expressed genes (DEGs).

    (c) Clarify whether raw or adjusted p-values were used for DEG analysis.

    (d) Perform DEG analyses stratified by breast cancer subtypes, since differential expression was observed by HER2 status, and some zinc finger proteins are known to be enriched in luminal subtypes.

    We have improved data transparency and reporting in the revised manuscript. All sequencing data are now publicly available, with whole-exome sequencing (WES) data deposited in the Sequence Read Archive (SRA; PRJNA913947) and RNA-seq data available in the Gene Expression Omnibus (GEO; GSE225846). In addition, we have provided comprehensive Supplementary Materials to support reproducibility and facilitate further analysis, including detailed mutation summaries (Table S1), mutation positions (Table S2), amino acid changes (Table S3), the curated TCGA comparison dataset (Table S4), protein domain annotations (Table S5), and the combined gene expression and clinical dataset (Table S6).

    We have also clarified key aspects of the statistical analysis, including the use of Benjamini–Hochberg adjusted p-values and the thresholds applied for significance. Furthermore, in response to reviewer comments regarding subtype-specific analyses, we have explicitly addressed in the Discussion why subtype-stratified differential expression analysis was not performed, noting that the limited sample size would reduce statistical power and increase the risk of overinterpretation. Together, these revisions enhance the transparency, accessibility, and interpretability of the study.

    (5) Mutation Analysis

    Visualizations of mutation distribution across protein domains would greatly strengthen interpretation. Comparing mutation distribution and frequency with published datasets would also contextualize the findings.

    We have substantially enhanced the mutation analysis by incorporating several new figures and complementary analyses that provide deeper biological interpretation. Specifically, we added Figure 1 to summarize mutation burden, coding consequences, and prevalence; Figure 2 to illustrate the nucleotide substitution spectrum; Figure 3 to map mutations across protein domains; Figure 4 to assess functional enrichment and mutation composition; and Figure 5 to highlight recurrent mutations.

    Reviewer #2 (Public review):

    Weaknesses:

    The current cohort size is relatively small to reach significant findings, and targeted exploration on ZNF family without emphasizing the reason or clinical significance hinders the overall significance of the entire work.

    We acknowledge the limitation posed by the relatively small cohort size and have addressed this concern in several ways in the revised manuscript. First, we have explicitly stated this limitation in the Discussion section. We have also reframed the study as a pilot and population-specific exploratory analysis to better reflect its scope. To strengthen the overall significance, we integrated both mutation and gene expression data, incorporated comparisons with TCGA datasets, and emphasized the importance of African-specific genomic insights. Importantly, we highlight that this study provides novel data from an underrepresented population, which represents a key contribution to the field.

    Reviewer #3 (Public review):

    Weaknesses:

    The author has enhanced the descriptive depth of the study by adding details on mutations, expression subgroup analyses, and functional annotations but has not addressed the core weaknesses of small cohort size and lack of functional validation. While the revised version is more comprehensive in cataloging molecular alterations, it remains confined to descriptive analysis, with no substantial improvement in the reliability or generalizability of its conclusions.

    We have addressed this concern by clearly acknowledging the key limitations of the study, including the absence of functional validation, the relatively small sample size, and the limited generalizability of the findings. In response, we have refined our interpretation to avoid causal claims and instead present the results as hypothesis-generating. We have also expanded the Discussion to include future research directions, recommending functional validation studies, multi-omics approaches, and validation in larger, more diverse cohorts.

    In addition, we have strengthened the robustness of the study by incorporating comparisons with TCGA data, providing more detailed mutation classification, and integrating genomic and transcriptomic analyses. Beyond addressing reviewer comments, we have further improved the manuscript by reorganizing the Results section to follow a clear and logical flow—from mutation burden and spectrum to protein-level distribution, functional enrichment, recurrent mutations, and TCGA comparison. We have also improved figure quality and labeling to meet journal standards, added clear and consistent figure captions, and ensured alignment between the text, figures, and tables throughout the manuscript.

    We sincerely thank the reviewers for their valuable feedback, which has significantly improved the quality and rigor of this work.

  5. eLife Assessment

    This study presents a valuable finding on the mutational landscape and expression profile of ZNF molecules in 23 Kenyan women with breast cancer. The evidence supporting the claims of the authors is solid, although inclusion of a larger number of patient samples, more statistical details and sufficient comparison with existing large-scale datasets would have strengthened the study. The work will be of interest to medical biologists working in the field of breast cancer.

  6. Reviewer #1 (Public review):

    Summary:

    This manuscript investigates mutations and expression patterns of zinc finger proteins in Kenyan breast cancer patients. Whole-exome sequencing and RNA-seq were performed on 23 breast cancer samples alongside matched normal tissues.

    Strengths:

    Whole-exome sequencing and RNA-seq were performed on 23 breast cancer samples alongside matched normal tissues in Kenyan breast cancer patients. The authors identified mutations in ZNF217, ZNF703, and ZNF750.

    Weaknesses:

    (1) Research scope:

    The results primarily focus on mutations in ZNF217, ZNF703, and ZNF750, with limited correlation analyses between mutations and gene expression. The rationale for focusing only on these genes is unclear. Given the availability of large breast cancer cohorts such as TCGA and METABRIC, the authors should compare their mutation profiles with these datasets. Beyond European and U.S. cohorts, sequencing data from multiple countries, including a recent Nigerian breast cancer study (doi: 10.1038/s41467-021-27079-w), should also be considered. Since whole-exome sequencing was performed, it is unclear why only four genes were highlighted, and why comparisons to previous literature were not included.

    (2) Language and Style Issues

    There are many typos and clear errors in the main text (e.g. (ref)).

    Additionally, several statements read unnaturally. For example:

    "Investigators uncovered 170 mutations ..." should instead be phrased as "We identified 170 mutations ...."

    "The research team ..." should be rephrased as "Our team ...."

    (3) Methods and Data Analysis Details

    The methods section is vague, with general descriptions rather than specific details of data processing and analysis. The authors should provide:

    (a) Parameters used for trimming, mapping, and variant calling (rather than referencing another paper such as Tang et al. 2023).

    (b) Statistical methods for somatic mutation/SNP detection.

    (c) Details of RNA purification and RNA-seq library preparation.

    Without these details, the reproducibility of the study is limited.

    (4) Data Reporting

    This study has the potential to provide a valuable resource for the field. However, data-sharing plans are unclear. The authors should:

    a) Deposit sequencing data in a public repository.

    b) Provide supplementary tables listing all detected mutations and all differentially expressed genes (DEGs).

    c) Clarify whether raw or adjusted p-values were used for DEG analysis.

    d) Perform DEG analyses stratified by breast cancer subtypes, since differential expression was observed by HER2 status, and some zinc finger proteins are known to be enriched in luminal subtypes.

    (5) Mutation Analysis

    Visualizations of mutation distribution across protein domains would greatly strengthen interpretation. Comparing mutation distribution and frequency with published datasets would also contextualize the findings.

    Comments on revisions:

    The revised manuscript hasn't addressed any of these concerns. Careful proofreading is recommended, even if the authors do not intend to make further modifications to the manuscript.

  7. Reviewer #2 (Public review):

    Summary:

    This work integrated the mutational landscape and expression profile of ZNF molecules in 23 Kenyan women with breast cancer.

    Strengths:

    The mutation landscape of ZNF217, ZNF703, and ZNF750 were comprehensively studied and correlate with tumor stage and HER2 status to highlight the clinical significance.

    Weaknesses:

    The current cohort size is relatively small to reach significant findings, and targeted exploration on ZNF family without emphasizing the reason or clinical significance hinders the overall significance of the entire work.

  8. Reviewer #3 (Public review):

    Summary:

    This revised study analyzes the somatic mutational profiles and transcriptomic expression of three zinc-finger genes (ZNF217, ZNF703, ZNF750) in 23 Kenyan women with breast cancer, using whole-exome sequencing and RNA-sequencing of paired tumor-normal tissues. A total of 358 somatic mutations were detected, and all three genes were significantly upregulated in tumors compared to normal tissues (ZNF217 showing the most prominent difference). Higher expression was observed in HER2-positive tumors, though mutation burden for each gene did not correlate significantly with HER2 status or cancer stage. The findings provide preliminary evidence for the idenfication of diagnostic/prognostic biomarkers or therapeutic targets in sub-Saharan African populations.

    Strengths:

    The study's key strengths lie in its focus on an underrepresented Kenyan cohort, addressing a critical gap in sub-Saharan African breast cancer genomic research. It integrates DNA-level mutation analysis with RNA-level expression data, leveraging standardized bioinformatics pipelines (e.g., Mutect2 for variant calling, DESeq2 for differential expression) and rigorous quality control to deliver detailed insights into mutation types, functional impacts, and amino acid changes. Additionally, it explores gene expression patterns across different cancer stages and HER2 status subgroups, generating targeted hypotheses for future validation and enhancing the reliability of its findings.

    Weaknesses:

    The author has enhanced the descriptive depth of the study by adding details on mutations, expression subgroup analyses, and functional annotations but has not addressed the core weaknesses of small cohort size and lack of functional validation. While the revised version is more comprehensive in cataloging molecular alterations, it remains confined to descriptive analysis, with no substantial improvement in the reliability or generalizability of its conclusions.

  9. eLife Assessment

    This study presents a valuable finding on mutations in ZNF217, ZNF703, and ZNF750 through 23 breast cancer samples alongside matched normal tissues in Kenyan breast cancer patients. The evidence supporting the claims of the authors is solid, yet the analysis of the manuscript lacks methodological transparency, statistical detail, and sufficient comparison with existing large-scale datasets. The work will be of interest to medical biologists and scientists working in the field of breast cancer.

  10. Reviewer #1 (Public review):

    Summary:

    This manuscript investigates mutations and expression patterns of zinc finger proteins in Kenyan breast cancer patients.

    Strengths:

    Whole-exome sequencing and RNA-seq were performed on 23 breast cancer samples alongside matched normal tissues in Kenyan breast cancer patients. The authors identified mutations in ZNF217, ZNF703, and ZNF750.

    Weaknesses:

    (1) Research scope:

    The results primarily focus on mutations in ZNF217, ZNF703, and ZNF750, with limited correlation analyses between mutations and gene expression. The rationale for focusing only on these genes is unclear. Given the availability of large breast cancer cohorts such as TCGA and METABRIC, the authors should compare their mutation profiles with these datasets. Beyond European and U.S. cohorts, sequencing data from multiple countries, including a recent Nigerian breast cancer study (doi: 10.1038/s41467-021-27079-w), should also be considered. Since whole-exome sequencing was performed, it is unclear why only four genes were highlighted and why comparisons to previous literature were not included.

    (2) Language and Style Issues:

    Several statements read somewhat 'unnaturally', and I strongly recommend proofreading.

    (3) Methods and Data Analysis Details:

    The methods section is vague, with general descriptions rather than specific details of data processing and analysis. The authors should provide:

    (a) Parameters used for trimming, mapping, and variant calling (rather than referencing another paper such as Tang et al. 2023).

    (b) Statistical methods for somatic mutation/SNP detection.

    (c) Details of RNA purification and RNA-seq library preparation.

    Without these details, the reproducibility of the study is limited.

    (4) Data Reporting:

    This study has the potential to provide a valuable resource for the field. However, data-sharing plans are unclear. The authors should:

    (a) deposit sequencing data in a public repository.

    (b) provide supplementary tables listing all detected mutations and all differentially expressed genes (DEGs).

    (c) clarify whether raw or adjusted p-values were used for DEG analysis.

    (d) perform DEG analyses stratified by breast cancer subtypes, since differential expression was observed by HER2 status, and some zinc finger proteins are known to be enriched in luminal subtypes.

    (5) Mutation Analysis:

    Visualizations of mutation distribution across protein domains would greatly strengthen interpretation. Comparing mutation distribution and frequency with published datasets would also contextualize the findings.

  11. Reviewer #2 (Public review):

    Summary:

    This work integrated the mutational landscape and expression profile of ZNF molecules in 23 Kenyan women with breast cancer.

    Strengths:

    The mutation landscape of ZNF217, ZNF703, and ZNF750 was comprehensively studied and correlated with tumor stage and HER2 status to highlight the clinical significance.

    Weaknesses:

    The current study design is relatively simple, and there is a limited cohort size, which is relatively small to reach significant findings. Thus, sample size enrichment, along with more analytic work, is needed.

    Targeted exploration of the ZNF family without emphasizing the reason or clinical significance hinders the overall significance of the entire work.

  12. Reviewer #3 (Public review):

    Summary:

    The authors aimed to define the somatic mutational landscape and transcriptomic expression of the ZNF217, ZNF703, and ZNF750 genes in breast cancers from Kenyan women and to investigate associations with clinicopathological features like HER2 status and cancer stage. They employed whole-exome and RNA-sequencing on 23 paired tumor-normal samples to achieve this.

    Strengths:

    (1) A major strength is the focus on a Kenyan cohort, addressing a critical gap in genomic studies of breast cancer, which are predominantly based on European or Asian populations.

    (2) The integration of DNA- and RNA-level data from the same patients provides a comprehensive view, linking genetic alterations to expression changes.

    Weaknesses:

    (1) The small cohort size (n=23) significantly limits the statistical power to detect associations between genetic features and clinical subgroups (e.g., HER2 status, stage), rendering the negative findings inconclusive.

    (2) The study is primarily descriptive. While it effectively catalogs mutations and expression changes, it does not include functional experiments to validate the biological impact of the identified alterations.