Development and evaluation of a live birth prediction model for evaluating human blastocysts from a retrospective study

Curation statements for this article:
  • Curated by eLife

    eLife logo

    eLife assessment

    This manuscript provides important findings that have practical implications for reproductive medicine and would be of interest to IVF specialists. Based on the compelling strength of evidence, the authors present significant results on improving the predictive value of the live birth model based on blastocyst evaluation and clinical features. However, some methodological information should be added to improve the reproducibility of the study results.

This article has been Reviewed by the following groups

Read the full article See related articles

Abstract

In infertility treatment, blastocyst morphological grading is commonly used in clinical practice for blastocyst evaluation and selection, but has shown limited predictive power on live birth outcomes of blastocysts. To improve live birth prediction, a number of artificial intelligence (AI) models have been established. Most existing AI models for blastocyst evaluation only used images for live birth prediction, and the area under the receiver operating characteristic (ROC) curve (AUC) achieved by these models has plateaued at ~0.65.

Methods:

This study proposed a multimodal blastocyst evaluation method using both blastocyst images and patient couple’s clinical features (e.g., maternal age, hormone profiles, endometrium thickness, and semen quality) to predict live birth outcomes of human blastocysts. To utilize the multimodal data, we developed a new AI model consisting of a convolutional neural network (CNN) to process blastocyst images and a multilayer perceptron to process patient couple’s clinical features. The data set used in this study consists of 17,580 blastocysts with known live birth outcomes, blastocyst images, and patient couple’s clinical features.

Results:

This study achieved an AUC of 0.77 for live birth prediction, which significantly outperforms related works in the literature. Sixteen out of 103 clinical features were identified to be predictors of live birth outcomes and helped improve live birth prediction. Among these features, maternal age, the day of blastocyst transfer, antral follicle count, retrieved oocyte number, and endometrium thickness measured before transfer are the top five features contributing to live birth prediction. Heatmaps showed that the CNN in the AI model mainly focuses on image regions of inner cell mass and trophectoderm (TE) for live birth prediction, and the contribution of TE-related features was greater in the CNN trained with the inclusion of patient couple's clinical features compared with the CNN trained with blastocyst images alone.

Conclusions:

The results suggest that the inclusion of patient couple’s clinical features along with blastocyst images increases live birth prediction accuracy.

Funding:

Natural Sciences and Engineering Research Council of Canada and the Canada Research Chairs Program.

Article activity feed

  1. eLife assessment

    This manuscript provides important findings that have practical implications for reproductive medicine and would be of interest to IVF specialists. Based on the compelling strength of evidence, the authors present significant results on improving the predictive value of the live birth model based on blastocyst evaluation and clinical features. However, some methodological information should be added to improve the reproducibility of the study results.

  2. Reviewer #1 (Public Review):

    This work provides a new multimodal blastocyst evaluation method utilising both blastocyst images and patient couple's clinical features (e.g., maternal age, hormone profiles, endometrium thickness, and semen quality) to predict live birth outcomes.
    The manuscript was reviewed using the checklist from the "Transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD): The TRIPOD statement" (https://www.equator-network.org/reporting-guidelines/tripod-statement/ ). Generally, the authors have achieved their aims, and the results support their conclusions.

    The major study strengths are as follows:

    The study dataset consists of a huge amount (17,580) of blastocysts with known live birth outcomes, as well as blastocyst images, and data included the clinical features of couples.
    The authors developed a new artificial intelligence model consisting of a convolutional neural network to process blastocyst images and a multilayer perceptron to process patient couple's clinical features. This model demonstrated an AUC of 0.77 for live birth prediction, which is significantly higher than that achieved by the previously developed models. The conclusions of this paper are mainly well supported by the data.

    Nevertheless, there are some weaknesses:

    Regarding testosterone, the method of testosterone assessment is essential. The statistical significance of testosterone as a predictor could change when calculated free T or bioavailable testosterone is used.

    According to the data presented in Supplementary Table 1, there are more than 15 statistically significant predictors of live birth. However, the value of predictive significance is presented only for 15 (Fig. 3).

  3. Reviewer #2 (Public Review):

    In this article, a multi-modal strategy for live birth prediction is proposed using blastocyst images and clinical features. The CNN architecture is used for the imaging dataset, while an MLP is built for the clinical features, and the final model is developed by concatenating CNN and MLP features. 17,580 samples are used for training and testing the model. The proposed model performed significantly better than the previous ones, with an AUC of 0.77.

    By creating activation maps in both scenarios: I) when imaging and clinical features were used, and II) when only imaging data was used, authors highlight the parts of images that are crucial for predictions. Their results confirm the benefits of utilizing multi-modal datasets.

    However, the manuscript is currently lacking crucial methodological information that is necessary to judge the validity of various claims.
    Furthermore, it lacks discussion of the potential applications of the proposed model in clinical settings.