Development and evaluation of a live birth prediction model for evaluating human blastocysts from a retrospective study

Hang Liu
Zhuoran Zhang
Yifan Gu
Changsheng Dai
Guanqiao Shan
Haocong Song
Daniel Li
Wenyuan Chen
Ge Lin
Yu Sun

Curated by eLife

eLife assessment

This manuscript provides important findings that have practical implications for reproductive medicine and would be of interest to IVF specialists. Based on the compelling strength of evidence, the authors present significant results on improving the predictive value of the live birth model based on blastocyst evaluation and clinical features. However, some methodological information should be added to improve the reproducibility of the study results.

This article has been Reviewed by the following groups

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

Evaluated articles (eLife)

Abstract

In infertility treatment, blastocyst morphological grading is commonly used in clinical practice for blastocyst evaluation and selection, but has shown limited predictive power on live birth outcomes of blastocysts. To improve live birth prediction, a number of artificial intelligence (AI) models have been established. Most existing AI models for blastocyst evaluation only used images for live birth prediction, and the area under the receiver operating characteristic (ROC) curve (AUC) achieved by these models has plateaued at ~0.65.

Methods:

This study proposed a multimodal blastocyst evaluation method using both blastocyst images and patient couple’s clinical features (e.g., maternal age, hormone profiles, endometrium thickness, and semen quality) to predict live birth outcomes of human blastocysts. To utilize the multimodal data, we developed a new AI model consisting of a convolutional neural network (CNN) to process blastocyst images and a multilayer perceptron to process patient couple’s clinical features. The data set used in this study consists of 17,580 blastocysts with known live birth outcomes, blastocyst images, and patient couple’s clinical features.

Results:

This study achieved an AUC of 0.77 for live birth prediction, which significantly outperforms related works in the literature. Sixteen out of 103 clinical features were identified to be predictors of live birth outcomes and helped improve live birth prediction. Among these features, maternal age, the day of blastocyst transfer, antral follicle count, retrieved oocyte number, and endometrium thickness measured before transfer are the top five features contributing to live birth prediction. Heatmaps showed that the CNN in the AI model mainly focuses on image regions of inner cell mass and trophectoderm (TE) for live birth prediction, and the contribution of TE-related features was greater in the CNN trained with the inclusion of patient couple's clinical features compared with the CNN trained with blastocyst images alone.

Conclusions:

The results suggest that the inclusion of patient couple’s clinical features along with blastocyst images increases live birth prediction accuracy.

Funding:

Natural Sciences and Engineering Research Council of Canada and the Canada Research Chairs Program.

Version published to 10.7554/elife.83662 on eLife
Feb 22, 2023
eLife
Jan 12, 2023

eLife assessment

This manuscript provides important findings that have practical implications for reproductive medicine and would be of interest to IVF specialists. Based on the compelling strength of evidence, the authors present significant results on improving the predictive value of the live birth model based on blastocyst evaluation and clinical features. However, some methodological information should be added to improve the reproducibility of the study results.

Read the original source
eLife
Jan 12, 2023

Reviewer #1 (Public Review):

This work provides a new multimodal blastocyst evaluation method utilising both blastocyst images and patient couple's clinical features (e.g., maternal age, hormone profiles, endometrium thickness, and semen quality) to predict live birth outcomes.
The manuscript was reviewed using the checklist from the "Transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD): The TRIPOD statement" (https://www.equator-network.org/reporting-guidelines/tripod-statement/ ). Generally, the authors have achieved their aims, and the results support their conclusions.

The major study strengths are as follows:

The study dataset consists of a huge amount (17,580) of blastocysts with known live birth outcomes, as well as blastocyst images, and data included the clinical features of …

Reviewer #1 (Public Review):

This work provides a new multimodal blastocyst evaluation method utilising both blastocyst images and patient couple's clinical features (e.g., maternal age, hormone profiles, endometrium thickness, and semen quality) to predict live birth outcomes.
The manuscript was reviewed using the checklist from the "Transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD): The TRIPOD statement" (https://www.equator-network.org/reporting-guidelines/tripod-statement/ ). Generally, the authors have achieved their aims, and the results support their conclusions.

The major study strengths are as follows:

The study dataset consists of a huge amount (17,580) of blastocysts with known live birth outcomes, as well as blastocyst images, and data included the clinical features of couples.
The authors developed a new artificial intelligence model consisting of a convolutional neural network to process blastocyst images and a multilayer perceptron to process patient couple's clinical features. This model demonstrated an AUC of 0.77 for live birth prediction, which is significantly higher than that achieved by the previously developed models. The conclusions of this paper are mainly well supported by the data.

Nevertheless, there are some weaknesses:

Regarding testosterone, the method of testosterone assessment is essential. The statistical significance of testosterone as a predictor could change when calculated free T or bioavailable testosterone is used.

According to the data presented in Supplementary Table 1, there are more than 15 statistically significant predictors of live birth. However, the value of predictive significance is presented only for 15 (Fig. 3).

Read the original source
eLife
Jan 12, 2023

Reviewer #2 (Public Review):

In this article, a multi-modal strategy for live birth prediction is proposed using blastocyst images and clinical features. The CNN architecture is used for the imaging dataset, while an MLP is built for the clinical features, and the final model is developed by concatenating CNN and MLP features. 17,580 samples are used for training and testing the model. The proposed model performed significantly better than the previous ones, with an AUC of 0.77.

By creating activation maps in both scenarios: I) when imaging and clinical features were used, and II) when only imaging data was used, authors highlight the parts of images that are crucial for predictions. Their results confirm the benefits of utilizing multi-modal datasets.

However, the manuscript is currently lacking crucial methodological information that …

Reviewer #2 (Public Review):

In this article, a multi-modal strategy for live birth prediction is proposed using blastocyst images and clinical features. The CNN architecture is used for the imaging dataset, while an MLP is built for the clinical features, and the final model is developed by concatenating CNN and MLP features. 17,580 samples are used for training and testing the model. The proposed model performed significantly better than the previous ones, with an AUC of 0.77.

By creating activation maps in both scenarios: I) when imaging and clinical features were used, and II) when only imaging data was used, authors highlight the parts of images that are crucial for predictions. Their results confirm the benefits of utilizing multi-modal datasets.

However, the manuscript is currently lacking crucial methodological information that is necessary to judge the validity of various claims.
Furthermore, it lacks discussion of the potential applications of the proposed model in clinical settings.

Read the original source
Version published to 10.1101/2022.10.20.22281296 on medRxiv
Oct 21, 2022

Research on an Optimized Deep Learning-Based Classification Model for Ovarian Cyst Ultrasound Images

This article has 5 authors:
1. 维梅李
2. 玉华夏
3. 咏曌李
4. 星伍
5. 琳师
This article has no evaluationsLatest version Feb 17, 2026
Non-Obstructive Azoospermia Prediction via Deep Learning-Driven Testicular Ultrasound Image Analysis: A Clinical Validation Study

This article has 8 authors:
1. Aqian Chen
2. Yutong Zhang
3. Guocheng Lu
4. Lirong Wang
5. Runa Liang
6. Qi Zhou
7. Jue Jiang
8. Juan Wang
This article has no evaluationsLatest version Feb 9, 2026
Predicting Early Intussusception Recurrence: A Multicenter Study of an Abdominal Ultrasound-Based Radiomics-Deep Learning Model

This article has 11 authors:
1. Rongying Tan
2. Jinwei Zhou
3. Shaomei Wang
4. Xi Zhang
5. Yanlin Mou
6. Djibril Adam Mahamat
7. Qinming Chen
8. Daiyue Yu
9. Yi Lu
10. Jianjun Wang
11. Kai Wu
This article has no evaluationsLatest version Jan 30, 2026

This article has been Reviewed by the following groups

Discuss this preprint

Listed in

Abstract

Methods:

Results:

Conclusions:

Funding:

Article activity feed

Related articles

Research on an Optimized Deep Learning-Based Classification Model for Ovarian Cyst Ultrasound Images

Non-Obstructive Azoospermia Prediction via Deep Learning-Driven Testicular Ultrasound Image Analysis: A Clinical Validation Study

Predicting Early Intussusception Recurrence: A Multicenter Study of an Abdominal Ultrasound-Based Radiomics-Deep Learning Model