Multimodal Conflict-Aware and Generative-Enhanced AI for Early Startup Survival and Risk Prediction
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Early-stage startups are central to innovation-driven economies, yet their failure rates remain persistently high, with more than half of new ventures not surviving their first three to five years. Accurately assessing the risk of young ventures is challenging because relevant signals are dispersed across heterogeneous sources, including entrepreneurial narratives, early financial indicators, and founders' positions in entrepreneurial networks. Most existing models either focus on a single modality or treat multimodal features as independent inputs, overlooking the informative role of cross-modal inconsistencies and struggling with small, imbalanced datasets. This paper proposes a conflict-aware, generative-enhanced multimodal framework for early-stage startup risk assessment. The model jointly encodes business plan and interview text, structured financial features, and founder-centric social network information via transformer, multilayer perceptron, and heterogeneous graph neural network encoders. Cross-modal contrastive learning aligns the three modalities into a shared representation space, while a modal conflict attention module explicitly quantifies inconsistencies between textual claims, financial realities, and network signals as additional risk features. To mitigate data scarcity and class imbalance, we further introduce a conditional GAN operating in the fused latent space to generate label- and conflict-conditioned synthetic representations that expose the classifier to both typical and strategically conflictual patterns. Experiments on a real-world dataset of 4,500 early-stage startups show that the proposed framework outperforms strong structured, text-based, graph-based, and multimodal baselines. Compared to the best multimodal fusion baseline, our model improves AUROC from 0.82 to 0.87 and AUPRC from 0.53 to 0.62, with consistent gains in macro F$_1$ and balanced accuracy. Ablation studies confirm the incremental contributions of contrastive alignment, modal conflict attention, and generative augmentation, while analyses of conflict features and case studies illustrate how the framework offers interpretable, conflict-aware decision support for investors and policymakers.