What Makes Neural Networks Trainable? Invexity as a Structural Design Principle in AI

Samuel Pinilla
Ana Sanabria
Jia Bi
Karen Egiazarian

Read the full article

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Despite their non-convex loss landscapes and vast parameter spaces, deep neural networks consistently achieve high performance across domains –from medical diagnostics to natural language processing and computer vision. However, the theoretical basis for their trainability remains unclear. Classical frameworks, such as convex optimization or probabilistic models (e.g., Bayesian optimization), offer only partial explanations and rely on restrictive assumptions that limit architectural expressiveness –such as shallow architectures, non-negative weights, or convex activations. This gap underscores a fundamental question: What Makes Neural Networks Trainable?. Here, we introduce a general framework based on invexity, a property that guarantees all critical points are global minima. We make four key contributions: (i) We demonstrate, for the first time, that the vast majority of commonly used activation functions –over 90% of fifty analyzed– are inherently invex. This reveals that modern architectures are already aligned with this property, even if unintentionally. ii) We show that deep Multilayer Perceptron models can be systematically constructed as invex structures, challenging the adequacy of existing convex and probabilistic optimization paradigms. iii) We prove that widely adopted architectures, such as ResNet, UNet, and Vision Transformer, satisfy invexity, providing a theoretical explanation for their empirical trainability, even at extreme depths, as is the case with ResNet. This provides a guarantee of accessibility to global optima for these well-known networks via standard gradient-based schemes. iv) By reframing trainability as a structural property rather than an empirical coincidence, our results provide a new foundation for understanding and designing neural networks.

Version published to 10.21203/rs.3.rs-7215670/v1 on Research Square
Aug 4, 2025

Interpretable Deep Prototype-Based Neural Networks: Can a 1 look like a 0?

This article has 3 authors:
1. Esteban García-Cuesta
2. Daniel Manrique
3. Radu Constantin Ionescu
This article has no evaluationsLatest version Jul 29, 2025
Understanding and Designing Deep Neural Networks Through Theory-Guided Training

This article has 3 authors:
1. Karthika Nasir
2. Aradhana Reva
3. Jai Sekhar
This article has no evaluationsLatest version Jul 7, 2025
Talking to Blackbox: Explainability Through P+NP=1

This article has 1 author:
1. Rogério Figurelli
This article has no evaluationsLatest version Jul 22, 2025

Listed in

Abstract

Article activity feed

Related articles

Interpretable Deep Prototype-Based Neural Networks: Can a 1 look like a 0?

Understanding and Designing Deep Neural Networks Through Theory-Guided Training

Talking to Blackbox: Explainability Through P+NP=1