What Makes Neural Networks Trainable? Invexity as a Structural Design Principle in AI

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Despite their non-convex loss landscapes and vast parameter spaces, deep neural networks consistently achieve high performance across domains –from medical diagnostics to natural language processing and computer vision. However, the theoretical basis for their trainability remains unclear. Classical frameworks, such as convex optimization or probabilistic models (e.g., Bayesian optimization), offer only partial explanations and rely on restrictive assumptions that limit architectural expressiveness –such as shallow architectures, non-negative weights, or convex activations. This gap underscores a fundamental question: What Makes Neural Networks Trainable?. Here, we introduce a general framework based on invexity, a property that guarantees all critical points are global minima. We make four key contributions: (i) We demonstrate, for the first time, that the vast majority of commonly used activation functions –over 90% of fifty analyzed– are inherently invex. This reveals that modern architectures are already aligned with this property, even if unintentionally. ii) We show that deep Multilayer Perceptron models can be systematically constructed as invex structures, challenging the adequacy of existing convex and probabilistic optimization paradigms. iii) We prove that widely adopted architectures, such as ResNet, UNet, and Vision Transformer, satisfy invexity, providing a theoretical explanation for their empirical trainability, even at extreme depths, as is the case with ResNet. This provides a guarantee of accessibility to global optima for these well-known networks via standard gradient-based schemes. iv) By reframing trainability as a structural property rather than an empirical coincidence, our results provide a new foundation for understanding and designing neural networks.

Article activity feed