A Globally Optimal Alternative to MLP

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

In deep learning, achieving the global minimum poses a significant challenge, even for relatively simple architectures such as Multi-Layer Perceptrons (MLPs). To address this challenge, we visualized model states at both local and global optima, thereby identifying the factors that impede the transition of models from local to global minima when employing conventional model training methodologies. Based on these insights, we propose the Lagrange Regressor (LReg), a framework that is mathematically equivalent to MLPs. Rather than updates via optimization techniques, LReg employs a mesh-refinement-coarsening (discrete) process to ensure the convergence of the model’s loss function to the global minimum. LReg achieves faster convergence and overcomes the inherent limitations of neural networks in fitting multi-frequency functions. Experiments conducted on large-scale benchmarks including ImageNet-1K (image classification), GLUE (natural language understanding), and WikiText (language modeling) show that LReg consistently enhances the performance of pre-trained models, significantly lowers test loss, and scales effectively to big data scenarios. These results underscore LReg’s potential as a scalable, optimization-free alternative for deep learning in large and complex datasets, aligning closely with the goals of innovative big data analytics.

Article activity feed