Optimised Tensor Contractions and Vectorised Execution for Efficient Finite Element Solvers in Computational Solid Mechanics: A Benchmark Study
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Tensor operations recur throughout computational solid mechanics, yet the optimisation strategies that have transformed similar computations in deep learning have not been examined systematically at the level of representative mechanics kernels. This work addresses that gap through a controlled benchmark study of three critical operations: anisotropic fourthorder elasticity tensor rotation, algorithmic tangent evaluation for J2 plasticity, and high-order matrix-free operator application on hexahedral elements. Through CPU execution and the use of open-source Python tools (NumPy, opt_einsum, and JAX), the study shows that runtime can be reduced by more than two orders of magnitude without altering the underlying mechanics, with speedups of 53:1 for anisotropic tensor rotation, 209:5× for nonlinear constitutive tangent evaluation, 19:6× for fused dense high-order operator application, and 8:4× for sum-factorisation. The results further show that different bottlenecks require different remedies. Particularly, the contractionpath optimisation is effective only when combined with batched compiled execution, branchless vectorisation is especially powerful for constitutive updates with pointwise logic, and tensor-product reformulation becomes important for high-order operators. Beyond the individual benchmarks, the work provides a transferable kernel-level blueprint for building faster finite element workflows, showing how substantial gains can be achieved through computational restructuring alone, before invoking specialised hardware.