Efficient Document Image Dewarping via Hybrid Deep Learning and Cubic Polynomial Geometry Restoration
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Camera-captured document images often suffer from geometric distortions caused by paper deformation, perspective distortion, and lens aberrations, significantly reducing OCR accuracy. This study develops an efficient automated method for document image dewarping that balances accuracy with computational efficiency.We propose a hybrid approach combining deep learning for document detection with classical computer vision for geometry restoration. YOLOv8 performs initial document segmentation and mask generation. Subsequently, classical CV techniques construct a topological 2D grid through cubic polynomial interpolation of document boundaries, followed by image remapping to correct nonlinear distortions. A new annotated dataset and open-source framework are provided to facilitate reproducibility and further research.Experimental evaluation against state-of-the-art methods (RectiNet, DocGeoNet, DocTr++) and mobile applications (DocScan, CamScanner, TapScanner) demonstrates superior performance. Our method achieves the lowest median Character Error Rate (CER=0.0235), Levenshtein Distance (LD=27.8), and highest Jaro--Winkler similarity (JW=0.902), approaching the quality of scanned originals. The approach requires significantly fewer computational resources and memory compared to pure deep learning solutions while delivering better OCR readability and geometry restoration quality.The proposed hybrid methodology effectively restores document geometry with computational efficiency superior to existing deep learning approaches, making it suitable for resource-constrained applications while maintaining high-quality document digitization.Project page: https://github.com/HorizonParadox/DRCCBI