Real-world Adaptation for enhancedphoto-realistic and semantic Style Transfer in indoor Panoramas
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
We present a novel geometry-aware, shading-independent, photo-realistic, andsemantic style transfer method for indoor panoramic scenes—for practical, real-world use. Unlike previous methods requiring separate inputs, our approach employs a multitask dense prediction architecture to infer multiple pixel-wise sig-nals from a single panoramic image. This comprehensive approach automaticallyderives essential signals (depth, semantic layers, shading, and reflectance) from a single 360-degree panoramic indoor photo, significantly enhancing usability for real-world scenarios. Our method extends the capabilities of semantic-aware gen-erative adversarial architectures by introducing two innovative strategies that address the geometric characteristics of indoor scenes and improve overall performance. We incorporate robust geometry losses that utilize layout and depthinference during training to ensure shape consistency between the generatedscenes and the ground truth. Secondly, we employ an hybrid end-to-end edge-driven scheme based on convolutional neural network for performing Intrinsic Image Decomposition (IID) in a way to extract the albedo and normalized shading signals in form of obscurance and highlights from the original scenes. We perform the style transfer on the albedo rather than on full RGB images, effectively preventing shading-related bleeding issues. Additionally, we apply super-resolution to the resulting scenes to enhance image quality and capturefine details. We tested this extended model on both real-world and syntheticdata. Experimental results demonstrate that our proposed enhanced architecture outperforms state-of-the-art style transfer models in terms of perceptualand accuracy metrics, achieving an 18.91% lower ArtFID (Art-Fr´echet InceptionDistance), a 13.99% higher PSNR (Peak Signal-to-Noise Ratio), and an 8.99%higher SSIM (Structural Similarity). The visual results show that our method iseffective in producing realistic and visually pleasing indoor scenes for a varietyof applications in the Architecture, Engineering and Construction field.