Abstract:
Due to the complex spatial environment and uneven artificial lighting underground, images captured by underground visual equipment often suffer from insufficient overall or partial lighting and poor visibility of image content. Existing image enhancement methods for low-light underground images often result in poor contrast and issues with overexposure and underexposure in certain areas. Within this article, we propose a self-supervised image enhancement method for low-light underground conditions based on structural and texture perception, aiming to alleviate the dependence on paired low-light/normal-light images during training. Firstly, to generate smoothly segmented illumination maps, we design a self-supervised structural and texture-aware illumination estimation network, which preserves scene edge structures and smooths texture details based on self-supervised training losses. To further exploit local texture features and global structural features in low-light images to improve the performance of the illumination estimation network, we introduce a local-global perception module into the illumination estimation network. This module leverages the ability of convolutional operations with small receptive fields to capture local features and the self-attention mechanism of visual transformers to facilitate global information interaction, thus extracting local and global features from low-light images. Secondly, to facilitate self-supervised learning, we adopt a structure-aware smoothness loss considering the segmented smoothness property of illumination maps. Finally, to refine the illumination maps generated by the illumination estimation network for reasonable brightness and contrast, we introduce a pseudo-label image generator to synthesize pseudo-label images with good contrast and brightness. By constraining the consistency between brightness and contrast of the enhanced images and pseudo-label images through reconstruction loss, we indirectly constrain the illumination maps. Experimental results on multiple public benchmark datasets and a dataset of low-light images in real underground scenes demonstrate the effectiveness of our method in enhancing low-light images, as well as its good generalization performance when faced with low-light images in underground scenarios.