Abstract:
Efficient environmental perception is a crucial aspect of enabling autonomous driving in open-pit mines. Simultaneously acquiring various features such as the location and distance of obstacles and drivable areas in the environment has become an urgent task for autonomous mining trucks to perceive the complex and harsh environments of open-pit mines. Previous single-task detection or segmentation methods have made significant progress, yet these studies cannot be organically integrated. Sequentially executing multiple single tasks is limited by computing power and fails to meet the environmental perception demands of autonomous driving. Therefore, a visual multi-task perception method for autonomous vehicles in open-pit mines integrating wavelet transform is proposed, encompassing obstacle instance segmentation, drivable area recognition, and depth prediction tasks. This method possesses independent environmental perception capabilities and can provide efficient and highly robust environmental perception for the environmental perception system of autonomous vehicles. Firstly, to meet the different feature extraction requirements of various tasks, the RepNCSPELAN4 and ADown modules are combined to achieve efficient gradient path planning and detailed information preservation within the model. This enhances the accuracy of feature extraction while ensuring model lightweightness. Secondly, a CWT module integrating wavelet transform is designed, which utilizes wavelet transform to expand the receptive field and improve the low-frequency response of features, thereby enhancing the accuracy of segmentation and depth prediction tasks. Finally, to address the difficulty of multi-task model convergence, the Gradnorm method based on gradient loss is used to adaptively balance the losses among multiple tasks. Experimental results demonstrate that the proposed model achieves good performance in different tasks, with an obstacle detection accuracy of 0.872, a mean Intersection over Union (mIOU) of 0.891 for drivable area segmentation, and an A1 accuracy of 0.844 for depth prediction tasks. Real-vehicle environment test results show that, compared to sequentially executing multiple tasks, the proposed model reduces inference time by 47.8% and memory usage by 39.7% while maintaining similar accuracy, providing an efficient solution for visual environmental perception in complex and harsh open-pit mine environments.