融合小波变换的露天矿无人车视觉多任务感知方法

Visual multi-task perception method for unmanned vehicles in open pit mines incorporating wavelet transforms

  • 摘要: 高效的环境感知是实现露天矿无人驾驶的关键一环,同时获取环境中障碍物位置、距离和可行驶区域等多种特征,已经成为无人矿卡感知露天矿复杂恶劣环境的急需任务。以往的单任务检测或分割方法已经取得了长足进步,然而这些研究未能实现有机结合,顺序执行多个单一任务受到计算能力限制,难以满足无人驾驶环境感知需求。因此提出一种融合小波变换的露天矿无人车视觉多任务感知方法,集成了障碍物实例分割、可行驶区域识别和深度预测任务,具备独立的环境感知能力,能够为无人车环境感知系统提供高效与高鲁棒性的环境感知支持。首先为满足不同任务的不同特征提取需求,结合了RepNCSPELAN4和ADown模块,实现模型内部高效梯度路径规划和细节信息保留,从而在保证模型轻量化的同时,提升特征提取的准确性。其次设计了融合小波变换的CWT模块,利用小波变换扩大感受野提升特征低频响应,提升分割与深度预测任务的精度。最后针对多任务模型收敛困难问题,使用基于梯度损失的Gradnorm方法,自适应平衡多个任务之间的损失。实验结果表明:所提模型在不同任务中均取得了良好的效果,障碍物检测任务精度达到了0.872,可行驶区域分割mIOU达到了0.891,深度预测任务A1精度达到了0.844。实车环境的测试结果表明:所提模型相较于顺序执行多种任务,在精度相近的情况下,减少了47.8%的推理耗时与39.7%的内存占用,对复杂恶劣环境下露天矿视觉环境感知提供了一种高效的解决方案。

     

    Abstract: Efficient environmental perception is a crucial aspect of enabling autonomous driving in open-pit mines. Simultaneously acquiring various features such as the location and distance of obstacles and drivable areas in the environment has become an urgent task for autonomous mining trucks to perceive the complex and harsh environments of open-pit mines. Previous single-task detection or segmentation methods have made significant progress, yet these studies cannot be organically integrated. Sequentially executing multiple single tasks is limited by computing power and fails to meet the environmental perception demands of autonomous driving. Therefore, a visual multi-task perception method for autonomous vehicles in open-pit mines integrating wavelet transform is proposed, encompassing obstacle instance segmentation, drivable area recognition, and depth prediction tasks. This method possesses independent environmental perception capabilities and can provide efficient and highly robust environmental perception for the environmental perception system of autonomous vehicles. Firstly, to meet the different feature extraction requirements of various tasks, the RepNCSPELAN4 and ADown modules are combined to achieve efficient gradient path planning and detailed information preservation within the model. This enhances the accuracy of feature extraction while ensuring model lightweightness. Secondly, a CWT module integrating wavelet transform is designed, which utilizes wavelet transform to expand the receptive field and improve the low-frequency response of features, thereby enhancing the accuracy of segmentation and depth prediction tasks. Finally, to address the difficulty of multi-task model convergence, the Gradnorm method based on gradient loss is used to adaptively balance the losses among multiple tasks. Experimental results demonstrate that the proposed model achieves good performance in different tasks, with an obstacle detection accuracy of 0.872, a mean Intersection over Union (mIOU) of 0.891 for drivable area segmentation, and an A1 accuracy of 0.844 for depth prediction tasks. Real-vehicle environment test results show that, compared to sequentially executing multiple tasks, the proposed model reduces inference time by 47.8% and memory usage by 39.7% while maintaining similar accuracy, providing an efficient solution for visual environmental perception in complex and harsh open-pit mine environments.

     

/

返回文章
返回