基于注意力机制的无监督矿井人员跟踪
Unsupervised mine personnel tracking based on attention mechanism
-
摘要: 目标跟踪是一项有挑战性的计算机视觉任务,在智能交通、人机交互、视频监控等领域有重要作用。目前已经出现诸多性能优越的跟踪算法,但是在煤矿场景下实现良好的跟踪效果,依旧存在较大困难,主要面临遮挡严重、背景干扰较多、井下人员较多、数据集样本数量少、缺乏统一标注等挑战,严重影响目标跟踪的效果。针对煤矿场景下,矿井视频数据集不完善、图像质量差以及缺乏统一标注等问题,设计了一种无监督的方法训练目标跟踪模型,将相关滤波和孪生网络相结合,融合二者在目标跟踪任务的优势,构建轻量级端到端的目标跟踪网络模型,采用目标前向跟踪、多帧后向验证方法完成无监督模型的目标跟踪过程。模型的主干网络使用轻量级AlexNet神经网络,解决了煤矿环境下移动平台存储和计算资源有限的问题。根据矿井环境下存在遮挡严重、背景干扰较多、密集目标排列紧密复杂等问题,提出了使用注意力机制提取视频图像中目标重要性信息的方法。在模型的主干网络结构中添加通道注意力机制和空间注意力机制,将重点关注的目标从诸多背景信息中提取出来,通过处理重要信息进而更好地完成跟踪当前目标的任务。将改进后基于注意力机制的无监督矿井人员跟踪模型与ECO,Staple, DSST,SiamFc, SiamRPN模型的平均覆盖率和平均中心位置误差进行对比,发现所提出的目标跟踪模型适用于煤矿复杂环境的人员跟踪问题,具有较好的目标跟踪效果。Abstract: Object tracking is a challenging computer vision task, which plays an important role in the fields of intelligent transportation, human-computer interaction, and video surveillance.There have been many tracking algorithms with superior performance, but it is still difficult to achieve good tracking results in coal mine scenes.It mainly faces challenges such as severe occlusion, more background interference, and more people underground, which seriously affects the performance of tracking.Secondly, there are challenges such as the small number of samples in the data set and the lack of uniform labeling.This paper designs an unsupervised method to train the target tracking model in the coal mine scene, where the mine video data set is imperfect, the image quality is low, and the lack of unified annotations.The combination of discriminative correlation filtering and twin networks takes the advantages of correlation filtering and Siamese networks, and constructs a lightweight end-to-end target tracking network model.Specifically, the forward tracking of the target and the backward verification of the target position information of the previous frame are adopted to realize the target tracking process of the unsupervised model.According to the problems of serious occlusion, more background interference, and close arrangement of dense targets in the mine environment, this paper proposes a method of using the attention mechanism to extract important information in video images.The channel attention mechanism and spatial attention mechanism are added to the backbone network structure of the model, which extracts the focus of important attention.The backbone network of the model uses lightweight AlexNet network to solve the problems of limited storage and computing resources of mobile platforms in the coal mine environment.Experimental results show that the tracking model proposed in this paper is suitable for the personnel tracking in the complex environment of coal mines, and has a good detection effect.