基于多重信息自注意力的综采工作面目标行为识别

杨艺; 杨艳磊; 王田; 王科平

doi:10.13225/j.cnki.jccs.2024.0056

摘要: 综采工作面关键设备和人员的行为识别是开采环境信息智能感知的基础和核心。然而，综采工作面光照条件普遍较差，煤尘和水雾等干扰容易引起视频画面模糊，导致识别目标行为的关键特征难以提取，使得设备和人员的行为识别准确度无法达到实际工程应用的标准。为此，基于ResT网络架构，建立一种包含空间、时间、通道的多重信息自注意力模型和特征融合机制，扩展了模型特征提取的信息源，将其从单纯的空间信息扩展到空间、时间和通道的多重信息，提升了模型对目标行为的表征能力。其中，空间信息是对目标行为在空间上的深度解析，展现了目标的纹理、位置和形状等一系列深层特征；时间信息是从连续的视频帧中提取目标行为的时序特征，反映了行为发生的顺序以及演变关系；通道信息则是对空间和时间层面上的扩展与深入，从多角度挖掘空间和时间信息，并将原始数据表征在特征通道上，提供了目标行为的全局特征。算法的有效性在综采工作面行为识别数据集上进行了验证和对比试验。结果表明：在真实综采工作面环境下，行为识别的准确度可达到96.90%。相较于Swin-Transformer、Timesformer等主流的行为识别算法，识别准确率分别提升了11.06%和10.62%。算法经过ONNX模型转换和TensorRT加速后，在GPU上实现了推理，具备工程应用价值。据此，研发了综采工作面行为识别系统，并将算法模型以插件的形式嵌入到行为识别系统的Pipeline中，实现在DeepStream框架下对综采工作面关键设备和人员行为的实时推理和准确识别。

Abstract: Recognizing the behavior of key equipment and personnel in fully mechanized mining faces is the foundation and core of intelligent sensing of mining environment information. However, the lighting conditions in fully mechanized mining faces are generally poor. coal dust and water mist can easily cause blurring of the video image, making it difficult to extract key features for identifying target behaviors. As a result, this affects the accuracy of identifying the behavior of equipment and personnel, failing to meet the requirements for practical engineering purposes. In order to address this problem, a multi-information self-attention model and feature fusion mechanism have been developed based on the ResT network architecture. This model expands the information source for feature extraction from pure spatial information to multi-information, including space, time, and channel. This enhancement improves the model's capability to recognize the target's behavior. Among the aforementioned categories, spatial information is a detailed spatial analysis of the target behavior, showcasing a range of deep features such as texture, location, and shape of the target. Temporal information refers to extracting temporal features of the target behavior from continuous video frames, reflecting the order of occurrence of the behavior as well as the evolutionary relationship. Channel information represents the expansion and depth of spatial and temporal levels by extracting spatial and temporal information from multiple perspectives. It characterizes the raw data on feature channels, which provide global features of the target behavior. The effectiveness of our algorithm has been validated through comparative experiments on the dataset for behavior recognition in fully mechanized mining faces. The experimental results demonstrate that the accuracy of behavior recognition can reach 96.90% in the fully mechanized mining faces environment. In comparison with mainstream behavior recognition algorithms such as Swin-Transformer and Timesformer, the recognition accuracy is enhanced by 11.06% and 10.62% respectively. The algorithm is transformed by an ONNX model and accelerated using TensorRT to enable GPU inference, thereby enhancing its value for engineering applications. Consequently, the fully mechanized mining face behavior recognition system was developed. The algorithm model was embedded into the pipeline of the behavior recognition system as a plug-in unit. This integration enables real-time analysis and accurate recognition of crucial equipment and personnel behaviors on the fully mechanized mining faces within the DeepStream framework.

基于多重信息自注意力的综采工作面目标行为识别

Target behavior recognition of fully mechanized mining face based on multi-information self-attention

相关链接