Target behavior recognition of fully mechanized mining face based on multi-information self-attention
-
Graphical Abstract
-
Abstract
Recognizing the behavior of key equipment and personnel in fully mechanized mining faces is the foundation and core of intelligent sensing of mining environment information. However, the lighting conditions in fully mechanized mining faces are generally poor. coal dust and water mist can easily cause blurring of the video image, making it difficult to extract key features for identifying target behaviors. As a result, this affects the accuracy of identifying the behavior of equipment and personnel, failing to meet the requirements for practical engineering purposes. In order to address this problem, a multi-information self-attention model and feature fusion mechanism have been developed based on the ResT network architecture. This model expands the information source for feature extraction from pure spatial information to multi-information, including space, time, and channel. This enhancement improves the model's capability to recognize the target's behavior. Among the aforementioned categories, spatial information is a detailed spatial analysis of the target behavior, showcasing a range of deep features such as texture, location, and shape of the target. Temporal information refers to extracting temporal features of the target behavior from continuous video frames, reflecting the order of occurrence of the behavior as well as the evolutionary relationship. Channel information represents the expansion and depth of spatial and temporal levels by extracting spatial and temporal information from multiple perspectives. It characterizes the raw data on feature channels, which provide global features of the target behavior. The effectiveness of our algorithm has been validated through comparative experiments on the dataset for behavior recognition in fully mechanized mining faces. The experimental results demonstrate that the accuracy of behavior recognition can reach 96.90% in the fully mechanized mining faces environment. In comparison with mainstream behavior recognition algorithms such as Swin-Transformer and Timesformer, the recognition accuracy is enhanced by 11.06% and 10.62% respectively. The algorithm is transformed by an ONNX model and accelerated using TensorRT to enable GPU inference, thereby enhancing its value for engineering applications. Consequently, the fully mechanized mining face behavior recognition system was developed. The algorithm model was embedded into the pipeline of the behavior recognition system as a plug-in unit. This integration enables real-time analysis and accurate recognition of crucial equipment and personnel behaviors on the fully mechanized mining faces within the DeepStream framework.
-
-