EFFICIENT MULTI-PERSON ACTION RECOGNITION USING YOLOV7-POSE AND DEEP LEARNING MODELS
Tóm tắt
Recognition of multi-person action is very important for technology to study and recognize the actions of many people in one scene at the same time. Common models used for pose estimation such as OpenPose and PoseNet show good results but have slower inference speeds, which makes them less useful in situations that need real-time processing. We suggest a way to solve this problem by joining quick pose estimation skills from YOLOv7-Pose with deep learning models—Long Short-Term Memory (LSTM), Gated Recurrent Unit (GRU) and Spatial Temporal-Graph Convolution Network (STGCN)—for classifying actions. From our experiment outcomes, we see that YOLOv7-Pose combined with STGCN has the topmost precision of 91%, while YOLOv7-Pose together with LSTM gives quickest testing time at 1.2 milliseconds. This indicates that the method we propose successfully maintains a balance between accuracy and efficiency, making it suitable for recognizing actions in realtime among multiple people in different applications.