EFFICIENT MULTI-PERSON ACTION RECOGNITION USING YOLOV7-POSE AND DEEP LEARNING MODELS

Tạp chí Khoa học và Công nghệ Đại học Công nghệ Đồng Nai

Tạp chí Khoa học và Công nghệ Đại học Công nghệ Đồng Nai

Từ khóa: Deep learning; LSTM; Multi-person action recognition; ST-GCN; YOLOv7-Pose;

Tóm tắt

Recognition of multi-person action is very important for technology to study and recognize the actions of many people in one scene at the same time. Common models used for pose estimation such as OpenPose and PoseNet show good results but have slower inference speeds, which makes them less useful in situations that need real-time processing. We suggest a way to solve this problem by joining quick pose estimation skills from YOLOv7-Pose with deep learning models—Long Short-Term Memory (LSTM), Gated Recurrent Unit (GRU) and Spatial Temporal-Graph Convolution Network (STGCN)—for classifying actions. From our experiment outcomes, we see that YOLOv7-Pose combined with STGCN has the topmost precision of 91%, while YOLOv7-Pose together with LSTM gives quickest testing time at 1.2 milliseconds. This indicates that the method we propose successfully maintains a balance between accuracy and efficiency, making it suitable for recognizing actions in realtime among multiple people in different applications.

EFFICIENT MULTI-PERSON ACTION RECOGNITION USING YOLOV7-POSE AND DEEP LEARNING MODELS

Tóm tắt

BỘ KHOA HỌC VÀ CÔNG NGHỆ - MINISTRY OF SCIENCE AND TECHNOLOGY OF VIETNAM

CỤC THÔNG TIN, THỐNG KÊ - NATIONAL AGENCY FOR SCIENCE AND TECHNOLOGY INFORMATION AND STATISTICS