OBJECT DETECTION USING THE SEGMENT ANYTHING APPROACH

Đặng Thị Dung, Nguyễn Trung Kiên, Trần Văn Phúc, Huỳnh Phúc Thịnh

Đặng Thị Dung, Nguyễn Trung Kiên, Trần Văn Phúc, Huỳnh Phúc Thịnh

Keywords: YOLOv8; YOLOv9; SAM; Object Detection; Machine Learning; Deep Learning

Abstract

Object detection and segmentation are fundamental tasks in computer vision with wide applications in healthcare, agriculture, transportation, and smart surveillance. Recent advances have highlighted the strong detection capability of YOLO models and the flexible segmentation capabilities of the Segment Anything Model (SAM). However, their integration has not been thoroughly investigated for multi-class animal recognition. This study evaluates YOLOv8 and YOLOv9 combined with SAM to improve bounding box refinement and segmentation quality. A dataset of over 2,000 annotated animal images across 20 classes was collected and divided into training, validation, and testing sets. Experiments were conducted on Google Colab with NVIDIA Tesla V100 GPUs. Results show that YOLOv8n is the most efficient for resource-constrained systems, achieving 81.4% accuracy, Precision = 0.9625, Recall = 0.8415, and F1 = 0.8979, while YOLOv9s provides the best overall balance with 83.92% accuracy, Precision = 0.8003, Recall = 0.7437, and F1 = 0.7708. The findings suggest that lightweight YOLOv8 models are suitable for real-time embedded deployment, whereas YOLOv9s is recommended for high-accuracy scenarios. Overall, the integration of YOLO with SAM enhances detection robustness, offering practical insights into balancing speed, accuracy, and hardware requirements.

OBJECT DETECTION USING THE SEGMENT ANYTHING APPROACH

Abstract

BỘ KHOA HỌC VÀ CÔNG NGHỆ - MINISTRY OF SCIENCE AND TECHNOLOGY OF VIETNAM

CỤC THÔNG TIN, THỐNG KÊ - NATIONAL AGENCY FOR SCIENCE AND TECHNOLOGY INFORMATION AND STATISTICS