Evaluation of DeepLabV3+ with ResNet backbone for building segmentation using UAV images
Abstract
Building segmentation using remote sensing, aerial, and UAV images with deep learning has gained significant attention. Buildings are crucial for urban development, management, and population estimation. Therefore, the automatic extraction of buildings from UAV images is essential for both research and practical applications. This paper presents a building dataset comprising 6,500 image samples, each measuring 512 x 512 pixels, derived from high-resolution UAV images taken with diverse building characteristics in various regions of Vietnam. The study evaluates the effectiveness of building extraction from UAV images using the DeepLabV3+ model with ResNet as the backbone of our dataset. The results indicate that the accuracy for predicting buildings reaches an Intersection over Union (IoU) of 0.774 when employing the ResNet101 backbone. However, this accuracy is significantly influenced by the architectural characteristics and spatial distribution of the buildings. In newly developed urban and suburban areas, the IoU metrics for predicted buildings can reach 0.874 and 0.857, respectively. In contrast, the accuracy declines in industrial zones and older urban areas, with IoU values of 0.762 and 0.673, respectively. This study has practical applications for urban management, development, and the construction of smart cities in our countr