HỆ CHÚ THÍCH ẢNH TỰ ĐỘNG CHO NGƯỜI KHIẾM THỊ

Đinh Thị Mận; Nguyễn Văn Thịnh; Trần Hữu Quốc Thư; Nguyễn Hải Yến; Nguyễn Phương Hạc; Trần Thị Vân Anh

Đinh Thị Mận
Nguyễn Văn Thịnh
Trần Hữu Quốc Thư
Nguyễn Hải Yến
Nguyễn Phương Hạc
Trần Thị Vân Anh

Abstract

Visual impairment poses significant challenges for visually impaired individuals in recognising and interacting with their surrounding environment. To address this issue, this study proposes a cross-platform automatic image captioning system. The model follows an encoder–decoder architecture, where DenseNet is used to extract visual features, while an LSTM network, combined with an attention mechanism, generates natural language descriptions. The proposed method is trained and evaluated on two benchmark datasets, MS COCO and Flickr30K, using widely adopted metrics such as BLEU and METEOR. Experimental results demonstrate that the system achieves higher accuracy compared to several recently published approaches. Furthermore, a practical application has been developed for both desktop and mobile platforms, enabling the production of audio descriptions for images, thereby enhancing accessibility to visual information for visually impaired users.

HỆ CHÚ THÍCH ẢNH TỰ ĐỘNG CHO NGƯỜI KHIẾM THỊ

Abstract

BỘ KHOA HỌC VÀ CÔNG NGHỆ - MINISTRY OF SCIENCE AND TECHNOLOGY OF VIETNAM

CỤC THÔNG TIN, THỐNG KÊ - NATIONAL AGENCY FOR SCIENCE AND TECHNOLOGY INFORMATION AND STATISTICS