TOWARDS THE DEVELOPMENT OF A SUBTITLE DATASET FOR EDUCATIONAL VIDEOS IN INFORMATION TECHNOLOGY

  • Trần Thị Thu Phương
  • Nguyễn Quốc Tuấn
  • Lê Thị Hằng
Keywords: educational subtitles; natural language processing; artificial intelligence in education; digital learning resources in IT.

Abstract

This study aims to construct a domain-specific dataset of video subtitles in the field of Information Technology (IT) to enhance access to educational resources and support the development of natural language processing (NLP) applications in education. A systematic methodology is proposed for data collection and processing, encompassing source selection, subtitle extraction, data cleaning, normalization, and quality assurance. The resulting dataset possesses strong academic value and is intended to serve as a foundational resource for further research and practical applications in IT education

điểm /   đánh giá
Published
2025-05-28