Evaluation of the characteristics of datasets used in multi-modal sentiment analysis
Abstract
Multimodal Sentiment Analysis (MSA) has increasingly attracted research attention. However, existing datasets still suffer from limitations in terms of comprehensiveness, diversity, and consistency. This study provides a systematic review of widely-used MSA datasets by analyzing key characteristics such as data modalities, annotation methods, and reliability. A four-stage evaluation framework was proposed: (1) survey of dataset collection and management, (2) statistical and technical feature analysis, (3) performance evaluation of models on these datasets, and (4) development of guidelines for constructing high-quality MSA datasets. The results offer in-depth insights into the strengths and weaknesses of each dataset, thereby supporting researchers to select suitable data sources and guide the development of more robust multimodal datasets in the future.