ĐÁNH GIÁ HIỆU SUẤT CÁC MÔ HÌNH HỌC MÁY TRONG DỰ ĐOÁN NGUY CƠ TIM MẠCH
Abstract
Cardiovascular diseases (CVD) are a leading cause of illness and death worldwide, making it crucial to predict cardiovascular risk accurately for effective prevention and treatment. This study aims to evaluate the performance of several supervised machine learning algorithms in predicting cardiovascular risk using a dataset of clinical and demographic features. Six commonly used models—Random Forest, XGBoost, Logistic Regression, Support Vector Classifier (SVC), K-Nearest Neighbors (KNN), and Decision Tree—are tested based on their ability to predict risk and other important metrics such as accuracy, precision, recall, F1-score, and area under the receiver operating characteristic curve (AUC). The data is preprocessed using normalization and transformation techniques, such as Quantile Transformation and Standard Scaling, to ensure the best model performance. The results provide a detailed comparison of the models' performance, showing their strengths and weaknesses in predicting cardiovascular risk. The findings highlight the best-performing models for identifying high-risk individuals, which could help healthcare professionals prioritize early interventions. The study also discusses the broader role of machine learning in healthcare, especially in disease prediction and prevention