Machine learning prediction of severe knee osteoarthritis from routine data: performance, interpretability, and decision curve analysis
Tóm tắt
Background: Early identification of severe knee osteoarthritis (KOA) usually relies on X-ray imaging, but this is not always possible in healthcare facilities with limited resources. Estimating KOA severity from routine clinical and biochemical data may help clinicians decide when to request imaging and when to refer patients.
Methods: A retrospective study was conducted on 189 patients with KOA whose severity was classified according to the Kellgren–Lawrence scale. Four machine learning models (i.e., logistic regression, Random Forest, XGBoost and SVM) were developed based on 15 routine variables. SHAP analysis was used to interpret the model and to select a reduced set of 10 variables. Performance was evaluated using five-fold stratified cross-validation and an independent test set of 20%. Decision curve analysis (DCA) was used to evaluate clinical benefit.
Results: The models showed moderate discriminative ability in cross validation (AUPRC 0.44 to 0.52; ROC AUC 0.54 to 0.62). The SHAP optimised logistic regression model achieved the best performance on the test set (AUPRC 0.70; ROC AUC 0.81). Important variables included serum uric acid, BMI, age, and renal function. Decision curve analysis showed a positive net benefit across clinically relevant risk thresholds.
Conclusion: A simple and interpretable logistic regression model based on routine data may help predict severe KOA and prioritise X-ray indication. Further validation studies with large multicenter samples are needed.
DOI: 10.59715/pntjmp.5.2.11