Application of data science approach to predicting the cultivation ages of ginseng and analyzing affecting variables

  • Ngô Thị Thu Tình
  • Đỗ Quang Hưng
  • Nguyễn Phương Linh
Keywords: Cultivation age of ginseng (CAG), Machine learning (ML), Extreme Gradient Boosting (XGB), Data science.

Abstract

The cultivation ages of ginseng are important factors that influence the quality and price of ginseng. Recent advances in data science have created great benefits for various practical applications. In data science, machine learning plays a vital role to discover the insights from data. This study develops and assesses the performance of three machine learning models, including Extreme Gradient Boosting (XGB), Light Gradient Boosting (LGB), and Gradient Boosting (GB), in predicting the cultivation age of ginseng (CAG). The models are developed based on 106 data samples with nine input parameters and one output parameter. The K-fold cross-validation technique is used to improve the models' generalizability and predictive performance. Importantly, the XGB model is optimized to find the hyperparameters. The predictive performance of the optimal XGB model is compared to the performance of the LG and GB models. The results show that the XGB is the best model with very high predictive performance (R2=0.964, RMSE=0.148 years, MAE=0.107 years). The sensitivity analysis using the feature importance is performed to evaluate the influence of input variables on the predicted CAG.

điểm /   đánh giá
Published
2022-08-24
Section
Research paper