Nghiên cứu ứng dụng các mô hình học máy để dự báo chỉ số chất lượng nước mặt vùng bán đảo Cà Mau
Abstract
Surface water quality in the Ca Mau peninsula is being polluted due to the influence of unsatisfactory discharge into water sources. Surface water sources in polluted areas are organic and microbiological with parameters of DO, BOD5, COD, NH4+, total Coliform, etc. In the study area, localities often use water quality index (WQI). to assess surface water quality and usability of water sources for different purposes. However, the calculation of WQI from water quality monitoring parameters still faces many limitations because it requires many monitoring parameters and is relatively complicated. Finding an effective WQI calculation method is very important and necessary to better serve the assessment of surface water quality for the study area. This study will apply a machine learning algorithm (model) to calculate the WQI based on the minimum input data (water quality parameters) to reduce the cost of surface water quality monitoring. The study applied Bayesian method (BMA) to select optimal water quality parameters (pH, BOD5, PO4 and Coliform) to calculate WQI. The results show that the machine learning models have calculated (predicted) WQI based on (minimum) parameters with high accuracy. Accordingly, Gradient Boosting model has the most accurate prediction results because it has the highest coefficient of determination R2 (0.973), the lowest error values of MAE, MSE and RMSE (3.24; 22.54; 4.75). XGBoost model with R2 of 0.966 and the corresponding error values (3.15; 28.95; 5.38). The Decision Tree model has an R2 of 0.944; the lowest error values is 4.46; 49.67; 7.04); The LightGBM model has an R2 of 0.928; false value value is 5.95; 63.30; 7.95.