Efficient interpretable prediction of protein-ligand interactions using gradient boosting models and explainable AI

  • Thi Cam Mai Truong
Keywords: Drug discovery, machine learning, explainable artificial intelligence.

Abstract

The prediction of small molecule binding affinity to protein targets is a critical step in modern drug discovery, offering the potential to accelerate the identification of effective therapeutics while reducing experimental costs. In this study, we employ the BELKA dataset, a large-scale DNA-encoded chemical library (DEL), to train machine learning models for binding affinity prediction. Using XGBoost, a tree-based gradient boosting algorithm, and extensive preprocessing and feature engineering, we develop predictive models for three protein targets: BRD4, HSA, and sEH to predict whether a given small molecule is a binder or not to one of three protein targets. The models demonstrate strong predictive capabilities, with interpretability achieved through SHAP analysis to identify molecular features driving binding predictions. Evaluation of the BELKA test dataset reveals challenges in generalization, providing valuable insights into the complexities of predictive modelling in drug discovery. This work highlights the promise of machine learning in advancing computational drug discovery by enabling efficient exploration of the chemical space for potential therapeutics.

điểm /   đánh giá
Published
2025-02-28