Building a new hybrid machine learning model for improvement insurance cross-sell prediction

  • Gia Bao Ngoc Doan
  • Minh Quan Luu
  • Thi Thanh Ha Truong
  • Duc Minh Tan Nguyen
  • Thi Minh Huyen Phan
  • Duy Thanh Tran
Keywords: Borderline-SMOTE, cross-sell prediction, decision tree, hybrid model, logistic regression, random forest, ROC-AUC, XGBoost

Abstract

<p><span style="font-weight: 400;">Amid rising competition in the insurance sector, optimizing cross-selling strategies is crucial for sustainable growth and requires a deep understanding of customer behavior. This study proposes a machine learning-driven framework for cross-sell prediction to enhance personalization, increase conversion rates, and maximize return on investment. Using 381,109 customer records from an insurance company, the data undergoes preprocessing steps including outlier treatment for Annual Premium, encoding categorical variables such as Gender and Vehicle Age, and standardizing numerical features like Age, Annual Premium, and Vintage. To address class imbalance in the Response variable, where only 12.26 percent of customers responded positively, Borderline-Synthetic Minority Over-sampling Technique (Borderline-SMOTE) is applied to generate synthetic samples and improve prediction accuracy. Four machine learning models, including Logistic Regression, Decision Tree, Random Forest, and XGBoost, are trained and evaluated using Accuracy, Receiver Operating Characteristic - Area Under the Curve (ROC-AUC), Mean Absolute Error, Mean Squared Error, and Root Mean Squared Error. Among these, XGBoost with Borderline-SMOTE achieves the best performance, with an accuracy of 0.84 and a ROC-AUC score of 0.8436, representing a significant improvement over the baseline XGBoost model with a ROC-AUC of 0.7768. Logistic Regression also improves, with its ROC-AUC increasing from 0.8250 to 0.8451. Visual analysis reveals behavioral patterns, such as a 25 percent purchase rate among customers with vehicles older than two years and a 20 percent rate among male customers with prior vehicle damage. The study delivers a high-performing predictive model to support targeted marketing efforts, potentially increasing cross-sell conversion rates by 5 to 10 percent. Future work will explore deep learning techniques and larger datasets to further enhance prediction capabilities</span></p>

Tác giả

Gia Bao Ngoc Doan
University of Economics and Law, Ho Chi Minh City; National University Ho Chi Minh City, Ho Chi Minh City
Minh Quan Luu
University of Economics and Law, Ho Chi Minh City; National University Ho Chi Minh City, Ho Chi Minh City
Thi Thanh Ha Truong
University of Economics and Law, Ho Chi Minh City; National University Ho Chi Minh City, Ho Chi Minh City
Duc Minh Tan Nguyen
University of Economics and Law, Ho Chi Minh City; National University Ho Chi Minh City, Ho Chi Minh City
Thi Minh Huyen Phan
University of Economics and Law, Ho Chi Minh City; National University Ho Chi Minh City, Ho Chi Minh City
Duy Thanh Tran
University of Economics and Law, Ho Chi Minh City; National University Ho Chi Minh City, Ho Chi Minh City
điểm /   đánh giá
Published
2025-09-07
Section
Bài viết