Car Insurance Claim Prediction Using Machine Learning: A Comparative Study of Ensemble and Classical Classifiers
Keywords:
Car Insurance Claim Prediction, Gradient Boosting, Random Forest, SVM, Logistic Regression, Flask, scikit-learn, SQLite, Chart.js, Bootstrap 5, Docker, Class Imbalance, Feature EngineeringAbstract
This paper presents a comprehensive web-based machine learning platform for predicting car insurance claim
outcomes using four classification algorithms: Gradient Boosting, Random Forest, Support Vector Machine (SVM),
and Logistic Regression. Insurance companies face the dual challenge of class imbalance (≈74% no-claim, ≈26%
claim) and non-linear interactions between policyholder features that defeat traditional actuarial GLM models. The
system processes a synthetic dataset of 10,000 policy records across 17 features—including driving experience, credit
score, annual mileage, speeding violations, DUIs, and past accidents—after applying a two-stage preprocessing
pipeline: LabelEncoding for 9 categorical variables and StandardScaler normalization for 8 numeric features,
followed by a stratified 80/20 train-test split. Gradient Boosting achieved the highest performance (Accuracy =
91.95%, Precision = 89.45%, Recall = 78.27%, F1 = 83.49%), surpassing SVM (91.10%), Logistic Regression
(90.25%), and Random Forest (90.10%). The full-stack Flask application integrates scikit-learn inference, SQLitebacked
authentication (Werkzeug PBKDF2-SHA256 hashing), prediction history, Chart.js analytics dashboards, and
Docker containerization—all within a Bootstrap 5 dark-themed UI. This article details the mathematical foundations
of all four algorithms, the end-to-end system architecture, the ML pipeline, algorithmic pseudocode, and rigorous
results analysis with comparative tables and performance graphs.
Downloads
Published
Issue
Section
License
Copyright (c) 2026 Authors

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.










