Studying the impact of data preprocessing, hyperparameter tuning and machine learning algorithms in crash prediction explainability

Cargando...
Miniatura
Fecha
2026-07
Título de la revista
ISSN de la revista
Título del volumen
Editor
Elsevier B.V.
google-scholar
Resumen
Road traffic crashes remain a major global concern, causing more than 1.3 million fatalities each year and underscoring the need for improved tools to understand and predict crash occurrence. This study presents an integrated retrospective crash-risk screening framework that merges four heterogeneous data sources (crash records, road infrastructure, connected vehicle data, and travel demand) to model road-segment crash risk in Madrid. Ten preprocessing configurations are created using oversampling (generate instances of the minority class), undersampling (removing instances of the dominant class), dataset expansion (new data generation), and SMOTE, each tested with and without normalization. Seven machine-learning algorithms (tree ensembles and SVMs) are evaluated under regression, multiclass classification, and binary classification formulations, resulting in a total of 210 experiments. Binary classification delivered the best performance, with Gradient boosting trained on normalized, undersampled data emerging as the strongest model. Subsequent Bayesian hyperparameter optimization further enhanced its predictive capability. Explainable AI analysis using SHAP values revealed that braking events are the most influential predictors of crash likelihood, followed by road length and traffic demand, emphasizing the relevance of driver-behavior indicators in safety modeling. Overall, the findings demonstrate the benefits of integrating traditional crash data with emerging connected vehicle and demand-based information. The study provides evidence that explainable machine learning approaches can effectively support data-driven decision-making for road-safety management and targeted intervention planning.
Palabras clave
Crash prediction
Explainable AI
Imbalanced learning
Machine learning
Road safety
Descripción
Materias
Cita
Díaz-Aparicio, J., Rodríguez-Esparza, E., Fajardo-Calderín, J., & Onieva, E. (2026). Studying the impact of data preprocessing, hyperparameter tuning and machine learning algorithms in crash prediction explainability. Array, 30. https://doi.org/10.1016/J.ARRAY.2026.100743
Colecciones