Abstract
Roadway crashes are a leading cause of death and injury worldwide, necessitating targeted interventions for improved safety. While models exist to analyze spatial variations in specific risk factor effects, a comprehensive approach to identifying and characterizing distinct types of crash-prone spatial environments across large regions remains less explored. We address this gap by introducing a novel data-driven typology of census tracts based on crash indicators across Massachusetts, Connecticut, and Vermont using 2023 crash data, providing a framework for categorizing areas based on distinct crash patterns. We apply dimensionality reduction using Uniform Manifold Approximation and Projection (UMAP) to 25 crash characteristics derived from over 222,000 crash records and aggregated across 2,660 census tracts. This reveals six key latent dimensions that capture underlying variations in crash profiles. Gaussian Mixture Modeling (GMM) reveals five distinct crash types, driven by differences in speed conditions, roadway design, and vehicle involvement, which influence crash severity and frequency. We validate the typology using a Light Gradient Boosting Machines (LightGBM) classifier, which achieved 96% accuracy across 5 classes. Finally, we demonstrate how this typology enables targeted and evidence-based safety interventions. This integrated pipeline offers a novel tool for understanding regional safety patterns and guiding the development of context-specific, proactive safety interventions.