Time series analysis on LA traffic collision data


One of the leading causes of injuries and fatalities globally is traffic accidents. Consequently, they make up a substantial area of research. A number of factors increase the likelihood of crashes including weather, road design, vehicle design, alcohol use and aggressive driving. In addition to these considerations, there can be some underlying patterns in the time of collision. A time series analysis can be used to uncover this underlying pattern. The purpose of this project is to predict the number of collisions using the time series analysis.

The Data Demo

DATA

The dataset is taken from the open data portal city of Los Angeles. This data set and a bunch of others is actively maintained by the city of Los Angeles and is freely available to the public. The dataset reflects traffic collision incidents in the City of Los Angeles dating back to 2010. This is transcribed from original paper traffic reports, so it’s very likely that there are errors. The data begins in January 2010 and is updated weekly. In this particular project, we used data from January 2010 to January 2021, which ends up being ~551K rows and 18 columns. Each row corresponds to a collision.

Technical Aspect

This is a simple Flask app for monthly traffic collision prediction based on univariate time series analysis. The trained model(web/model.pkl) takes date as input and predict the no.of collisions per month.
Tools : Python( Flask, Numpy, Pandas,), HTML, CSS, Power BI


    • A dataframe is created from the traffic collision data with date as index, monthly collision as feature and 132 month records.
    • From the descriptive statistics of the target column, the minimum no.of collisions in a month ranges from 2429 to 5285.
    • The no.of collisions between 2010 and 2014 is almost linear. There is a significant increase in the no. of collisions from 2014 to 2019. In 2020, there is a fall in the no. of collisions, due to the corona pandemic.
    • Here the trend of the target column varies over time. Therefore, multiplicative modelling is used for decomposition to understand time series components seperately
    • Plotted autocorrelation(ACF) and partial autocorrelation(PACF) and checked for stationarity of time series data using Dickey Fuller Test.
    • Used log transformation to stabilize the variance of the series. Removed linear trend using first differencing method and seasonality using seasonal differencing.
    • Applied ARIMA and FB prophet models for prediction. FB prophet model resulted in RMASE value less than 1000 whereas ARIMA model got an RMSE value of 1006.
    • Saved the FB Prophet model as a pickle file and built web app using Flask to predict the count of monthly collisions

Demo


View other projects Bact to top