#**SLIDE SUPPLEMENT**
## 1. Intersection Crash
**slide page: 3**

Potential reasons of intersection crashes are obtained by pulling from variable "*DRVRPC*" in the original dataset. However, for each intersection crash, it may have multiple reasons which means that we cannot directly group by this variable and count the number. In order to address this problem, a loop was conducted to go through each data and the reasons are counted by a dictionary.
```
for i in range(len(df)):
  if 'DC' in df['DRVRPC1'][i]:
    reason['DC'] += 1
  if 'DIS' in df['DRVRPC1'][i]:
    reason['DIS'] += 1
  ......
```

  
**slide page: 4**

### Obtain intersection data

City street and highway name is given in the dataset but the detailed intersection data is not provided. So, intersection crash data cannot be pulled directly from the dataset. 

However, we can group by street names to extract intersection of two streets but due to lots of missing street names and the intersection condition in reality, the result of this method is not ideal and even aweful. For example, the most dangerous highway intersection given by this method is E Washington Ave & Aberg Ave. But by looking at satellite plot on google map, this is not an intersection but an elevated road (just as shown in the following).

![](https://i.imgur.com/qkMGvt8.jpg)

So we tried to add clustering model trained by latitude and longtitude data because location data have no missing value. The clustering model used is DBSCAN. It is a density based clustering algorithm that works well on spatial data. The most important parameter in this model is "*eps*" defined as the maximum distance between two samples. The average highway and city street intersection length is about 35m and 23m (https://nacto.org/publication/urban-street-design-guide/street-design-elements/lane-width/). So, after setting "*eps*" correctly, the model can give us a precise intersection cluster result. The detailed info about this model can be found here: https://scikit-learn.org/stable/modules/generated/sklearn.cluster.DBSCAN.html

### Analyize intersection crash

Because intersection condition, such as traffic sign and street light, may change by year, in order to avoid the influence of these changes, we used data in recent 3 years rather than 10 years.

Thanks to google map, we can look at the 3D view of each intersection. Also, we requested original crash report from TOPS laboratory. However, these reports are in pdf file so that they can be checked manually.

## 2. Car to Truck Crash
**slide page: 5** 
### Data Source
The dataset "Madison_crash_2009_2019.csv" from https://topslab.wisc.edu/
### Introduction
My investigation focuses on accidents of second common transportation tools, such as bikes, trucks, buses, and cycles. Although the number of accidents for these transportation tools is relatively small, it deserves our attention for several reasons. Bicycle involved accidents are often accompanied by serious injuries, and truck-related accidents are more likely to cause severe traffic jams.
### Technical Details
The dataset provides the types of cars involved in the accidents in two separate columns and with a variety of types of cars. For instance, it classifies truck into TRK DB(double bottom), TRK NA(not attached), TRK SA(semi attached), TRK ST(insert truck). To have a more general view of the accidents, I integrate subgroups of types of vehicles into their main groups (classify TRK DB, TRK NA,etc. into TRK). Then I add a new column to the dataframe to describe both types of vehicles involved in the accidents (carTotruck, carTocar, etc.), and the x-axis of my plot is based on the new column.


**slide page: 6**
### Data Source
The dataset comes from [Street Lights Map](https://data-cityofmadison.opendata.arcgis.com/datasets/utility-maintained-street-lights) provided by Open Data Madison.
### Methdology
To find if there is any pattern in the dark time carTotruck accidents, given such a great number compare with others, I plotted all the dark time carTotruck accidents. Surprisingly, most of which happened in Interstate 39/90, as the three red box indicates. Since the light condition for those accidents are dark, I assumed that either there are no street lights or the street lights are broken. I found the madison street lights distribution from Madison open data, and downloaded its shapefile to add into my dark time carTotruck accidents map. By observing the non-overlapped area, I noticed that for all three areas there are no street lights nearby.

## 3. Bike Crash Analysis
### Traffic Loading Calculation
**slide page: 11**

### Data Source
The dataset comes from [Traffic Flow Map](https://data-cityofmadison.opendata.arcgis.com/datasets/traffic-flow-map) presented by Open Data Madison.

### Feature
This dataset recorded the average traffic load of each road during weekday.

### Usage
Steps to extracted traffic load of a single road from this dataset.
1. Find specific road by road name
    ```python 
        df.['segment_na'].unique()
    ```
    This returns a list of differnt road name so we can search for the road name we want from it.
2. Get traffic load of specific road
    ```python 
        road_name = df[df["segment_na"]  == "road name"]
    ```

### Simulation of peak hour traffic load
We use a conservative formula to simulate the total traffic load of 3 hours of peak time.

$$PeakLoad = \frac{0.7*DailyLoad}{LaneCount*3 hours}$$
<br>
Lane count can be retreived from Google Map Satelite

### Score Calculation
To rank the section by benefit, we calculate the score of a section by the total accident count divided by the total length of the segment that needed modification. 

$$Score = \frac{AccidentCount}{Length}$$



# **EXTRA PLOTS**


## Crash severity during day hours
### Question
What is the distribution of the accident in a day? When did most of the more serious accidents happened?

![](https://i.imgur.com/ePQqPPA.png)
### Conclusion
*   During regular weekdays, the average number of traffic accident tends to increase from 6am, 11am and 2pm, and then reach peack at 8am 12pm and 5am, which are all the rush hour of the day.

*   During weekends, traffic accidents tend to cluster between 11am and 6pm.

## Hourly injury/fata accident rate during May to October

![](https://i.imgur.com/iAUx4Hj.png)

### Conclusion
- The fatal cases are uniformly distributed. Meaning that fatal cases can be caused by few factors that are independent of time.

<br>

- For The injuried cases, it's curve fitted the overall accident distribution over day time. This indicated we can look for the same feature from overall distribution. 


## Alcohol crash severity during day hours
![](https://i.imgur.com/kCGxt7G.png)
* Drunk driving accident has a different pattern comapred to the general case. 

*   The number of accident keeps low during morning hours and starts to increase at 2pm. 
*   The number of accident (injured and fatal) reaches peak at 2am for both weekday and weekend. 
*   Possible reason: People are more likely to drink in evening and night (6pm to 2pm). Especially on weekend, people come to party at late night and may drunk drive more frequently.

## Ratio of age group in speeding crashes
### Question
Do any age groups lead to severer accidents?
![](https://i.imgur.com/uYh4YW7.png)
*   Speeding will lead to severe accident!

*   Astonishing fact: 71% fatal accidents among 18-25 years old drivers are caused by speeding. 
*   Large part of injured accident among young age groups are because of speeding.
*   In general, young people tend to have more traffic accidents than older people.

## Monthly injury/fatal count
### Question
what period of time will has more accidents that result in injury and fatal?

![](https://i.imgur.com/tYPe0WI.png)

### Conclusion
From May to October is the high season for injury and fatal cases.

This conclusion is conunter-intuitive because this time period would be during summer break and sutdents shoud be mostly out of town, which brings out attention to the next question.


## Traffic loading map for heaviness calculation
Question: to plot an overall traffic loading across the City for traffic loading extraction.

![](https://i.imgur.com/7QXNKid.png)

## The severity of Accidents Inovolved with Different Types of Vehicles
### Question
Is there any relation between the severity of accidents and the types of vehicles involved in the crashes?

![](https://i.imgur.com/ftUx5en.png)

### Conclusion
- We've noticed that carTobike and carTocycle accidents have injured cases more than property Damage cases, a conclusion that we cannot find if we only examine the severity of total accidents. 
- In particular, carTobike type accidents have nearly **200** injured related car accidents over ten years, consist nearly **90%** of total carTobike accidents. We need to do a further study on bike related accidents, given the great number of injured accidents associated.
<br>

## Ratio of Accident Severity in City Streets and State Highway
### Question
Did highway accidents cause more severe crashes than city street accidents?

![](https://i.imgur.com/7WNrYHj.png)

### Conclusion
- The answer is **YES**. Highway accidents may cause more severe crashes than city street accidents.

- Given the small number of fatal accidents happened annually, I've focused on comparing the percentage of injured accidents. Apparently, the percentage of injured in highway accidents is **3-4%** higher than that of city street accidents steadily over 10 years.

- I've also noticed that all lines in the plot do not fluctuate much in recent ten years, which means that the trend may continue in the future.




# **Documentation**

## Data Source
1. Main data source: [Wisconsin Traffic Operation and Safety Laboratory](https://transportal.cee.wisc.edu/services/crash-data/)

2. [City Map](https://data-cityofmadison.opendata.arcgis.com/datasets/city-limit)

3. [Lake](https://data-cityofmadison.opendata.arcgis.com/datasets/lakes-and-rivers)

4. [Traffic Loading](https://data-cityofmadison.opendata.arcgis.com/datasets/traffic-flow-map)

5. [Roads](https://www.census.gov/cgi-bin/geo/shapefiles/index.php?year=2019&layergroup=Roads)

6. [Bike Paths](https://data-cityofmadison.opendata.arcgis.com/datasets/bike-paths)

## Important packages and visualization tools
1. Data process: numpy, pandas

2. Visuazalition: matplotlib, seaborn, plotly, geopandas, [shapely]((https://pypi.org/project/Shapely/)), [plotly](https://plot.ly/python/) (this is a tool for interactive plots that provides python API)

3. Others (machine learning models): sklearn