close

Вход

Забыли?

вход по аккаунту

?

ICITBS.2016.149

код для вставкиСкачать
2016 International Conference on Intelligent Transportation, Big Data & Smart City
Early Warning of Traffic Accident in Shanghai Based on Large Data set Mining
Yang Yanbin, Zhou Lijuan, Leng Mengjun, Sun Ling
Shanghai Maritime University, College of Transport &Communications, Shanghai, 201306, China
[email protected]
Data mining is the process of extracting knowledge from
specific forms of data. For specific data, specific issues,
choosing one or more algorithms to find the hidden rules of
the data, that is implicit and meaningful knowledge, to
provide scientific support for decision making. The basic
process of data mining is as follows:
Abstract—Through the classification and regression analysis
on traffic accident statistics in Shanghai from July 2014 to
April 2015, the paper puts forward a forecasting model of
traffic accident incidences, by which we provides the index
system of traffic accident, including month, week, weather and
wind speed. Using this model to calculate the range of traffic
accident
simultaneously.
Finally,
making
decisions
and recommendations for controlling traffic accidents and
rescue related based on analyzing safe levels, which has
important guiding significance to the traffic accident
prevention and traffic safety management in our country.
A. Data preparation
Select the data applicable to data mining applications, the
quality of research data, in order to further analyze the
preparation, and determine the analytical methods to be
carried out. We analyze the main data source of traffic
accidents in Shanghai in recent years. In order to data mining
more effectively , but also includes a number of relevant data,
such as Shanghai's time information, temperature
information, weather information, etc..
Keywords- data mining; traffic accident; regression analysis;
incidence; safety levels Introduction
I.
INTRODUCTION
According to the global traffic and police department
statistics, the number of traffic accidents in the world for
about 500 thousand people last year. There are 104 thousand
people in China, accounting for 1/5 of the total number of
deaths worldwide traffic accidents, ranking first in the world.
And a lot of traffic accidents happened because of the
unreasonable setting of the road itself, the need is hurry to
change the status quo, to reduce the incidence of accidents.
At present, the road traffic accident analysis and decision
etc. basically in the manual processing stage, and manual
processing is the main cause of low efficiency and poor
accuracy of decision analysis of the large amount of data
traffic accident. Therefore, it is imperative to carry out
scientific research and effective improvement on the analysis
and decision making of road traffic accidents. But the
existing navigation system only for speeding, and
monitoring of the high incidence of road sweeping voice
prompt to have shortcomings, in view of the road ahead of
the drivers prone to defects, improving vigilance on the
traffic accidents, the user vigilance, thus reducing the
probability of road accidents.
This paper makes analysis on whether the various factors
of Shanghai traffic accidents influencing traffic accidents.
Through the collation of a large initial record of accident
data, and screening the influence factors by significance
analysis, to comprise the new accident record. The accident
rate model was fitted by Lingo, and the influence factors on
the traffic accidents rate were derived.
II.
B. Data reorganization and conversion
On the basis of open data of the Shanghai municipal
government, using soda data, public data and private data,
taking into account the accident data is the government
statistics and manual sorting, and is mainly used for the
analysis of accident statistics, accident data is incomplete,
redundancy and ambiguity, not for data mining algorithm
directly, the need for data processing and classification.
C. Data mining
After cleaning and conversion, the original data of the
accident is suitable for mining data sets, data mining on this
data set to complete the extraction of knowledge, to find the
appropriate knowledge model for decision analysis. For
specific data, specific issues, choose one or more data
mining algorithm, find the hidden rules, rules and patterns,
and provide the solution to the problem.
D. Result analysis
Interpret the results of data mining and evaluate the
results, remove the meaningless part, the meaning of the
rules or patterns to analyze again, and ultimately to be easy
to understand and identify the way to provide decision
makers.
III.
The goal of data mining is to discover hidden and
meaningful knowledge from databases. There are many data
mining algorithms and they applies to broad functional areas,
which includes classification, estimation and prediction,
clustering,
association,
sequence
discovery
and
characterization. Regression analysis, time series analysis,
cluster analysis and others are general methods.
ACCIDENT DATA MINING
978-1-5090-6061-0/17 $31.00 © 2017 IEEE
DOI 10.1109/ICITBS.2016.149
ROAD TRAFFIC ACCIDENTS DATA MINING IN
SHANGHAI
18
For the analysis on Shanghai traffic accident data,
considering that this paper is to explore the correlation
between Shanghai traffic accidents and various influencing
factors, then obtain the probability of road accidents in all
circumstances, pointing out specific measures. Therefore, we
expand the analysis from the following aspects.
West wind
north wind
northeaster
northwester
southwester
southeaster
south wind
A. Classification
In order to establish a reasonable index system of traffic
accidents, nine possible influencing factors are selected out,
such as month, week, time, temperature, weather, wind
direction, wind speed, whether there is camera and whether
the road is smooth. We classify all factors at first, sorting
month by 1 to 12 and week by 1 to 7. Time, temperature,
weather, wind direction and wind speed according to the
following categories respectively.
Table 5 Wind speed categories
Wind speed
grade 3
grade 3-4
grade 3-5
grade 4-5
grade 4-6
Table 1 Time categories
Time
0:00-1:59
2:00-3:59
4:00-5:59
6:00-7:59
8:00-9:59
10:00-11:59
12:00-13:59
14:00-15:59
16:00-17:59
18:00-19:59
20:00-21:59
22:00-23:59
2
3
4
5
6
7
8
Reference values
1
2
3
4
5
Through significance testing on the correlation between
the accident frequency and the influencing factors, to screen
out power factors of accidents. Based on the correlation
analysis results, choosing and removing the influence factors
of accidents. Finally, seven influencing factors of month,
week, time, temperature, weather, wind direction and wind
speed are ascertain.
Reference values
1
2
3
4
5
6
7
8
9
10
11
12
B. Regression analysis
First of all, making data processing on the traffic
accidents frequency corresponding to month, and then we
knows relation between month and traffic accidents
frequency by fitting as follows.
Table 2 Temperature categories
Temperature
-10-0℃
0-5℃
5-10℃
10-15℃
15-20℃
20-25℃
25-30℃
Reference values
1
2
3
4
5
6
7
Table 3 Weather categories
Weather
heavy rain
thundershower
moderate rain
rainstorm
clear
shower
overcast
sleet
light rain
cloudy
Figure 1. relation fitting on month and traffic accidents frequency
Reference values
1
2
3
4
5
6
7
8
9
10
From the chart above, the number of accidents in
Shanghai occurred at least in February, in the September,
October and November occurred more. In February, most
people go home for the New Year, the Shanghai traffic
volume tends to the lowest, so the number of occurrences are
minimum. In the September, October and November, on the
one hand because the students term begins, on the other hand
due to the National Day holiday, and the vehicles increased,
so the number of occurrences also increased and is in line
with reality.
Based on the analysis of other influencing factors, we
can get the conclusion:
1) Week
Table 4 Wind direction categories
Wind direction
east wind
Reference values
1
19
The number of accidents on Monday, Thursday and
Friday mostly, and also in the first and the last two working
days, people are generally become undisciplined, prone to
traffic accidents.
2) Time
As we know, the number of traffic accidents in the
morning and evening peak hours more than other times, that
is, more accidents occurs in 6:00-8:00 and 16:00-19:00.
3) Temperature
The number of traffic accidents in each temperature
range is relatively average, but with the increase of
temperature, the number of traffic accidents has increased
slowly.
4) Weather
Frequent traffic accidents mainly occurs in light rain and
cloudy, which makes people listless and inattention. And in
heavy rain, rainstorm and other weather, people will be more
careful, so the frequency of traffic accidents is few.
5) Wind direction
Traffic accidents happen mostly in southeaster, mainly
because China is in the east of the Eurasian continent in
Pacific, southeast monsoon comes in summer, which also
verifies the influence of temperature on frequency of
accidents.
6) Wind speed
Accidents happens more in three wind speed, and as the
wind speed increases, the number of accidents decreased
slowly.
According to the fitting mentioned above, we can find
out the relationship between the number of accidents and the
various influence factors, and the model of the number of
accidents is obtained as follows:
of the regression equation is very good, the regression
equation is significant, the regression model is setting up.
C. Model of accident occurrence rate
According to the relationship between the number of
accidents and the various influence factors, we first assume
that the relationship between the incidence rate and the
influence factors is as follows:
Y  k1 x13  k 2 x12  k 3 x1  k 4 x 24  k 5 x 23  k 6 x 22  k 7 x 2
 k 8 x35  k 9 x34  k10 x33  k11 x32  k12 x3  k14 x 42  k15 x 4
2
6
According to the value we set and the corresponding Y
value, we give the constraint conditions:
k1  k 2  k 3  k 4  k 5  k 6  k 7  k 8  k 9 + k10 + k11  k12
 k13  k14  k15  k16  k17  k18  k19  k 20  k 21  1
Month
Figure 2. lingo example solution
Week
Time
0.9363 x2  2.6051
ln y  0.0008 x35  0.0247 x34  0.2828 x33
1.3777 x32  2.4163 x3  5.527
Temperat
ure
ln y  0.0285 x43  0.43 x42
Weather
ln y  1.6416 ln( x5 )  0.5452
Wind
direction
ln y  0.0274 x62  0.0879 x6
Wind
speed
As a result, we get the relationship between the incidence
of accidents in Shanghai and the influencing factors.
R2
ln y  0.0039 x24  0.0683 x23  0.4083 x22
2.0652 x4  1.1673
1.5834
ln y  0.5349 x73  4.8008 x72
13.243 x7  16.668
0.85
38
(22)
0.94
61
(23)
0.83
95
(24)
(25)
(26)
(27)
0.98
09
0.80
23
0.90
26
(2-9)
After processing the data, the coefficients of the function
are fitted by lingo, and the results are as follows:
Form
ula
ln y   0.0033x13  0.0642 x12  0.3009 x1 0.87
5.4948 (267
1)
Regression equation
(2-8)
2
7
 k16 ln( x5 )  k17 x  k18 x 6  k19 x  k 20 x  k 21 x 7
Table 6 Regression equation of the influencing factors and the number of
accidents
Influence
factor
3
7
Y  0.4508  10 4 x13  0.1834  10 2 x12  0.02058x1
0.8176  10 2 x24  0.1345 x23  0.7518 x22  1.6172 x2
(2-10)
2
0.01799 ln( x5 )  0.8932  10 x7
As we can see from the function, there is a greater link
between the number of accidents per day, namely the
accident rate and the months, weeks, weather and wind
speed. Therefore, we select the month, week, weather and
wind speed as the 4 factors of the accident rate index system,
as shown below:
From the table we can see that the of the regression
equation is greater than 0.8, indicating that the fitting effect
20
(3) According to the scope of the traffic accident
incidence, we put forward the safety level, and provide the
corresponding measures and the concept of the volunteer aid
station in different safety level.
(4) In order to develop the traffic accident rate model
better, the classification of the current traffic data need to be
more reasonable, in addition to the current traffic accident
data, other data such as vehicle mileage, road information
and lane number data that could influence traffic accident,
we need to collect and improve the modal as soon as
possible.
Figure 3. Index system of accident occurrence rate
In this way, we can calculate the probability of
occurrence of traffic accidents according to the month, week,
the weather and wind speed.
IV.
MODEL APPLICATION
According to the function that we have obtained, as well
as the value of each variable range .We find out the
maximum value of the traffic accident rate is 1.2397, the
minimum value is 0.9303.
That is when on Tuesday January, the weather is cloudy,
wind speed at the 4-6 level, the probability of traffic
accidents achieve maximum, we should watch rigorously.
When on Monday August, the weather is rain, wind speed at
the 3 level, the probability of traffic accident reach the
minimum instead. A possible reason is that we will be more
careful in a rainy day, not prone to traffic accidents, but we
also need to remind people to be careful.
According to the range of traffic accidents rate, we give
the safety level, as shown in the following table:
Safety level
7
6
5
4
3
2
1
REFERENCES
[1]
[2]
[3]
[4]
Table 7 Safety level classification
Range of accident rate
0.9303-0.9746
0.9747-1.0189
1.0190-1.0632
1.0633-1.1075
1.1076-1.1518
1.1519-1.1961
1.1962-1.2400
According to the set of safety levels, we can take the
appropriate measures to prevent the occurrence of traffic
accidents that can be avoided. For example, at the higher
safety level, we can set up the electronic warning system,
remind people to be careful with some sharp turns or a large
crowd; at the lower safety level, it needs not only the
electronic warning, but also needs the corresponding traffic
police and other personnel to maintain the traffic situation, in
order to avoid the occurrence of traffic accidents.
We can analyze the incidence of a certain area of the
traffic accident, then as for the "golden 5 minutes" rescue
time of the traffic accident, we can set up a volunteer aid
station at the right place for every hospital. So we can solve
the serious lack of national emergency common sense, but
missed the most the effective rescue time problem.
V.
CONCLUSIONS AND RECOMMENDATIONS
(1) Due to the rapid growth of motor vehicles, drivers
and road mileage and the rapid development of economy,
traffic accidents and casualties and economic losses caused
by traffic accidents in Shanghai city have also increased
rapidly.
(2) Through the analysis of the traffic accident situation
and the data of the influence factors in July 2014 - April
2015. We have got the traffic accident rate index system
includes four parts: month, week, weather, wind speed, with
the application of the index system we can get the rate of
traffic accidents.
21
Hayakawa H, Fischbeck P S, Fischhoff B. Traffic accident statistics
and risk perceptions in Japan and the United States[J]. Accident
Analysis & Prevention, 2000, 32(6):827-35.
Evans A W. Estimating transport fatality risk from past accident
data[J]. Accident Analysis & Prevention, 2003, 35(4):459-72.
Liu Jun, “Traffic accident analysis based on Data Mining
Technology” [J]. Transport Information and Safety, 2008, 26(1):7376. (in Chinese)
Li Ganshan, “Study on the Traffic Accident Fatality Data in Yunnan
Province of China” [J]. China Safety Science Journal, 2007,
17(7):72-80.
Документ
Категория
Без категории
Просмотров
10
Размер файла
388 Кб
Теги
icitbs, 2016, 149
1/--страниц
Пожаловаться на содержимое документа