Search | Korea Science

Comparison of Association Rule Learning and Subgroup Discovery for Mining Traffic Accident Data (교통사고 데이터의 마이닝을 위한 연관규칙 학습기법과 서브그룹 발견기법의 비교)

Kim, Jeongmin;Ryu, Kwang Ryel
- Journal of Intelligence and Information Systems
- /
- v.21 no.4
- /
- pp.1-16
- /
- 2015
Traffic accident is one of the major cause of death worldwide for the last several decades. According to the statistics of world health organization, approximately 1.24 million deaths occurred on the world's roads in 2010. In order to reduce future traffic accident, multipronged approaches have been adopted including traffic regulations, injury-reducing technologies, driving training program and so on. Records on traffic accidents are generated and maintained for this purpose. To make these records meaningful and effective, it is necessary to analyze relationship between traffic accident and related factors including vehicle design, road design, weather, driver behavior etc. Insight derived from these analysis can be used for accident prevention approaches. Traffic accident data mining is an activity to find useful knowledges about such relationship that is not well-known and user may interested in it. Many studies about mining accident data have been reported over the past two decades. Most of studies mainly focused on predict risk of accident using accident related factors. Supervised learning methods like decision tree, logistic regression, k-nearest neighbor, neural network are used for these prediction. However, derived prediction model from these algorithms are too complex to understand for human itself because the main purpose of these algorithms are prediction, not explanation of the data. Some of studies use unsupervised clustering algorithm to dividing the data into several groups, but derived group itself is still not easy to understand for human, so it is necessary to do some additional analytic works. Rule based learning methods are adequate when we want to derive comprehensive form of knowledge about the target domain. It derives a set of if-then rules that represent relationship between the target feature with other features. Rules are fairly easy for human to understand its meaning therefore it can help provide insight and comprehensible results for human. Association rule learning methods and subgroup discovery methods are representing rule based learning methods for descriptive task. These two algorithms have been used in a wide range of area from transaction analysis, accident data analysis, detection of statistically significant patient risk groups, discovering key person in social communities and so on. We use both the association rule learning method and the subgroup discovery method to discover useful patterns from a traffic accident dataset consisting of many features including profile of driver, location of accident, types of accident, information of vehicle, violation of regulation and so on. The association rule learning method, which is one of the unsupervised learning methods, searches for frequent item sets from the data and translates them into rules. In contrast, the subgroup discovery method is a kind of supervised learning method that discovers rules of user specified concepts satisfying certain degree of generality and unusualness. Depending on what aspect of the data we are focusing our attention to, we may combine different multiple relevant features of interest to make a synthetic target feature, and give it to the rule learning algorithms. After a set of rules is derived, some postprocessing steps are taken to make the ruleset more compact and easier to understand by removing some uninteresting or redundant rules. We conducted a set of experiments of mining our traffic accident data in both unsupervised mode and supervised mode for comparison of these rule based learning algorithms. Experiments with the traffic accident data reveals that the association rule learning, in its pure unsupervised mode, can discover some hidden relationship among the features. Under supervised learning setting with combinatorial target feature, however, the subgroup discovery method finds good rules much more easily than the association rule learning method that requires a lot of efforts to tune the parameters.
https://doi.org/10.13088/jiis.2015.21.4.001 인용 PDF KSCI

Analysis of National Stream Drying Phenomena using DrySAT-WFT Model: Focusing on Inflow of Dam and Weir Watersheds in 5 River Basins (DrySAT-WFT 모형을 활용한 전국 하천건천화 분석: 전국 5대강 댐·보 유역의 유입량을 중심으로)

LEE, Yong-Gwan;JUNG, Chung-Gil;KIM, Won-Jin;KIM, Seong-Joon
- Journal of the Korean Association of Geographic Information Studies
- /
- v.23 no.2
- /
- pp.53-69
- /
- 2020
The increase of the impermeable area due to industrialization and urban development distorts the hydrological circulation system and cause serious stream drying phenomena. In order to manage this, it is necessary to develop a technology for impact assessment of stream drying phenomena, which enables quantitative evaluation and prediction. In this study, the cause of streamflow reduction was assessed for dam and weir watersheds in the five major river basins of South Korea by using distributed hydrological model DrySAT-WFT (Drying Stream Assessment Tool and Water Flow Tracking) and GIS time series data. For the modeling, the 5 influencing factors of stream drying phenomena (soil erosion, forest growth, road-river disconnection, groundwater use, urban development) were selected and prepared as GIS-based time series spatial data from 1976 to 2015. The DrySAT-WFT was calibrated and validated from 2005 to 2015 at 8 multipurpose dam watershed (Chungju, Soyang, Andong, Imha, Hapcheon, Seomjin river, Juam, and Yongdam) and 4 gauging stations (Osucheon, Mihocheon, Maruek, and Chogang) respectively. The calibration results showed that the coefficient of determination (R²) was 0.76 in average (0.66 to 0.84) and the Nash-Sutcliffe model efficiency was 0.62 in average (0.52 to 0.72). Based on the 2010s (2006~2015) weather condition for the whole period, the streamflow impact was estimated by applying GIS data for each decade (1980s: 1976~1985, 1990s: 1986~1995, 2000s: 1996~2005, 2010s: 2006~2015). The results showed that the 2010s averaged-wet streamflow (Q95) showed decrease of 4.1~6.3%, the 2010s averaged-normal streamflow (Q185) showed decreased of 6.7~9.1% and the 2010s averaged-drought streamflow (Q355) showed decrease of 8.4~10.4% compared to 1980s streamflows respectively on the whole. During 1975~2015, the increase of groundwater use covered 40.5% contribution and the next was forest growth with 29.0% contribution among the 5 influencing factors.
https://doi.org/10.11108/kagis.2020.23.2.053 인용 PDF KSCI

Search Result 72, Processing Time 0.019 seconds

Comparison of Association Rule Learning and Subgroup Discovery for Mining Traffic Accident Data (교통사고 데이터의 마이닝을 위한 연관규칙 학습기법과 서브그룹 발견기법의 비교)

Analysis of National Stream Drying Phenomena using DrySAT-WFT Model: Focusing on Inflow of Dam and Weir Watersheds in 5 River Basins (DrySAT-WFT 모형을 활용한 전국 하천건천화 분석: 전국 5대강 댐·보 유역의 유입량을 중심으로)

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)