• Title/Summary/Keyword: 공간데이터마이닝

Search Result 66, Processing Time 0.024 seconds

Relationship between Diurnal Patterns of Transit Ridership and Land Use in the Metropolitan Seoul Area (서울 대도시권 하루 시간대별 지하철 통행흐름 패턴과 토지이용과의 관계)

  • Lee, Keum-Sook;Song, Ye-Na;Park, Jong-Soo;Anderson, William P.
    • Journal of the Economic Geographical Society of Korea
    • /
    • v.15 no.1
    • /
    • pp.26-41
    • /
    • 2012
  • This study investigates the time-space characteristics of intra-urban passenger flows in the Metropolitan Seoul area. In particular, we analyze the relationships between transit ridership and land use through the use of the subway passenger flow data obtained from the transit transaction databases. For this purpose, the strength of each subway station, i.e., the number of total in-coming and out-going passengers at each station, in the morning, afternoon, and evening, is calculated and visualized, which reflects urban land use patterns. Then the subway stations are classified into four groups via a hierarchical analysis of the in-coming and out-going passenger flows at 353 stations. Each group appears to have characteristic properties according to the region, e.g., residential areas and central business districts. This has been confirmed by the analysis which probes explicitly the relationship between the local socio-economic variables and station groups. This analysis, disclosing the inter-relationship between the subway network and urban land use, may be useful at various stages in urban as well as transportation planning, and provides analytical tools for a wide spectrum of applications ranging from impact evaluation to decision-making and planning support.

  • PDF

Analysis of Traffic Accidents Injury Severity in Seoul using Decision Trees and Spatiotemporal Data Visualization (의사결정나무와 시공간 시각화를 통한 서울시 교통사고 심각도 요인 분석)

  • Kang, Youngok;Son, Serin;Cho, Nahye
    • Journal of Cadastre & Land InformatiX
    • /
    • v.47 no.2
    • /
    • pp.233-254
    • /
    • 2017
  • The purpose of this study is to analyze the main factors influencing the severity of traffic accidents and to visualize spatiotemporal characteristics of traffic accidents in Seoul. To do this, we collected the traffic accident data that occurred in Seoul for four years from 2012 to 2015, and classified as slight, serious, and death traffic accidents according to the severity of traffic accidents. The analysis of spatiotemporal characteristics of traffic accidents was performed by kernel density analysis, hotspot analysis, space time cube analysis, and Emerging HotSpot Analysis. The factors affecting the severity of traffic accidents were analyzed using decision tree model. The results show that traffic accidents in Seoul are more frequent in suburbs than in central areas. Especially, traffic accidents concentrated in some commercial and entertainment areas in Seocho and Gangnam, and the traffic accidents were more and more intense over time. In the case of death traffic accidents, there were statistically significant hotspot areas in Yeongdeungpo-gu, Guro-gu, Jongno-gu, Jung-gu and Seongbuk. However, hotspots of death traffic accidents by time zone resulted in different patterns. In terms of traffic accident severity, the type of accident is the most important factor. The type of the road, the type of the vehicle, the time of the traffic accident, and the type of the violation of the regulations were ranked in order of importance. Regarding decision rules that cause serious traffic accidents, in case of van or truck, there is a high probability that a serious traffic accident will occur at a place where the width of the road is wide and the vehicle speed is high. In case of bicycle, car, motorcycle or the others there is a high probability that a serious traffic accident will occur under the same circumstances in the dawn time.

Performance Comparison of Clustering using Discritization Algorithm (이산화 알고리즘을 이용한 계층적 클러스터링의 실험적 성능 평가)

  • Won, Jae Kang;Lee, Jeong Chan;Jung, Yong Gyu;Lee, Young Ho
    • Journal of Service Research and Studies
    • /
    • v.3 no.2
    • /
    • pp.53-60
    • /
    • 2013
  • Datamining from the large data in the form of various techniques for obtaining information have been developed. In recent years one of the most sought areas of pattern recognition and machine learning method is created with most of existing learning algorithms based on categorical attributes to a rule or decision model. However, the real-world data, it may consist of numeric attributes in many cases. In addition it contains attributes with numerical values to the normal categorical attribute. In this case, therefore, it is required processes in order to use the data to learn an appropriate value for the type attribute. In this paper, the domain of the numeric attributes are divided into several segments using learning algorithm techniques of discritization. It is described Clustering with other data mining techniques. Large amount of first cluster with characteristics is similar records from the database into smaller groups that split multiple given finite patterns in the pattern space. It is close to each other of a set of patterns that together make up a bunch. Among the set without specifying a particular category in a given data by extracting a pattern. It will be described similar grouping of data clustering technique to classify the data.

  • PDF

A New Memory-based Learning using Dynamic Partition Averaging (동적 분할 평균을 이용한 새로운 메모리 기반 학습기법)

  • Yih, Hyeong-Il
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.18 no.4
    • /
    • pp.456-462
    • /
    • 2008
  • The classification is that a new data is classified into one of given classes and is one of the most generally used data mining techniques. Memory-Based Reasoning (MBR) is a reasoning method for classification problem. MBR simply keeps many patterns which are represented by original vector form of features in memory without rules for reasoning, and uses a distance function to classify a test pattern. If training patterns grows in MBR, as well as size of memory great the calculation amount for reasoning much have. NGE, FPA, and RPA methods are well-known MBR algorithms, which are proven to show satisfactory performance, but those have serious problems for memory usage and lengthy computation. In this paper, we propose DPA (Dynamic Partition Averaging) algorithm. it chooses partition points by calculating GINI-Index in the entire pattern space, and partitions the entire pattern space dynamically. If classes that are included to a partition are unique, it generates a representative pattern from partition, unless partitions relevant partitions repeatedly by same method. The proposed method has been successfully shown to exhibit comparable performance to k-NN with a lot less number of patterns and better result than EACH system which implements the NGE theory and FPA, and RPA.

Fuzzy discretization with spatial distribution of data and Its application to feature selection (데이터의 공간적 분포를 고려한 퍼지 이산화와 특징선택에의 응용)

  • Son, Chang-Sik;Shin, A-Mi;Lee, In-Hee;Park, Hee-Joon;Park, Hyoung-Seob;Kim, Yoon-Nyun
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.20 no.2
    • /
    • pp.165-172
    • /
    • 2010
  • In clinical data minig, choosing the optimal subset of features is such important, not only to reduce the computational complexity but also to improve the usefulness of the model constructed from the given data. Moreover the threshold values (i.e., cut-off points) of selected features are used in a clinical decision criteria of experts for differential diagnosis of diseases. In this paper, we propose a fuzzy discretization approach, which is evaluated by measuring the degree of separation of redundant attribute values in overlapping region, based on spatial distribution of data with continuous attributes. The weighted average of the redundant attribute values is then used to determine the threshold value for each feature and rough set theory is utilized to select a subset of relevant features from the overall features. To verify the validity of the proposed method, we compared experimental results, which applied to classification problem using 668 patients with a chief complaint of dyspnea, based on three discretization methods (i.e., equal-width, equal-frequency, and entropy-based) and proposed discretization method. From the experimental results, we confirm that the discretization methods with fuzzy partition give better results in two evaluation measures, average classification accuracy and G-mean, than those with hard partition.

Steel Plate Faults Diagnosis with S-MTS (S-MTS를 이용한 강판의 표면 결함 진단)

  • Kim, Joon-Young;Cha, Jae-Min;Shin, Junguk;Yeom, Choongsub
    • Journal of Intelligence and Information Systems
    • /
    • v.23 no.1
    • /
    • pp.47-67
    • /
    • 2017
  • Steel plate faults is one of important factors to affect the quality and price of the steel plates. So far many steelmakers generally have used visual inspection method that could be based on an inspector's intuition or experience. Specifically, the inspector checks the steel plate faults by looking the surface of the steel plates. However, the accuracy of this method is critically low that it can cause errors above 30% in judgment. Therefore, accurate steel plate faults diagnosis system has been continuously required in the industry. In order to meet the needs, this study proposed a new steel plate faults diagnosis system using Simultaneous MTS (S-MTS), which is an advanced Mahalanobis Taguchi System (MTS) algorithm, to classify various surface defects of the steel plates. MTS has generally been used to solve binary classification problems in various fields, but MTS was not used for multiclass classification due to its low accuracy. The reason is that only one mahalanobis space is established in the MTS. In contrast, S-MTS is suitable for multi-class classification. That is, S-MTS establishes individual mahalanobis space for each class. 'Simultaneous' implies comparing mahalanobis distances at the same time. The proposed steel plate faults diagnosis system was developed in four main stages. In the first stage, after various reference groups and related variables are defined, data of the steel plate faults is collected and used to establish the individual mahalanobis space per the reference groups and construct the full measurement scale. In the second stage, the mahalanobis distances of test groups is calculated based on the established mahalanobis spaces of the reference groups. Then, appropriateness of the spaces is verified by examining the separability of the mahalanobis diatances. In the third stage, orthogonal arrays and Signal-to-Noise (SN) ratio of dynamic type are applied for variable optimization. Also, Overall SN ratio gain is derived from the SN ratio and SN ratio gain. If the derived overall SN ratio gain is negative, it means that the variable should be removed. However, the variable with the positive gain may be considered as worth keeping. Finally, in the fourth stage, the measurement scale that is composed of selected useful variables is reconstructed. Next, an experimental test should be implemented to verify the ability of multi-class classification and thus the accuracy of the classification is acquired. If the accuracy is acceptable, this diagnosis system can be used for future applications. Also, this study compared the accuracy of the proposed steel plate faults diagnosis system with that of other popular classification algorithms including Decision Tree, Multi Perception Neural Network (MLPNN), Logistic Regression (LR), Support Vector Machine (SVM), Tree Bagger Random Forest, Grid Search (GS), Genetic Algorithm (GA) and Particle Swarm Optimization (PSO). The steel plates faults dataset used in the study is taken from the University of California at Irvine (UCI) machine learning repository. As a result, the proposed steel plate faults diagnosis system based on S-MTS shows 90.79% of classification accuracy. The accuracy of the proposed diagnosis system is 6-27% higher than MLPNN, LR, GS, GA and PSO. Based on the fact that the accuracy of commercial systems is only about 75-80%, it means that the proposed system has enough classification performance to be applied in the industry. In addition, the proposed system can reduce the number of measurement sensors that are installed in the fields because of variable optimization process. These results show that the proposed system not only can have a good ability on the steel plate faults diagnosis but also reduce operation and maintenance cost. For our future work, it will be applied in the fields to validate actual effectiveness of the proposed system and plan to improve the accuracy based on the results.