• Title/Summary/Keyword: Pre-Clustering

Search Result 126, Processing Time 0.027 seconds

A Hybrid Clustering Technique for Processing Large Data (대용량 데이터 처리를 위한 하이브리드형 클러스터링 기법)

  • Kim, Man-Sun;Lee, Sang-Yong
    • The KIPS Transactions:PartB
    • /
    • v.10B no.1
    • /
    • pp.33-40
    • /
    • 2003
  • Data mining plays an important role in a knowledge discovery process and various algorithms of data mining can be selected for the specific purpose. Most of traditional hierachical clustering methode are suitable for processing small data sets, so they difficulties in handling large data sets because of limited resources and insufficient efficiency. In this study we propose a hybrid neural networks clustering technique, called PPC for Pre-Post Clustering that can be applied to large data sets and find unknown patterns. PPC combinds an artificial intelligence method, SOM and a statistical method, hierarchical clustering technique, and clusters data through two processes. In pre-clustering process, PPC digests large data sets using SOM. Then in post-clustering, PPC measures Similarity values according to cohesive distances which show inner features, and adjacent distances which show external distances between clusters. At last PPC clusters large data sets using the simularity values. Experiment with UCI repository data showed that PPC had better cohensive values than the other clustering techniques.

Pre-Adjustment of Incomplete Group Variable via K-Means Clustering

  • Hwang, S.Y.;Hahn, H.E.
    • Journal of the Korean Data and Information Science Society
    • /
    • v.15 no.3
    • /
    • pp.555-563
    • /
    • 2004
  • In classification and discrimination, we often face with incomplete group variable arising typically from many missing values and/or incredible cases. This paper suggests the use of K-means clustering for pre-adjusting incompleteness and in turn classification based on generalized statistical distance is performed. For illustrating the proposed procedure, simulation study is conducted comparatively with CART in data mining and traditional techniques which are ignoring incompleteness of group variable. Simulation study manifests that our methodology out-performs.

  • PDF

A Simple Tandem Method for Clustering of Multimodal Dataset

  • Cho C.;Lee J.W.;Lee J.W.
    • Proceedings of the Korean Operations and Management Science Society Conference
    • /
    • 2003.05a
    • /
    • pp.729-733
    • /
    • 2003
  • The presence of local features within clusters incurred by multi-modal nature of data prohibits many conventional clustering techniques from working properly. Especially, the clustering of datasets with non-Gaussian distributions within a cluster can be problematic when the technique with implicit assumption of Gaussian distribution is used. Current study proposes a simple tandem clustering method composed of k-means type algorithm and hierarchical method to solve such problems. The multi-modal dataset is first divided into many small pre-clusters by k-means or fuzzy k-means algorithm. The pre-clusters found from the first step are to be clustered again using agglomerative hierarchical clustering method with Kullback- Leibler divergence as the measure of dissimilarity. This method is not only effective at extracting the multi-modal clusters but also fast and easy in terms of computation complexity and relatively robust at the presence of outliers. The performance of the proposed method was evaluated on three generated datasets and six sets of publicly known real world data.

  • PDF

Classification of basin characteristics related to inundation using clustering (군집분석을 이용한 침수관련 유역특성 분류)

  • Lee, Han Seung;Cho, Jae Woong;Kang, Ho seon;Hwang, Jeong Geun;Moon, Hae Jin
    • Proceedings of the Korea Water Resources Association Conference
    • /
    • 2020.06a
    • /
    • pp.96-96
    • /
    • 2020
  • In order to establish the risk criteria of inundation due to typhoons or heavy rainfall, research is underway to predict the limit rainfall using basin characteristics, limit rainfall and artificial intelligence algorithms. In order to improve the model performance in estimating the limit rainfall, the learning data are used after the pre-processing. When 50.0% of the entire data was removed as an outlier in the pre-processing process, it was confirmed that the accuracy is over 90%. However, the use rate of learning data is very low, so there is a limitation that various characteristics cannot be considered. Accordingly, in order to predict the limit rainfall reflecting various watershed characteristics by increasing the use rate of learning data, the watersheds with similar characteristics were clustered. The algorithms used for clustering are K-Means, Agglomerative, DBSCAN and Spectral Clustering. The k-Means, DBSCAN and Agglomerative clustering algorithms are clustered at the impervious area ratio, and the Spectral clustering algorithm is clustered in various forms depending on the parameters. If the results of the clustering algorithm are applied to the limit rainfall prediction algorithm, various watershed characteristics will be considered, and at the same time, the performance of predicting the limit rainfall will be improved.

  • PDF

Semantic Correspondence of Database Schema from Heterogeneous Databases using Self-Organizing Map

  • Dumlao, Menchita F.;Oh, Byung-Joo
    • Journal of IKEEE
    • /
    • v.12 no.4
    • /
    • pp.217-224
    • /
    • 2008
  • This paper provides a framework for semantic correspondence of heterogeneous databases using self- organizing map. It solves the problem of overlapping between different databases due to their different schemas. Clustering technique using self-organizing maps (SOM) is tested and evaluated to assess its performance when using different kinds of data. Preprocessing of database is performed prior to clustering using edit distance algorithm, principal component analysis (PCA), and normalization function to identify the features necessary for clustering.

  • PDF

Range-Doppler Clustering of Radar Data for Detecting Moving Objects (이동물체 탐지를 위한 레이다 데이터의 거리-도플러 클러스터링 기법)

  • Kim, Seongjoon;Yang, Dongwon;Jung, Younghun;Kim, Sujin;Yoon, Joohong
    • Journal of the Korea Institute of Military Science and Technology
    • /
    • v.17 no.6
    • /
    • pp.810-820
    • /
    • 2014
  • Recently many studies of Radar systems mounted on ground vehicles for autonomous driving, SLAM (Simultaneous localization and mapping) and collision avoidance are reported. In near field, several hits per an object are generated after signal processing of Radar data. Hence, clustering is an essential technique to estimate their shapes and positions precisely. This paper proposes a method of grouping hits in range-doppler domains into clusters which represent each object, according to the pre-defined rules. The rules are based on the perceptual cues to separate hits by object. The morphological connectedness between hits and the characteristics of SNR distribution of hits are adopted as the perceptual cues for clustering. In various simulations for the performance assessment, the proposed method yielded more effective performance than other techniques.

Development of Sasang Type Diagnostic Test with Neural Network (신경망을 사용한 사상체질 진단검사 개발 연구)

  • Chae, Han;Hwang, Sang-Moon;Eom, Il-Kyu;Kim, Byoung-Chul;Kim, Young-In;Kim, Byung-Joo;Kwon, Young-Kyu
    • Journal of Physiology & Pathology in Korean Medicine
    • /
    • v.23 no.4
    • /
    • pp.765-771
    • /
    • 2009
  • The medical informatics for clustering Sasang types with collected clinical data is important for the personalized medicine, but it has not been thoroughly studied yet. The purpose of this study was to examine the usefulness of neural network data mining algorithm for traditional Korean medicine. We used Kohonen neural network, the Self-Organizing Map (SOM), for the analysis of biomedical information following data pre-processing and calculated the validity index as percentage correctly predicted and type-specific sensitivity. We can extract 12 data fields from 30 after data pre-processing with correlation analysis and latent functional relationship analysis. The profile of Myers-Briggs Type Inidcator and Bio-Impedance Analysis data which are clustered with SOM was similar to that of original measurements. The percentage correctly predicted was 56%, and sensitivity for So-Yang, Tae-Eum and So-Eum type were 56%, 48%, and 61%, respectively. This study showed that the neural network algorithm for clustering Sasang types based on clinical data is useful for the sasang type diagnostic test itself. We discussed the importance of data pre-processing and clustering algorithm for the validity of medical devices in traditional Korean medicine.

Real-time Reflection Light Detection Algorithm using Pixel Clustering Data (Pixel 군집화 Data를 이용한 실시간 반사광 검출 알고리즘)

  • Hwang, Dokyung;An, Jongwoo;Kang, Hosun;Lee, Jangmyung
    • The Journal of Korea Robotics Society
    • /
    • v.14 no.4
    • /
    • pp.301-310
    • /
    • 2019
  • A new algorithm has been propose to detect the reflected light region as disturbances in a real-time vision system. There have been several attempts to detect existing reflected light region. The conventional mathematical approach requires a lot of complex processes so that it is not suitable for a real-time vision system. On the other hand, when a simple detection process has been applied, the reflected light region can not be detected accurately. Therefore, in order to detect reflected light region for a real-time vision system, the detection process requires a new algorithm that is as simple and accurate as possible. In order to extract the reflected light, the proposed algorithm has been adopted several filter equations and clustering processes in the HSI (Hue Saturation Intensity) color space. Also the proposed algorithm used the pre-defined reflected light data generated through the clustering processes to make the algorithm simple. To demonstrate the effectiveness of the proposed algorithm, several images with the reflected region have been used and the reflected regions are detected successfully.

Online Clustering Algorithms for Semantic-Rich Network Trajectories

  • Roh, Gook-Pil;Hwang, Seung-Won
    • Journal of Computing Science and Engineering
    • /
    • v.5 no.4
    • /
    • pp.346-353
    • /
    • 2011
  • With the advent of ubiquitous computing, a massive amount of trajectory data has been published and shared in many websites. This type of computing also provides motivation for online mining of trajectory data, to fit user-specific preferences or context (e.g., time of the day). While many trajectory clustering algorithms have been proposed, they have typically focused on offline mining and do not consider the restrictions of the underlying road network and selection conditions representing user contexts. In clear contrast, we study an efficient clustering algorithm for Boolean + Clustering queries using a pre-materialized and summarized data structure. Our experimental results demonstrate the efficiency and effectiveness of our proposed method using real-life trajectory data.

IAM Clustering Architecture for Inter-Cloud Environment (Inter-Cloud 환경을 위한 IAM 클러스터링 아키텍처)

  • Kim, Jinouk;Park, Jung Soo;Park, Minho;Jung, Souhwan
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.40 no.5
    • /
    • pp.860-862
    • /
    • 2015
  • In this paper, we propose a new type of IAM clustering architecture for the efficiency of user authentication and authorization in the Inter-Cloud environment. clustering architecture allows users to easily use un-registered services with their registered authentication and access permissions through pre-Access Agreement. through this paper, we explain our authentication protocol and IAM clustering architecture components.