• Title/Summary/Keyword: Data Clustering

Search Result 2,754, Processing Time 0.033 seconds

A Study on Performance Evaluation of Clustering Algorithms using Neural and Statistical Method (클러스터링 성능평가: 신경망 및 통계적 방법)

  • 윤석환;신용백
    • Journal of the Korean Professional Engineers Association
    • /
    • v.29 no.2
    • /
    • pp.71-79
    • /
    • 1996
  • This paper evaluates the clustering performance of a neural network and a statistical method. Algorithms which are used in this paper are the GLVQ(Generalized Loaming vector Quantization) for a neural method and the k -means algorithm for a statistical clustering method. For comparison of two methods, we calculate the Rand's c statistics. As a result, the mean of c value obtained with the GLVQ is higher than that obtained with the k -means algorithm, while standard deviation of c value is lower. Experimental data sets were the Fisher's IRIS data and patterns extracted from handwritten numerals.

  • PDF

Clustering Observations for Detecting Multiple Outliers in Regression Models

  • Seo, Han-Son;Yoon, Min
    • The Korean Journal of Applied Statistics
    • /
    • v.25 no.3
    • /
    • pp.503-512
    • /
    • 2012
  • Detecting outliers in a linear regression model eventually fails when similar observations are classified differently in a sequential process. In such circumstances, identifying clusters and applying certain methods to the clustered data can prevent a failure to detect outliers and is computationally efficient due to the reduction of data. In this paper, we suggest to implement a clustering procedure for this purpose and provide examples that illustrate the suggested procedure applied to the Hadi-Simonoff (1993) method, reverse Hadi-Simonoff method, and Gentleman-Wilk (1975) method.

Design of improved Mulit-FNN for Nonlinear Process modeling

  • Park, Hosung;Sungkwun Oh
    • 제어로봇시스템학회:학술대회논문집
    • /
    • 2002.10a
    • /
    • pp.102.2-102
    • /
    • 2002
  • In this paper, the improved Multi-FNN (Fuzzy-Neural Networks) model is identified and optimized using HCM (Hard C-Means) clustering method and optimization algorithms. The proposed Multi-FNN is based on FNN and use simplified and linear inference as fuzzy inference method and error back propagation algorithm as learning rules. We use a HCM clustering and genetic algorithms (GAs) to identify both the structure and the parameters of a Multi-FNN model. Here, HCM clustering method, which is carried out for the process data preprocessing of system modeling, is utilized to determine the structure of Multi-FNN according to the divisions of input-output space using I/O process data. Also, the parame...

  • PDF

Fuzzy Modeling based on FCM Clustering Algorithm (FCM 클러스터링 알고리즘에 기초한 퍼지 모델링)

  • 윤기찬;오성권
    • 제어로봇시스템학회:학술대회논문집
    • /
    • 2000.10a
    • /
    • pp.373-373
    • /
    • 2000
  • In this paper, we propose a fuzzy modeling algorithm which divides the input space more efficiently than convention methods by taking into consideration correlations between components of sample data. The proposed fuzzy modeling algorithm consists of two steps: coarse tuning, which determines consequent parameters approximately using FCRM clustering method, and fine tuning, which adjusts the premise and consequent parameters more precisely by gradient descent algorithm. To evaluate the performance of the proposed fuzzy mode, we use the numerical data of nonlinear function.

  • PDF

Results of Discriminant Analysis with Respect to Cluster Analyses Under Dimensional Reduction

  • Chae, Seong-San
    • Communications for Statistical Applications and Methods
    • /
    • v.9 no.2
    • /
    • pp.543-553
    • /
    • 2002
  • Principal component analysis is applied to reduce p-dimensions into q-dimensions ( $q {\leq} p$). Any partition of a collection of data points with p and q variables generated by the application of six hierarchical clustering methods is re-classified by discriminant analysis. From the application of discriminant analysis through each hierarchical clustering method, correct classification ratios are obtained. The results illustrate which method is more reasonable in exploratory data analysis.

Forecasting High-Level Ozone Concentration with Fuzzy Clustering (퍼지 클러스터링을 이용한 고농도오존예측)

  • 김재용;김성신;왕보현
    • Proceedings of the Korean Institute of Intelligent Systems Conference
    • /
    • 2001.05a
    • /
    • pp.191-194
    • /
    • 2001
  • The ozone forecasting systems have many problems because the mechanism of the ozone concentration is highly complex, nonlinear, and nonstationary. Also, the results of prediction are not a good performance so far, especially in the high-level ozone concentration. This paper describes the modeling method of the ozone prediction system using neuro-fuzzy approaches and fuzzy clustering. The dynamic polynomial neural network (DPNN) based upon a typical algorithm of GMDH (group method of data handling) is a useful method for data analysis, identification of nonlinear complex system, and prediction of a dynamical system.

  • PDF

A k-means++ Algorithm for Internet Shopping Search Engine

  • Jian-Ji Ren;Jae-kee Lee
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2008.11a
    • /
    • pp.75-77
    • /
    • 2008
  • Nowadays, as the indices of the major search engines grow to a tremendous proportion, vertical search services can help customers to find what they need. Search Engine is one of the reasons for Internet shopping success in today's world. The import one part of search engine is clustering data. The objective of this paper is to explore a k-means++ algorithm to calculate the clustering data which in the Internet shopping environment. The experiment results shows that the k-means++ algorithm is a faster algorithm to achieved a good clustering.

Clustering of Time-Course Microarray Data Using Pharmacokinetic Parameter (약동학적 파라미터를 이용한 시간경로 마이크로어레이 자료의 군집분석)

  • Lee, Hyo-Jung;Kim, Peol-A;Park, Mi-Ra
    • The Korean Journal of Applied Statistics
    • /
    • v.24 no.4
    • /
    • pp.623-631
    • /
    • 2011
  • A major goal of time-course microarray data analysis is the detection of groups of genes that manifest similar expression patterns over time. The corresponding numerous cluster algorithms for clustering time-course microarray data have been developed. In this study, we proposed a clustering method based on the primary pharmacokinetic parameters in the pharmacokinetics study for assessment of pharmaceutical equivalents between two drug products. A real data and a simulation data was used to demonstrate the usefulness of the proposed method.

The Selective Transmission of Sensor Data for a Water Quality Monitoring System (수질 모니터링 시스템을 위한 센서 데이터의 선택적 전송방법)

  • Kwon, Dae-Hyeon;Oh, Ryeom-Duk;Cho, Soo-Sun
    • Journal of Internet Computing and Services
    • /
    • v.11 no.4
    • /
    • pp.51-58
    • /
    • 2010
  • In this paper, we introduce various attempts to transmit sensor data efficiently for design of a water quality monitoring system under the USN environment. The representative methods are the sensor management on a sensor node and the clustering on a sink node. The sensor management includes controls of sensing intervals, data accumulations, and data transmissions. And the clustering is one of efficient data compression methods using data mining technology. From the experimental results we confirmed that the proposed transmission method using the sensor management and the clustering outperformed common transmission method.

Unsupervised Outpatients Clustering: A Case Study in Avissawella Base Hospital, Sri Lanka

  • Hoang, Huu-Trung;Pham, Quoc-Viet;Kim, Jung Eon;Kim, Hoon;Park, Junseok;Hwang, Won-Joo
    • Journal of Korea Multimedia Society
    • /
    • v.22 no.4
    • /
    • pp.480-490
    • /
    • 2019
  • Nowadays, Electronic Medical Record (EMR) has just implemented at few hospitals for Outpatient Department (OPD). OPD is the diversified data, it includes demographic and diseases of patient, so it need to be clustered in order to explore the hidden rules and the relationship of data types of patient's information. In this paper, we propose a novel approach for unsupervised clustering of patient's demographic and diseases in OPD. Firstly, we collect data from a hospital at OPD. Then, we preprocess and transform data by using powerful techniques such as standardization, label encoder, and categorical encoder. After obtaining transformed data, we use some strong experiments, techniques, and evaluation to select the best number of clusters and best clustering algorithm. In addition, we use some tests and measurements to analyze and evaluate cluster tendency, models, and algorithms. Finally, we obtain the results to analyze and discover new knowledge, meanings, and rules. Clusters that are found out in this research provide knowledge to medical managers and doctors. From these information, they can improve the patient management methods, patient arrangement methods, and doctor's ability. In addition, it is a reference for medical data scientist to mine OPD dataset.