• Title/Summary/Keyword: UCI repository

Search Result 74, Processing Time 0.021 seconds

Hybrid Self Organizing Map using Monte Carlo Computing

  • Jun Sung-Hae;Park Min-Jae;Oh Kyung-Whan
    • Proceedings of the Korean Institute of Intelligent Systems Conference
    • /
    • 2006.05a
    • /
    • pp.381-384
    • /
    • 2006
  • Self Organizing Map(SOM) is a powerful neural network model for unsupervised loaming. In many clustering works with exploratory data analysis, it has been popularly used. But it has a weakness which is the poorly theoretical base. A lot more researches for settling the problem have been published. Also, our paper proposes a method to overcome the drawback of SOM. As compared with the presented researches, our method has a different approach to solve the problem. So, a hybrid SOM is proposed in this paper. Using Monte Carlo computing, a hybrid SOM improves the performance of clustering. We verify the improved performance of a hybrid SOM according to the experimental results using UCI machine loaming repository. In addition to, the number of clusters is determined by our hybrid SOM.

  • PDF

Association Analysis of Parkinson's Disease using Apriori Algorithm

  • Jung, Yong-Gyu;Kim, Oh-Jin;Won, Jae-Kang
    • International journal of advanced smart convergence
    • /
    • v.1 no.1
    • /
    • pp.43-47
    • /
    • 2012
  • Parkinson's disease is representative degenerative diseases of the nervous system, which is from deficiency of dopamine neurons to pass in which the gradual degeneration of the body. In this paper, open UCI repository data of Parkinson's patients is used for experiments. The classification based on correlation analysis is examined. In addition, the relationship between groups is differentiated by cluster analysis based on patients with Parkinson's disease by apriori algorithm and correlation analysis. It is used to find the properties that distinguish cluster analysis. Though the disease is the same in the basic structure, each group is compared as each gender group with the most distinctive part of the characteristics.

Intelligent Data Mining Agent for Automatic Clustering (자동 군집화를 위한 지능화된 데이터 마이닝 에이전트)

  • 박정은;전성해;오경환
    • Proceedings of the Korea Inteligent Information System Society Conference
    • /
    • 2002.11a
    • /
    • pp.370-376
    • /
    • 2002
  • 인터넷 환경에서 발생되는 수많은 데이터를 지능적으로 처리할 수 있는 자동화된 분석 시스템의 필요성이 제기된다. 이러한 시스템의 데이터 분석은 크게 지도 학습과 자율 학습으로 나된다. 본 논문에서는 특히 자율학습 군집화에 대한 자동화된 시스템으로서 지능화된 데이터 마이닝 에이전트를 제안한다. 군집화 과정에서는 데이터를 분석하는 분석가가 군집화의 방법과 결과 해석에 실시간으로 관여하기 어렵기 때문에 이러한 작업을 담당하는 지능화된 에이전트가 자동화된 군집화를 담당하면 효과적인 군집화 전략이 될 수 있다. 본 논문의 자동 군집화를 위한 지능화된 데이터 마이닝 에이전트 시스템은 군집화 수행 에이전트와 군집화 성능 평가 에이전트로 구성된 다중 에이전트로서 두 개의 에이전트가 서로 정보를 교환하면서 최적의 군집화를 수행한다. UCI Machine Repository 데이터를 이용한 실험을 통해 제안 시스템의 성능 평가를 수행하였다.

  • PDF

Discretization of continuous-valued attributes considering data distribution (데이터 분포를 고려한 연속 값 속성의 이산화)

  • 이상훈;박정은;오경환
    • Proceedings of the Korean Institute of Intelligent Systems Conference
    • /
    • 2003.05a
    • /
    • pp.217-220
    • /
    • 2003
  • 본 논문에서는 특정 매개변수의 입력 없이 속성(attribute)에 따른 목적속성(class)값의 분포를 고려하여 연속형(conti-nuous) 값을 범주형(categorical)의 형태로 변환시키는 새로운 방법을 제안하였다. 각각의 속성에 대해 목적속성의 분포를 1차원 공간에 사상(mapping)하고, 각 목적속성의 밀도, 다른 목적속성과의 중복 정도 등의 기준에 따라 구간을 군집화 한다. 이렇게 생성된 군집들은 각각 목적속성을 예측할 수 있는 확률적 수치에 기반한 것으로, 각 속성이 제공하는 정보의 손실을 최소화하는 이산화 경계선을 갖고 있다. 제안된 데이터 이산화 방법의 향상된 성능은 C4.5 알고리즘과 UCI Machine Learning Data Repository 데이터를 사용하여 확인할 수 있다.

  • PDF

Cluster Merging Using Density based Fuzzy C-Means algorithm (밀도 기반의 퍼지 C-Means 알고리즘을 이용한 클러스터 합병)

  • 한진우;전성해;오경환
    • Proceedings of the Korean Institute of Intelligent Systems Conference
    • /
    • 2003.05a
    • /
    • pp.235-238
    • /
    • 2003
  • Fuzzy C-Means(FCM) 알고리즘은 초기 군집 중심의 개수와 위치에 따라 군집 결과의 성능차이가 많이 나타난다. 하지만 일반적인 경우에 군집 중심의 개수는 분석가의 주관에 의해 결정되고, 임의적으로 결정되기 때문에 원래 데이터의 구조와는 무관하게 수행되어 최적화된 군집화 수행을 실행하지 못하는 경우가 발생하게 된다. 따라서 본 논문에서는 원래의 데이터의 구조에 좀더 근접한 퍼지 군집화를 수행하기 위하여 격자를 바탕으로 한 데이터의 밀도를 이용한 FCM을 제안하고, 이러한 밀도 기반 FCM에 의해 결정된 군집의 합병 기법을 제안하였다. N-차원의 데이터 공간을 N-차원의 격자로 나누고, 초기 군집 중심의 개수와 위치는 각 격자의 밀도를 바탕으로 결정된다. 초기화 이후에 각 격자 내부에서 FCM을 이용하여 군집화를 수행하고, 계속해서 이웃 격자의 군집결과에 대하여 군집간의 유사도 측도를 이용하여 군집 합병을 수행함으로써 데이터의 자연적인 구조에 근접한 군집화를 수행하였다. 제안된 군집화 합병 기법의 향상된 성능은 UCI Machine Learning Repository 데이터를 이용하여 확인하였다.

  • PDF

A Co-Evolutionary Computing for Statistical Learning Theory

  • Jun Sung-Hae
    • International Journal of Fuzzy Logic and Intelligent Systems
    • /
    • v.5 no.4
    • /
    • pp.281-285
    • /
    • 2005
  • Learning and evolving are two basics for data mining. As compared with classical learning theory based on objective function with minimizing training errors, the recently evolutionary computing has had an efficient approach for constructing optimal model without the minimizing training errors. The global search of evolutionary computing in solution space can settle the local optima problems of learning models. In this research, combining co-evolving algorithm into statistical learning theory, we propose an co-evolutionary computing for statistical learning theory for overcoming local optima problems of statistical learning theory. We apply proposed model to classification and prediction problems of the learning. In the experimental results, we verify the improved performance of our model using the data sets from UCI machine learning repository and KDD Cup 2000.

An Optimal Clustering using Hybrid Self Organizing Map

  • Jun, Sung-Hae
    • International Journal of Fuzzy Logic and Intelligent Systems
    • /
    • v.6 no.1
    • /
    • pp.10-14
    • /
    • 2006
  • Many clustering methods have been studied. For the most part of these methods may be needed to determine the number of clusters. But, there are few methods for determining the number of population clusters objectively. It is difficult to determine the cluster size. In general, the number of clusters is decided by subjectively prior knowledge. Because the results of clustering depend on the number of clusters, it must be determined seriously. In this paper, we propose an efficient method for determining the number of clusters using hybrid' self organizing map and new criterion for evaluating the clustering result. In the experiment, we verify our model to compare other clustering methods using the data sets from UCI machine learning repository.

Empirical Comparisons of Clustering Algorithms using Silhouette Information

  • Jun, Sung-Hae;Lee, Seung-Joo
    • International Journal of Fuzzy Logic and Intelligent Systems
    • /
    • v.10 no.1
    • /
    • pp.31-36
    • /
    • 2010
  • Many clustering algorithms have been used in diverse fields. When we need to group given data set into clusters, many clustering algorithms based on similarity or distance measures are considered. Most clustering works have been based on hierarchical and non-hierarchical clustering algorithms. Generally, for the clustering works, researchers have used clustering algorithms case by case from these algorithms. Also they have to determine proper clustering methods subjectively by their prior knowledge. In this paper, to solve the subjective problem of clustering we make empirical comparisons of popular clustering algorithms which are hierarchical and non hierarchical techniques using Silhouette measure. We use silhouette information to evaluate the clustering results such as the number of clusters and cluster variance. We verify our comparison study by experimental results using data sets from UCI machine learning repository. Therefore we are able to use efficient and objective clustering algorithms.

Support Vector Machine based on Stratified Sampling

  • Jun, Sung-Hae
    • International Journal of Fuzzy Logic and Intelligent Systems
    • /
    • v.9 no.2
    • /
    • pp.141-146
    • /
    • 2009
  • Support vector machine is a classification algorithm based on statistical learning theory. It has shown many results with good performances in the data mining fields. But there are some problems in the algorithm. One of the problems is its heavy computing cost. So we have been difficult to use the support vector machine in the dynamic and online systems. To overcome this problem we propose to use stratified sampling of statistical sampling theory. The usage of stratified sampling supports to reduce the size of training data. In our paper, though the size of data is small, the performance accuracy is maintained. We verify our improved performance by experimental results using data sets from UCI machine learning repository.

Diagnosis of Parkinson's Disease by Voice Disorder Using Mahalanobis Taguchi System (Mahalanobis Taguchi System을 이용한 파킨슨병 환자의 음성분석을 통한 진단에 관한 연구)

  • Hong, Jung-Eui
    • Journal of Korean Society of Industrial and Systems Engineering
    • /
    • v.32 no.4
    • /
    • pp.215-222
    • /
    • 2009
  • Human voice reacts very sensitively to human's minute physical condition. For instance, human voice disorders affect patients profoundly especially in the case of Parkinson's disease. Acoustic tools such as MDVP, can function as an equipment that measures various voice in different objects. Many different approaches have been applied for analyzing the voice disorders for diagnosis of Parkinson's disease. According to the voice data of suspected Parkinson's patients from UCI Machine Learning Repository, it is reported to have 23 people with Parkinson's disease and 8 healthy people. Applying Mahalanobis Taguchi System (MTS) for diagnosis of Parkinson's disease, the correct diagnosis performance is compared to previous research results.