• 제목/요약/키워드: Classification of Clusters

검색결과 349건 처리시간 0.029초

Contribution to Improve Database Classification Algorithms for Multi-Database Mining

  • Miloudi, Salim;Rahal, Sid Ahmed;Khiat, Salim
    • Journal of Information Processing Systems
    • /
    • 제14권3호
    • /
    • pp.709-726
    • /
    • 2018
  • Database classification is an important preprocessing step for the multi-database mining (MDM). In fact, when a multi-branch company needs to explore its distributed data for decision making, it is imperative to classify these multiple databases into similar clusters before analyzing the data. To search for the best classification of a set of n databases, existing algorithms generate from 1 to ($n^2-n$)/2 candidate classifications. Although each candidate classification is included in the next one (i.e., clusters in the current classification are subsets of clusters in the next classification), existing algorithms generate each classification independently, that is, without taking into account the use of clusters from the previous classification. Consequently, existing algorithms are time consuming, especially when the number of candidate classifications increases. To overcome the latter problem, we propose in this paper an efficient approach that represents the problem of classifying the multiple databases as a problem of identifying the connected components of an undirected weighted graph. Theoretical analysis and experiments on public databases confirm the efficiency of our algorithm against existing works and that it overcomes the problem of increase in the execution time.

A New Distributed Parallel Algorithm for Pattern Classification using Neural Network Model

  • 김대수;백순철
    • ETRI Journal
    • /
    • 제13권2호
    • /
    • pp.34-41
    • /
    • 1991
  • In this paper, a new distributed parallel algorithm for pattern classification based upon Self-Organizing Neural Network(SONN)[10-12] is developed. This system works without any information about the number of clusters or cluster centers. The SONN model showed good performance for finding classification information, cluster centers, the number of salient clusters and membership information. It took a considerable amount of time in the sequential version if the input data set size is very large. Therefore, design of parallel algorithm is desirous. A new distributed parallel algorithm is developed and experimental results are presented.

  • PDF

Clustering Algorithm Using Hashing in Classification of Multispectral Satellite Images

  • Park, Sung-Hee;Kim, Hwang-Soo;Kim, Young-Sup
    • 대한원격탐사학회지
    • /
    • 제16권2호
    • /
    • pp.145-156
    • /
    • 2000
  • Clustering is the process of partitioning a data set into meaningful clusters. As the data to process increase, a laster algorithm is required than ever. In this paper, we propose a clustering algorithm to partition a multispectral remotely sensed image data set into several clusters using a hash search algorithm. The processing time of our algorithm is compared with that of clusters algorithm using other speed-up concepts. The experiment results are compared with respect to the number of bands, the number of clusters and the size of data. It is also showed that the processing time of our algorithm is shorter than that of cluster algorithms using other speed-up concepts when the size of data is relatively large.

Identification of a Gaussian Fuzzy Classifier

  • Heesoo Hwang
    • International Journal of Control, Automation, and Systems
    • /
    • 제2권1호
    • /
    • pp.118-124
    • /
    • 2004
  • This paper proposes an approach to deriving a fuzzy classifier based on evolutionary supervised clustering, which identifies the optimal clusters necessary to classify classes. The clusters are formed by multi-dimensional weighted Euclidean distance, which allows clusters of varying shapes and sizes. A cluster induces a Gaussian fuzzy antecedent set with unique variance in each dimension, which reflects the tightness of the cluster. The fuzzy classifier is com-posed of as many classification rules as classes. The clusters identified for each class constitute fuzzy sets, which are joined by an "and" connective in the antecedent part of the corresponding rule. The approach is evaluated using six data sets. The comparative results with different classifiers are given.are given.

A Classification Algorithm Based on Data Clustering and Data Reduction for Intrusion Detection System over Big Data

  • Wang, Qiuhua;Ouyang, Xiaoqin;Zhan, Jiacheng
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • 제13권7호
    • /
    • pp.3714-3732
    • /
    • 2019
  • With the rapid development of network, Intrusion Detection System(IDS) plays a more and more important role in network applications. Many data mining algorithms are used to build IDS. However, due to the advent of big data era, massive data are generated. When dealing with large-scale data sets, most data mining algorithms suffer from a high computational burden which makes IDS much less efficient. To build an efficient IDS over big data, we propose a classification algorithm based on data clustering and data reduction. In the training stage, the training data are divided into clusters with similar size by Mini Batch K-Means algorithm, meanwhile, the center of each cluster is used as its index. Then, we select representative instances for each cluster to perform the task of data reduction and use the clusters that consist of representative instances to build a K-Nearest Neighbor(KNN) detection model. In the detection stage, we sort clusters according to the distances between the test sample and cluster indexes, and obtain k nearest clusters where we find k nearest neighbors. Experimental results show that searching neighbors by cluster indexes reduces the computational complexity significantly, and classification with reduced data of representative instances not only improves the efficiency, but also maintains high accuracy.

Improvement of location positioning using KNN, Local Map Classification and Bayes Filter for indoor location recognition system

  • Oh, Seung-Hoon;Maeng, Ju-Hyun
    • 한국컴퓨터정보학회논문지
    • /
    • 제26권6호
    • /
    • pp.29-35
    • /
    • 2021
  • 본 논문에서는 위치 측위의 정확도를 높일 수 있는 방안으로 KNN(K-Nearest Neighbor)과 Local Map Classification 및 Bayes Filter를 융합한 기법을 제안한다. 먼저 이 기법은 Local Map Classification이 실제 지도를 여러 개의 Cluster로 나누고, 다음으로 KNN으로 Cluster들을 분류한다. 그리고 Bayes Filter가 획득한 각 Cluster의 확률을 통하여 Posterior Probability을 계산한다. 이 Posterior Probability으로 로봇이 위치한 Cluster를 검색한다. 성능 평가를 위하여 KNN과 Local Map Classification 및 Bayes Filter을 적용하여서 얻은 위치 측위의 결과를 분석하였다. 분석 결과로 RSSI 신호가 변하더라도 위치 정보는 한 Cluster에 고정되면서 위치 측위의 정확도가 높아진다는 사실을 확인하였다.

적응적 탐색 전략을 갖춘 계층적 ART2 분류 모델 (Hierarchical Ann Classification Model Combined with the Adaptive Searching Strategy)

  • 김도현;차의영
    • 한국정보과학회논문지:소프트웨어및응용
    • /
    • 제30권7_8호
    • /
    • pp.649-658
    • /
    • 2003
  • 본 연구에서는 ART2 신경회로망의 성능을 개선하기 위한 계층적 구조를 제안하고, 구성된 클러스터에 대하여 적합도(fitness) 선택을 통한 빠르고 효과적인 패턴 분류 모델(HART2)을 제안한다. 본 논문에서 제안하는 신경회로망은 비지도 학습을 통하여 대략적으로 1차 클러스터를 형성하고, 이 각각의 1차 클러스터로 분류된 패턴에 대해 지도학습을 통한 2군 클러스터를 생성하여 패턴을 분류하는 계층적 신경회로망이다. 이 신경회로망을 이용한 패턴분류 과정은 먼저 입력패턴을 1차 클러스터와 비교하여 유사한 몇 개의 1차 클러스터를 적합도에 따라 선택한다. 이때, 입력패턴과 클러스터들간의 상대 측정 거리비에 기반한 적합도 함수를 도입하여 1차 클러스터에 연결된 클러스터들을 Pruning 함으로써 계층적인 네트워크에서의 속도 향상과 정확성을 추구하였다. 마지막으로 입력패턴과 선택된 1차 클러스터에 연결된 2차 클러스터와의 비교를 통해 최종적으로 패턴을 분류하게 된다. 본 논문의 효율성을 검증하기 위하여 22종의 한글 및 영어 글꼴에 대한 숫자 데이타를 다양한 형태로 변형시켜 확장된 테스트 패턴에 대하여 실험해 본 결과 제안된 신경회로망의 패턴 분류 능력의 우수함을 증명하였다

Classification of Daily Precipitation Patterns in South Korea using Mutivariate Statistical Methods

  • Mika, Janos;Kim, Baek-Jo;Park, Jong-Kil
    • 한국환경과학회지
    • /
    • 제15권12호
    • /
    • pp.1125-1139
    • /
    • 2006
  • The cluster analysis of diurnal precipitation patterns is performed by using daily precipitation of 59 stations in South Korea from 1973 to 1996 in four seasons of each year. Four seasons are shifted forward by 15 days compared to the general ones. Number of clusters are 15 in winter, 16 in spring and autumn, and 26 in summer, respectively. One of the classes is the totally dry day in each season, indicating that precipitation is never observed at any station. This is treated separately in this study. Distribution of the days among the clusters is rather uneven with rather low area-mean precipitation occurring most frequently. These 4 (seasons)$\times$2 (wet and dry days) classes represent more than the half (59 %) of all days of the year. On the other hand, even the smallest seasonal clusters show at least $5\sim9$ members in the 24 years (1973-1996) period of classification. The cluster analysis is directly performed for the major $5\sim8$ non-correlated coefficients of the diurnal precipitation patterns obtained by factor analysis In order to consider the spatial correlation. More specifically, hierarchical clustering based on Euclidean distance and Ward's method of agglomeration is applied. The relative variance explained by the clustering is as high as average (63%) with better capability in spring (66%) and winter (69 %), but lower than average in autumn (60%) and summer (59%). Through applying weighted relative variances, i.e. dividing the squared deviations by the cluster averages, we obtain even better values, i.e 78 % in average, compared to the same index without clustering. This means that the highest variance remains in the clusters with more precipitation. Besides all statistics necessary for the validation of the final classification, 4 cluster centers are mapped for each season to illustrate the range of typical extremities, paired according to their area mean precipitation or negative pattern correlation. Possible alternatives of the performed classification and reasons for their rejection are also discussed with inclusion of a wide spectrum of recommended applications.

Polynomial Fuzzy Radial Basis Function Neural Network Classifiers Realized with the Aid of Boundary Area Decision

  • Roh, Seok-Beom;Oh, Sung-Kwun
    • Journal of Electrical Engineering and Technology
    • /
    • 제9권6호
    • /
    • pp.2098-2106
    • /
    • 2014
  • In the area of clustering, there are numerous approaches to construct clusters in the input space. For regression problem, when forming clusters being a part of the overall model, the relationships between the input space and the output space are essential and have to be taken into consideration. Conditional Fuzzy C-Means (c-FCM) clustering offers an opportunity to analyze the structure in the input space with the mechanism of supervision implied by the distribution of data present in the output space. However, like other clustering methods, c-FCM focuses on the distribution of the data. In this paper, we introduce a new method, which by making use of the ambiguity index focuses on the boundaries of the clusters whose determination is essential to the quality of the ensuing classification procedures. The introduced design is illustrated with the aid of numeric examples that provide a detailed insight into the performance of the fuzzy classifiers and quantify several essentials design aspects.

An Improved Automated Spectral Clustering Algorithm

  • Xiaodan Lv
    • Journal of Information Processing Systems
    • /
    • 제20권2호
    • /
    • pp.185-199
    • /
    • 2024
  • In this paper, an improved automated spectral clustering (IASC) algorithm is proposed to address the limitations of the traditional spectral clustering (TSC) algorithm, particularly its inability to automatically determine the number of clusters. Firstly, a cluster number evaluation factor based on the optimal clustering principle is proposed. By iterating through different k values, the value corresponding to the largest evaluation factor was selected as the first-rank number of clusters. Secondly, the IASC algorithm adopts a density-sensitive distance to measure the similarity between the sample points. This rendered a high similarity to the data distributed in the same high-density area. Thirdly, to improve clustering accuracy, the IASC algorithm uses the cosine angle classification method instead of K-means to classify the eigenvectors. Six algorithms-K-means, fuzzy C-means, TSC, EIGENGAP, DBSCAN, and density peak-were compared with the proposed algorithm on six datasets. The results show that the IASC algorithm not only automatically determines the number of clusters but also obtains better clustering accuracy on both synthetic and UCI datasets.