• 제목/요약/키워드: Objective clustering

검색결과 224건 처리시간 0.026초

빠른 클러스터 개수 선정을 통한 효율적인 데이터 클러스터링 방법 (Efficient Data Clustering using Fast Choice for Number of Clusters)

  • 김성수;강범수
    • 산업경영시스템학회지
    • /
    • 제41권2호
    • /
    • pp.1-8
    • /
    • 2018
  • K-means algorithm is one of the most popular and widely used clustering method because it is easy to implement and very efficient. However, this method has the limitation to be used with fixed number of clusters because of only considering the intra-cluster distance to evaluate the data clustering solutions. Silhouette is useful and stable valid index to decide the data clustering solution with number of clusters to consider the intra and inter cluster distance for unsupervised data. However, this valid index has high computational burden because of considering quality measure for each data object. The objective of this paper is to propose the fast and simple speed-up method to overcome this limitation to use silhouette for the effective large-scale data clustering. In the first step, the proposed method calculates and saves the distance for each data once. In the second step, this distance matrix is used to calculate the relative distance rate ($V_j$) of each data j and this rate is used to choose the suitable number of clusters without much computation time. In the third step, the proposed efficient heuristic algorithm (Group search optimization, GSO, in this paper) can search the global optimum with saving computational capacity with good initial solutions using $V_j$ probabilistically for the data clustering. The performance of our proposed method is validated to save significantly computation time against the original silhouette only using Ruspini, Iris, Wine and Breast cancer in UCI machine learning repository datasets by experiment and analysis. Especially, the performance of our proposed method is much better than previous method for the larger size of data.

비유사도 척도를 이용한 퍼지 데이터에 대한 퍼지 클러스터링 (Fuzzy Clustering of Fuzzy Data using a Dissimilarity Measure)

  • 이건명
    • 한국정보과학회논문지:소프트웨어및응용
    • /
    • 제26권9호
    • /
    • pp.1114-1124
    • /
    • 1999
  • 클러스터링은 동일한 클러스터에 속하는 데이타들 간에는 유사도가 크도록 하고 다른 클러스터에 속하는 데이타들 간에는 유사도가 작도록 주어진 데이타를 몇 개의 클러스터로 묶는 것이다. 어떤 대상을 기술하는 데이타는 수치 속성뿐만 아니라 정성적인 비수치 속성을 갖게 되고, 이들 속성값은 관측 오류, 불확실성, 주관적인 판정 등으로 인해서 정확한 값으로 주어지지 않고 애매한 값으로 주어지는 경우가 많다. 본 논문에서는 애매한 값을 퍼지값으로 표현하는 수치 속성과 비수치 속성을 포함한 데이타에 대한 비유사도 척도를 제안하고, 이 척도를 이용하여 퍼지값을 포함한 데이타에 대하여 퍼지 클러스터링하는 방법을 소개한 다음, 이를 이용한 실험 결과를 보인다. Abstract The objective of clustering is to group a set of data into some number of clusters in a way to minimize the similarity between data belonging to different clusters and to maximize the similarity between data belonging to the same cluster. Many data for real world objects consist of numeric attributes and non-numeric attributes whose values are fuzzily described due to observation error, uncertainty, subjective judgement, and so on. This paper proposes a dissimilarity measure applicable to such data and then introduces a fuzzy clustering method for such data using the proposed dissimilarity measure. It also presents some experiment results to show the applicability of the proposed clustering method and dissimilarity measure.

적응성 있는 차분 진화에 의한 함수최적화와 이벤트 클러스터링 (Function Optimization and Event Clustering by Adaptive Differential Evolution)

  • 황희수
    • 한국지능시스템학회논문지
    • /
    • 제12권5호
    • /
    • pp.451-461
    • /
    • 2002
  • 차분 진화는 다양한 형태의 목적함수를 최적화하는데 매우 효율적인 방법임이 입증되었다 차분 진화의 가장 큰 이점은 개념적 단순성과 사용의 용이성이다. 그러나 차분 진화의 수렴성이 제어 파라미터에 매우 민감한 단점이 있다. 본 논문은 새로운 교배용 벡터 생성법과 제어 파라미터의 적응 메커니즘을 결합한 적응성 있는 차분 진화를 제안한다. 이는 수렴성을 해치지 않으면서 차분 진화를 보다 강인하게 만들며 사용이 쉽도록 해준다. 12가지 최적화 문제에 대해 제안한 방법을 시험하였다. 적응성 있는 차분 진화의 응용 사례로써 이벤트 예측을 위한 교사 클러스터링 방법을 제안한다. 이 방법을 진화에 의한 이벤트 클러스터링이라 부르며 데이터 모델링 검증에 널리 사용되는 4 가지 사례에 대해 그 성능을 시험하였다.

클러스터 밀도에 무관한 향상된 클러스터링 기법 (An Improved Clustering Method with Cluster Density Independence)

  • 유병현;김완우;허경용
    • 한국정보통신학회:학술대회논문집
    • /
    • 한국정보통신학회 2015년도 추계학술대회
    • /
    • pp.248-249
    • /
    • 2015
  • 클러스터링은 대표적인 비교사 학습 방법의 하나로 균일한 특성을 가지는 데이터를 클러스터로 묶기 위해 사용된다. 하지만 클러스터링은 기본적으로 클러스터의 중심에서 데이터까지의 거리에 기반하고 있으므로 클러스터의 중심이 밀도가 높은 클러스터 쪽으로 쏠리는 현상이 발생한다. 이 논문에서는 클러스터의 중심을 가능한 멀리 떨어져 있도록 하는 항을 Fuzzy C-Means의 목적함수에 추가함으로써 클러스터 사이의 밀도 차이가 심한 데이터의 클러스터링 문제에서 정확한 결과를 얻을 수 있는 클러스터링 방법을 제안한다. 제안한 방법은 FCM에 비해 실제 클러스터 중심으로 수렴하는 경우가 더 많으며 수렴 속도 역시 FCM 보다 빠른 것을 실험 결과를 통해 확인할 수 있다.

  • PDF

리니어형 초전도 전원장치 모델링을 위한 입자화 기반 Neurocomputing 네트워크 설계 (Design of Granular-based Neurocomputing Networks for Modeling of Linear-Type Superconducting Power Supply)

  • 박호성;정윤도;김현기;오성권
    • 전기학회논문지
    • /
    • 제59권7호
    • /
    • pp.1320-1326
    • /
    • 2010
  • In this paper, we develop a design methodology of granular-based neurocomputing networks realized with the aid of the clustering techniques. The objective of this paper is modeling and evaluation of approximation and generalization capability of the Linear-Type Superconducting Power Supply (LTSPS). In contrast with the plethora of existing approaches, here we promote a development strategy in which a topology of the network is predominantly based upon a collection of information granules formed on a basis of available experimental data. The underlying design tool guiding the development of the granular-based neurocomputing networks revolves around the Fuzzy C-Means (FCM) clustering method and the Radial Basis Function (RBF) neural network. In contrast to "standard" Radial Basis Function neural networks, the output neuron of the network exhibits a certain functional nature as its connections are realized as local linear whose location is determined by the membership values of the input space with the aid of FCM clustering. To modeling and evaluation of performance of the linear-type superconducting power supply using the proposed network, we describe a detailed characteristic of the proposed model using a well-known NASA software project data.

A Hybrid Recommendation System based on Fuzzy C-Means Clustering and Supervised Learning

  • Duan, Li;Wang, Weiping;Han, Baijing
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • 제15권7호
    • /
    • pp.2399-2413
    • /
    • 2021
  • A recommendation system is an information filter tool, which uses the ratings and reviews of users to generate a personalized recommendation service for users. However, the cold-start problem of users and items is still a major research hotspot on service recommendations. To address this challenge, this paper proposes a high-efficient hybrid recommendation system based on Fuzzy C-Means (FCM) clustering and supervised learning models. The proposed recommendation method includes two aspects: on the one hand, FCM clustering technique has been applied to the item-based collaborative filtering framework to solve the cold start problem; on the other hand, the content information is integrated into the collaborative filtering. The algorithm constructs the user and item membership degree feature vector, and adopts the data representation form of the scoring matrix to the supervised learning algorithm, as well as by combining the subjective membership degree feature vector and the objective membership degree feature vector in a linear combination, the prediction accuracy is significantly improved on the public datasets with different sparsity. The efficiency of the proposed system is illustrated by conducting several experiments on MovieLens dataset.

Review on Energy Efficient Clustering based Routing Protocol

  • Kanu Patel;Hardik Modi
    • International Journal of Computer Science & Network Security
    • /
    • 제23권10호
    • /
    • pp.169-178
    • /
    • 2023
  • Wireless sensor network is wieldy use for IoT application. The sensor node consider as physical device in IoT architecture. This all sensor node are operated with battery so the power consumption is very high during the data communication and low during the sensing the environment. Without proper planning of data communication the network might be dead very early so primary objective of the cluster based routing protocol is to enhance the battery life and run the application for longer time. In this paper we have comprehensive of twenty research paper related with clustering based routing protocol. We have taken basic information, network simulation parameters and performance parameters for the comparison. In particular, we have taken clustering manner, node deployment, scalability, data aggregation, power consumption and implementation cost many more points for the comparison of all 20 protocol. Along with basic information we also consider the network simulation parameters like number of nodes, simulation time, simulator name, initial energy and communication range as well energy consumption, throughput, network lifetime, packet delivery ration, jitter and fault tolerance parameters about the performance parameters. Finally we have summarize the technical aspect and few common parameter must be fulfill or consider for the design energy efficient cluster based routing protocol.

개선된 밀도 기반의 퍼지 C-Means 알고리즘을 이용한 클러스터 합병 (Cluster Merging Using Enhanced Density based Fuzzy C-Means Clustering Algorithm)

  • 한진우;전성해;오경환
    • 한국지능시스템학회논문지
    • /
    • 제14권5호
    • /
    • pp.517-524
    • /
    • 2004
  • 1960년대 퍼지 이론이 소개된 이후 데이터 마이닝을 포함한 기계 학습 분야의 군집화 작업에서 퍼지 이론이 폭넓게 사용되었다. 퍼지 C-평균 알고리즘은 가장 많이 사용되는 퍼지 군집화 알고리즘이다. 이 알고리즘은 하나의 데이터 개체가 서로 다른 소속 정도를 가지고 각 군집에 할당될 수 있도록 한다. 퍼지 C-평균 알고리즘도 K-평균 알고리즘과 같은 일반적인 군집화 알고리즘과 마찬가지로 초기 군집수와 군집 중심의 위치에 의해 최종 군집 결과의 성능 차이가 나타난다. 군집화를 위한 이러한 초기 설정은 주관적이며 이 때문에 적절치 못한 결과를 얻게 될 수도 있다. 본 논문에서는 이 문제를 해결할 수 있는 방법으로 주어진 학습 데이터의 속성을 기반으로 한 초기 군집수와 군집 중심을 결정하는 개선된 밀도 기반의 퍼지 C-평균 알고리즘을 제안하였다. 제안 방법은 격자를 사용하여 초기 군집 중심의 위치와 군집수를 결정하였다. 기존에 많이 이용되었던 객관적인 기계 학습 데이터를 이용하여 제안 알고리즘의 성능비교를 수행하였다.

An Efficient Optimization Technique for Node Clustering in VANETs Using Gray Wolf Optimization

  • Khan, Muhammad Fahad;Aadil, Farhan;Maqsood, Muazzam;Khan, Salabat;Bukhari, Bilal Haider
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • 제12권9호
    • /
    • pp.4228-4247
    • /
    • 2018
  • Many methods have been developed for the vehicles to create clusters in vehicular ad hoc networks (VANETs). Usually, nodes are vehicles in the VANETs, and they are dynamic in nature. Clusters of vehicles are made for making the communication between the network nodes. Cluster Heads (CHs) are selected in each cluster for managing the whole cluster. This CH maintains the communication in the same cluster and with outside the other cluster. The lifetime of the cluster should be longer for increasing the performance of the network. Meanwhile, lesser the CH's in the network also lead to efficient communication in the VANETs. In this paper, a novel algorithm for clustering which is based on the social behavior of Gray Wolf Optimization (GWO) for VANET named as Intelligent Clustering using Gray Wolf Optimization (ICGWO) is proposed. This clustering based algorithm provides the optimized solution for smooth and robust communication in the VANETs. The key parameters of proposed algorithm are grid size, load balance factor (LBF), the speed of the nodes, directions and transmission range. The ICGWO is compared with the well-known meta-heuristics, Multi-Objective Particle Swarm Optimization (MOPSO) and Comprehensive Learning Particle Swarm Optimization (CLPSO) for clustering in VANETs. Experiments are performed by varying the key parameters of the ICGWO, for measuring the effectiveness of the proposed algorithm. These parameters include grid sizes, transmission ranges, and a number of nodes. The effectiveness of the proposed algorithm is evaluated in terms of optimization of number of cluster with respect to transmission range, grid size and number of nodes. ICGWO selects the 10% of the nodes as CHs where as CLPSO and MOPSO selects the 13% and 14% respectively.

군집화 기반 프로세스 마이닝을 이용한 커리큘럼 마이닝 분석 (Curriculum Mining Analysis Using Clustering-Based Process Mining)

  • 주우민;최진영
    • 산업경영시스템학회지
    • /
    • 제38권4호
    • /
    • pp.45-55
    • /
    • 2015
  • In this paper, we consider curriculum mining as an application of process mining in the domain of education. The basic objective of the curriculum mining is to construct a registration pattern model by using logs of registration data. However, subject registration patterns of students are very unstructured and complicated, called a spaghetti model, because it has a lot of different cases and high diversity of behaviors. In general, it is typically difficult to develop and analyze registration patterns. In the literature, there was an effort to handle this issue by using clustering based on the features of students and behaviors. However, it is not easy to obtain them in general since they are private and qualitative. Therefore, in this paper, we propose a new framework of curriculum mining applying K-means clustering based on subject attributes to solve the problems caused by unstructured process model obtained. Specifically, we divide subject's attribute data into two parts : categorical and numerical data. Categorical attribute has subject name, class classification, and research field, while numerical attribute has ABEEK goal and semester information. In case of categorical attribute, we suggest a method to quantify them by using binarization. The number of clusters used for K-means clustering, we applied Elbow method using R-squared value representing the variance ratio that can be explained by the number of clusters. The performance of the suggested method was verified by using a log of student registration data from an 'A university' in terms of the simplicity and fitness, which are the typical performance measure of obtained process model in process mining.