• 제목/요약/키워드: clustering techniques

검색결과 528건 처리시간 0.033초

Environmental Survey Data Modeling Using K-means Clustering Techniques

  • Park, Hee-Chang;Cho, Kwang-Hyun
    • Journal of the Korean Data and Information Science Society
    • /
    • 제16권3호
    • /
    • pp.557-566
    • /
    • 2005
  • Clustering is the process of grouping the data into clusters so that objects within a cluster have high similarity in comparison to one another. In this paper we used k-means clustering of several clustering techniques. The k-means Clustering Is classified as a partitional clustering method. We analyze 2002 Gyeongnam social indicator survey data using k-means clustering techniques for environmental information. We can use these outputs given by k-means clustering for environmental preservation and environmental improvement.

  • PDF

Environmental Survey Data Modeling using K-means Clustering Techniques

  • 박희창;조광현
    • 한국데이터정보과학회:학술대회논문집
    • /
    • 한국데이터정보과학회 2004년도 추계학술대회
    • /
    • pp.77-86
    • /
    • 2004
  • Clustering is the process of grouping the data into clusters so that objects within a cluster have high similarity in comparison to one another. In this paper we used k-means clustering of several clustering techniques. The k-means Clustering is classified as a partitional clustering method. We analyze 2002 Gyeongnam social indicator survey data using k-means clustering techniques for environmental information. We can use these outputs given by k-means clustering for environmental preservation and environmental improvement.

  • PDF

Performance evaluation of principal component analysis for clustering problems

  • Kim, Jae-Hwan;Yang, Tae-Min;Kim, Jung-Tae
    • Journal of Advanced Marine Engineering and Technology
    • /
    • 제40권8호
    • /
    • pp.726-732
    • /
    • 2016
  • Clustering analysis is widely used in data mining to classify data into categories on the basis of their similarity. Through the decades, many clustering techniques have been developed, including hierarchical and non-hierarchical algorithms. In gene profiling problems, because of the large number of genes and the complexity of biological networks, dimensionality reduction techniques are critical exploratory tools for clustering analysis of gene expression data. Recently, clustering analysis of applying dimensionality reduction techniques was also proposed. PCA (principal component analysis) is a popular methd of dimensionality reduction techniques for clustering problems. However, previous studies analyzed the performance of PCA for only full data sets. In this paper, to specifically and robustly evaluate the performance of PCA for clustering analysis, we exploit an improved FCBF (fast correlation-based filter) of feature selection methods for supervised clustering data sets, and employ two well-known clustering algorithms: k-means and k-medoids. Computational results from supervised data sets show that the performance of PCA is very poor for large-scale features.

K-means Clustering for Environmental Indicator Survey Data

  • Park, Hee-Chang;Cho, Kwang-Hyun
    • 한국데이터정보과학회:학술대회논문집
    • /
    • 한국데이터정보과학회 2005년도 춘계학술대회
    • /
    • pp.185-192
    • /
    • 2005
  • There are many data mining techniques such as association rule, decision tree, neural network analysis, clustering, genetic algorithm, bayesian network, memory-based reasoning, etc. We analyze 2003 Gyeongnam social indicator survey data using k-means clustering technique for environmental information. Clustering is the process of grouping the data into clusters so that objects within a cluster have high similarity in comparison to one another. In this paper, we used k-means clustering of several clustering techniques. The k-means clustering is classified as a partitional clustering method. We can apply k-means clustering outputs to environmental preservation and environmental improvement.

  • PDF

Clustering Approaches to Identifying Gene Expression Patterns from DNA Microarray Data

  • Do, Jin Hwan;Choi, Dong-Kug
    • Molecules and Cells
    • /
    • 제25권2호
    • /
    • pp.279-288
    • /
    • 2008
  • The analysis of microarray data is essential for large amounts of gene expression data. In this review we focus on clustering techniques. The biological rationale for this approach is the fact that many co-expressed genes are co-regulated, and identifying co-expressed genes could aid in functional annotation of novel genes, de novo identification of transcription factor binding sites and elucidation of complex biological pathways. Co-expressed genes are usually identified in microarray experiments by clustering techniques. There are many such methods, and the results obtained even for the same datasets may vary considerably depending on the algorithms and metrics for dissimilarity measures used, as well as on user-selectable parameters such as desired number of clusters and initial values. Therefore, biologists who want to interpret microarray data should be aware of the weakness and strengths of the clustering methods used. In this review, we survey the basic principles of clustering of DNA microarray data from crisp clustering algorithms such as hierarchical clustering, K-means and self-organizing maps, to complex clustering algorithms like fuzzy clustering.

Intelligent Clustering in Vehicular ad hoc Networks

  • Aadil, Farhan;Khan, Salabat;Bajwa, Khalid Bashir;Khan, Muhammad Fahad;Ali, Asad
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • 제10권8호
    • /
    • pp.3512-3528
    • /
    • 2016
  • A network with high mobility nodes or vehicles is vehicular ad hoc Network (VANET). For improvement in communication efficiency of VANET, many techniques have been proposed; one of these techniques is vehicular node clustering. Cluster nodes (CNs) and Cluster Heads (CHs) are elected or selected in the process of clustering. The longer the lifetime of clusters and the lesser the number of CHs attributes to efficient networking in VANETs. In this paper, a novel Clustering algorithm is proposed based on Ant Colony Optimization (ACO) for VANET named ACONET. This algorithm forms optimized clusters to offer robust communication for VANETs. For optimized clustering, parameters of transmission range, direction, speed of the nodes and load balance factor (LBF) are considered. The ACONET is compared empirically with state of the art methods, including Multi-Objective Particle Swarm Optimization (MOPSO) and Comprehensive Learning Particle Swarm Optimization (CLPSO) based clustering techniques. An extensive set of experiments is performed by varying the grid size of the network, the transmission range of nodes, and total number of nodes in network to evaluate the effectiveness of the algorithms in comparison. The results indicate that the ACONET has significantly outperformed the competitors.

Comprehensive review on Clustering Techniques and its application on High Dimensional Data

  • Alam, Afroj;Muqeem, Mohd;Ahmad, Sultan
    • International Journal of Computer Science & Network Security
    • /
    • 제21권6호
    • /
    • pp.237-244
    • /
    • 2021
  • Clustering is a most powerful un-supervised machine learning techniques for division of instances into homogenous group, which is called cluster. This Clustering is mainly used for generating a good quality of cluster through which we can discover hidden patterns and knowledge from the large datasets. It has huge application in different field like in medicine field, healthcare, gene-expression, image processing, agriculture, fraud detection, profitability analysis etc. The goal of this paper is to explore both hierarchical as well as partitioning clustering and understanding their problem with various approaches for their solution. Among different clustering K-means is better than other clustering due to its linear time complexity. Further this paper also focused on data mining that dealing with high-dimensional datasets with their problems and their existing approaches for their relevancy

Comparison of the Performance of Clustering Analysis using Data Reduction Techniques to Identify Energy Use Patterns

  • Song, Kwonsik;Park, Moonseo;Lee, Hyun-Soo;Ahn, Joseph
    • 국제학술발표논문집
    • /
    • The 6th International Conference on Construction Engineering and Project Management
    • /
    • pp.559-563
    • /
    • 2015
  • Identification of energy use patterns in buildings has a great opportunity for energy saving. To find what energy use patterns exist, clustering analysis has been commonly used such as K-means and hierarchical clustering method. In case of high dimensional data such as energy use time-series, data reduction should be considered to avoid the curse of dimensionality. Principle Component Analysis, Autocorrelation Function, Discrete Fourier Transform and Discrete Wavelet Transform have been widely used to map the original data into the lower dimensional spaces. However, there still remains an ongoing issue since the performance of clustering analysis is dependent on data type, purpose and application. Therefore, we need to understand which data reduction techniques are suitable for energy use management. This research aims find the best clustering method using energy use data obtained from Seoul National University campus. The results of this research show that most experiments with data reduction techniques have a better performance. Also, the results obtained helps facility managers optimally control energy systems such as HVAC to reduce energy use in buildings.

  • PDF

자취 군집화를 통한 프로세스 마이닝의 성능 개선 (Improving Process Mining with Trace Clustering)

  • 송민석;;;정재윤
    • 대한산업공학회지
    • /
    • 제34권4호
    • /
    • pp.460-469
    • /
    • 2008
  • Process mining aims at mining valuable information from process execution results (called "event logs"). Even though process mining techniques have proven to be a valuable tool, the mining results from real process logs are usually too complex to interpret. The main cause that leads to complex models is the diversity of process logs. To address this issue, this paper proposes a trace clustering approach that splits a process log into homogeneous subsets and applies existing process mining techniques to each subset. Based on log profiles from a process log, the approach uses existing clustering techniques to derive clusters. Our approach are implemented in ProM framework. To illustrate this, a real-life case study is also presented.

한글 위키피디아를 이용한 트위터 문서의 주제별 클러스터링 기법 (Topical Clustering Techniques of Twitter Documents Using Korean Wikipedia)

  • 장재영
    • 한국인터넷방송통신학회논문지
    • /
    • 제14권5호
    • /
    • pp.189-196
    • /
    • 2014
  • 최근 들어 트위터와 같은 SNS 환경에서 검색의 필요성이 증가하고 있다. 트위터 검색을 지원하기 위해서는 다량으로 검색된 문서를 주제별로 분류하는 클러스터링 기법이 필요하다. 하지만 트위터의 특성상 단순한 클러스터링 기술을 그대로 적용하기에는 많은 제약이 따른다. 본 논문에서는 이를 극복하기 위해 트위터 환경에 적합한 클러스터링 기법을 제안한다. 제안된 기법에서는 한글 위키피디아를 이용하여 각 트위터 문서에 대한 특징 벡터를 보강하고 각 특징들의 가중치를 재계산하는 방법을 이용하였다. 또한 한글 트위터 문서를 대상으로 실험을 실시하고 기존 기법과의 성능 비교를 통해서 제안된 기법의 유용성을 증명하였다.