• Title/Summary/Keyword: Clustering Evaluation

Search Result 328, Processing Time 0.033 seconds

Corrosion image analysis on galvanized steel by using superpixel DBSCAN clustering algorithm (슈퍼픽셀 DBSCAN 군집 알고리즘을 이용한 용융아연도금 강판의 부식이미지 분석)

  • Kim, Beomsoo;Kim, Yeonwon;Lee, Kyunghwang;Yang, Jeonghyeon
    • Journal of the Korean institute of surface engineering
    • /
    • v.55 no.3
    • /
    • pp.164-172
    • /
    • 2022
  • Hot-dip galvanized steel(GI) is widely used throughout the industry as a corrosion resistance material. Corrosion of steel is a common phenomenon that results in the gradual degradation under various environmental conditions. Corrosion monitoring is to track the degradation progress for a long time. Corrosion on steel plate appears as discoloration and any irregularities on the surface. This study developed a quantitative evaluation method of the rust formed on GI steel plate using a superpixel-based DBSCAN clustering method and k-means clustering from the corroded area in a given image. The superpixel-based DBSCAN clustering method decrease computational costs, reaching automatic segmentation. The image color of the rusty surface was analyzed quantitatively based on HSV(Hue, Saturation, Value) color space. In addition, two segmentation methods are compared for the particular spatial region using their histograms.

Hierarchical Overlapping Clustering to Detect Complex Concepts (중복을 허용한 계층적 클러스터링에 의한 복합 개념 탐지 방법)

  • Hong, Su-Jeong;Choi, Joong-Min
    • Journal of Intelligence and Information Systems
    • /
    • v.17 no.1
    • /
    • pp.111-125
    • /
    • 2011
  • Clustering is a process of grouping similar or relevant documents into a cluster and assigning a meaningful concept to the cluster. By this process, clustering facilitates fast and correct search for the relevant documents by narrowing down the range of searching only to the collection of documents belonging to related clusters. For effective clustering, techniques are required for identifying similar documents and grouping them into a cluster, and discovering a concept that is most relevant to the cluster. One of the problems often appearing in this context is the detection of a complex concept that overlaps with several simple concepts at the same hierarchical level. Previous clustering methods were unable to identify and represent a complex concept that belongs to several different clusters at the same level in the concept hierarchy, and also could not validate the semantic hierarchical relationship between a complex concept and each of simple concepts. In order to solve these problems, this paper proposes a new clustering method that identifies and represents complex concepts efficiently. We developed the Hierarchical Overlapping Clustering (HOC) algorithm that modified the traditional Agglomerative Hierarchical Clustering algorithm to allow overlapped clusters at the same level in the concept hierarchy. The HOC algorithm represents the clustering result not by a tree but by a lattice to detect complex concepts. We developed a system that employs the HOC algorithm to carry out the goal of complex concept detection. This system operates in three phases; 1) the preprocessing of documents, 2) the clustering using the HOC algorithm, and 3) the validation of semantic hierarchical relationships among the concepts in the lattice obtained as a result of clustering. The preprocessing phase represents the documents as x-y coordinate values in a 2-dimensional space by considering the weights of terms appearing in the documents. First, it goes through some refinement process by applying stopwords removal and stemming to extract index terms. Then, each index term is assigned a TF-IDF weight value and the x-y coordinate value for each document is determined by combining the TF-IDF values of the terms in it. The clustering phase uses the HOC algorithm in which the similarity between the documents is calculated by applying the Euclidean distance method. Initially, a cluster is generated for each document by grouping those documents that are closest to it. Then, the distance between any two clusters is measured, grouping the closest clusters as a new cluster. This process is repeated until the root cluster is generated. In the validation phase, the feature selection method is applied to validate the appropriateness of the cluster concepts built by the HOC algorithm to see if they have meaningful hierarchical relationships. Feature selection is a method of extracting key features from a document by identifying and assigning weight values to important and representative terms in the document. In order to correctly select key features, a method is needed to determine how each term contributes to the class of the document. Among several methods achieving this goal, this paper adopted the $x^2$�� statistics, which measures the dependency degree of a term t to a class c, and represents the relationship between t and c by a numerical value. To demonstrate the effectiveness of the HOC algorithm, a series of performance evaluation is carried out by using a well-known Reuter-21578 news collection. The result of performance evaluation showed that the HOC algorithm greatly contributes to detecting and producing complex concepts by generating the concept hierarchy in a lattice structure.

CLUSTERING DNA MICROARRAY DATA BY STOCHASTIC ALGORITHM

  • Shon, Ho-Sun;Kim, Sun-Shin;Wang, Ling;Ryu, Keun-Ho
    • Proceedings of the KSRS Conference
    • /
    • 2007.10a
    • /
    • pp.438-441
    • /
    • 2007
  • Recently, due to molecular biology and engineering technology, DNA microarray makes people watch thousands of genes and the state of variation from the tissue samples of living body. With DNA Microarray, it is possible to construct a genetic group that has similar expression patterns and grasp the progress and variation of gene. This paper practices Cluster Analysis which purposes the discovery of biological subgroup or class by using gene expression information. Hence, the purpose of this paper is to predict a new class which is unknown, open leukaemia data are used for the experiment, and MCL (Markov CLustering) algorithm is applied as an analysis method. The MCL algorithm is based on probability and graph flow theory. MCL simulates random walks on a graph using Markov matrices to determine the transition probabilities among nodes of the graph. If you look at closely to the method, first, MCL algorithm should be applied after getting the distance by using Euclidean distance, then inflation and diagonal factors which are tuning modulus should be tuned, and finally the threshold using the average of each column should be gotten to distinguish one class from another class. Our method has improved the accuracy through using the threshold, namely the average of each column. Our experimental result shows about 70% of accuracy in average compared to the class that is known before. Also, for the comparison evaluation to other algorithm, the proposed method compared to and analyzed SOM (Self-Organizing Map) clustering algorithm which is divided into neural network and hierarchical clustering. The method shows the better result when compared to hierarchical clustering. In further study, it should be studied whether there will be a similar result when the parameter of inflation gotten from our experiment is applied to other gene expression data. We are also trying to make a systematic method to improve the accuracy by regulating the factors mentioned above.

  • PDF

Microarray data analysis using relative hierarchical clustering (상대적 계층적 군집 방법을 이용한 마이크로어레이 자료의 군집분석)

  • Woo, Sook Young;Lee, Jae Won;Jhun, Myoungshic
    • Journal of the Korean Data and Information Science Society
    • /
    • v.25 no.5
    • /
    • pp.999-1009
    • /
    • 2014
  • Hierarchical clustering analysis helps easily exploring massive microarray data and understanding biological phenomena with dendrogram. But, because hierarchical clustering algorithms only consider the absolute similarity, it is difficult to illustrate a relative dissimilarity, which consider not only the distance between a pair of clusters, but also how distant are they from the rest of the clusters. In this study, we introduced the relative hierarchical clustering method proposed by Mollineda and Vidal (2000) and compared hierarchical clustering method and relative hierarchical method using the simulated data and the real data in the various situations. The evaluation of the quality of two hierarchical methods was performed using percentage of incorrectly grouped points (PIGP), homogeneity and separation.

The Effectiveness of High-level Text Features in SOM-based Web Image Clustering (SOM 기반 웹 이미지 분류에서 고수준 텍스트 특징들의 효과)

  • Cho Soo-Sun
    • The KIPS Transactions:PartB
    • /
    • v.13B no.2 s.105
    • /
    • pp.121-126
    • /
    • 2006
  • In this paper, we propose an approach to increase the power of clustering Web images by using high-level semantic features from text information relevant to Web images as well as low-level visual features of image itself. These high-level text features can be obtained from image URLs and file names, page titles, hyperlinks, and surrounding text. As a clustering engine, self-organizing map (SOM) proposed by Kohonen is used. In the SOM-based clustering using high-level text features and low-level visual features, the 200 images from 10 categories are divided in some suitable clusters effectively. For the evaluation of clustering powers, we propose simple but novel measures indicating the degrees of scattering images from the same category, and degrees of accumulation of the same category images. From the experiment results, we find that the high-level text features are more useful in SOM-based Web image clustering.

Clustering-based Cooperative Routing using OFDM for Supporting Transmission Efficiency in Mobile Wireless Sensor Networks (모바일 무선 센서네트워크에서 전송 효율 향상을 지원하기 위한 OFDM을 사용한 클러스터링 기반의 협력도움 라우팅)

  • Lee, Joo-Sang;An, Beong-Ku
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.10 no.6
    • /
    • pp.85-92
    • /
    • 2010
  • In this paper, we propose a Clustering-based Cooperative Routing using OFDM (CCRO) for supporting transmission efficiency in mobile wireless sensor networks. The main features and contributions of the proposed method are as follows. First, the clustering method which uses the location information of nodes as underlying infrastructure for supporting stable transmission services efficiently is used. Second, cluster-based cooperative data transmission method is used for improving data transmission and reliability services. Third, OFDM based data transmission method is used for improving data transmission ratio with channel efficiency. Fourth, we consider realistic approach in the view points of the mobile ad-hoc wireless sensor networks while conventional methods just consider fixed sensor network environments. The performance evaluation of the proposed method is performed via simulation using OPNET and theoretical analysis. The results of performance evaluation show improvement of transmission efficiency.

Non-hierarchical Clustering based Hybrid Recommendation using Context Knowledge (상황 지식을 이용한 비계층적 군집 기반 하이브리드 추천)

  • Baek, Ji-Won;Kim, Min-Jeong;Park, Roy C.;Jung, Hoill;Chung, Kyungyong
    • Journal of the Institute of Convergence Signal Processing
    • /
    • v.20 no.3
    • /
    • pp.138-144
    • /
    • 2019
  • In a modern society, people are concerned seriously about their travel destinations depending on time, economic problem. In this paper, we propose an non-hierarchical clustering based hybrid recommendation using context knowledge. The proposed method is personalized way of recommended knowledge about preferred travel places according to the user's location, place, and weather. Based on 14 attributes from the data collected through the survey, users with similar characteristics are grouped using a non-hierarchical clustering based hybrid recommendation. This makes more accurate recommendation by weighting implicit and explicit data. The users can be recommended a preferred travel destination without spending unnecessary time. The performance evaluation uses accuracy, recall, F-measure. The evaluation result was shown 0.636 accuracy, 0.723 recall, and 0.676 F-measure.

Performance Evaluation of Clustering Algorithms for Fixed-Grid Spatial Index (고정 그리드 공간 색인을 위한 클러스터링 알고리즘의 성능 평가)

  • 유진영;김진덕;김동현;홍봉희;김장수
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 1998.10b
    • /
    • pp.32-134
    • /
    • 1998
  • 공간 색인의 하나인 그리드 파일은 공간 데이터 영역을 격자 형태의 셀로 분할하여 구성하는데 특히, 셀들의 크기가 모두 동일한 값으로 고정되어진 것을 고정 그리드(fixed grid)라고 한다. 셀들의 크기가 고정된으로 인해 샐 분할선 상에 객체가 존재하는 경우가 자주 발생하게 되고 이러한 객체들은 하나 이상의 셀에 의해 중복으로 참조된다. 중복 참조 객체는 1/10 시간을 증가시켜 질의 처리 시 성능 저하의 주요한 원인이 된다. 따라서 중복 객체를 효율적으로 처리 할 수 있는 클러스터링 알고리즘의 고안이 필요하다. 이 논문에서는 중복 참조 객체를 처리하기 위한 객체 클러스터링(Object clustering)과 셀 단위로 클러스터하기 위한 셀 클러스터링(Cell clustering) 알고리즘을 구현한다. 그리고 공간 질의 수행 시에 각 클러스터기법들에 대한 성능을 평가한다.

Parallel Algorithm For Level Clustering (집단화를 위한 병렬 알고리즘의 구현)

  • Bae, Yong-Geun
    • The Transactions of the Korea Information Processing Society
    • /
    • v.2 no.2
    • /
    • pp.148-155
    • /
    • 1995
  • When we analize many amount of patterns, it is necessary for these patterns are to be clustering into several groups according to a certain evaluation function. This process, in case that there are lots of input patterns, needs a considerable amount of computations and is reqired parallel algorithm for these. To solve this problem, this paper propose parallel clustering algorithm which parallelized k-means algorithm and implemented it under the MIMD parallel computer based message passing. The result is through the experiment and performance analysis, that this parallel algorithm is appropriate in case these are many input patterns.

  • PDF

A Study on Performance Evaluation of Clustering Algorithms using Neural and Statistical Method (신경망 및 통계적 방법에 의한 클러스터링 성능평가)

  • 윤석환;민준영;신용백
    • Journal of Korean Society of Industrial and Systems Engineering
    • /
    • v.19 no.37
    • /
    • pp.41-51
    • /
    • 1996
  • This paper evaluates the clustering performance of a neural network and a statistical method. Algorithms which are used in this paper are the GLVQ(Generalized Learning vector Quantization) for a neural method and the k-means algorithm fer a statistical clustering method. For comparison of two methods, we calculate the Rand's c statistics. As a result, the mean of c value obtained with the GLVQ is higher than that obtained with the k-means algorithm, while standard deviation of c value is lower. Experimental data sets were the Fisher's IRIS data and patterns extracted from handwritten numerals.

  • PDF