• Title/Summary/Keyword: K-means 알고리즘

Search Result 770, Processing Time 0.027 seconds

Determination of coagulant input rate in water purification plant using K-means algorithm and GBR algorithm (K-means 알고리즘과 GBR 알고리즘을 이용한 정수장 응집제 투입률 결정 기법)

  • Kim, Jinyoung;Kang, Bokseon;Jung, Hoekyung
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.25 no.6
    • /
    • pp.792-798
    • /
    • 2021
  • In this paper, an algorithm for determining the coagulant input rate in the drug-injection tank during the process of the water purification plant was derived through big data analysis and prediction based on artificial intelligence. In addition, analysis of big data technology and AI algorithm application methods and existing academic and technical data were reviewed to analyze and review application cases in similar fields. Through this, the goal was to develop an algorithm for determining the coagulant input rate and to present the optimal input rate through autonomous driving simulator and pilot operation of the coagulant input process. Through this study, the coagulant injection rate, which is an output variable, is determined based on various input variables, and it is developed to simulate the relationship pattern between the input variable and the output variable and apply the learned pattern to the decision-making pattern of water plant operating workers.

A Differentially Private K-Means Clustering using Quadtree and Uniform Sampling (쿼드트리와 균등 샘플링를 이용한 효과적 차분 프라이버시 K-평균 클러스터링 알고리즘)

  • Hong, Daeyoung;Goo, Hanjun;Shim, Kyuseok
    • Proceedings of the Korea Contents Association Conference
    • /
    • 2018.05a
    • /
    • pp.25-26
    • /
    • 2018
  • 최근 데이터를 공개할 때 프라이버시를 보호하기 위한 방법들이 연구되고 있다. 그 중 차분 프라이버시(differential privacy)는 최소성 공격 등에 대해서도 안전함이 증명된 익명화 기법이다. 본 논문에서는 기존 차분 프라이버시 -평균 클러스터링 알고리즘의 성능을 개선하고 실생활 데이터를 이용한 실험을 통해 이를 검증한다.

  • PDF

Design and Implementation of Distributed In-Memory DBMS-based Parallel K-Means as In-database Analytics Function (분산 인 메모리 DBMS 기반 병렬 K-Means의 In-database 분석 함수로의 설계와 구현)

  • Kou, Heymo;Nam, Changmin;Lee, Woohyun;Lee, Yongjae;Kim, HyoungJoo
    • KIISE Transactions on Computing Practices
    • /
    • v.24 no.3
    • /
    • pp.105-112
    • /
    • 2018
  • As data size increase, a single database is not enough to serve current volume of tasks. Since data is partitioned and stored into multiple databases, analysis should also support parallelism in order to increase efficiency. However, traditional analysis requires data to be transferred out of database into nodes where analytic service is performed and user is required to know both database and analytic framework. In this paper, we propose an efficient way to perform K-means clustering algorithm inside the distributed column-based database and relational database. We also suggest an efficient way to optimize K-means algorithm within relational database.

A Performance Comparison of Cluster Validity Indices based on K-means Algorithm (K-means 알고리즘 기반 클러스터링 인덱스 비교 연구)

  • Shim, Yo-Sung;Chung, Ji-Won;Choi, In-Chan
    • Asia pacific journal of information systems
    • /
    • v.16 no.1
    • /
    • pp.127-144
    • /
    • 2006
  • The K-means algorithm is widely used at the initial stage of data analysis in data mining process, partly because of its low time complexity and the simplicity of practical implementation. Cluster validity indices are used along with the algorithm in order to determine the number of clusters as well as the clustering results of datasets. In this paper, we present a performance comparison of sixteen indices, which are selected from forty indices in literature, while considering their applicability to nonhierarchical clustering algorithms. Data sets used in the experiment are generated based on multivariate normal distribution. In particular, four error types including standardization, outlier generation, error perturbation, and noise dimension addition are considered in the comparison. Through the experiment the effects of varying number of points, attributes, and clusters on the performance are analyzed. The result of the simulation experiment shows that Calinski and Harabasz index performs the best through the all datasets and that Davis and Bouldin index becomes a strong competitor as the number of points increases in dataset.

KNN/PFCM Hybrid Algorithm for Indoor Location Determination in WLAN (WLAN 실내 측위 결정을 위한 KNN/PFCM Hybrid 알고리즘)

  • Lee, Jang-Jae;Jung, Min-A;Lee, Seong-Ro
    • Journal of the Institute of Electronics Engineers of Korea SP
    • /
    • v.47 no.6
    • /
    • pp.146-153
    • /
    • 2010
  • For the indoor location, wireless fingerprinting is most favorable because fingerprinting is most accurate among the technique for wireless network based indoor location which does not require any special equipments dedicated for positioning. As fingerprinting method,k-nearest neighbor(KNN) has been widely applied for indoor location in wireless location area networks(WLAN), but its performance is sensitive to number of neighborsk and positions of reference points(RPs). So possibilistic fuzzy c-means(PFCM) clustering algorithm is applied to improve KNN, which is the KNN/PFCM hybrid algorithm presented in this paper. In the proposed algorithm, through KNN,k RPs are firstly chosen as the data samples of PFCM based on signal to noise ratio(SNR). Then, thek RPs are classified into different clusters through PFCM based on SNR. Experimental results indicate that the proposed KNN/PFCM hybrid algorithm generally outperforms KNN and KNN/FCM algorithm when the locations error is less than 2m.

Comparison of Initial Seeds Methods for K-Means Clustering (K-Means 클러스터링에서 초기 중심 선정 방법 비교)

  • Lee, Shinwon
    • Journal of Internet Computing and Services
    • /
    • v.13 no.6
    • /
    • pp.1-8
    • /
    • 2012
  • Clustering method is divided into hierarchical clustering, partitioning clustering, and more. K-Means algorithm is one of partitioning clustering and is adequate to cluster so many documents rapidly and easily. It has disadvantage that the random initial centers cause different result. So, the better choice is to place them as far away as possible from each other. We propose a new method of selecting initial centers in K-Means clustering. This method uses triangle height for initial centers of clusters. After that, the centers are distributed evenly and that result is more accurate than initial cluster centers selected random. It is time-consuming, but can reduce total clustering time by minimizing the number of allocation and recalculation. We can reduce the time spent on total clustering. Compared with the standard algorithm, average consuming time is reduced 38.4%.

Development of a Clustering Model for Automatic Knowledge Classification (지식 분류의 자동화를 위한 클러스터링 모형 연구)

  • 정영미;이재윤
    • Journal of the Korean Society for information Management
    • /
    • v.18 no.2
    • /
    • pp.203-230
    • /
    • 2001
  • The purpose of this study is to develop a document clustering model for automatic classification of knowledge. Two test collections of newspaper article texts and journal article abstracts are built for the clustering experiment. Various feature reduction criteria as well as term weighting methods are applied to the term sets of the test collections, and cosine and Jaccard coefficients are used as similarity measures. The performances of complete linkage and K-means clustering algorithms are compared using different feature selection methods and various term weights. It was found that complete linkage clustering outperforms K-means algorithm and feature reduction up to almost 10% of the total feature sets does not lower the performance of document clustering to any significant extent.

  • PDF

Performance Improvement on MFCM for Nonlinear Blind Channel Equalization Using Gaussian Weights (가우시안 가중치를 이용한 비선형 블라인드 채널등화를 위한 MFCM의 성능개선)

  • Han, Soo-Whan;Park, Sung-Dae;Woo, Young-Woon
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2007.10a
    • /
    • pp.407-412
    • /
    • 2007
  • 본 논문에서는 비선형 블라인드 채널등화기의 구현을 위하여 가우시안 가중치(gaussian weights)를 이용한 개선된 퍼지 클러스터(Modified Fuzzy C-Means with Gaussian Weights: MFCM_GW) 알고리즘을 제안한다. 제안된 알고리즘은 기존 FCM 알고리즘의 유클리디언 거리(Euclidean distance) 값 대신 Bayesian Likelihood 목적함수(fitness function)와 가우시안 가중치가 적용된 멤버쉽 매트릭스(partition matrix)를 이용하여, 비선형 채널의 출력으로 수신된 데이터들로부터 최적의 채널 출력 상태 값(optimal channel output states)들을 직접 추정한다. 이렇게 추정된 채널 출력 상태 값들로 비선형 채널의 이상적 채널 상태(desired channel states) 벡터들을 구성하고, 이를 Radial Basis Function(RBF) 등화기의 중심(center)으로 활용함으로써 송신된 데이터 심볼을 찾아낸다. 실험에서는 무작위 이진 신호에 가우시안 잡음이 추가된 데이터를 사용하여 기존의 Simplex Genetic Algorithm(GA), 하이브리드 형태의 GASA(GA merged with simulated annealing (SA)), 그리고 과거에 발표되었던 MFCM 등과 그 성능을 비교 분석하였으며, 가우시안 가중치가 적용된 MFCM_GW를 이용한 채널등화기가 상대적으로 정확도와 속도 면에서 우수함을 보였다.

  • PDF

A Codebook Design for Vector Quantization Using a Neural Network (신경망을 이용한 벡터 양자화의 코드북 설계)

  • 주상현;원치선;신재호
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.19 no.2
    • /
    • pp.276-283
    • /
    • 1994
  • Using a neural network for vector quantization, we can expect to have better codebook design algorithm for its adaptive process. Also, the designed codebook puts the codewords in order by its self-organizing characteristics, which makes it possible to partially search the codebook for real time process. To exploit these features of the neural network, in this paper, we propose a new codebook design algorithm that modified the KSFM(Kohonen`s Self-organizing Feature Map) and then combines the K-means algorithm. Experimental results show the performance improvment and the ability of the partical seach of the codebook for the real time process.

  • PDF

Color Quantization of Natural Images for Content-Based Retrieval (내용기반 검색을 위한 자연 영상의 칼라양자화 방법)

  • 길연희;김성영;박창민;김민환
    • Proceedings of the Korea Multimedia Society Conference
    • /
    • 2000.11a
    • /
    • pp.266-270
    • /
    • 2000
  • 내용기반 영상검색시스템에서 객체 단위로 영상을 검색하기 위해서는 영상에서 의미있는 객체를 추출하는 과정이 필수적이며, 이를 위해 영역 분할을 효율적으로 수행하기 위한 양자화가 선행되어야 한다. 일반적인 칼라 양자화 기법은 칼라 수를 줄이되 양자화 된 영상이 원시 영상과 가능할 비슷해 보이도록 하는 것을 목적으로 하지만, 영역 분할을 위한 칼라 양자화에서는 칼라의 표현보나는 의미있는 객체를 용이하게 추출할 수 있도록 양자화 하는 것을 목적으로 한다. 본 논문에서는 기존의 Octree 양자화 방법과 K-means 알고리즘의 장점을 조합하여 영역 분할에 용이한 양자화 결과를 얻을 수 있는 방법을 제안한다. 먼저, Octree 양자화 방법을 수행하여 얻어진 양자화 된 칼라들 중에서 시각적으로 유사한 칼라를 병합함으로써, Octree 양자화 방법의 단점인 강제 분할 문제점을 해결한다. 이어서, 병합 후의 양자화 된 칼라에 대해서만 K-means 알고리즘을 수행함으로써, 보다 빠른 시간 내에 영역 분할에 적합한 양자화 된 영상을 얻는다. 실험을 통해 제안한 방법의 효용성을 확인하였다.

  • PDF