• Title/Summary/Keyword: k-mean 군집화

Search Result 78, Processing Time 0.031 seconds

The Effect of Input Variables Clustering on the Characteristics of Ensemble Machine Learning Model for Water Quality Prediction (입력자료 군집화에 따른 앙상블 머신러닝 모형의 수질예측 특성 연구)

  • Park, Jungsu
    • Journal of Korean Society on Water Environment
    • /
    • v.37 no.5
    • /
    • pp.335-343
    • /
    • 2021
  • Water quality prediction is essential for the proper management of water supply systems. Increased suspended sediment concentration (SSC) has various effects on water supply systems such as increased treatment cost and consequently, there have been various efforts to develop a model for predicting SSC. However, SSC is affected by both the natural and anthropogenic environment, making it challenging to predict SSC. Recently, advanced machine learning models have increasingly been used for water quality prediction. This study developed an ensemble machine learning model to predict SSC using the XGBoost (XGB) algorithm. The observed discharge (Q) and SSC in two fields monitoring stations were used to develop the model. The input variables were clustered in two groups with low and high ranges of Q using the k-means clustering algorithm. Then each group of data was separately used to optimize XGB (Model 1). The model performance was compared with that of the XGB model using the entire data (Model 2). The models were evaluated by mean squared error-ob servation standard deviation ratio (RSR) and root mean squared error. The RSR were 0.51 and 0.57 in the two monitoring stations for Model 2, respectively, while the model performance improved to RSR 0.46 and 0.55, respectively, for Model 1.

A Study on the Optimization of State Tying Acoustic Models using Mixture Gaussian Clustering (혼합 가우시안 군집화를 이용한 상태공유 음향모델 최적화)

  • Ann, Tae-Ock
    • Journal of the Institute of Electronics Engineers of Korea SP
    • /
    • v.42 no.6
    • /
    • pp.167-176
    • /
    • 2005
  • This paper describes how the state tying model based on the decision tree which is one of Acoustic models used for speech recognition optimizes the model by reducing the number of mixture Gaussians of the output probability distribution. The state tying modeling uses a finite set of questions which is possible to include the phonological knowledge and the likelihood based decision criteria. And the recognition rate can be improved by increasing the number of mixture Gaussians of the output probability distribution. In this paper, we'll reduce the number of mixture Gaussians at the highest point of recognition rate by clustering the Gaussians. Bhattacharyya and Euclidean method will be used for the distance measure needed when clustering. And after calculating the mean and variance between the pair of lowest distance, the new Gaussians are created. The parameters for the new Gaussians are derived from the parameters of the Gaussians from which it is born. Experiments have been performed using the STOCKNAME (1,680) databases. And the test results show that the proposed method using Bhattacharyya distance measure maintains their recognition rate at $97.2\%$ and reduces the ratio of the number of mixture Gaussians by $1.0\%$. And the method using Euclidean distance measure shows that it maintains the recognition rate at $96.9\%$ and reduces the ratio of the number of mixture Gaussians by $1.0\%$. Then the methods can optimize the state tying model.

Personalized insurance product based on similarity (유사도를 활용한 맞춤형 보험 추천 시스템)

  • Kim, Joon-Sung;Cho, A-Ra;Oh, Hayong
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.26 no.11
    • /
    • pp.1599-1607
    • /
    • 2022
  • The data mainly used for the model are as follows: the personal information, the information of insurance product, etc. With the data, we suggest three types of models: content-based filtering model, collaborative filtering model and classification models-based model. The content-based filtering model finds the cosine of the angle between the users and items, and recommends items based on the cosine similarity; however, before finding the cosine similarity, we divide into several groups by their features. Segmentation is executed by K-means clustering algorithm and manually operated algorithm. The collaborative filtering model uses interactions that users have with items. The classification models-based model uses decision tree and random forest classifier to recommend items. According to the results of the research, the contents-based filtering model provides the best result. Since the model recommends the item based on the demographic and user features, it indicates that demographic and user features are keys to offer more appropriate items.

A Study on the Improvement of Quantitative Precipitation Forecast using a Clustering Method (군집기법을 이용한 연강수량 예보개선에 관한 연구)

  • Kim, Gwang-Seob;Jo, So-Hyun
    • Proceedings of the Korea Water Resources Association Conference
    • /
    • 2009.05a
    • /
    • pp.94-97
    • /
    • 2009
  • 연 및 계절강수량의 정확한 예보는 수자원관리에 매우 중요하다. 예보 정확도를 높이기 위한 다양한 연구가 계속 진행되어 왔다. 그럼에도 불구하고 강수자료가 가지는 매우 큰 불확실성 때문에 예보의 정확도 향상은 계속되는 숙제로 우리에게 남아 있다. 이를 개선하기 위하여 본 연구에서는 군집화 기법을 이용한 연 및 계절 강수량 예측개선에 대한 연구 결과를 제시하였다. 이를 위하여 연강수량, 계절강수량 및 월강수량의 예측을 위하여 전구에서 일어나는 각종 기후 인자들과의 상관성 분석은 대단히 중요하다. 전 세계적으로 어느 특정 지역에서의 선행 기후인자 변화 양상이 우리나라의 강수량에 높은 상관성을 가지며 영향을 미친다면 예측을 위한 매우 유용한 정보라 하겠으나 국내 강수량과 기후 지수 사이의 선형 상관성은 매우 낮을 뿐만 아니라 지체상관성도 특정 지체에서 매우 큰 상관성을 보이는 인자를 찾기 어려움을 알 수 있다. 이를 극복하기 위하여 본 연구에서는 k-mean clustering을 이용하여 우리나라 주변의 기후조건을 분류하고 기후조건에 따른 강수량의 변화를 분석하였다. 남중국해역($105^{\circ}E\;^{\sim}\;135^{\circ}E$, $0^{\circ}N\;^{\sim}\;35^{\circ}N$), 우리나라 연안 해역 ($110^{\circ}E\;^{\sim}\;150^{\circ}E$, $20^{\circ}N\;^{\sim}\;40^{\circ}N$), 인도양 해역 ($75^{\circ}E\;^{\sim}\;105^{\circ}E$, $0^{\circ}N\;^{\sim}\;25^{\circ}N$) 및 아라비아 해역 ($45^{\circ}E\;^{\sim}\;75^{\circ}E$, $0^{\circ}N\;^{\sim}\;30^{\circ}N$ 평균 해수면 온도 변화에 따라 8개 군집으로 분류한 분석결과로 분석결과 2008년도는 그룹 5에 해당하며 그룹 5의 기후 상태는 근해와 남중국해역의 평균 해수면 온도가 평년보다 낮고 인도양 해역과 아라비아 해역의 평균 해수면 온도는 평년값과 비슷한 상태를 나타낸다. 그룹 5에 해당하는 기후조건에서 차년의 강수평균은 평년값 보다 적음을 보였다. 이러한 특성은 전체 유역에 걸쳐 동일하게 나타났다. 이에 대한 계절적 평균 분포는 군집 5에 대한 차년도 강수의 평균 계절분포는 전체적으로 평년값보다 낮게 나타났다. 이에 근거하여 올해 연 평균 강수량은 평년값보다 적을 것이며 전체 계절에 대하여도 평년값보다 적은 강수량이 올 것으로 판단된다. 이는 기상청의 2009년 봄철 기후전망과 유사한 예측 결과를 보여준다.

  • PDF

A Fine Dust Measurement Technique using K-means and Sobel-mask Edge Detection Method (K-means와 Sobel-mask 윤곽선 검출 기법을 이용한 미세먼지 측정 방법)

  • Lee, Won-Hyeung;Seo, Ju-Wan;Kim, Ki-Yeon;Lin, Chi-Ho
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.22 no.2
    • /
    • pp.97-101
    • /
    • 2022
  • In this paper, we propose a method of measuring Fine dust in images using K-means and Sobel-mask based edge detection techniques using CCTV. The proposed algorithm collects images using a CCTV camera and designates an image range through a region of interest. When clustering is completed by applying the K-means algorithm, outline is detected through Sobel-mask, edge strength is measured, and the concentration of fine dust is determined based on the measured data. The proposed method extracts the contour of the mountain range using the characteristics of Sobel-mask, which has an advantage in diagonal measurement, and shows the difference in detection according to the concentration of fine dust as an experimental result.

Scalable Collaborative Filtering Technique based on Adaptive Clustering (적응형 군집화 기반 확장 용이한 협업 필터링 기법)

  • Lee, O-Joun;Hong, Min-Sung;Lee, Won-Jin;Lee, Jae-Dong
    • Journal of Intelligence and Information Systems
    • /
    • v.20 no.2
    • /
    • pp.73-92
    • /
    • 2014
  • An Adaptive Clustering-based Collaborative Filtering Technique was proposed to solve the fundamental problems of collaborative filtering, such as cold-start problems, scalability problems and data sparsity problems. Previous collaborative filtering techniques were carried out according to the recommendations based on the predicted preference of the user to a particular item using a similar item subset and a similar user subset composed based on the preference of users to items. For this reason, if the density of the user preference matrix is low, the reliability of the recommendation system will decrease rapidly. Therefore, the difficulty of creating a similar item subset and similar user subset will be increased. In addition, as the scale of service increases, the time needed to create a similar item subset and similar user subset increases geometrically, and the response time of the recommendation system is then increased. To solve these problems, this paper suggests a collaborative filtering technique that adapts a condition actively to the model and adopts the concepts of a context-based filtering technique. This technique consists of four major methodologies. First, items are made, the users are clustered according their feature vectors, and an inter-cluster preference between each item cluster and user cluster is then assumed. According to this method, the run-time for creating a similar item subset or user subset can be economized, the reliability of a recommendation system can be made higher than that using only the user preference information for creating a similar item subset or similar user subset, and the cold start problem can be partially solved. Second, recommendations are made using the prior composed item and user clusters and inter-cluster preference between each item cluster and user cluster. In this phase, a list of items is made for users by examining the item clusters in the order of the size of the inter-cluster preference of the user cluster, in which the user belongs, and selecting and ranking the items according to the predicted or recorded user preference information. Using this method, the creation of a recommendation model phase bears the highest load of the recommendation system, and it minimizes the load of the recommendation system in run-time. Therefore, the scalability problem and large scale recommendation system can be performed with collaborative filtering, which is highly reliable. Third, the missing user preference information is predicted using the item and user clusters. Using this method, the problem caused by the low density of the user preference matrix can be mitigated. Existing studies on this used an item-based prediction or user-based prediction. In this paper, Hao Ji's idea, which uses both an item-based prediction and user-based prediction, was improved. The reliability of the recommendation service can be improved by combining the predictive values of both techniques by applying the condition of the recommendation model. By predicting the user preference based on the item or user clusters, the time required to predict the user preference can be reduced, and missing user preference in run-time can be predicted. Fourth, the item and user feature vector can be made to learn the following input of the user feedback. This phase applied normalized user feedback to the item and user feature vector. This method can mitigate the problems caused by the use of the concepts of context-based filtering, such as the item and user feature vector based on the user profile and item properties. The problems with using the item and user feature vector are due to the limitation of quantifying the qualitative features of the items and users. Therefore, the elements of the user and item feature vectors are made to match one to one, and if user feedback to a particular item is obtained, it will be applied to the feature vector using the opposite one. Verification of this method was accomplished by comparing the performance with existing hybrid filtering techniques. Two methods were used for verification: MAE(Mean Absolute Error) and response time. Using MAE, this technique was confirmed to improve the reliability of the recommendation system. Using the response time, this technique was found to be suitable for a large scaled recommendation system. This paper suggested an Adaptive Clustering-based Collaborative Filtering Technique with high reliability and low time complexity, but it had some limitations. This technique focused on reducing the time complexity. Hence, an improvement in reliability was not expected. The next topic will be to improve this technique by rule-based filtering.

Efficiently Color Compensation in Back-Light Image using Fuzzy c-means Clustering Algorithm (FCM을 이용한 역광 이미지의 효율적인 컬러 색상 보정)

  • Kim, Young-Tak;Yu, Jae-Hyoung;Hahn, Hern-Soo
    • Proceedings of the Korean Society of Computer Information Conference
    • /
    • 2011.01a
    • /
    • pp.37-38
    • /
    • 2011
  • 본 논문은 상대적으로 대비도 차이가 크게 나타나는 역광 이미지에 대해서 Retinex 알고리즘을 적용하여 보정 했을 경우 발생하는 밝은 영역에서의 컬러 성분의 손실을 개선하기 위한 새로운 기법을 제안한다. 역광 이미지의 경우 밝은 영역과 어두운 영역에 대한 밝기 차이가 매우 크게 발생하기 때문에 Retinex 알고리즘을 이용하여 영상의 대비도를 향상시킬 경우 밝은 영역에서의 컬러 성분이 손실되는 현상이 발생한다. 이러한 손실을 보완하기 위해서 원본 영상의 밝은 영역에 해당하는 컬러 성분을 Retinex 알고리즘으로 보정된 영상에 추가해준다. Fuzzy c-means 군집화 알고리즘을 이용하여 원본 영상에서의 밝은 영역과 어두운 영역에 대하여 모든 화소의 소속 정도를 나타내는 퍼지 소속 함수를 구한다. 밝은 영역에 대해서의 컬러 성분은 원본 영상 값에 밝은 영역 퍼지 소속 함수를 적용하고, 어두운 영역에 대해서의 컬러 성분은 Retinex 복원 영상 값에 어두운 영역 퍼지 소속 함수를 이용한다. 제안하는 알고리즘의 성능 평가를 위해 역광 현상이 강하게 나타나는 자연영상들을 대상으로 적용하여 기존의 Retinex 알고리즘(MSRCR) 보다 우수한 성능을 가지고 있음을 보였다.

  • PDF

Blind Channel Estimation through Clustering in Backscatter Communication Systems (후방산란 통신시스템에서 군집화를 통한 블라인드 채널 추정)

  • Kim, Soo-Hyun;Lee, Donggu;Sun, Young-Ghyu;Sim, Issac;Hwang, Yu-Min;Shin, Yoan;Kim, Dong-In;Kim, Jin-Young
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.20 no.2
    • /
    • pp.81-86
    • /
    • 2020
  • Ambient backscatter communication has a drawback in which the transmission power is limited because the data is transmitted using the ambient RF signal. In order to improve transmission efficiency between transceiver, a channel estimator capable of estimating channel state at a receiver is needed. In this paper, we consider the K-means algorithm to improve the performance of the channel estimator based on EM algorithm. The simulation uses MSE as a performance parameter to verify the performance of the proposed channel estimator. The initial value setting through K-means shows improved performance compared to the channel estimation method using the general EM algorithm.

Extensions of X-means with Efficient Learning the Number of Clusters (X-means 확장을 통한 효율적인 집단 개수의 결정)

  • Heo, Gyeong-Yong;Woo, Young-Woon
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.12 no.4
    • /
    • pp.772-780
    • /
    • 2008
  • K-means is one of the simplest unsupervised learning algorithms that solve the clustering problem. However K-means suffers the basic shortcoming: the number of clusters k has to be known in advance. In this paper, we propose extensions of X-means, which can estimate the number of clusters using Bayesian information criterion(BIC). We introduce two different versions of algorithm: modified X-means(MX-means) and generalized X-means(GX-means), which employ one full covariance matrix for one cluster and so can estimate the number of clusters efficiently without severe over-fitting which X-means suffers due to its spherical cluster assumption. The algorithms start with one cluster and try to split a cluster iteratively to maximize the BIC score. The former uses K-means algorithm to find a set of optimal clusters with current k, which makes it simple and fast. However it generates wrongly estimated centers when the clusters are overlapped. The latter uses EM algorithm to estimate the parameters and generates more stable clusters even when the clusters are overlapped. Experiments with synthetic data show that the purposed methods can provide a robust estimate of the number of clusters and cluster parameters compared to other existing top-down algorithms.

Community dynamics of Salix species during the sedimentation in Paksil-nup Wetland, Hapcheon (합천 박실늪의 퇴적에 따른 버들류 (Salix sp.)의 군집 동태)

  • Kim, Cheol-Soo;Lee, Pal-Hong;Son, Sung-Gon;Oh, Kyung-Hwan
    • Journal of Wetlands Research
    • /
    • v.2 no.1
    • /
    • pp.19-29
    • /
    • 2000
  • The physico-chemical characteristics of core sediment, community dynamics of Salix species during the sedimentation were investigated in 1990~1997 for the purpose of inquiry to reveal the effects of terrestrialization on the environment and plant community in a natural wetland. The study site, Paksil-nup wetland was a valley blocked lake located near Hwang-River, Hapcheon-gun, Gyeongsangnam-do, Korea. The values of conductivity, organic matter, total nitrogen, exchangeable K, and exchangeable Ca were higher, and pH was lower in the upper layer of the core sediment. Soil properties such as available phosphorus, exchangeable Ca, exchangeable Na increased, and organic matter, total nitrogen, exchangeable K decreased during the sedimentation. Salix nipponica was the dominant species, and Salix glandulosa was subdominant among 10 Salix species. Sahx species were supposed to be the pioneer plants in shrub and tree layers during the succession in Paksil-nup wetland. Age class of SaliX species community from the epilittoral zone to the infralittoral zone were low, and age of Saljx species distributed from 2 years to 11 years. DBH, height, mean number of branches, number of herb species, and light intensity were increased, whereas density was decreased from a lower age to a higher age community. Salix nipponica was superior than Sahx purpurea var. japonica and other Salix species during the interspecific competition among Salix species.

  • PDF