• Title/Summary/Keyword: Optimal clustering

Search Result 367, Processing Time 0.028 seconds

A new cluster validity index based on connectivity in self-organizing map (자기조직화지도에서 연결강도에 기반한 새로운 군집타당성지수)

  • Kim, Sangmin;Kim, Jaejik
    • The Korean Journal of Applied Statistics
    • /
    • v.33 no.5
    • /
    • pp.591-601
    • /
    • 2020
  • The self-organizing map (SOM) is a unsupervised learning method projecting high-dimensional data into low-dimensional nodes. It can visualize data in 2 or 3 dimensional space using the nodes and it is available to explore characteristics of data through the nodes. To understand the structure of data, cluster analysis is often used for nodes obtained from SOM. In cluster analysis, the optimal number of clusters is one of important issues. To help to determine it, various cluster validity indexes have been developed and they can be applied to clustering outcomes for nodes from SOM. However, while SOM has an advantage in that it reflects the topological properties of original data in the low-dimensional space, these indexes do not consider it. Thus, we propose a new cluster validity index for SOM based on connectivity between nodes which considers topological properties of data. The performance of the proposed index is evaluated through simulations and it is compared with various existing cluster validity indexes.

Development and Evaluation of Sediment Delivery Ratio Equation using Clustering Methods for Estimation of Sediment Discharge on Ungauged Basins in Korea (국내 미계측 유역의 유사유출량 예측을 위한 군집별 유사전달율 산정식 도출 및 평가)

  • Lee, Seoro;Park, Sang Deog;Shin, Seung Sook;Kim, Ki-sung;Kim, Jonggun;Lim, Kyoung Jae
    • Journal of Korean Society on Water Environment
    • /
    • v.34 no.5
    • /
    • pp.537-547
    • /
    • 2018
  • Sediment discharge by rainfall runoff affects water quality in rivers such as turbid water, eutrophication. In order to solve various problems caused by soil loss, it is important to establish a sediment management plan for watersheds and rivers in advance. However, there is a lack of sediment data available for estimating sediment discharge in ungauged basins.. Thus, reasonable research is very important to evaluate and predict the sediment discharge quantitatively. In this study, cluster analysis was conducted to classify gauged watersheds into hydrologically homogeneous groups based on the watershed characteristics. Also, this study suggests a method to efficiently predict the sediment discharge for ungauged basins by developing and evaluating the SDR equations based on the PA-SDR module. As the result, the SDR equations for the classified watersheds were derived to predict the most reasonable sediment discharge of ungauged basins with 0.24 % ~ 10.89 % errors. It was found that the optimal parameters for the gauged basins reflect well characteristic of sediment movement. SDR equations proposed in this study will be available for estimating sediment discharge on ungauged basins. Also it is possible to utilize establishing the appropriate sediment management plan for integrated management of watershed and river in Korea.

Function Approximation for accelerating learning speed in Reinforcement Learning (강화학습의 학습 가속을 위한 함수 근사 방법)

  • Lee, Young-Ah;Chung, Tae-Choong
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.13 no.6
    • /
    • pp.635-642
    • /
    • 2003
  • Reinforcement learning got successful results in a lot of applications such as control and scheduling. Various function approximation methods have been studied in order to improve the learning speed and to solve the shortage of storage in the standard reinforcement learning algorithm of Q-Learning. Most function approximation methods remove some special quality of reinforcement learning and need prior knowledge and preprocessing. Fuzzy Q-Learning needs preprocessing to define fuzzy variables and Local Weighted Regression uses training examples. In this paper, we propose a function approximation method, Fuzzy Q-Map that is based on on-line fuzzy clustering. Fuzzy Q-Map classifies a query state and predicts a suitable action according to the membership degree. We applied the Fuzzy Q-Map, CMAC and LWR to the mountain car problem. Fuzzy Q-Map reached the optimal prediction rate faster than CMAC and the lower prediction rate was seen than LWR that uses training example.

Nano Technology Trend Analysis Using Google Trend and Data Mining Method for Nano-Informatics (나노 인포매틱스 기반 구축을 위한 구글 트렌드와 데이터 마이닝 기법을 활용한 나노 기술 트렌드 분석)

  • Shin, Minsoo;Park, Min-Gyu;Bae, Seong-Hun
    • Journal of Korean Society of Industrial and Systems Engineering
    • /
    • v.40 no.4
    • /
    • pp.237-245
    • /
    • 2017
  • Our research is aimed at predicting recent trend and leading technology for the future and providing optimal Nano technology trend information by analyzing Nano technology trend. Under recent global market situation, Users' needs and the technology to meet these needs are changing in real time. At this point, Nano technology also needs measures to reduce cost and enhance efficiency in order not to fall behind the times. Therefore, research like trend analysis which uses search data to satisfy both aspects is required. This research consists of four steps. We collect data and select keywords in step 1, detect trends based on frequency and create visualization in step 2, and perform analysis using data mining in step 3. This research can be used to look for changes of trend from three perspectives. This research conducted analysis on changes of trend in terms of major classification, Nano technology of 30's, and key words which consist of relevant Nano technology. Second, it is possible to provide real-time information. Trend analysis using search data can provide information depending on the continuously changing market situation due to the real-time information which search data includes. Third, through comparative analysis it is possible to establish a useful corporate policy and strategy by apprehending the trend of the United States which has relatively advanced Nano technology. Therefore, trend analysis using search data like this research can suggest proper direction of policy which respond to market change in a real time, can be used as reference material, and can help reduce cost.

Extended Information Entropy via Correlation for Autonomous Attribute Reduction of BigData (빅 데이터의 자율 속성 감축을 위한 확장된 정보 엔트로피 기반 상관척도)

  • Park, In-Kyu
    • Journal of Korea Game Society
    • /
    • v.18 no.1
    • /
    • pp.105-114
    • /
    • 2018
  • Various data analysis methods used for customer type analysis are very important for game companies to understand their type and characteristics in an attempt to plan customized content for our customers and to provide more convenient services. In this paper, we propose a k-mode cluster analysis algorithm that uses information uncertainty by extending information entropy to reduce information loss. Therefore, the measurement of the similarity of attributes is considered in two aspects. One is to measure the uncertainty between each attribute on the center of each partition and the other is to measure the uncertainty about the probability distribution of the uncertainty of each property. In particular, the uncertainty in attributes is taken into account in the non-probabilistic and probabilistic scales because the entropy of the attribute is transformed into probabilistic information to measure the uncertainty. The accuracy of the algorithm is observable to the result of cluster analysis based on the optimal initial value through extensive performance analysis and various indexes.

Analysis and Detection Method for Line-shaped Echoes using Support Vector Machine (Support Vector Machine을 이용한 선에코 특성 분석 및 탐지 방법)

  • Lee, Hansoo;Kim, Eun Kyeong;Kim, Sungshin
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.24 no.6
    • /
    • pp.665-670
    • /
    • 2014
  • A SVM is a kind of binary classifier in order to find optimal hyperplane which separates training data into two groups. Due to its remarkable performance, the SVM is applied in various fields such as inductive inference, binary classification or making predictions. Also it is a representative black box model; there are plenty of actively discussed researches about analyzing trained SVM classifier. This paper conducts a study on a method that is automatically detecting the line-shaped echoes, sun strobe echo and radial interference echo, using the SVM algorithm because the line-shaped echoes appear relatively often and disturb weather forecasting process. Using a spatial clustering method and corrected reflectivity data in the weather radar, the training data is made up with mean reflectivity, size, appearance, centroid altitude and so forth. With actual occurrence cases of the line-shaped echoes, the trained SVM classifier is verified, and analyzed its characteristics using the decision tree method.

Construction Scheme of Training Data using Automated Exploring of Boundary Categories (경계범주 자동탐색에 의한 확장된 학습체계 구성방법)

  • Choi, Yun-Jeong;Jee, Jeong-Gyu;Park, Seung-Soo
    • The KIPS Transactions:PartB
    • /
    • v.16B no.6
    • /
    • pp.479-488
    • /
    • 2009
  • This paper shows a reinforced construction scheme of training data for improvement of text classification by automatic search of boundary category. The documents laid on boundary area are usually misclassified as they are including multiple topics and features. which is the main factor that we focus on. In this paper, we propose an automated exploring methodology of optimal boundary category based on previous research. We consider the boundary area among target categories to new category to be required training, which are then added to the target category sementically. In experiments, we applied our method to complex documents by intentionally making errors in training process. The experimental results show that our system has high accuracy and reliability in noisy environment.

An Optimal Resource Distribution Scheme for P2P Streaming Service over Centralized DU Environment in LTE (LTE에서 집중화된 DU 환경에서 P2P 스트리밍 서비스를 위한 최적의 자원 배분 방안)

  • Kim, Yangjung;Chong, Ilyoung
    • KIPS Transactions on Computer and Communication Systems
    • /
    • v.3 no.3
    • /
    • pp.81-86
    • /
    • 2014
  • According to the development of streaming services with P2P and mobile network technologies, researches to enhance the service quality in mobile environment have been proposed. However, streaming services considering high-speed mobile environment and characteristics of heterogenous terminals have been hindered from being provided with the required quality from user because of bandwidth congestion between selfish peers of existing P2P system. It is also prone to long delay and loss in accordance with the repeated traffic amounts because there are no optimized solution for traffic localization. The structure to enhance peer contribution for service differentiation and peer selection with clustering scheme with location information of terminal can satisfy both users and service providers with service quality and efficiency. In this paper, we propose an incentive mechanism and resource distribution scheme with user contribution and traffic cost information based on user location, which make mobile users increase the satisfaction of service quality in LTE environments.

Self Organizing RBF Neural Network Equalizer (자력(自力) RBF 신경망 등화기)

  • Kim, Jeong-Su;Jeong, Jeong-Hwa
    • Journal of the Institute of Electronics Engineers of Korea CI
    • /
    • v.39 no.1
    • /
    • pp.35-47
    • /
    • 2002
  • This paper proposes a self organizing RBF neural network equalizer for the equalization of digital communications. It is the most important for the equalizer using the RBF neural network to estimate the RBF centers correctly and quickly, which are the desired channel states. However, the previous RBF equalizers are not used in the actual communication system because of some drawbacks that the number of channel states has to be known in advance and many centers are necessary. Self organizing neural network equalizer proposed in this paper can implement the equalization without prior information regarding the number of channel states because it selects RBF centers among the signals that are transmitted to the equalizer by the new addition and removal criteria. Furthermore, the proposed equalizer has a merit that is able to make a equalization with fewer centers than those of prior one by the course of the training using LMS and clustering algorithm. In the linear, nonlinear and standard telephone channel, the proposed equalizer is compared with the optimal Bayesian equalizer for the BER performance, the symbol decision boundary and the number of centers. As a result of the comparison, we can confirm that the proposed equalizer has almost similar performance with the Bavesian enualizer.

The Effect of the Number of Phoneme Clusters on Speech Recognition (음성 인식에서 음소 클러스터 수의 효과)

  • Lee, Chang-Young
    • The Journal of the Korea institute of electronic communication sciences
    • /
    • v.9 no.11
    • /
    • pp.1221-1226
    • /
    • 2014
  • In an effort to improve the efficiency of the speech recognition, we investigate the effect of the number of phoneme clusters. For this purpose, codebooks of varied number of phoneme clusters are prepared by modified k-means clustering algorithm. The subsequent processing is fuzzy vector quantization (FVQ) and hidden Markov model (HMM) for speech recognition test. The result shows that there are two distinct regimes. For large number of phoneme clusters, the recognition performance is roughly independent of it. For small number of phoneme clusters, however, the recognition error rate increases nonlinearly as it is decreased. From numerical calculation, it is found that this nonlinear regime might be modeled by a power law function. The result also shows that about 166 phoneme clusters would be the optimal number for recognition of 300 isolated words. This amounts to roughly 3 variations per phoneme.