• Title/Summary/Keyword: 군집 수 결정

Search Result 365, Processing Time 0.024 seconds

Sampling-Based Automated Parameter Estimation for Canopy Clustering (샘플링 기반 Canopy Clustering 파라미터 설정 기법)

  • Choi, Sung-Woon;Yu, Seung-Hak;Yoon, Sung-Roh
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2012.06b
    • /
    • pp.438-440
    • /
    • 2012
  • 대용량 데이터를 효율적으로 군집화하기위해 개발된 Canopy Clustering은 2개의 파라미터 (T1, T2)에 기반하여 Canopy 형성이 결정되며, 결과적으로 이들 파라미터에 의해 군집화 결과가 크게 달라질 수 있다. 이에 따라 데이터의 특성을 잘 반영하는 파라미터 값을 적절히 선택하는 것이 매우 중요하지만, 자동화된 파라미터 설정 기법의 부재로 인하여, 기존 연구에서는 사용자의 경험에 의하여 Canopy Clustering의 파라미터 값을 설정하는 것이 일반적이었다. 본 논문에서는 통계적 샘플링을 이용하여 T1, T2의 값을 효과적으로 설정하는 방법을 제안한다.

Finding the Number of Clusters and Various Experiments Based on ASA Clustering Method (ASA 군집화를 이용한 군집수 결정 및 다양한 실험)

  • Yoon Bok-Sik
    • Journal of the Korean Operations Research and Management Science Society
    • /
    • v.31 no.2
    • /
    • pp.87-98
    • /
    • 2006
  • In many cases of cluster analysis we are forced to perform clustering without any prior knowledge on the number of clusters. But in some clustering methods such as k-means algorithm it is required to provide the number of clusters beforehand. In this study, we focus on the problem to determine the number of clusters in the given data. We follow the 2 stage approach of ASA clustering algorithm and mainly try to improve the performance of the first stage of the algorithm. We verify the usefulness of the method by applying it for various kinds of simulated data. Also, we apply the method for clustering two kinds of real life qualitative data.

User Adaptive Recommendation Model Based on User Clustering using Proxies (대리자를 이용한 군집화 기반 사용자 적응적 추천 모델)

  • Ryu, Sanghyun;Song, Changhwan;Jang, Hyunsu;Eom, Young Ik
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2009.04a
    • /
    • pp.39-42
    • /
    • 2009
  • 사용자 적응형 추천 시스템의 목적은 사용자의 선호도와 행동 정보 등을 분석, 분류하여 그를 바탕으로 각 사용자가 필요로 하거나 선호 할 만한 서비스를 사용자에게 추천하여 사용자 편리성을 높이는 것이다. 그러나 기존의 추천 시스템은 새로운 사용자의 등장이나 새로운 서비스의 등장 시 분석에 많은 시간을 필요로 하거나, 과특성화와 희귀성이라는 특성으로 인한 추천 서비스 단순화 등의 문제점을 안고 있다. 본 논문에서는 새로운 사용자 등장 시 결정 트리를 이용한 분류로 분석시간을 줄이고, 새로운 아이템의 등장 시 분석시간의 감소와 다양한 사용자 중심적인 추천을 위해 대리자를 이용한 사용자 군집화와 추천을 수행하는 새로운 모델을 제시한다. 또한 제안된 모델을 분석하여 위의 문제점들이 어떻게 해결되는지 설명한다.

Dynamic Crowd Simulation by Emotion-based Behavioral Control of Individuals (개체의 감정기반 행동제어를 통한 동적 군중 시뮬레이션)

  • Ahn, Eun-Young;Kim, Jae-Won;Han, Sang-Hoon;Moon, Chan-Il
    • The Journal of the Korea Contents Association
    • /
    • v.9 no.11
    • /
    • pp.1-9
    • /
    • 2009
  • In virtual environments, such as computer game and animation, we need to enhance naturalness of crowd simulation. So, we propose a method to generate dynamically moving crowd patterns by applying emotional factors to the individual characters of a crowd in the determination of their behavior. The proposed method mimics human behavior and controls each character in a group to decide its own path according to its individual status. And it is able to generate various moving patterns as a result of letting the individuals go to another group depending upon their conditions. In this paper, some temperament and feeling factors are defined and determination rules for calculating the emotional status are also proposed. Moreover we use a fuzzy theory for accurate representation of the ambiguous expressions such as feeling bad, feeling good and so on. Our experiments show that the suggested method can simulate virtual crowd in more natural and diverse ways.

A study on image segmentation for depth map generation (깊이정보 생성을 위한 영상 분할에 관한 연구)

  • Lim, Jae Sung
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.18 no.10
    • /
    • pp.707-716
    • /
    • 2017
  • The advances in image display devices necessitate display images suitable for the user's purpose. The display devices should be able to provide object-based image information when a depthmap is required. In this paper, we represent the algorithm using a histogram-based image segmentation method for depthmap generation. In the conventional K-means clustering algorithm, the number of centroids is parameterized, so existing K-means algorithms cannot adaptively determine the number of clusters. Further, the problem of K-means algorithm tends to sink into the local minima, which causes over-segmentation. On the other hand, the proposed algorithm is adaptively able to select centroids and can stand on the basis of the histogram-based algorithm considering the amount of computational complexity. It is designed to show object-based results by preventing the existing algorithm from falling into the local minimum point. Finally, we remove the over-segmentation components through connected-component labeling algorithm. The results of proposed algorithm show object-based results and better segmentation results of 0.017 and 0.051, compared to the benchmark method in terms of Probabilistic Rand Index(PRI) and Segmentation Covering(SC), respectively.

A study on 3-step complex data mining in society indicator survey (사회지표조사에서의 3단계 복합 데이터마이닝의 적용 방안)

  • Cho, Kwang-Hyun;Park, Hee-Chang
    • Journal of the Korean Data and Information Science Society
    • /
    • v.23 no.5
    • /
    • pp.983-992
    • /
    • 2012
  • Social indicator survey can identify the state of society as a whole. When we create a policy, social indicator survey can reflect the public opinion of the region. Social indicator survey is an important measure of social change. Social indicator survey has been conducted in many municipalities (Seoul, Incheon, Busan, Ulsan, Gyeongsangnamdo, etc.). But, the result of social indicator survey analysis is mainly the basic statistical analysis. In this study, we propose a new data mining methodology for effective analysis. We propose a 3-step complex data mining in society indicator survey. 3-step complex data mining uses three data mining method (intervening association rule, clustering, decision tree).

A Study on the Optimization of State Tying Acoustic Models using Mixture Gaussian Clustering (혼합 가우시안 군집화를 이용한 상태공유 음향모델 최적화)

  • Ann, Tae-Ock
    • Journal of the Institute of Electronics Engineers of Korea SP
    • /
    • v.42 no.6
    • /
    • pp.167-176
    • /
    • 2005
  • This paper describes how the state tying model based on the decision tree which is one of Acoustic models used for speech recognition optimizes the model by reducing the number of mixture Gaussians of the output probability distribution. The state tying modeling uses a finite set of questions which is possible to include the phonological knowledge and the likelihood based decision criteria. And the recognition rate can be improved by increasing the number of mixture Gaussians of the output probability distribution. In this paper, we'll reduce the number of mixture Gaussians at the highest point of recognition rate by clustering the Gaussians. Bhattacharyya and Euclidean method will be used for the distance measure needed when clustering. And after calculating the mean and variance between the pair of lowest distance, the new Gaussians are created. The parameters for the new Gaussians are derived from the parameters of the Gaussians from which it is born. Experiments have been performed using the STOCKNAME (1,680) databases. And the test results show that the proposed method using Bhattacharyya distance measure maintains their recognition rate at $97.2\%$ and reduces the ratio of the number of mixture Gaussians by $1.0\%$. And the method using Euclidean distance measure shows that it maintains the recognition rate at $96.9\%$ and reduces the ratio of the number of mixture Gaussians by $1.0\%$. Then the methods can optimize the state tying model.

Plot Size for Investigating Forest Community Structure(I) -Adequate Number of Plots of Tree Stratum in a Mixed Deciduous Forest Community at Sobaeksan Area- (삼림군집구조 조사를 위한 조사구 크기에 관한 연구(I) -소백산지역 활엽수혼효림군집 교목층의 적정 조사구수-)

  • 박인협;이경재;조재창
    • Korean Journal of Environment and Ecology
    • /
    • v.6 no.2
    • /
    • pp.162-167
    • /
    • 1993
  • A mixed deciduous forest community in Mt. Sobaek was studied to determine the adequate number of plots of tree stratum for investigating forest community structure. Twenty l0m $\times$ l0m plots were set up iii the studied forest community, and species area curve. performance curve and statistical method were carried out. According to species-area curve, the minimal number of plots where a given percentage increase in number of plots produced less than the same percentage increase in number of species was eight. The minimal number of plots where a given percentage increase in number of plots produced less than the half of the percentage increase in number of plots was eleven. According to performance curve by importance value of the major species, the minimal number of plots where the dominant species was distinguished from the subdominant species was five. The minimal number of plots where the subdominant species was distinguished from each other was ten. Therefore, ten l0m $\times$ l0m plots seems to give an adequate sample for investigating structure of the studied forest community. Similarity index between the ten plots and total twenty plots was above 90%, and 95% confidence interval of species diversity of the ten plots was $\pm$ 0.073.

  • PDF

Plot Size for Investigating Forest Community Structure(II) -Adequate Plot Area of Tree Stratum in a Mixed Forest Community at T$\v{o}$kyusan Area- (삼림군집구조 조사를 위한 조사구 크기에 관한 연구(II) -덕유산지역 혼효림군집 교목층의 적정 조사구 면적-)

  • Park, In-Hyeop;Ryu, Chang-Hee;Cho, Woo
    • Korean Journal of Environment and Ecology
    • /
    • v.7 no.2
    • /
    • pp.187-191
    • /
    • 1994
  • A mixed forest community in Tokyusan was studied to determine the adequate plot area of tree stratum for investigating forest community structure. Nineteen nested plots were set up in the studied forest community, and species-area curve and performance curve were established. According to species-area curve, the minimum plot area where a given percentage increase in plot area produced less than the same percentage increase in number of species was 500$m^2$. The minimum plot area where a given percentage increase in plot area produced less than the half of the percentage increase in number of species was 1,000$m^2$. According to performance curve of the importance values of the major species, the minimum plot area where the importance value of the major species was distinguished from each other was 900$m^2$, and the minimum plot area was 500$m^2$ except for a big tree of Pinus densiflora distributed unexpectedly. According to performance curve of species diversity, the minimum plot area was 400$m^2$. Similarity indices between plot area above 900$m^2$ and total plot area were more than 90% and similarity indices between plot area above 400$m^2$ and total plot area were more than 85%. It may be as a conclusion that minimum plot area was generally about 500$m^2$ and in case of requiring more accuracy, minimum plot area was about 1,000$m^2$.

  • PDF

A Study of Post-processing Methods of Clustering Algorithm and Classification of the Segmented Regions (클러스터링 알고리즘의 후처리 방안과 분할된 영역들의 분류에 대한 연구)

  • Oh, Jun-Taek;Kim, Bo-Ram;Kim, Wook-Hyun
    • The KIPS Transactions:PartB
    • /
    • v.16B no.1
    • /
    • pp.7-16
    • /
    • 2009
  • Some clustering algorithms have a problem that an image is over-segmented since both the spatial information between the segmented regions is not considered and the number of the clusters is defined in advance. Therefore, they are difficult to be applied to the applicable fields. This paper proposes the new post-processing methods, a reclassification of the inhomogeneous clusters and a region merging using Baysian algorithm, that improve the segmentation results of the clustering algorithms. The inhomogeneous cluster is firstly selected based on variance and between-class distance and it is then reclassified into the other clusters in the reclassification step. This reclassification is repeated until the optimal number determined by the minimum average within-class distance. And the similar regions are merged using Baysian algorithm based on Kullbeck-Leibler distance between the adjacent regions. So we can effectively solve the over-segmentation problem and the result can be applied to the applicable fields. Finally, we design a classification system for the segmented regions to validate the proposed method. The segmented regions are classified by SVM(Support Vector Machine) using the principal colors and the texture information of the segmented regions. In experiment, the proposed method showed the validity for various real-images and was effectively applied to the designed classification system.