• Title/Summary/Keyword: K-평균 군집분석

Search Result 449, Processing Time 0.029 seconds

XML Document Clustering Technique by K-means algorithm through PCA (주성분 분석의 K 평균 알고리즘을 통한 XML 문서 군집화 기법)

  • Kim, Woo-Saeng
    • The KIPS Transactions:PartD
    • /
    • v.18D no.5
    • /
    • pp.339-342
    • /
    • 2011
  • Recently, researches are studied in developing efficient techniques for accessing, querying, and storing XML documents which are frequently used in the Internet. In this paper, we propose a new method to cluster XML documents efficiently. We use a K-means algorithm with a Principal Component Analysis(PCA) to cluster XML documents after they are represented by vectors in the feature vector space by transferring them as names and levels of the elements of the corresponding trees. The experiment shows that our proposed method has a good result.

A Study on the Relationship between Skill and Competition Score Factors of KLPGA Players Using Canonical Correlation Biplot and Cluster Analysis (정준상관 행렬도와 군집분석을 응용한 KLPGA 선수의 기술과 경기성적요인에 대한 연관성 분석)

  • Choi, Tae-Hoon;Choi, Yong-Seok
    • The Korean Journal of Applied Statistics
    • /
    • v.21 no.3
    • /
    • pp.429-439
    • /
    • 2008
  • Canonical correlation biplot is 2-dimensional plot for investigating the relationship between two sets of variables and the relationship between observations and variables in canonical correlation analysis graphically. In general, biplot is useful for giving a graphical description of the data. However, this general biplot and also canonical correlation biplot do not give some concise interpretations between variables and observations when the number of observations are large. Recently, for overcoming this problem, Choi and Kim (2008) suggested a method to interpret the biplot analysis by applying the K-means clustering analysis. Therefore, in this study, we will apply their method for investigating the relationship between skill and competition score factors of KLPGA players using canonical correlation biplot and cluster analysis.

News Clustering and Multi-Document Summarization for Real-time Issue Analysis (실시간 이슈 분석을 위한 뉴스 군집화 및 다중 문서 요약)

  • Yu, Hongyeon;Lee, Seungwoo;Ko, Youngjoong
    • Annual Conference on Human and Language Technology
    • /
    • 2018.10a
    • /
    • pp.132-137
    • /
    • 2018
  • 뉴스 기반의 실시간 이슈 분석을 위해서는 실시간으로 생성되는 다중 뉴스 기사 집합을 입력으로 받아 점증적으로 군집화 하고, 각 군집별 정보를 자동으로 요약하는 기술이 필요하다. 기존에는 정적인 데이터 기반의 군집화와 요약 각각에 대한 연구는 활발히 진행되고 있지만, 실시간으로 입력되는 대량의 데이터를 위한 점증적인 군집화와 요약에 대한 연구는 매우 부족하다. 따라서 본 논문에서는 실시간으로 입력되는 대량의 뉴스 기사 집합을 분석하기 위한 점증적이고 계층적인 뉴스 군집화 및 다중 문서 요약 방법을 제안한다. 평가를 위해서 2016년 10월, 11월 두 달간의 실제 데이터를 사용 하였으며, 전문 교육을 받은 연구원들이 Precision at k 기반의 정성평가를 진행하였다. 그 결과, 자동으로 생성된 12개의 군집에서 군집 성능은 평균 66% (상위계층 $l_1$: 82%, 하위계층 $l_2$: 43%), 요약 성능은 평균 92%를 얻었다.

  • PDF

Color Analysis of Clothing in Product Images for User's Color Preference-Based Recommendation System (사용자의 색상 선호 기반 추천 시스템을 위한 상품 이미지 속 의류 색상 분석)

  • Roh, Eunjin;Park, Sangwon
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2022.11a
    • /
    • pp.643-645
    • /
    • 2022
  • 많은 온라인 쇼핑몰에서 색상 기반 필터링 서비스나 추천 시스템을 제공하지만, 수동 분류는 많은 시간이 들고 오류 위험이 있다. 본 연구의 실험에서는 먼저 분석할 의류 이미지를 실루엣 분석으로 수행한 경우와 수행하지 않는 경우의 k-평균 군집화 알고리즘으로 가장 우세한 색상 군집의 중심값을 도출하는데, 만약 군집 개수가 2개 이상이면 보다 큰 군집의 중심값만을 고려한다. 이 중심값을 이용해 사전 학습한 k-최근접 이웃 알고리즘으로 색상 클래스를 분류한다. 실험 결과 실루엣 분석을 수행하지 않은 k-평균 군집화 알고리즘을 사용한 분류 방식이 정확도와 수행 시간 모두 매우 준수하였으나, 배경색이 존재하여 의류 색 분석에 영향을 줄 수 있는 경우 잘못 분류한다는 문제도 있다.

Gene Screening and Clustering of Yeast Microarray Gene Expression Data (효모 마이크로어레이 유전자 발현 데이터에 대한 유전자 선별 및 군집분석)

  • Lee, Kyung-A;Kim, Tae-Houn;Kim, Jae-Hee
    • The Korean Journal of Applied Statistics
    • /
    • v.24 no.6
    • /
    • pp.1077-1094
    • /
    • 2011
  • We accomplish clustering analyses for yeast cell cycle microarray expression data. To reflect the characteristics of a time-course data, we screen the genes using the test statistics with Fourier coefficients applying a FDR procedure. We compare the results done by model-based clustering, K-means, PAM, SOM, hierarchical Ward method and Fuzzy method with the yeast data. As the validity measure for clustering results, connectivity, Dunn index and silhouette values are computed and compared. A biological interpretation with GO analysis is also included.

A comparison analysis of factors to affect pedestrian volumes by land-use type using Seoul Pedestrian Survey data (토지이용유형별 보행량 영향 요인 비교·분석 - 서울시 유동인구 조사자료를 바탕으로)

  • Jang, Jin-Young;Choi, Sung-Taek;Lee, Hyang-Sook;Kim, Su-Jae;Choo, Sang-Ho
    • The Journal of The Korea Institute of Intelligent Transport Systems
    • /
    • v.14 no.2
    • /
    • pp.39-53
    • /
    • 2015
  • The paper analyzes factors to affect pedestrian volumes by land-use type using 2012 Seoul Pedestrian Survey. First of all, five groups were classified based on land-use types around survey points such as residential, commercial, industrial and green uses, using k-average cluster analysis. Then, differences in average pedestrian volumes by group were compared for a day and time of day. In addition, multiple regression analysis was employed to identify factors to affect pedestrian volumes, considering physical features, land use types, public transportation accessibility, and socio-economic indices as independent variables by spatial hierarchy. Model results show that the walkway width positively influenced on pedestrian volumes for all groups, whereas other variables differently affected by group. Our results can be used as basic data for establishing polices with respect to pedestrian road design and improvement as well as estimating pedestrian demand by land-use type.

Analysis of Relative Settlement Behavior of Retaining Wall Backside Ground Using Clustering (군집분류를 이용한 흙막이 벽체 배면 지반의 상대적 침하거동 분석)

  • Young-Jun Kwack;Heui-Soo Han
    • The Journal of Engineering Geology
    • /
    • v.33 no.1
    • /
    • pp.189-200
    • /
    • 2023
  • As urbanization and industrialization increase development in downtown areas, damage due to ground settlement continues to occur. Building collapse in urban has a high risk of leading to large-scale damage to life and property. However, there has rarely been studied on measurement data analysis methods when uneven loads are applied to the excavated ground and no prior knowledge of the ground. Accordingly, it was attempted to analyze the relative settlement behavior and correlation by processing the time-series surface settlement of construction sites in the urban. In this paper, the average index of difference in settlement and average of relative difference in settlement are defined and calculated, then plotted in the coordinate system to analyze the relative settlement behavior over time. In addition, since there was no prior knowledge of the ground, a standard to classify the clusters was needed, and the observation points were classified into using k-means clustering and Dunn Index. As a result of the analysis, it was confirmed that all the clusters moved to the stable region as the settlement amount converges. The clusters were segmented. Based on the analysis results, it was possible to distinguish between the independent displacement area and same behavior area by analyzing the correlation between measurement points. If possible to analyze the relative settlement behavior between the stations and classify the behavior areas, it can be helpful in settlement and stability management, such as uplift of the surrounding area, prediction of ground failure area, and prevention of activity failure.

Impacts of Automated Vehicle Platoons on Car-following Behavior of Manually-Driven Vehicles (군집주행 환경이 비자율차량의 차량 추종에 미치는 영향분석)

  • Suh, Sanghyuk;Lee, Seolyoung;Oh, Cheol;Choi, Saerona
    • The Journal of The Korea Institute of Intelligent Transport Systems
    • /
    • v.16 no.4
    • /
    • pp.107-121
    • /
    • 2017
  • This study conducted a 3-stage survey and simulation experiment to identify the impact of vehicle platoons on car-following behavior of manually-driven vehicles. Vehicle maneuvering data obtained from driving simulations was statistically analyzed based on three measures including average speed, acceleration noise, and offset to represent the deviation of lateral movements. Results indicate that MV drivers tended to have psychological burden while driving in automated vehicle platooning environments, which resulted in different vehicle maneuvers. It is expected that the outcome of this study would be useful fundamentals in developing various traffic operations strategies for managing mixed traffic stream consisting of MVs and autonomous vehicles.

A Comparison of cluster analysis based on profile of LPGA player profile in 2009 (2009년 여자프로골프선수 프로파일을 이용한 군집방법비교)

  • Min, Dae-Kee
    • Journal of the Korean Data and Information Science Society
    • /
    • v.21 no.3
    • /
    • pp.471-480
    • /
    • 2010
  • Cluster analysis is one of the useful methods to find out number of groups and member’s belongings. With the rapid development of computer application in statistics, variety of new methods in clustering analysis were studied such as EM algorism and Self organization maps. The goals of cluster analysis is finding the number of groupings that are meaningful to me. If data are analyzed perfectly with cluster analysis, we can get the same results from discernment analysis.

Selecting Technique of Accident Sections using K-mean Method (K-평균법을 이용한 고속도로 사고분석구간 분할기법 개발)

  • Lee, Ki-Young;Chang, Myung-Soon
    • International Journal of Highway Engineering
    • /
    • v.7 no.4 s.26
    • /
    • pp.211-219
    • /
    • 2005
  • A selection of the analysis section for traffic accidents is used to analyze definitely the cause of accidents sorting similar accidents by a group and to raise the effect of improvement projects deciding the priority of accidents. In the existing method, an uniformly dividing method based on road mileages has been used, which has no consideration for similarities among accidents. Consequently, in recent, a slider-length method considering accident types rather than road mileages is widely used. In this study, using K-mean method, a non-hierarchical grouping technique used in the Cluster Analysis ai a applicatory method for the slider length method, a method classifies accidents that occurred the most nearby mileages into one group is proposed. To verify the proposed method, a comparison between the f-mean method and the dividing method at regular intervals on the data of a total of 25.6km lengths along Kyung-bu freeway in Pusan direction was made so that the K-mean method was proved to an effective method considering the similarities and adjacencies of accidents.

  • PDF