• Title/Summary/Keyword: k-평균군집방법

Search Result 192, Processing Time 0.025 seconds

A Development of Customer Segmentation by Using Data Mining Technique (데이터마이닝에 의한 고객세분화 개발)

  • Jin Seo-Hoon
    • The Korean Journal of Applied Statistics
    • /
    • v.18 no.3
    • /
    • pp.555-565
    • /
    • 2005
  • To Know customers is very important for the company to survive in its cut-throat competition among coimpetitors. Companies need to manage the relationship with each ana every customer, ant make each of customers as profitable as possible. CRM (Customer relationship management) has emerged as a key solution for managing the profitable relationship. In order to achieve successful CRM customer segmentation is a essential component. Clustering as a data mining technique is very useful to build data-driven segmentation. This paper is concerned with building proper customer segmentation with introducing a credit card company case. Customer segmentation was built based only on transaction data which cattle from customer's activities. Two-step clustering approach which consists of k-means clustering and agglomerative clustering was applied for building a customer segmentation.

XML Document Clustering Technique by K-means algorithm through PCA (주성분 분석의 K 평균 알고리즘을 통한 XML 문서 군집화 기법)

  • Kim, Woo-Saeng
    • The KIPS Transactions:PartD
    • /
    • v.18D no.5
    • /
    • pp.339-342
    • /
    • 2011
  • Recently, researches are studied in developing efficient techniques for accessing, querying, and storing XML documents which are frequently used in the Internet. In this paper, we propose a new method to cluster XML documents efficiently. We use a K-means algorithm with a Principal Component Analysis(PCA) to cluster XML documents after they are represented by vectors in the feature vector space by transferring them as names and levels of the elements of the corresponding trees. The experiment shows that our proposed method has a good result.

Comparison of clustering methods of microarray gene expression data (마이크로어레이 유전자 발현 자료에 대한 군집 방법 비교)

  • Lim, Jin-Soo;Lim, Dong-Hoon
    • Journal of the Korean Data and Information Science Society
    • /
    • v.23 no.1
    • /
    • pp.39-51
    • /
    • 2012
  • Cluster analysis has proven to be a useful tool for investigating the association structure among genes and samples in a microarray data set. We applied several cluster validation measures to evaluate the performance of clustering algorithms for analyzing microarray gene expression data, including hierarchical clustering, K-means, PAM, SOM and model-based clustering. The available validation measures fall into the three general categories of internal, stability and biological. The performance of clustering algorithms is evaluated using simulated and SRBCT microarray data. Our results from simulated data show that nearly every methods have good results with same result as the number of classes in the original data. For the SRBCT data the best choice for the number of clusters is less clear than the simulated data. It appeared that PAM, SOM, model-based method showed similar results to simulated data under Silhouette with of internal measure as well as PAM and model-based method under biological measure, while model-based clustering has the best value of stability measure.

Word Cluster-based Mobile Application Categorization (단어 군집 기반 모바일 애플리케이션 범주화)

  • Heo, Jeongman;Park, So-Young
    • Journal of the Korea Society of Computer and Information
    • /
    • v.19 no.3
    • /
    • pp.17-24
    • /
    • 2014
  • In this paper, we propose a mobile application categorization method using word cluster information. Because the mobile application description can be shortly written, the proposed method utilizes the word cluster seeds as well as the words in the mobile application description, as categorization features. For the fragmented categories of the mobile applications, the proposed method generates the word clusters by applying the frequency of word occurrence per category to K-means clustering algorithm. Since the mobile application description can include some paragraphs unrelated to the categorization, such as installation specifications, the proposed method uses some word clusters useful for the categorization. Experiments show that the proposed method improves the recall (5.65%) by using the word cluster information.

신용카드업에서 데이터마이닝의 활용 -고객행동기반의 고객세분화-

  • 진서훈;안상욱
    • Proceedings of the Korean Statistical Society Conference
    • /
    • 2004.11a
    • /
    • pp.171-174
    • /
    • 2004
  • 기업들이 심화된 경쟁체제 속에서 고객에 대한 보다 심층적인 이해를 필요로 하고 정보기술의 발달로 각 요소활동내용의 데이터화가 가능해짐에 따라 CRM으로 대변되는 고객 정보의 전략적 활용이 매우 중요하게 되었다. 이를 위해 기업은 고객에 대한 이해를 바탕으로 고객관리 및 마케팅을 수행하기 위한 필수적인 도구인 고객세분화를 수행하고 있다. 본 연구에서는 신용카드고객의 카드사용행태에 근거하여 서로 유사한 사용행태를 보이는 고객군으로 세분화하는 과정을 소개한다. 고객이 실제로 카드를 사용하면서 발생시킨 거래정보에만 의존하여 고객세분화를 수행하였으며 이는 마케팅의 관점에서 상당히 의미 있는 내용이라 볼 수 있다. 고객세분화를 위하여 데이터마이닝기법인 k-평균군집방법과 최장연결법에 의한 계보적 군집방법을 활용하였다

  • PDF

Clustering and classification to characterize daily electricity demand (시간단위 전력사용량 시계열 패턴의 군집 및 분류분석)

  • Park, Dain;Yoon, Sanghoo
    • Journal of the Korean Data and Information Science Society
    • /
    • v.28 no.2
    • /
    • pp.395-406
    • /
    • 2017
  • The purpose of this study is to identify the pattern of daily electricity demand through clustering and classification. The hourly data was collected by KPS (Korea Power Exchange) between 2008 and 2012. The time trend was eliminated for conducting the pattern of daily electricity demand because electricity demand data is times series data. We have considered k-means clustering, Gaussian mixture model clustering, and functional clustering in order to find the optimal clustering method. The classification analysis was conducted to understand the relationship between external factors, day of the week, holiday, and weather. Data was divided into training data and test data. Training data consisted of external factors and clustered number between 2008 and 2011. Test data was daily data of external factors in 2012. Decision tree, random forest, Support vector machine, and Naive Bayes were used. As a result, Gaussian model based clustering and random forest showed the best prediction performance when the number of cluster was 8.

Study on Scaling Exponent for Classification of Regions using Scaling Property (스케일 성질을 이용한 군집 지역에서의 스케일 인자에 대한 연구)

  • Jung, Younghun;Kim, Sunghun;Ahn, Hyunjun;Heo, Jun-Haeng
    • Proceedings of the Korea Water Resources Association Conference
    • /
    • 2015.05a
    • /
    • pp.504-504
    • /
    • 2015
  • 수공구조물을 설계하기 위해서는 설계수문량을 빈도해석을 통해 산정할 수 있다. 빈도해석 중 지점빈도해석을 보완한 지역빈도해석을 적용하기 위해서는 군집분석을 통한 지역구분이 무엇보다 중요하다. 또한 스케일 성질(scaling property)은 강우의 시 공간적 특성을 지속기간별 관측된 강우자료를 이용하여 재현기간에 대한 지속기간의 함수로 강우의 IDF곡선을 제시할 수 있는 방법이다. 따라서 스케일 성질을 통해 군집된 지역에서의 강우자료에 적용하여 스케일 인자(scaling exponent)를 추정한 후 수문학적 동질성을 통계적 특성으로 설명하고자 한다. 본 연구를 수행하기에 앞서 군집 분석은 4개의 군집방법(평균연결법, Ward방법, Two-Step방법, K-means방법)을 적용하였고, 한강유역에 위치한 104개의 강우지점은 4개의 지역으로 구분하는 것이 적절하다고 판단되어 비계층적 방법인 k-means방법을 이용하여 지역을 구분하였다. 본 연구에서는 군집된 결과를 바탕으로 4개의 지역으로 구분된 지역에 포함된 강우지점을 대상으로 스케일 인자를 추정하고 수문학적 동질성을 통계적 방법으로 제시하고자 한다.

  • PDF

Study on Fast HEVC Encoding with Hierarchical Motion Vector Clustering (움직임 벡터의 계층적 군집화를 통한 HEVC 고속 부호화 연구)

  • Lim, Jeongyun;Ahn, Yong-Jo;Sim, Donggyu
    • Journal of Broadcast Engineering
    • /
    • v.21 no.4
    • /
    • pp.578-591
    • /
    • 2016
  • In this paper, the fast encoding algorithm in High Efficiency Video Coding (HEVC) encoder was studied. For the encoding efficiency, the current HEVC reference software is divided the input image into Coding Tree Unit (CTU). then, it should be re-divided into CU up to maximum depth in form of quad-tree for RDO (Rate-Distortion Optimization) in encoding precess. But, it is one of the reason why complexity is high in the encoding precess. In this paper, to reduce the high complexity in the encoding process, it proposed the method by determining the maximum depth of the CU using a hierarchical clustering at the pre-processing. The hierarchical clustering results represented an average combination of motion vectors (MV) on neighboring blocks. Experimental results showed that the proposed method could achieve an average of 16% time saving with minimal BD-rate loss at 1080p video resolution. When combined the previous fast algorithm, the proposed method could achieve an average 45.13% time saving with 1.84% BD-rate loss.

News Clustering and Multi-Document Summarization for Real-time Issue Analysis (실시간 이슈 분석을 위한 뉴스 군집화 및 다중 문서 요약)

  • Yu, Hongyeon;Lee, Seungwoo;Ko, Youngjoong
    • Annual Conference on Human and Language Technology
    • /
    • 2018.10a
    • /
    • pp.132-137
    • /
    • 2018
  • 뉴스 기반의 실시간 이슈 분석을 위해서는 실시간으로 생성되는 다중 뉴스 기사 집합을 입력으로 받아 점증적으로 군집화 하고, 각 군집별 정보를 자동으로 요약하는 기술이 필요하다. 기존에는 정적인 데이터 기반의 군집화와 요약 각각에 대한 연구는 활발히 진행되고 있지만, 실시간으로 입력되는 대량의 데이터를 위한 점증적인 군집화와 요약에 대한 연구는 매우 부족하다. 따라서 본 논문에서는 실시간으로 입력되는 대량의 뉴스 기사 집합을 분석하기 위한 점증적이고 계층적인 뉴스 군집화 및 다중 문서 요약 방법을 제안한다. 평가를 위해서 2016년 10월, 11월 두 달간의 실제 데이터를 사용 하였으며, 전문 교육을 받은 연구원들이 Precision at k 기반의 정성평가를 진행하였다. 그 결과, 자동으로 생성된 12개의 군집에서 군집 성능은 평균 66% (상위계층 $l_1$: 82%, 하위계층 $l_2$: 43%), 요약 성능은 평균 92%를 얻었다.

  • PDF

A Comparison of cluster analysis based on profile of LPGA player profile in 2009 (2009년 여자프로골프선수 프로파일을 이용한 군집방법비교)

  • Min, Dae-Kee
    • Journal of the Korean Data and Information Science Society
    • /
    • v.21 no.3
    • /
    • pp.471-480
    • /
    • 2010
  • Cluster analysis is one of the useful methods to find out number of groups and member’s belongings. With the rapid development of computer application in statistics, variety of new methods in clustering analysis were studied such as EM algorism and Self organization maps. The goals of cluster analysis is finding the number of groupings that are meaningful to me. If data are analyzed perfectly with cluster analysis, we can get the same results from discernment analysis.