• Title/Summary/Keyword: Clustering analysis

Search Result 2,090, Processing Time 0.033 seconds

Clustering Algorithm using a Center Of Gravity for Grid-based Sample

  • Park, Hee-Chang;Ryu, Jee-Hyun
    • 한국데이터정보과학회:학술대회논문집
    • /
    • 2003.05a
    • /
    • pp.77-88
    • /
    • 2003
  • Cluster analysis has been widely used in many applications, such that data analysis, pattern recognition, image processing, etc. But clustering requires many hours to get clusters that we want, because it is more primitive, explorative and we make many data an object of cluster analysis. In this paper we propose a new clustering method, 'Clustering algorithm using a center of gravity for grid-based sample'. It is more fast than any traditional clustering method and maintains accuracy. It reduces running time by using grid-based sample and keeps accuracy by using representative point, a center of gravity.

  • PDF

Comparison of Software Clustering using Split Based Tree Analysis (분기점 기반 트리 분석을 통한 소프트웨어 클러스터링 결과 비교)

  • Um, Jaechul;Lee, Chan-gun
    • Journal of Software Engineering Society
    • /
    • v.25 no.3
    • /
    • pp.59-62
    • /
    • 2012
  • We propose a novel metric for quantitatively comparing different clustered results generated from software clustering algorithms. A quantitative evaluation of software clustering helps understanding of architectural changes of software. The concept of split, which has been used for analysis of genetic characters in bio-informatics, is applied in the analysis of software architecture.

  • PDF

Clustering-driven Pair Trading Portfolio Investment in Korean Stock Market (한국 주식시장에서의 군집화 기반 페어트레이딩 포트폴리오 투자 연구)

  • Cho, Poongjin;Lee, Minhyuk;Song, Jae Wook
    • Journal of Korean Society of Industrial and Systems Engineering
    • /
    • v.45 no.3
    • /
    • pp.123-130
    • /
    • 2022
  • Pair trading is a statistical arbitrage investment strategy. Traditionally, cointegration has been utilized in the pair exploring step to discover a pair with a similar price movement. Recently, the clustering analysis has attracted many researchers' attention, replacing the cointegration method. This study tests a clustering-driven pair trading investment strategy in the Korean stock market. If a pair detected through clustering has a large spread during the spread exploring period, the pair is included in the portfolio for backtesting. The profitability of the clustering-driven pair trading strategies is investigated based on various profitability measures such as the distribution of returns, cumulative returns, profitability by period, and sensitivity analysis on different parameters. The backtesting results show that the pair trading investment strategy is valid in the Korean stock market. More interestingly, the clustering-driven portfolio investments show higher performance compared to benchmarks. Note that the hierarchical clustering shows the best portfolio performance.

Customer Load Pattern Analysis using Clustering Techniques (클러스터링 기법을 이용한 수용가별 전력 데이터 패턴 분석)

  • Ryu, Seunghyoung;Kim, Hongseok;Oh, Doeun;No, Jaekoo
    • KEPCO Journal on Electric Power and Energy
    • /
    • v.2 no.1
    • /
    • pp.61-69
    • /
    • 2016
  • Understanding load patterns and customer classification is a basic step in analyzing the behavior of electricity consumers. To achieve that, there have been many researches about clustering customers' daily load data. Nowadays, the deployment of advanced metering infrastructure (AMI) and big-data technologies make it easier to study customers' load data. In this paper, we study load clustering from the view point of yearly and daily load pattern. We compare four clustering methods; K-means clustering, hierarchical clustering (average & Ward's method) and DBSCAN (Density-Based Spatial Clustering of Applications with Noise). We also discuss the relationship between clustering results and Korean Standard Industrial Classification that is one of possible labels for customers' load data. We find that hierarchical clustering with Ward's method is suitable for clustering load data and KSIC can be well characterized by daily load pattern, but not quite well by yearly load pattern.

Arabic Text Clustering Methods and Suggested Solutions for Theme-Based Quran Clustering: Analysis of Literature

  • Bsoul, Qusay;Abdul Salam, Rosalina;Atwan, Jaffar;Jawarneh, Malik
    • Journal of Information Science Theory and Practice
    • /
    • v.9 no.4
    • /
    • pp.15-34
    • /
    • 2021
  • Text clustering is one of the most commonly used methods for detecting themes or types of documents. Text clustering is used in many fields, but its effectiveness is still not sufficient to be used for the understanding of Arabic text, especially with respect to terms extraction, unsupervised feature selection, and clustering algorithms. In most cases, terms extraction focuses on nouns. Clustering simplifies the understanding of an Arabic text like the text of the Quran; it is important not only for Muslims but for all people who want to know more about Islam. This paper discusses the complexity and limitations of Arabic text clustering in the Quran based on their themes. Unsupervised feature selection does not consider the relationships between the selected features. One weakness of clustering algorithms is that the selection of the optimal initial centroid still depends on chances and manual settings. Consequently, this paper reviews literature about the three major stages of Arabic clustering: terms extraction, unsupervised feature selection, and clustering. Six experiments were conducted to demonstrate previously un-discussed problems related to the metrics used for feature selection and clustering. Suggestions to improve clustering of the Quran based on themes are presented and discussed.

Usability Analysis of Structured Abstracts in Journal Articles for Document Clustering (문서 클러스터링을 위한 학술지 논문의 구조적 초록 활용성 연구)

  • Choi, Sang-Hee;Lee, Jae-Yun
    • Journal of the Korean Society for information Management
    • /
    • v.29 no.1
    • /
    • pp.331-349
    • /
    • 2012
  • Structured abstracts have been regarded as an essential information factor to represent topics of journal articles. This study aims to provide an unconventional view to utilize structured abstracts with the analysis on sub fields of a structured abstract in depth. In this study, a structured abstract was segmented into four fields, namely, purpose, design, findings, and values/implications. Each field was compared in the performance analysis of document clustering. In result, the purpose statement of an abstract affected on the performance of journal article clustering more than any other fields. Furthermore, certain types of keywords were identified to be excluded in the document clustering to improve clustering performance, especially by Within group average clustering method. These keywords had stronger relationship to a specific abstract field such as research design than the topic of an article.

Gene Expression Data Analysis Using Seed Clustering (시드 클러스터링 방법에 의한 유전자 발현 데이터 분석)

  • Shin Myoung
    • Journal of the Institute of Electronics Engineers of Korea CI
    • /
    • v.42 no.1
    • /
    • pp.1-7
    • /
    • 2005
  • Cluster analysis of microarray data has been often used to find biologically relevant Broups of genes based on their expression levels. Since many functionally related genes tend to be co-expressed, by identifying groups of genes with similar expression profiles, the functionalities of unknown genes can be inferred from those of known genes in the same group. In this Paper we address a novel clustering approach, called seed clustering, and investigate its applicability for microarray data analysis. In the seed clustering method, seed genes are first extracted by computational analysis of their expression profiles and then clusters are generated by taking the seed genes as prototype vectors for target clusters. Since it has strong mathematical foundations, the seed clustering method produces the stable and consistent results in a systematic way. Also, our empirical results indicate that the automatically extracted seed genes are well representative of potential clusters hidden in the data, and that its performance is favorable compared to current approaches.

THE FUZZY CLUSTERING ALGORITHM AND SELF-ORGANIZING NEURAL NETWORKS TO IDENTIFY POTENTIALLY FAILING BANKS

  • Lee, Gi-Dong
    • 한국디지털정책학회:학술대회논문집
    • /
    • 2005.06a
    • /
    • pp.485-493
    • /
    • 2005
  • Using 1991 FDIC financial statement data, we develop fuzzy clusters of the data set. We also identify the distinctive characteristics of the fuzzy clustering algorithm and compare the closest hard-partitioning result of the fuzzy clustering algorithm with the outcomes of two self-organizing neural networks. When nine clusters are used, our analysis shows that the fuzzy clustering method distinctly groups failed and extreme performance banks from control (healthy) banks. The experimental results also show that the fuzzy clustering method and the self-organizing neural networks are promising tools in identifying potentially failing banks.

  • PDF

Sample Based Algorithm for k-Spatial Medians Clustering

  • Jin, Seo-Hoon;Jung, Byoung-Cheol
    • The Korean Journal of Applied Statistics
    • /
    • v.23 no.2
    • /
    • pp.367-374
    • /
    • 2010
  • As an alternative to the k-means clustering the k-spatial medians clustering has many good points because of advantages of spatial median. However, it has not been used a lot since it needs heavy computation. If the number of objects and the number of variables are large the computation time problem is getting serious. In this study we propose fast algorithm for the k-spatial medians clustering. Practical applicability of the algorithm is shown with some numerical studies.

Two Phase Hierarchical Clustering Algorithm for Group Formation in Data Mining (데이터 마이닝에서 그룹 세분화를 위한 2단계 계층적 글러스터링 알고리듬)

  • 황인수
    • Korean Management Science Review
    • /
    • v.19 no.1
    • /
    • pp.189-196
    • /
    • 2002
  • Data clustering is often one of the first steps in data mining analysis. It Identifies groups of related objects that can be used as a starling point for exploring further relationships. This technique supports the development of population segmentation models, such as demographic-based customer segmentation. This paper Purpose to present the development of two phase hierarchical clustering algorithm for group formation. Applications of the algorithm for product-customer group formation in customer relationahip management are also discussed. As a result of computer simulations, suggested algorithm outperforms single link method and k-means clustering.