• Title/Summary/Keyword: Clustering Problem

Search Result 709, Processing Time 0.031 seconds

Document Clustering Using Semantic Features and Fuzzy Relations

  • Kim, Chul-Won;Park, Sun
    • Journal of information and communication convergence engineering
    • /
    • v.11 no.3
    • /
    • pp.179-184
    • /
    • 2013
  • Traditional clustering methods are usually based on the bag-of-words (BOW) model. A disadvantage of the BOW model is that it ignores the semantic relationship among terms in the data set. To resolve this problem, ontology or matrix factorization approaches are usually used. However, a major problem of the ontology approach is that it is usually difficult to find a comprehensive ontology that can cover all the concepts mentioned in a collection. This paper proposes a new document clustering method using semantic features and fuzzy relations for solving the problems of ontology and matrix factorization approaches. The proposed method can improve the quality of document clustering because the clustered documents use fuzzy relation values between semantic features and terms to distinguish clearly among dissimilar documents in clusters. The selected cluster label terms can represent the inherent structure of a document set better by using semantic features based on non-negative matrix factorization, which is used in document clustering. The experimental results demonstrate that the proposed method achieves better performance than other document clustering methods.

Clustering Characteristics and Class Hierarchy Generation in Object-Oriented Development (객체지향개발에서의 속성 클러스터링과 클래스 계층구조생성)

  • Lee Gun Ho
    • The KIPS Transactions:PartD
    • /
    • v.11D no.7 s.96
    • /
    • pp.1443-1450
    • /
    • 2004
  • The clustering characteristics for a number of classes, and defining the inheritance relations between the classes is a difficult and complex problem in an early stage of object oriented software development. We discuss a traditional iterative approach for the reuse of the existing classes in a library and an integrated approach to creating a number of new classes presented in this study. This paper formulates a character-istic clustering problem for zero-one integer programming and presents a network solution method with illustrative examples and the basic rules to define the inheritance relations between the classes. The network solution method for a characteristic clustering problem is based on a distance parameter between every pair of objects with characteristics. We apply the approach to a real problem taken from industry.

Fuzzy Clustering with Genre Preference for Collaborative Filtering

  • Lee, Soojung
    • Journal of the Korea Society of Computer and Information
    • /
    • v.25 no.5
    • /
    • pp.99-106
    • /
    • 2020
  • The scalability problem inherent in collaborative filtering-based recommender systems has been an issue in related studies during past decades. Clustering is a well-known technique for handling this problem, but has not been actively studied due to its low performance. This paper adopts a clustering method to overcome the scalability problem, inherent drawback of collaborative filtering systems. Furthermore, in order to handle performance degradation caused by applying clustering into collaborative filtering, we take two strategies into account. First, we use fuzzy clustering and secondly, we propose and apply a similarity estimation method based on user preference for movie genres. The proposed method of this study is evaluated through experiments and compared with several previous relevant methods in terms of major performance metrics. Experimental results show that the proposed demonstrated superior performance in prediction and rank accuracies and comparable performance to the best method in our experiments in recommendation accuracy.

Clustering Algorithm for Sequences of Categorical Values (범주형 값들이 순서를 가지고 있는 데이터들의 클러스터링 기법)

  • Oh Seung Joon;Kim Jae Yearn
    • Proceedings of the Society of Korea Industrial and System Engineering Conference
    • /
    • 2002.05a
    • /
    • pp.125-132
    • /
    • 2002
  • We study clustering algorithm for sequences of categorical values. Clustering is a data mining problem that has received significant attention by the database community. Traditional clustering algorlthms deal with numerical or categorical data points. However, there exist many important databases that store categorical data sequences. In this paper we introduce new similarity measure and develope a hierarchical clustering algorithm. An experimental section shows performance of the proposed approach.

  • PDF

Sample Based Algorithm for k-Spatial Medians Clustering

  • Jin, Seo-Hoon;Jung, Byoung-Cheol
    • The Korean Journal of Applied Statistics
    • /
    • v.23 no.2
    • /
    • pp.367-374
    • /
    • 2010
  • As an alternative to the k-means clustering the k-spatial medians clustering has many good points because of advantages of spatial median. However, it has not been used a lot since it needs heavy computation. If the number of objects and the number of variables are large the computation time problem is getting serious. In this study we propose fast algorithm for the k-spatial medians clustering. Practical applicability of the algorithm is shown with some numerical studies.

Refinement of Document Clustering by Using NMF

  • Shinnou, Hiroyuki;Sasaki, Minoru
    • Proceedings of the Korean Society for Language and Information Conference
    • /
    • 2007.11a
    • /
    • pp.430-439
    • /
    • 2007
  • In this paper, we use non-negative matrix factorization (NMF) to refine the document clustering results. NMF is a dimensional reduction method and effective for document clustering, because a term-document matrix is high-dimensional and sparse. The initial matrix of the NMF algorithm is regarded as a clustering result, therefore we can use NMF as a refinement method. First we perform min-max cut (Mcut), which is a powerful spectral clustering method, and then refine the result via NMF. Finally we should obtain an accurate clustering result. However, NMF often fails to improve the given clustering result. To overcome this problem, we use the Mcut object function to stop the iteration of NMF.

  • PDF

Heuristic algorithm to raise efficiency in clustering (군집의 효율향상을 위한 휴리스틱 알고리즘)

  • Lee, Seog-Hwan;Park, Seung-Hun
    • Journal of the Korea Safety Management & Science
    • /
    • v.11 no.3
    • /
    • pp.157-166
    • /
    • 2009
  • In this study, we developed a heuristic algorithm to get better efficiency of clustering than conventional algorithms. Conventional clustering algorithm had lower efficiency of clustering as there were no solid method for selecting initial center of cluster and as they had difficulty in search solution for clustering. EMC(Expanded Moving Center) heuristic algorithm was suggested to clear the problem of low efficiency in clustering. We developed algorithm to select initial center of cluster and search solution systematically in clustering. Experiments of clustering are performed to evaluate performance of EMC heuristic algorithm. Squared-error of EMC heuristic algorithm showed better performance for real case study and improved greatly with increase of cluster number than the other ones.

New Sequential Clustering Combination for Rule Generation System (규칙 생성 시스템을 위한 새로운 연속 클러스터링 조합)

  • Kim, Sung Suk;Choi, Ho Jin
    • Journal of Internet Computing and Services
    • /
    • v.13 no.5
    • /
    • pp.1-8
    • /
    • 2012
  • In this paper, we propose a new clustering combination based on numerical data driven for rule generation mechanism. In large and complicated space, a clustering method can obtain limited performance results. To overcome the single clustering method problem, hybrid combined methods can solve problem to divided simple cluster estimation. Fundamental structure of the proposed method is combined by mountain clustering and modified Chen clustering to extract detail cluster information in complicated data distribution of non-parametric space. It has automatic rule generation ability with advanced density based operation when intelligent systems including neural networks and fuzzy inference systems can be generated by clustering results. Also, results of the mechanism will be served to information of decision support system to infer the useful knowledge. It can extend to healthcare and medical decision support system to help experts or specialists. We show and explain the usefulness of the proposed method using simulation and results.

Clustering Algorithm for Time Series with Similar Shapes

  • Ahn, Jungyu;Lee, Ju-Hong
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.12 no.7
    • /
    • pp.3112-3127
    • /
    • 2018
  • Since time series clustering is performed without prior information, it is used for exploratory data analysis. In particular, clusters of time series with similar shapes can be used in various fields, such as business, medicine, finance, and communications. However, existing time series clustering algorithms have a problem in that time series with different shapes are included in the clusters. The reason for such a problem is that the existing algorithms do not consider the limitations on the size of the generated clusters, and use a dimension reduction method in which the information loss is large. In this paper, we propose a method to alleviate the disadvantages of existing methods and to find a better quality of cluster containing similarly shaped time series. In the data preprocessing step, we normalize the time series using z-transformation. Then, we use piecewise aggregate approximation (PAA) to reduce the dimension of the time series. In the clustering step, we use density-based spatial clustering of applications with noise (DBSCAN) to create a precluster. We then use a modified K-means algorithm to refine the preclusters containing differently shaped time series into subclusters containing only similarly shaped time series. In our experiments, our method showed better results than the existing method.

Clustering Algorithm Considering Sensor Node Distribution in Wireless Sensor Networks

  • Yu, Boseon;Choi, Wonik;Lee, Taikjin;Kim, Hyunduk
    • Journal of Information Processing Systems
    • /
    • v.14 no.4
    • /
    • pp.926-940
    • /
    • 2018
  • In clustering-based approaches, cluster heads closer to the sink are usually burdened with much more relay traffic and thus, tend to die early. To address this problem, distance-aware clustering approaches, such as energy-efficient unequal clustering (EEUC), that adjust the cluster size according to the distance between the sink and each cluster head have been proposed. However, the network lifetime of such approaches is highly dependent on the distribution of the sensor nodes, because, in randomly distributed sensor networks, the approaches do not guarantee that the cluster energy consumption will be proportional to the cluster size. To address this problem, we propose a novel approach called CACD (Clustering Algorithm Considering node Distribution), which is not only distance-aware but also node density-aware approach. In CACD, clusters are allowed to have limited member nodes, which are determined by the distance between the sink and the cluster head. Simulation results show that CACD is 20%-50% more energy-efficient than previous work under various operational conditions considering the network lifetime.