Search | Korea Science

Refinement of Document Clustering by Using NMF

Shinnou, Hiroyuki;Sasaki, Minoru
- Proceedings of the Korean Society for Language and Information Conference
- /
- 2007.11a
- /
- pp.430-439
- /
- 2007
In this paper, we use non-negative matrix factorization (NMF) to refine the document clustering results. NMF is a dimensional reduction method and effective for document clustering, because a term-document matrix is high-dimensional and sparse. The initial matrix of the NMF algorithm is regarded as a clustering result, therefore we can use NMF as a refinement method. First we perform min-max cut (Mcut), which is a powerful spectral clustering method, and then refine the result via NMF. Finally we should obtain an accurate clustering result. However, NMF often fails to improve the given clustering result. To overcome this problem, we use the Mcut object function to stop the iteration of NMF.
PDF

Document Clustering Method using Coherence of Cluster and Non-negative Matrix Factorization (비음수 행렬 분해와 군집의 응집도를 이용한 문서군집)

Kim, Chul-Won;Park, Sun
- Journal of the Korea Institute of Information and Communication Engineering
- /
- v.13 no.12
- /
- pp.2603-2608
- /
- 2009
Document clustering is an important method for document analysis and is used in many different information retrieval applications. This paper proposes a new document clustering model using the clustering method based NMF(non-negative matrix factorization) and refinement of documents in cluster by using coherence of cluster. The proposed method can improve the quality of document clustering because the re-assigned documents in cluster by using coherence of cluster based similarity between documents, the semantic feature matrix and the semantic variable matrix, which is used in document clustering, can represent an inherent structure of document set more well. The experimental results demonstrate appling the proposed method to document clustering methods achieves better performance than documents clustering methods.
https://doi.org/10.6109/JKIICE.2009.13.12.2603 인용 PDF KSCI

A Layer-based Dynamic Unequal Clustering Method in Large Scale Wireless Sensor Networks (대규모 무선 센서 네트워크에서 계층 기반의 동적 불균형 클러스터링 기법)

Kim, Jin-Su
- Journal of the Korea Academia-Industrial cooperation Society
- /
- v.13 no.12
- /
- pp.6081-6088
- /
- 2012
An unequal clustering method in wireless sensor networks is the technique that forms the cluster of different size. This method decreases whole energy consumption by solving the hot spot problem. In this paper, I propose a layer-based dynamic unequal clustering using the unequal clustering model. This method decreases whole energy consumption and maintain that equally using optimal cluster's number and cluster head position. I also show that proposed method is better than previous clustering method at the point of network lifetime.
https://doi.org/10.5762/KAIS.2012.13.12.6081 인용 PDF KSCI

A Study on K -Means Clustering

Bae, Wha-Soo;Roh, Se-Won
- Communications for Statistical Applications and Methods
- /
- v.12 no.2
- /
- pp.497-508
- /
- 2005
This paper aims at studying on K-means Clustering focusing on initialization which affect the clustering results in K-means cluster analysis. The four different methods(the MA method, the KA method, the Max-Min method and the Space Partition method) were compared and the clustering result shows that there were some differences among these methods, especially that the MA method sometimes leads to incorrect clustering due to the inappropriate initialization depending on the types of data and the Max-Min method is shown to be more effective than other methods especially when the data size is large.
https://doi.org/10.5351/CKSS.2005.12.2.497 인용 PDF KSCI

A Task Scheduling Method after Clustering for Data Intensive Jobs in Heterogeneous Distributed Systems

Hajikano, Kazuo;Kanemitsu, Hidehiro;Kim, Moo Wan;Kim, Hee-Dong
- Journal of Computing Science and Engineering
- /
- v.10 no.1
- /
- pp.9-20
- /
- 2016
Several task clustering heuristics are proposed for allocating tasks in heterogeneous systems to achieve a good response time in data intensive jobs. However, one of the challenging problems is the process in task scheduling after task allocation by task clustering. We propose a task scheduling method after task clustering, leveraging worst schedule length (WSL) as an upper bound of the schedule length. In our proposed method, a task in a WSL sequence is scheduled preferentially to make the WSL smaller. Experimental results by simulation show that the response time is improved in several task clustering heuristics. In particular, our proposed scheduling method with the task clustering outperforms conventional list-based task scheduling methods.
https://doi.org/10.5626/JCSE.2016.10.1.9 인용 PDF KSCI

Clustering non-stationary advanced metering infrastructure data

Kang, Donghyun;Lim, Yaeji
- Communications for Statistical Applications and Methods
- /
- v.29 no.2
- /
- pp.225-238
- /
- 2022
In this paper, we propose a clustering method for advanced metering infrastructure (AMI) data in Korea. As AMI data presents non-stationarity, we consider time-dependent frequency domain principal components analysis, which is a proper method for locally stationary time series data. We develop a new clustering method based on time-varying eigenvectors, and our method provides a meaningful result that is different from the clustering results obtained by employing conventional methods, such as K-means and K-centres functional clustering. Simulation study demonstrates the superiority of the proposed approach. We further apply the clustering results to the evaluation of the electricity price system in South Korea, and validate the reform of the progressive electricity tariff system.
https://doi.org/10.29220/CSAM.2022.29.2.225 인용 PDF KSCI

A Bayesian Model-based Clustering with Dissimilarities

Oh, Man-Suk;Raftery, Adrian
- Proceedings of the Korean Statistical Society Conference
- /
- 2003.10a
- /
- pp.9-14
- /
- 2003
A Bayesian model-based clustering method is proposed for clustering objects on the basis of dissimilarites. This combines two basic ideas. The first is that tile objects have latent positions in a Euclidean space, and that the observed dissimilarities are measurements of the Euclidean distances with error. The second idea is that the latent positions are generated from a mixture of multivariate normal distributions, each one corresponding to a cluster. We estimate the resulting model in a Bayesian way using Markov chain Monte Carlo. The method carries out multidimensional scaling and model-based clustering simultaneously, and yields good object configurations and good clustering results with reasonable measures of clustering uncertainties. In the examples we studied, the clustering results based on low-dimensional configurations were almost as good as those based on high-dimensional ones. Thus tile method can be used as a tool for dimension reduction when clustering high-dimensional objects, which may be useful especially for visual inspection of clusters. We also propose a Bayesian criterion for choosing the dimension of the object configuration and the number of clusters simultaneously. This is easy to compute and works reasonably well in simulations and real examples.
PDF

Finding the Number of Clusters and Various Experiments Based on ASA Clustering Method (ASA 군집화를 이용한 군집수 결정 및 다양한 실험)

Yoon Bok-Sik
- Journal of the Korean Operations Research and Management Science Society
- /
- v.31 no.2
- /
- pp.87-98
- /
- 2006
In many cases of cluster analysis we are forced to perform clustering without any prior knowledge on the number of clusters. But in some clustering methods such as k-means algorithm it is required to provide the number of clusters beforehand. In this study, we focus on the problem to determine the number of clusters in the given data. We follow the 2 stage approach of ASA clustering algorithm and mainly try to improve the performance of the first stage of the algorithm. We verify the usefulness of the method by applying it for various kinds of simulated data. Also, we apply the method for clustering two kinds of real life qualitative data.
PDF KSCI

Clustering of Decision Making Units using DEA (DEA를 이용한 의사결정단위의 클러스터링)

Kim, Kyeongtaek
- Journal of Korean Society of Industrial and Systems Engineering
- /
- v.37 no.4
- /
- pp.239-244
- /
- 2014
The conventional clustering approaches are mostly based on minimizing total dissimilarity of input and output. However, the clustering approach may not be helpful in some cases of clustering decision making units (DMUs) with production feature converting multiple inputs into multiple outputs because it does not care converting functions. Data envelopment analysis (DEA) has been widely applied for efficiency estimation of such DMUs since it has non-parametric characteristics. We propose a new clustering method to identify groups of DMUs that are similar in terms of their input-output profiles. A real world example is given to explain the use and effectiveness of the proposed method. And we calculate similarity value between its result and the result of a conventional clustering method applied to the example. After the efficiency value was added to input of K-means algorithm, we calculate new similarity value and compare it with the previous one.
https://doi.org/10.11627/jkise.2014.37.4.239 인용 PDF KSCI

Enhancing Text Document Clustering Using Non-negative Matrix Factorization and WordNet

Kim, Chul-Won;Park, Sun
- Journal of information and communication convergence engineering
- /
- v.11 no.4
- /
- pp.241-246
- /
- 2013
A classic document clustering technique may incorrectly classify documents into different clusters when documents that should belong to the same cluster do not have any shared terms. Recently, to overcome this problem, internal and external knowledge-based approaches have been used for text document clustering. However, the clustering results of these approaches are influenced by the inherent structure and the topical composition of the documents. Further, the organization of knowledge into an ontology is expensive. In this paper, we propose a new enhanced text document clustering method using non-negative matrix factorization (NMF) and WordNet. The semantic terms extracted as cluster labels by NMF can represent the inherent structure of a document cluster well. The proposed method can also improve the quality of document clustering that uses cluster labels and term weights based on term mutual information of WordNet. The experimental results demonstrate that the proposed method achieves better performance than the other text clustering methods.
https://doi.org/10.6109/jicce.2013.11.4.241 인용 PDF KSCI

Search Result 2,553, Processing Time 0.029 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)