• Title/Summary/Keyword: Gap clustering

Search Result 48, Processing Time 0.024 seconds

Improvement of Self Organizing Maps using Gap Statistic and Probability Distribution

  • Jun, Sung-Hae
    • International Journal of Fuzzy Logic and Intelligent Systems
    • /
    • v.8 no.2
    • /
    • pp.116-120
    • /
    • 2008
  • Clustering is a method for unsupervised learning. General clustering tools have been depended on statistical methods and machine learning algorithms. One of the popular clustering algorithms based on machine learning is the self organizing map(SOM). SOM is a neural networks model for clustering. SOM and extended SOM have been used in diverse classification and clustering fields such as data mining. But, SOM has had a problem determining optimal number of clusters. In this paper, we propose an improvement of SOM using gap statistic and probability distribution. The gap statistic was introduced to estimate the number of clusters in a dataset. We use gap statistic for settling the problem of SOM. Also, in our research, weights of feature nodes are updated by probability distribution. After complete updating according to prior and posterior distributions, the weights of SOM have probability distributions for optima clustering. To verify improved performance of our work, we make experiments compared with other learning algorithms using simulation data sets.

Word Segmentation in Handwritten Korean Text Lines based on GAP Clustering (GAP 군집화에 기반한 필기 한글 단어 분리)

  • Jeong, Seon-Hwa;Kim, Soo-Hyung
    • Journal of KIISE:Software and Applications
    • /
    • v.27 no.6
    • /
    • pp.660-667
    • /
    • 2000
  • In this paper, a word segmentation method for handwritten Korean text line images is proposed. The method uses gap information to segment words in line images, where the gap is defined as a white run obtained after vertical projection of line images. Each gap is assigned to one of inter-word gap and inter-character gap based on gap distance. We take up three distance measures which have been proposed for the word segmentation of handwritten English text line images. Then we test three clustering techniques to detect the best combination of gap metrics and classification techniques for Korean text line images. The experiment has been done with 305 text line images extracted manually from live mail pieces. The experimental result demonstrates the superiority of BB(Bounding Box) distance measure and sequential clustering approach, in which the cumulative word segmentation accuracy up to the third hypothesis is 88.52%. Given a line image, the processing time is about 0.05 second.

  • PDF

A File Clustering Algorithm for Wear-leveling (마모도 평준화를 위한 File Clustering 알고리즘)

  • Lee, Taehwa;Cha, Jaehyuk
    • Journal of Digital Contents Society
    • /
    • v.14 no.1
    • /
    • pp.51-57
    • /
    • 2013
  • Storage device based on Flash Memory have many attractive features such as high performance, low power consumption, shock resistance, and low weight, so they replace HDDs to a certain extent. An Storage device based on Flash Memory has FTL(Flash Translation Layer) which emulate block storage devices like HDDs. A garbage collection, one of major functions of FTL, effects highly on the performance and the lifetime of devices. However, there is no de facto standard for new garbage collection algorithms. To solve this problem, we propose File Clustering Algorithm. File Clustering Algorithm respect to update page from same file at the same time. So, these are clustered to same block. For this mechanism, We propose Page Allocation Policy in FTL and use MIN-MAX GAP to guarantee wear leveling. To verify the algorithm in this paper, we use TPC Benchmark. So, The performance evaluation reveals that the proposed algorithm has comparable result with the existing algorithms(No wear leveling, Hot/Cold) and shows approximately 690% improvement in terms of the wear leveling.

Fuzzy Controller Modeling for Electromagnetic Levitation Systems based on Clustering Algorithm (클러스터링에 기초한 자기부상시스템의 퍼지제어기 모델링)

  • Kim, Min-Soo;Byun, Yeun-Sub;Lee, Kwan-Sup
    • Proceedings of the KSR Conference
    • /
    • 2006.11a
    • /
    • pp.145-159
    • /
    • 2006
  • This paper describes the development of a clustering based fuzzy controller of an electromagnetic suspension vehicle using gain scheduling method and Kalman filter for a simplified single magnet system. Electromagnetic suspension vehicle systems are highly nonlinear and essentially unstable systems For achieving the levitation control of the DC electromagnetic suspension system, we considered a fuzzy system modeling method based on clustering algorithm which a set of input/output data is collected from the well defined Linear Quadratic Gaussian(LQG) controller. Simulation results show that the proposed clustering based fuzzy controller methodology robustly yields uniform performance with adequate gap response over the mass variation range.

  • PDF

New Generation Gap Models for Evolutionary Algorithm in Real Parameter Optimization (실수최적화 진화 알고리즘을 위한 새로운 세대차 모델)

  • Choi, Jun-Seok;Seo, Ki-Sung
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.19 no.1
    • /
    • pp.62-68
    • /
    • 2009
  • Two new generation gap models with modified parent-centric recombination(PCX) operator are proposed. First, the self-adaptation generation gap(SGG) model is a control method that keeps a replaced probability of parents by offspring to a certain level which obtains better performance. Second, virtual cluster generation gap(VCGG) is provided to extend distances among parents using clustering, which causes it to diversify individuals. In this model, distances among parents can be controlled by size of clusters. To demonstrate the effectiveness of our two proposed approaches, experiments for three standard test problems are executed and compared to most competing current approaches, CMA-ES and Generalized Generation Gap(G3) with PCX. It is shown two proposed methods are superior to consistently other approaches in the study.

Clustering of Web Objects with Similar Popularity Trends (유사한 인기도 추세를 갖는 웹 객체들의 클러스터링)

  • Loh, Woong-Kee
    • The KIPS Transactions:PartD
    • /
    • v.15D no.4
    • /
    • pp.485-494
    • /
    • 2008
  • Huge amounts of various web items such as keywords, images, and web pages are being made widely available on the Web. The popularities of such web items continuously change over time, and mining temporal patterns in popularities of web items is an important problem that is useful for several web applications. For example, the temporal patterns in popularities of search keywords help web search enterprises predict future popular keywords, enabling them to make price decisions when marketing search keywords to advertisers. However, presence of millions of web items makes it difficult to scale up previous techniques for this problem. This paper proposes an efficient method for mining temporal patterns in popularities of web items. We treat the popularities of web items as time-series, and propose gapmeasure to quantify the similarity between the popularities of two web items. To reduce the computation overhead for this measure, an efficient method using the Fast Fourier Transform (FFT) is presented. We assume that the popularities of web items are not necessarily following any probabilistic distribution or periodic. For finding clusters of web items with similar popularity trends, we propose to use a density-based clustering algorithm based on the gap measure. Our experiments using the popularity trends of search keywords obtained from the Google Trends web site illustrate the scalability and usefulness of the proposed approach in real-world applications.

Document Clustering using Term reweighting based on NMF (NMF 기반의 용어 가중치 재산정을 이용한 문서군집)

  • Lee, Ju-Hong;Park, Sun
    • Journal of the Korea Society of Computer and Information
    • /
    • v.13 no.4
    • /
    • pp.11-18
    • /
    • 2008
  • Document clustering is an important method for document analysis and is used in many different information retrieval applications. This paper proposes a new document clustering model using the re-weighted term based NMF(non-negative matrix factorization) to cluster documents relevant to a user's requirement. The proposed model uses the re-weighted term by using user feedback to reduce the gap between the user's requirement for document classification and the document clusters by means of machine. The Proposed method can improve the quality of document clustering because the re-weighted terms. the semantic feature matrix and the semantic variable matrix, which is used in document clustering, can represent an inherent structure of document set more well. The experimental results demonstrate appling the proposed method to document clustering methods achieves better performance than documents clustering methods.

  • PDF

Decomposition of a Text Block into Words Using Projection Profiles, Gaps and Special Symbols (투영 프로파일, GaP 및 특수 기호를 이용한 텍스트 영역의 어절 단위 분할)

  • Jeong Chang Bu;Kim Soo Hyung
    • Journal of KIISE:Software and Applications
    • /
    • v.31 no.9
    • /
    • pp.1121-1130
    • /
    • 2004
  • This paper proposes a method for line and word segmentation for machine-printed text blocks. To separate a text region into the unit of lines, it analyses the horizontal projection profile and performs a recursive projection profile cut method. In the word segmentation, between-word gaps are identified by a hierarchical clustering method after finding gaps in the text line by using a connected component analysis. In addition, a special symbol detection technique is applied to find two types of special symbols tying between words using their morphologic features. An experiment with 84 text regions from English and Korean documents shows that the proposed method achieves 99.92% accuracy of word segmentation, while a commercial OCR software named Armi 6.0 Pro$^{TM}$ has 97.58% accuracy.y.

Neural-based Blind Modeling of Mini-mill ASC Crown

  • Lee, Gang-Hwa;Lee, Dong-Il;Lee, Seung-Joon;Lee, Suk-Gyu;Kim, Shin-Il;Park, Hae-Doo;Park, Seung-Gap
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.12 no.6
    • /
    • pp.577-582
    • /
    • 2002
  • Neural network can be trained to approximate an arbitrary nonlinear function of multivariate data like the mini-mill crown values in Automatic Shape Control. The trained weights of neural network can evaluate or generalize the process data outside the training vectors. Sometimes, the blind modeling of the process data is necessary to compare with the scattered analytical model of mini-mill process in isolated electro-mechanical forms. To come up with a viable model, we propose the blind neural-based range-division domain-clustering piecewise-linear modeling scheme. The basic ideas are: 1) dividing the range of target data, 2) clustering the corresponding input space vectors, 3)training the neural network with clustered prototypes to smooth out the convergence and 4) solving the resulting matrix equations with a pseudo-inverse to alleviate the ill-conditioning problem. The simulation results support the effectiveness of the proposed scheme and it opens a new way to the data analysis technique. By the comparison with the statistical regression, it is evident that the proposed scheme obtains better modeling error uniformity and reduces the magnitudes of errors considerably. Approximatly 10-fold better performance results.

A Theoretical Study of Designing Thesaurus Browser by Clustering Algorithm (클러스터링을 이용한 시소러스 브라우저의 설계에 대한 이론적 연구)

  • Seo, Hwi
    • Journal of Korean Library and Information Science Society
    • /
    • v.30 no.3
    • /
    • pp.427-456
    • /
    • 1999
  • This paper deals with the problems of information retrieval through full-test database which arise from both the deficiency of searching strategies or methods by information searcher and the difficulties of query representation, generation, extension, etc. In oder to solve these problems, we should use automatic retrieval instead of manual retrieval in the past. One of the ways to make the gap narrow between the terms by the writers and query by the searchers is that the query should be searched with the terms which the writers use. Thus, the preconditions which should be taken one accorded way to solve the problems are that all areas of information retrieval such as should taken one accorded way to solve the problems are that all areas of information retrieval such as contents analysis, information structure, query formation, query evaluation, etc. should be solved as a coherence way. We need to deal all the ares of automatic information retrieval for the efficiency of retrieval thought this paper is trying to solve the design of thesaurus browser. Thus, this paper shows the theoretical analyses about the form of information retrieval, automatic indexing, clustering technique, establishing and expressing thesaurus, and information retrieval technique. As the result of analyzing them, this paper shows us theoretical model, that is to say, the thesaurus browser by clustering algorithm. The result in the paper will be a theoretical basis on new retrieval algorithm.

  • PDF