• Title/Summary/Keyword: unsupervised clustering

Search Result 224, Processing Time 0.022 seconds

Arabic Text Clustering Methods and Suggested Solutions for Theme-Based Quran Clustering: Analysis of Literature

  • Bsoul, Qusay;Abdul Salam, Rosalina;Atwan, Jaffar;Jawarneh, Malik
    • Journal of Information Science Theory and Practice
    • /
    • v.9 no.4
    • /
    • pp.15-34
    • /
    • 2021
  • Text clustering is one of the most commonly used methods for detecting themes or types of documents. Text clustering is used in many fields, but its effectiveness is still not sufficient to be used for the understanding of Arabic text, especially with respect to terms extraction, unsupervised feature selection, and clustering algorithms. In most cases, terms extraction focuses on nouns. Clustering simplifies the understanding of an Arabic text like the text of the Quran; it is important not only for Muslims but for all people who want to know more about Islam. This paper discusses the complexity and limitations of Arabic text clustering in the Quran based on their themes. Unsupervised feature selection does not consider the relationships between the selected features. One weakness of clustering algorithms is that the selection of the optimal initial centroid still depends on chances and manual settings. Consequently, this paper reviews literature about the three major stages of Arabic clustering: terms extraction, unsupervised feature selection, and clustering. Six experiments were conducted to demonstrate previously un-discussed problems related to the metrics used for feature selection and clustering. Suggestions to improve clustering of the Quran based on themes are presented and discussed.

Implementation of an Artificial Odour Recognition System with Unsupervised Clustering Methods (Unsupervised clustering 방법을 갖는 인공 냄새인식 시스템의 구현)

  • Choi, Chan-Seok;Kim, Jeong-Do;Byun, Hyung-Gi
    • Journal of Sensor Science and Technology
    • /
    • v.10 no.6
    • /
    • pp.310-316
    • /
    • 2001
  • We have been designed and constructed an artificial odour recognition system(electronic nose system) using metal oxide type sensor array for recognizing and analyzing various odours. We proposed an unsupervised clustering method based on Euclidean distances in order for human observer to examine easily multi-dimensional data, which has been measured from an array of sensors. This is a combination of Principal Components Analysis(PCA) used as a starting point for Sammom Mapping Method(SMM). No prior assumptions are made of the classes in which odour belong, and the error due to dimensional reduction at the PCA can be minimized without the disadvantages of rotation of clusters when the order of data sets in a data base was changed in the SMM. An artificial odour recognition system with the proposed unsupervised clustering method was applied to assessment of odour differences of Volatile Organic Compounds(VOCs) and Korean whiskies respectively, and demonstrated the best performances throughout the experimental trails.

  • PDF

An Overview of Unsupervised and Semi-Supervised Fuzzy Kernel Clustering

  • Frigui, Hichem;Bchir, Ouiem;Baili, Naouel
    • International Journal of Fuzzy Logic and Intelligent Systems
    • /
    • v.13 no.4
    • /
    • pp.254-268
    • /
    • 2013
  • For real-world clustering tasks, the input data is typically not easily separable due to the highly complex data structure or when clusters vary in size, density and shape. Kernel-based clustering has proven to be an effective approach to partition such data. In this paper, we provide an overview of several fuzzy kernel clustering algorithms. We focus on methods that optimize an fuzzy C-mean-type objective function. We highlight the advantages and disadvantages of each method. In addition to the completely unsupervised algorithms, we also provide an overview of some semi-supervised fuzzy kernel clustering algorithms. These algorithms use partial supervision information to guide the optimization process and avoid local minima. We also provide an overview of the different approaches that have been used to extend kernel clustering to handle very large data sets.

Unsupervised Speaker Adaptation Based on Sufficient HMM Statistics (SUFFICIENT HMM 통계치에 기반한 UNSUPERVISED 화자 적응)

  • Ko Bong-Ok;Kim Chong-Kyo
    • Proceedings of the KSPS conference
    • /
    • 2003.05a
    • /
    • pp.127-130
    • /
    • 2003
  • This paper describes an efficient method for unsupervised speaker adaptation. This method is based on selecting a subset of speakers who are acoustically close to a test speaker, and calculating adapted model parameters according to the previously stored sufficient HMM statistics of the selected speakers' data. In this method, only a few unsupervised test speaker's data are required for the adaptation. Also, by using the sufficient HMM statistics of the selected speakers' data, a quick adaptation can be done. Compared with a pre-clustering method, the proposed method can obtain a more optimal speaker cluster because the clustering result is determined according to test speaker's data on-line. Experiment results show that the proposed method attains better improvement than MLLR from the speaker independent model. Moreover the proposed method utilizes only one unsupervised sentence utterance, while MLLR usually utilizes more than ten supervised sentence utterances.

  • PDF

Classification of Traffic Flows into QoS Classes by Unsupervised Learning and KNN Clustering

  • Zeng, Yi;Chen, Thomas M.
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.3 no.2
    • /
    • pp.134-146
    • /
    • 2009
  • Traffic classification seeks to assign packet flows to an appropriate quality of service(QoS) class based on flow statistics without the need to examine packet payloads. Classification proceeds in two steps. Classification rules are first built by analyzing traffic traces, and then the classification rules are evaluated using test data. In this paper, we use self-organizing map and K-means clustering as unsupervised machine learning methods to identify the inherent classes in traffic traces. Three clusters were discovered, corresponding to transactional, bulk data transfer, and interactive applications. The K-nearest neighbor classifier was found to be highly accurate for the traffic data and significantly better compared to a minimum mean distance classifier.

The Hyperspectral Image Classification with the Unsupervised SAM (무감독 SAM 기법을 이용한 하이퍼스펙트럴 영상 분류)

  • 김대성;김진곤;변영기;김용일
    • Proceedings of the Korean Society of Surveying, Geodesy, Photogrammetry, and Cartography Conference
    • /
    • 2004.04a
    • /
    • pp.159-164
    • /
    • 2004
  • SAM(Spectral Angle Mapper) is the method using the similarly of the angle between pairs of signatures instead of the spectral distance(MDC, MLC etc.) for classification or clustering. In this paper, we applied unsupervised techniques(Unsupervised SAM and ISODATA) to the Hyperspectral Image(Hyperion) which has innumerable, narrow and contiguous spectral bands and Multispectral Image(ETM$\^$+/) for the clustering of signatures. The overall measured accuracies of the USAM and ISODATA of multispectral image were 76.52%, 53.91% and the USAM and ISODATA of hyperspectral image were 63.04%, 53.91%. From the results of our test, we report that the Unsupervised SAM is better classfication technique than ISODATA. Also we believe that the "Spectral Angle" can potentially be one of the most accurate classifier not only multispectral images but hyperspectral images.

  • PDF

Reinforcement learning multi-agent using unsupervised learning in a distributed cloud environment

  • Gu, Seo-Yeon;Moon, Seok-Jae;Park, Byung-Joon
    • International Journal of Internet, Broadcasting and Communication
    • /
    • v.14 no.2
    • /
    • pp.192-198
    • /
    • 2022
  • Companies are building and utilizing their own data analysis systems according to business characteristics in the distributed cloud. However, as businesses and data types become more complex and diverse, the demand for more efficient analytics has increased. In response to these demands, in this paper, we propose an unsupervised learning-based data analysis agent to which reinforcement learning is applied for effective data analysis. The proposal agent consists of reinforcement learning processing manager and unsupervised learning manager modules. These two modules configure an agent with k-means clustering on multiple nodes and then perform distributed training on multiple data sets. This enables data analysis in a relatively short time compared to conventional systems that perform analysis of large-scale data in one batch.

Determining the Optimal Number of Signal Clusters Using Iterative HMM Classification

  • Ernest, Duker Junior;Kim, Yoon Joong
    • International journal of advanced smart convergence
    • /
    • v.7 no.2
    • /
    • pp.33-37
    • /
    • 2018
  • In this study, we propose an iterative clustering algorithm that automatically clusters a set of voice signal data without a label into an optimal number of clusters and generates hmm model for each cluster. In the clustering process, the likelihood calculations of the clusters are performed using iterative hmm learning and testing while varying the number of clusters for given data, and the maximum likelihood estimation method is used to determine the optimal number of clusters. We tested the effectiveness of this clustering algorithm on a small-vocabulary digit clustering task by mapping the unsupervised decoded output of the optimal cluster to the ground-truth transcription, we found out that they were highly correlated.

Multi-scale Cluster Hierarchy for Non-stationary Functional Signals of Mutual Fund Returns (Mutual Fund 수익률의 비정상 함수형 시그널을 위한 다해상도 클러스터 계층구조)

  • Kim, Dae-Lyong;Jung, Uk
    • Korean Management Science Review
    • /
    • v.24 no.2
    • /
    • pp.57-72
    • /
    • 2007
  • Many Applications of scientific research have coupled with functional data signal clustering techniques to discover novel characteristics that can be used for the diagnoses of several issues. In this article we present an interpretable multi-scale cluster hierarchy framework for clustering functional data using its multi-aspect frequency information. The suggested method focuses on how to effectively select transformed features/variables in unsupervised manner so that finally reduce the data dimension and achieve the multi-purposed clustering. Specially, we apply our suggested method to mutual fund returns and make superior-performing funds group based on different aspects such as global patterns, seasonal variations, levels of noise, and their combinations. To promise our method producing a quality cluster hierarchy, we give some empirical results under the simulation study and a set of real life data. This research will contribute to financial market analysis and flexibly fit to other research fields with clustering purposes.

An Unsupervised Clustering Technique of XML Documents based on Function Transform and FFT (함수 변환과 FFT에 기반한 조정자가 없는 XML 문서 클러스터링 기법)

  • Lee, Ho-Suk
    • The KIPS Transactions:PartD
    • /
    • v.14D no.2
    • /
    • pp.169-180
    • /
    • 2007
  • This paper discusses a new unsupervised XML document clustering technique based on the function transform and FFT(Fast Fourier Transform). An XML document is transformed into a discrete function based on the hierarchical nesting structure of the elements. The discrete function is, then, transformed into vectors using FFT. The vectors of two documents are compared using a weighted Euclidean distance metric. If the comparison is lower than the pre specified threshold, the two documents are considered similar in the structure and are grouped into the same cluster. XML clustering can be useful for the storage and searching of XML documents. The experiments were conducted with 800 synthetic documents and also with 520 real documents. The experiments showed that the function transform and FFT are effective for the incremental and unsupervised clustering of XML documents similar in structure.