• 제목/요약/키워드: Unsupervised clustering

검색결과 224건 처리시간 0.027초

Arabic Text Clustering Methods and Suggested Solutions for Theme-Based Quran Clustering: Analysis of Literature

  • Bsoul, Qusay;Abdul Salam, Rosalina;Atwan, Jaffar;Jawarneh, Malik
    • Journal of Information Science Theory and Practice
    • /
    • 제9권4호
    • /
    • pp.15-34
    • /
    • 2021
  • Text clustering is one of the most commonly used methods for detecting themes or types of documents. Text clustering is used in many fields, but its effectiveness is still not sufficient to be used for the understanding of Arabic text, especially with respect to terms extraction, unsupervised feature selection, and clustering algorithms. In most cases, terms extraction focuses on nouns. Clustering simplifies the understanding of an Arabic text like the text of the Quran; it is important not only for Muslims but for all people who want to know more about Islam. This paper discusses the complexity and limitations of Arabic text clustering in the Quran based on their themes. Unsupervised feature selection does not consider the relationships between the selected features. One weakness of clustering algorithms is that the selection of the optimal initial centroid still depends on chances and manual settings. Consequently, this paper reviews literature about the three major stages of Arabic clustering: terms extraction, unsupervised feature selection, and clustering. Six experiments were conducted to demonstrate previously un-discussed problems related to the metrics used for feature selection and clustering. Suggestions to improve clustering of the Quran based on themes are presented and discussed.

Unsupervised clustering 방법을 갖는 인공 냄새인식 시스템의 구현 (Implementation of an Artificial Odour Recognition System with Unsupervised Clustering Methods)

  • 최찬석;김정도;변형기
    • 센서학회지
    • /
    • 제10권6호
    • /
    • pp.310-316
    • /
    • 2001
  • 다양한 냄새를 인식하고 분석하기 위하여 metal oxide 형 센서어레이를 이용한 인공 냄새인식 시스템(전자 코 시스템)을 설계 제작하였다. 센서어레이로부터 측정되는 다차원 데이터를 관측자로 하여금 쉽게 구별 할 수 있도록 Euclidean distance를 기본으로 하는 unsupervised clustering 방법을 제안한다. 제안된 방법은 주성분 분석법을 Sammon의 매핑법을 시작점으로 사용한 결합방법으로 특정냄새가 속해있는 cluster들에 대한 가정이 필요하지 않으며, 주성분 분석법에서 나타나는 차원축소로 인한 오차를 최소화하고 Sammon의 매핑법 사용으로 나타나는 데이터베이스의 입력순서에 따른 cluster들의 회전현상을 제거할 수 있다. 제안된 unsupervised clustering 방법으로 구현된 인공 냄새인식 시스템은 휘발성 유기화합물과 국산양주들의 냄새 차이를 각각 평가하는데 사용되어졌고 실험을 통하여 좋은 성능을 검증하였다.

  • PDF

An Overview of Unsupervised and Semi-Supervised Fuzzy Kernel Clustering

  • Frigui, Hichem;Bchir, Ouiem;Baili, Naouel
    • International Journal of Fuzzy Logic and Intelligent Systems
    • /
    • 제13권4호
    • /
    • pp.254-268
    • /
    • 2013
  • For real-world clustering tasks, the input data is typically not easily separable due to the highly complex data structure or when clusters vary in size, density and shape. Kernel-based clustering has proven to be an effective approach to partition such data. In this paper, we provide an overview of several fuzzy kernel clustering algorithms. We focus on methods that optimize an fuzzy C-mean-type objective function. We highlight the advantages and disadvantages of each method. In addition to the completely unsupervised algorithms, we also provide an overview of some semi-supervised fuzzy kernel clustering algorithms. These algorithms use partial supervision information to guide the optimization process and avoid local minima. We also provide an overview of the different approaches that have been used to extend kernel clustering to handle very large data sets.

SUFFICIENT HMM 통계치에 기반한 UNSUPERVISED 화자 적응 (Unsupervised Speaker Adaptation Based on Sufficient HMM Statistics)

  • 고봉옥;김종교
    • 대한음성학회:학술대회논문집
    • /
    • 대한음성학회 2003년도 5월 학술대회지
    • /
    • pp.127-130
    • /
    • 2003
  • This paper describes an efficient method for unsupervised speaker adaptation. This method is based on selecting a subset of speakers who are acoustically close to a test speaker, and calculating adapted model parameters according to the previously stored sufficient HMM statistics of the selected speakers' data. In this method, only a few unsupervised test speaker's data are required for the adaptation. Also, by using the sufficient HMM statistics of the selected speakers' data, a quick adaptation can be done. Compared with a pre-clustering method, the proposed method can obtain a more optimal speaker cluster because the clustering result is determined according to test speaker's data on-line. Experiment results show that the proposed method attains better improvement than MLLR from the speaker independent model. Moreover the proposed method utilizes only one unsupervised sentence utterance, while MLLR usually utilizes more than ten supervised sentence utterances.

  • PDF

Classification of Traffic Flows into QoS Classes by Unsupervised Learning and KNN Clustering

  • Zeng, Yi;Chen, Thomas M.
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • 제3권2호
    • /
    • pp.134-146
    • /
    • 2009
  • Traffic classification seeks to assign packet flows to an appropriate quality of service(QoS) class based on flow statistics without the need to examine packet payloads. Classification proceeds in two steps. Classification rules are first built by analyzing traffic traces, and then the classification rules are evaluated using test data. In this paper, we use self-organizing map and K-means clustering as unsupervised machine learning methods to identify the inherent classes in traffic traces. Three clusters were discovered, corresponding to transactional, bulk data transfer, and interactive applications. The K-nearest neighbor classifier was found to be highly accurate for the traffic data and significantly better compared to a minimum mean distance classifier.

무감독 SAM 기법을 이용한 하이퍼스펙트럴 영상 분류 (The Hyperspectral Image Classification with the Unsupervised SAM)

  • 김대성;김진곤;변영기;김용일
    • 한국측량학회:학술대회논문집
    • /
    • 한국측량학회 2004년도 춘계학술발표회논문집
    • /
    • pp.159-164
    • /
    • 2004
  • SAM(Spectral Angle Mapper) is the method using the similarly of the angle between pairs of signatures instead of the spectral distance(MDC, MLC etc.) for classification or clustering. In this paper, we applied unsupervised techniques(Unsupervised SAM and ISODATA) to the Hyperspectral Image(Hyperion) which has innumerable, narrow and contiguous spectral bands and Multispectral Image(ETM$\^$+/) for the clustering of signatures. The overall measured accuracies of the USAM and ISODATA of multispectral image were 76.52%, 53.91% and the USAM and ISODATA of hyperspectral image were 63.04%, 53.91%. From the results of our test, we report that the Unsupervised SAM is better classfication technique than ISODATA. Also we believe that the "Spectral Angle" can potentially be one of the most accurate classifier not only multispectral images but hyperspectral images.

  • PDF

Reinforcement learning multi-agent using unsupervised learning in a distributed cloud environment

  • Gu, Seo-Yeon;Moon, Seok-Jae;Park, Byung-Joon
    • International Journal of Internet, Broadcasting and Communication
    • /
    • 제14권2호
    • /
    • pp.192-198
    • /
    • 2022
  • Companies are building and utilizing their own data analysis systems according to business characteristics in the distributed cloud. However, as businesses and data types become more complex and diverse, the demand for more efficient analytics has increased. In response to these demands, in this paper, we propose an unsupervised learning-based data analysis agent to which reinforcement learning is applied for effective data analysis. The proposal agent consists of reinforcement learning processing manager and unsupervised learning manager modules. These two modules configure an agent with k-means clustering on multiple nodes and then perform distributed training on multiple data sets. This enables data analysis in a relatively short time compared to conventional systems that perform analysis of large-scale data in one batch.

Determining the Optimal Number of Signal Clusters Using Iterative HMM Classification

  • Ernest, Duker Junior;Kim, Yoon Joong
    • International journal of advanced smart convergence
    • /
    • 제7권2호
    • /
    • pp.33-37
    • /
    • 2018
  • In this study, we propose an iterative clustering algorithm that automatically clusters a set of voice signal data without a label into an optimal number of clusters and generates hmm model for each cluster. In the clustering process, the likelihood calculations of the clusters are performed using iterative hmm learning and testing while varying the number of clusters for given data, and the maximum likelihood estimation method is used to determine the optimal number of clusters. We tested the effectiveness of this clustering algorithm on a small-vocabulary digit clustering task by mapping the unsupervised decoded output of the optimal cluster to the ground-truth transcription, we found out that they were highly correlated.

Mutual Fund 수익률의 비정상 함수형 시그널을 위한 다해상도 클러스터 계층구조 (Multi-scale Cluster Hierarchy for Non-stationary Functional Signals of Mutual Fund Returns)

  • 김대룡;정욱
    • 경영과학
    • /
    • 제24권2호
    • /
    • pp.57-72
    • /
    • 2007
  • Many Applications of scientific research have coupled with functional data signal clustering techniques to discover novel characteristics that can be used for the diagnoses of several issues. In this article we present an interpretable multi-scale cluster hierarchy framework for clustering functional data using its multi-aspect frequency information. The suggested method focuses on how to effectively select transformed features/variables in unsupervised manner so that finally reduce the data dimension and achieve the multi-purposed clustering. Specially, we apply our suggested method to mutual fund returns and make superior-performing funds group based on different aspects such as global patterns, seasonal variations, levels of noise, and their combinations. To promise our method producing a quality cluster hierarchy, we give some empirical results under the simulation study and a set of real life data. This research will contribute to financial market analysis and flexibly fit to other research fields with clustering purposes.

함수 변환과 FFT에 기반한 조정자가 없는 XML 문서 클러스터링 기법 (An Unsupervised Clustering Technique of XML Documents based on Function Transform and FFT)

  • 이호석
    • 정보처리학회논문지D
    • /
    • 제14D권2호
    • /
    • pp.169-180
    • /
    • 2007
  • 본 논문은 함수 변환(Function Transform)과 FFT(Fast Fourier Transform)를 사용하는 새로운 XML 문서 클리스터링 기법에 대하여 논한다. 본 문서 클러스터링 기법은 조정자 없이 점진적으로 수행된다. XML 문서는 엘리먼트의 계층적인 구조에 기반하여 이산 함수로 변환된다. 이산 함수는 FFT를 사용하여 벡터로 변환된다. 문서를 나타내는 벡터는 가중치 유클리디안 거리 메트릭을 사용하여 비교된다. 비교 결과가 미리 정의된 값보다 작을 때에는 비교되는 두 개의 문서는 구조적으로 비슷한 것으로 간주되어 동일한 그룹으로 분류된다. XML 문서 클리스터링은 XML 문서의 저장과 검색에 유용하게 사용될 수 있다. 800개의 합서 문서와 520개의 실제 문서를 사용하여 실험하였다. 실험 결과는 함수변환과 FFT는 XML 문서를 엘리먼트의 구조를 기반으로 하여 점진적으로 조정자 없이 효과적으로 분류하는 것을 보여주었다.