• 제목/요약/키워드: Density-based Clustering

검색결과 164건 처리시간 0.037초

AN EFFICIENT DENSITY BASED ANT COLONY APPROACH ON WEB DOCUMENT CLUSTERING

  • M. REKA
    • Journal of applied mathematics & informatics
    • /
    • 제41권6호
    • /
    • pp.1327-1339
    • /
    • 2023
  • World Wide Web (WWW) use has been increasing recently due to users needing more information. Lately, there has been a growing trend in the document information available to end users through the internet. The web's document search process is essential to find relevant documents for user queries.As the number of general web pages increases, it becomes increasingly challenging for users to find records that are appropriate to their interests. However, using existing Document Information Retrieval (DIR) approaches is time-consuming for large document collections. To alleviate the problem, this novel presents Spatial Clustering Ranking Pattern (SCRP) based Density Ant Colony Information Retrieval (DACIR) for user queries based DIR. The proposed first stage is the Term Frequency Weight (TFW) technique to identify the query weightage-based frequency. Based on the weight score, they are grouped and ranked using the proposed Spatial Clustering Ranking Pattern (SCRP) technique. Finally, based on ranking, select the most relevant information retrieves the document using DACIR algorithm.The proposed method outperforms traditional information retrieval methods regarding the quality of returned objects while performing significantly better in run time.

Plurality Rule-based Density and Correlation Coefficient-based Clustering for K-NN

  • Aung, Swe Swe;Nagayama, Itaru;Tamaki, Shiro
    • IEIE Transactions on Smart Processing and Computing
    • /
    • 제6권3호
    • /
    • pp.183-192
    • /
    • 2017
  • k-nearest neighbor (K-NN) is a well-known classification algorithm, being feature space-based on nearest-neighbor training examples in machine learning. However, K-NN, as we know, is a lazy learning method. Therefore, if a K-NN-based system very much depends on a huge amount of history data to achieve an accurate prediction result for a particular task, it gradually faces a processing-time performance-degradation problem. We have noticed that many researchers usually contemplate only classification accuracy. But estimation speed also plays an essential role in real-time prediction systems. To compensate for this weakness, this paper proposes correlation coefficient-based clustering (CCC) aimed at upgrading the performance of K-NN by leveraging processing-time speed and plurality rule-based density (PRD) to improve estimation accuracy. For experiments, we used real datasets (on breast cancer, breast tissue, heart, and the iris) from the University of California, Irvine (UCI) machine learning repository. Moreover, real traffic data collected from Ojana Junction, Route 58, Okinawa, Japan, was also utilized to lay bare the efficiency of this method. By using these datasets, we proved better processing-time performance with the new approach by comparing it with classical K-NN. Besides, via experiments on real-world datasets, we compared the prediction accuracy of our approach with density peaks clustering based on K-NN and principal component analysis (DPC-KNN-PCA).

Spectral clustering based on the local similarity measure of shared neighbors

  • Cao, Zongqi;Chen, Hongjia;Wang, Xiang
    • ETRI Journal
    • /
    • 제44권5호
    • /
    • pp.769-779
    • /
    • 2022
  • Spectral clustering has become a typical and efficient clustering method used in a variety of applications. The critical step of spectral clustering is the similarity measurement, which largely determines the performance of the spectral clustering method. In this paper, we propose a novel spectral clustering algorithm based on the local similarity measure of shared neighbors. This similarity measurement exploits the local density information between data points based on the weight of the shared neighbors in a directed k-nearest neighbor graph with only one parameter k, that is, the number of nearest neighbors. Numerical experiments on synthetic and real-world datasets demonstrate that our proposed algorithm outperforms other existing spectral clustering algorithms in terms of the clustering performance measured via the normalized mutual information, clustering accuracy, and F-measure. As an example, the proposed method can provide an improvement of 15.82% in the clustering performance for the Soybean dataset.

A Density-based Clustering Method

  • Ahn, Sung Mahn;Baik, Sung Wook
    • Communications for Statistical Applications and Methods
    • /
    • 제9권3호
    • /
    • pp.715-723
    • /
    • 2002
  • This paper is to show a clustering application of a density estimation method that utilizes the Gaussian mixture model. We define "closeness measure" as a clustering criterion to see how close given two Gaussian components are. Closeness measure is defined as the ratio of log likelihood between two Gaussian components. According to simulations using artificial data, the clustering algorithm turned out to be very powerful in that it can correctly determine clusters in complex situations, and very flexible in that it can produce different sizes of clusters based on different threshold valuesold values

가버 필터와 밀도 기반 공간 클러스터링을 이용한 피부의 이상 영역 검출 (Detection of Abnormal Region of Skin using Gabor Filter and Density-based Spatial Clustering of Applications with Noise)

  • 전민성;최경주
    • 한국멀티미디어학회논문지
    • /
    • 제21권2호
    • /
    • pp.117-129
    • /
    • 2018
  • In this paper, we suggest a new system that detects abnormal region of skim. First, an illumination elimination algorithm which uses LAB color model is processed on input facial image to obtain robust facial image for illumination, and then gabor filter is processed to detect the reactivity of discontinuity. And last, the density-based spatial clustering of applications with noise(DBSCAN) algorithm is processed to classify areas of wrinkles, dots, and other skin diseases. This method allows the user to check the skin condition of the images taken in real life.

Classification of Subgroups of Solar and Heliospheric Observatory (SOHO) Sungrazing Kreutz Comet Group by the Density-Based Spatial Clustering of Applications with Noise (DBSCAN) Clustering Algorithm

  • Ulkar Karimova;Yu Yi
    • Journal of Astronomy and Space Sciences
    • /
    • 제41권1호
    • /
    • pp.35-42
    • /
    • 2024
  • Sungrazing comets, known for their proximity to the Sun, are traditionally classified into broad groups like Kreutz, Marsden, Kracht, Meyer, and non-group comets. While existing methods successfully categorize these groups, finer distinctions within the Kreutz subgroup remain a challenge. In this study, we introduce an automated classification technique using the densitybased spatial clustering of applications with noise (DBSCAN) algorithm to categorize sungrazing comets. Our method extends traditional classifications by finely categorizing the Kreutz subgroup into four distinct subgroups based on a comprehensive range of orbital parameters, providing critical insights into the origins and dynamics of these comets. Corroborative analyses validate the accuracy and effectiveness of our method, offering a more efficient framework for understanding the categorization of sungrazing comets.

슈퍼픽셀의 밀집도 및 텍스처정보를 이용한 DBSCAN기반 칼라영상분할 (A Method of Color Image Segmentation Based on DBSCAN(Density Based Spatial Clustering of Applications with Noise) Using Compactness of Superpixels and Texture Information)

  • 이정환
    • 디지털산업정보학회논문지
    • /
    • 제11권4호
    • /
    • pp.89-97
    • /
    • 2015
  • In this paper, a method of color image segmentation based on DBSCAN(Density Based Spatial Clustering of Applications with Noise) using compactness of superpixels and texture information is presented. The DBSCAN algorithm can generate clusters in large data sets by looking at the local density of data samples, using only two input parameters which called minimum number of data and distance of neighborhood data. Superpixel algorithms group pixels into perceptually meaningful atomic regions, which can be used to replace the rigid structure of the pixel grid. Each superpixel is consist of pixels with similar features such as luminance, color, textures etc. Superpixels are more efficient than pixels in case of large scale image processing. In this paper, superpixels are generated by SLIC(simple linear iterative clustering) as known popular. Superpixel characteristics are described by compactness, uniformity, boundary precision and recall. The compactness is important features to depict superpixel characteristics. Each superpixel is represented by Lab color spaces, compactness and texture information. DBSCAN clustering method applied to these feature spaces to segment a color image. To evaluate the performance of the proposed method, computer simulation is carried out to several outdoor images. The experimental results show that the proposed algorithm can provide good segmentation results on various images.

An Overview of Unsupervised and Semi-Supervised Fuzzy Kernel Clustering

  • Frigui, Hichem;Bchir, Ouiem;Baili, Naouel
    • International Journal of Fuzzy Logic and Intelligent Systems
    • /
    • 제13권4호
    • /
    • pp.254-268
    • /
    • 2013
  • For real-world clustering tasks, the input data is typically not easily separable due to the highly complex data structure or when clusters vary in size, density and shape. Kernel-based clustering has proven to be an effective approach to partition such data. In this paper, we provide an overview of several fuzzy kernel clustering algorithms. We focus on methods that optimize an fuzzy C-mean-type objective function. We highlight the advantages and disadvantages of each method. In addition to the completely unsupervised algorithms, we also provide an overview of some semi-supervised fuzzy kernel clustering algorithms. These algorithms use partial supervision information to guide the optimization process and avoid local minima. We also provide an overview of the different approaches that have been used to extend kernel clustering to handle very large data sets.

Intelligent LoRa-Based Positioning System

  • Chen, Jiann-Liang;Chen, Hsin-Yun;Ma, Yi-Wei
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • 제16권9호
    • /
    • pp.2961-2975
    • /
    • 2022
  • The Location-Based Service (LBS) is one of the most well-known services on the Internet. Positioning is the primary association with LBS services. This study proposes an intelligent LoRa-based positioning system, called AI@LBS, to provide accurate location data. The fingerprint mechanism with the clustering algorithm in unsupervised learning filters out signal noise and improves computing stability and accuracy. In this study, data noise is filtered using the DBSCAN (Density-Based Spatial Clustering of Applications with Noise) algorithm, increasing the positioning accuracy from 95.37% to 97.38%. The problem of data imbalance is addressed using the SMOTE (Synthetic Minority Over-sampling Technique) technique, increasing the positioning accuracy from 97.38% to 99.17%. A field test in the NTUST campus (www.ntust.edu.tw) revealed that AI@LBS system can reduce average distance error to 0.48m.

발산거리 기반의 신경망에 의한 가우시안 확률 밀도 함수의 군집화 (Guassian pdfs Clustering Using a Divergence Measure-based Neural Network)

  • 박동철;권오현
    • 한국통신학회논문지
    • /
    • 제29권5C호
    • /
    • pp.627-631
    • /
    • 2004
  • 음성인식 모델상의 GPDFs(Gaussian Probability Density Functions)을 효율적으로 군집화 할 수 있는 알고리즘이 제안되었다. 제안된 알고리즘은 데이터 사이의 거리 척도로 발산 거리를 사용하는 새로운 형태의 CNN(Centroid Neural Network)으로, 제한된 자원을 가지는 H/W환경의 음성인식에서 메모리 사용량을 축소하는 응용에 대한 실험 결과, 음성인식 모델인 CDHMM(Continuous Density Hidden Markov Model)에서 기존의 Dk-means(Divergence-based k-means)알고리즘을 이용한 방법과 비교하여 인식 성능의 유지와 함께 약 31.3%의 GPDFs를 더 축소할 수 있었고, 군집화 알고리즘을 적용하지 자은 전체 GPDFs를 사용한 경우와 비교해서 인식 성능의 유지와 함께 약 61.8%의 GPDFs를 압축할 수 있었으며, SNR 10㏈ 잡음 데이터에 대한 성능평가에서도 인식 성능이 유지될 수 있었다.