• Title/Summary/Keyword: Random clustering

Search Result 152, Processing Time 0.023 seconds

Spectral Clustering with Sparse Graph Construction Based on Markov Random Walk

  • Cao, Jiangzhong;Chen, Pei;Ling, Bingo Wing-Kuen;Yang, Zhijing;Dai, Qingyun
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.9 no.7
    • /
    • pp.2568-2584
    • /
    • 2015
  • Spectral clustering has become one of the most popular clustering approaches in recent years. Similarity graph constructed on the data is one of the key factors that influence the performance of spectral clustering. However, the similarity graphs constructed by existing methods usually contain some unreliable edges. To construct reliable similarity graph for spectral clustering, an efficient method based on Markov random walk (MRW) is proposed in this paper. In the proposed method, theMRW model is defined on the raw k-NN graph and the neighbors of each sample are determined by the probability of the MRW. Since the high order transition probabilities carry complex relationships among data, the neighbors in the graph determined by our proposed method are more reliable than those of the existing methods. Experiments are performed on the synthetic and real-world datasets for performance evaluation and comparison. The results show that the graph obtained by our proposed method reflects the structure of the data better than those of the state-of-the-art methods and can effectively improve the performance of spectral clustering.

A Clustering Approach for Feature Selection in Microarray Data Classification Using Random Forest

  • Aydadenta, Husna;Adiwijaya, Adiwijaya
    • Journal of Information Processing Systems
    • /
    • v.14 no.5
    • /
    • pp.1167-1175
    • /
    • 2018
  • Microarray data plays an essential role in diagnosing and detecting cancer. Microarray analysis allows the examination of levels of gene expression in specific cell samples, where thousands of genes can be analyzed simultaneously. However, microarray data have very little sample data and high data dimensionality. Therefore, to classify microarray data, a dimensional reduction process is required. Dimensional reduction can eliminate redundancy of data; thus, features used in classification are features that only have a high correlation with their class. There are two types of dimensional reduction, namely feature selection and feature extraction. In this paper, we used k-means algorithm as the clustering approach for feature selection. The proposed approach can be used to categorize features that have the same characteristics in one cluster, so that redundancy in microarray data is removed. The result of clustering is ranked using the Relief algorithm such that the best scoring element for each cluster is obtained. All best elements of each cluster are selected and used as features in the classification process. Next, the Random Forest algorithm is used. Based on the simulation, the accuracy of the proposed approach for each dataset, namely Colon, Lung Cancer, and Prostate Tumor, achieved 85.87%, 98.9%, and 89% accuracy, respectively. The accuracy of the proposed approach is therefore higher than the approach using Random Forest without clustering.

Detection of Moving Objects in Crowded Scenes using Trajectory Clustering via Conditional Random Fields Framework (Conditional Random Fields 구조에서 궤적군집화를 이용한 혼잡 영상의 이동 객체 검출)

  • Kim, Hyeong-Ki;Lee, Gwang-Gook;Kim, Whoi-Yul
    • Journal of Korea Multimedia Society
    • /
    • v.13 no.8
    • /
    • pp.1128-1141
    • /
    • 2010
  • This paper proposes a method of moving object detection in crowded scene using clustered trajectory. Unlike previous appearance based approaches, the proposed method employes motion information only to isolate moving objects. In the proposed method, feature points are extracted from input frames first and then feature tracking is followed to create feature trajectories. Based on an assumption that feature points originated from the same objects shows similar motion as the object moves, the proposed method detects moving objects by clustering trajectories of similar motions. For this purpose an energy function based on spatial proximity, motion coherence, and temporal continuity is defined to measure the similarity between two trajectories and the clustering is achieved by minimizing the energy function in CRFs (conditional random fields). Compared to previous methods, which are unable to separate falsely merged trajectories during the clustering process, the proposed method is able to rearrange the falsely merged trajectories during iteration because the clustering is solved my energy minimization in CRFs. Experiment results with three different crowded scenes show about 94% detection rate with 7% false alarm rate.

Enhanced Locality Sensitive Clustering in High Dimensional Space

  • Chen, Gang;Gao, Hao-Lin;Li, Bi-Cheng;Hu, Guo-En
    • Transactions on Electrical and Electronic Materials
    • /
    • v.15 no.3
    • /
    • pp.125-129
    • /
    • 2014
  • A dataset can be clustered by merging the bucket indices that come from the random projection of locality sensitive hashing functions. It should be noted that for this to work the merging interval must be calculated first. To improve the feasibility of large scale data clustering in high dimensional space we propose an enhanced Locality Sensitive Hashing Clustering Method. Firstly, multiple hashing functions are generated. Secondly, data points are projected to bucket indices. Thirdly, bucket indices are clustered to get class labels. Experimental results showed that on synthetic datasets this method achieves high accuracy at much improved cluster speeds. These attributes make it well suited to clustering data in high dimensional space.

Comparison of Initial Seeds Methods for K-Means Clustering (K-Means 클러스터링에서 초기 중심 선정 방법 비교)

  • Lee, Shinwon
    • Journal of Internet Computing and Services
    • /
    • v.13 no.6
    • /
    • pp.1-8
    • /
    • 2012
  • Clustering method is divided into hierarchical clustering, partitioning clustering, and more. K-Means algorithm is one of partitioning clustering and is adequate to cluster so many documents rapidly and easily. It has disadvantage that the random initial centers cause different result. So, the better choice is to place them as far away as possible from each other. We propose a new method of selecting initial centers in K-Means clustering. This method uses triangle height for initial centers of clusters. After that, the centers are distributed evenly and that result is more accurate than initial cluster centers selected random. It is time-consuming, but can reduce total clustering time by minimizing the number of allocation and recalculation. We can reduce the time spent on total clustering. Compared with the standard algorithm, average consuming time is reduced 38.4%.

New Optimization Algorithm for Data Clustering (최적화에 기반 한 데이터 클러스터링 알고리즘)

  • Kim, Ju-Mi
    • Journal of Intelligence and Information Systems
    • /
    • v.13 no.3
    • /
    • pp.31-45
    • /
    • 2007
  • Large data handling is one of critical issues that the data mining community faces. This is particularly true for computationally intense tasks such as data clustering. Random sampling of instances is one possible means of achieving large data handling, but a pervasive problem with this approach is how to deal with the noise in the evaluation of the learning algorithm. This paper develops a new optimization based clustering approach using an algorithm specifically designed for noisy performance. Numerical results show this algorithm better than the other algorithms such as PAM and CLARA. Also with this algorithm substantial benefits can be achieved in terms of computational time without sacrificing solution quality using partial data.

  • PDF

Clustering and classification to characterize daily electricity demand (시간단위 전력사용량 시계열 패턴의 군집 및 분류분석)

  • Park, Dain;Yoon, Sanghoo
    • Journal of the Korean Data and Information Science Society
    • /
    • v.28 no.2
    • /
    • pp.395-406
    • /
    • 2017
  • The purpose of this study is to identify the pattern of daily electricity demand through clustering and classification. The hourly data was collected by KPS (Korea Power Exchange) between 2008 and 2012. The time trend was eliminated for conducting the pattern of daily electricity demand because electricity demand data is times series data. We have considered k-means clustering, Gaussian mixture model clustering, and functional clustering in order to find the optimal clustering method. The classification analysis was conducted to understand the relationship between external factors, day of the week, holiday, and weather. Data was divided into training data and test data. Training data consisted of external factors and clustered number between 2008 and 2011. Test data was daily data of external factors in 2012. Decision tree, random forest, Support vector machine, and Naive Bayes were used. As a result, Gaussian model based clustering and random forest showed the best prediction performance when the number of cluster was 8.

Terminal-based Dynamic Clustering Algorithm in Multi-Cell Cellular System

  • Ni, Jiqing;Fei, Zesong;Xing, Chengwen;Zhao, Di;Kuang, Jingming
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.6 no.9
    • /
    • pp.2086-2097
    • /
    • 2012
  • A terminal-based dynamic clustering algorithm is proposed in a multi-cell scenario, where the user could select the cooperative BSs from the predetermined static base stations (BSs) set based on dynamic channel condition. First, the user transmission rate is derived based on linear precoding and per-cell feedback scheme. Then, the dynamic clustering algorithm can be implemented based on two criteria: (a) the transmission rate should meet the user requirement for quality of service (QoS); (b) the rate increment exceeds the predetermined constant threshold. By adopting random vector quantization (RVQ), the optimized number of cooperative BSs and the corresponding channel conditions are presented respectively. Numerical results are given and show that the performance of the proposed method can improve the system resources utilization effectively.

A Fusion of Data Mining Techniques for Predicting Movement of Mobile Users

  • Duong, Thuy Van T.;Tran, Dinh Que
    • Journal of Communications and Networks
    • /
    • v.17 no.6
    • /
    • pp.568-581
    • /
    • 2015
  • Predicting locations of users with portable devices such as IP phones, smart-phones, iPads and iPods in public wireless local area networks (WLANs) plays a crucial role in location management and network resource allocation. Many techniques in machine learning and data mining, such as sequential pattern mining and clustering, have been widely used. However, these approaches have two deficiencies. First, because they are based on profiles of individual mobility behaviors, a sequential pattern technique may fail to predict new users or users with movement on novel paths. Second, using similar mobility behaviors in a cluster for predicting the movement of users may cause significant degradation in accuracy owing to indistinguishable regular movement and random movement. In this paper, we propose a novel fusion technique that utilizes mobility rules discovered from multiple similar users by combining clustering and sequential pattern mining. The proposed technique with two algorithms, named the clustering-based-sequential-pattern-mining (CSPM) and sequential-pattern-mining-based-clustering (SPMC), can deal with the lack of information in a personal profile and avoid some noise due to random movements by users. Experimental results show that our approach outperforms existing approaches in terms of efficiency and prediction accuracy.

Keyphrase Extraction Using Active Learning and Clustering (Active Learning과 군집화를 이용한 고정키어구 추출)

  • Lee, Hyun-Woo;Cha, Jeong-Won
    • MALSORI
    • /
    • no.66
    • /
    • pp.87-103
    • /
    • 2008
  • We describe a new active learning method in conditional random fields (CRFs) framework for keyphrase extraction. To save elaboration in annotation, we use diversity and representative measure. We select high diversity training candidates by sentence confidence value. We also select high representative candidates by clustering the part-of-speech patterns of contexts. In the experiments using dialog corpus, our method achieves 86.80% and saves 88% training corpus compared with those of supervised method. From the results of experiment, we can see that the proposed method shows improved performance over the previous methods. Additionally, the proposed method can be applied to other applications easily since its implementation is independent on applications.

  • PDF