• 제목/요약/키워드: Data Clustering

검색결과 2,769건 처리시간 0.029초

클러스터링 알고리즘에서 저비용 3D LiDAR 기반 객체 감지를 위한 향상된 파라미터 추론 (Improved Parameter Inference for Low-Cost 3D LiDAR-Based Object Detection on Clustering Algorithms)

  • 김다현;안준호
    • 인터넷정보학회논문지
    • /
    • 제23권6호
    • /
    • pp.71-78
    • /
    • 2022
  • 본 논문은 3D LiDAR의 포인트 클라우드 데이터를 가공하여 3D 객체탐지를 위한 알고리즘을 제안했다. 기존에 2D LiDAR와 달리 3D LiDAR 기반의 데이터는 너무 방대하며 3차원으로 가공이 힘들었다. 본 논문은 3D LiDAR 기반의 다양한 연구들을 소개하고 3D LiDAR 데이터 처리에 관해 서술하였다. 본 연구에서는 객체탐지를 위해 클러스터링 기법을 활용한 3D LiDAR의 데이터를 가공하는 방법을 제안하며 명확하고 정확한 3D 객체탐지를 위해 카메라와 융합하는 알고리즘 설계하였다. 또한, 3D LiDAR 기반 데이터를 클러스터링하기 위한 모델을 연구하였으며 모델에 따른 하이퍼 파라미터값을 연구하였다. 3D LiDAR 기반 데이터를 클러스터링할 때, DBSCAN 알고리즘이 가장 정확한 결과를 보였으며 DBSCAN의 하이퍼 파라미터값을 비교 분석하였다. 본 연구가 추후 3D LiDAR를 활용한 객체탐지 연구에 도움이 될 것이다.

Machine Learning-Based Transactions Anomaly Prediction for Enhanced IoT Blockchain Network Security and Performance

  • Nor Fadzilah Abdullah;Ammar Riadh Kairaldeen;Asma Abu-Samah;Rosdiadee Nordin
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • 제18권7호
    • /
    • pp.1986-2009
    • /
    • 2024
  • The integration of blockchain technology with the rapid growth of Internet of Things (IoT) devices has enabled secure and decentralised data exchange. However, security vulnerabilities and performance limitations remain significant challenges in IoT blockchain networks. This work proposes a novel approach that combines transaction representation and machine learning techniques to address these challenges. Various clustering techniques, including k-means, DBSCAN, Gaussian Mixture Models (GMM), and Hierarchical clustering, were employed to effectively group unlabelled transaction data based on their intrinsic characteristics. Anomaly transaction prediction models based on classifiers were then developed using the labelled data. Performance metrics such as accuracy, precision, recall, and F1-measure were used to identify the minority class representing specious transactions or security threats. The classifiers were also evaluated on their performance using balanced and unbalanced data. Compared to unbalanced data, balanced data resulted in an overall average improvement of approximately 15.85% in accuracy, 88.76% in precision, 60% in recall, and 74.36% in F1-score. This demonstrates the effectiveness of each classifier as a robust classifier with consistently better predictive performance across various evaluation metrics. Moreover, the k-means and GMM clustering techniques outperformed other techniques in identifying security threats, underscoring the importance of appropriate feature selection and clustering methods. The findings have practical implications for reinforcing security and efficiency in real-world IoT blockchain networks, paving the way for future investigations and advancements.

디자인 패턴을 적용한 위성영상처리를 위한 군집화 분류시스템의 설계 (A Design of Clustering Classification Systems using Satellite Remote Sensing Images Based on Design Patterns)

  • 김동연;김진일
    • 정보처리학회논문지B
    • /
    • 제9B권3호
    • /
    • pp.319-326
    • /
    • 2002
  • 본 논문에서는 위성영상을 처리하기 위한 무감독분류 기법인 군집분류 시스템을 설계하고 구현하였다. 구현된 시스템은 새로운 위성영상 포맷과 군집분류 기법의 지원이 용이하고, 확장성 있는 시스템의 설계를 위하여 팩토리 패턴과 전략적 패턴 등 다양한 디자인 패턴을 적용하였다. 군집분류 시스템은 순차군집분류 기법, K-Means 군집분류 기법, ISODATA 기법, Fuzzy C-Means군집분류 기법을 설계, 구현하였으며 Landsat TM 위성영상을 분류기의 입력영상으로 실험하였다. 그 결과 군집분류 기법은 사전지식이 없는 위성영상의 분류를 위한 표본영역의 추출작업과 위성영상의 실시간 분류에 효과적인 사용이 가능함을 보였으며, 재사용성 및 확장성이 우수한 시스템을 개발하였다.

Combining Distributed Word Representation and Document Distance for Short Text Document Clustering

  • Kongwudhikunakorn, Supavit;Waiyamai, Kitsana
    • Journal of Information Processing Systems
    • /
    • 제16권2호
    • /
    • pp.277-300
    • /
    • 2020
  • This paper presents a method for clustering short text documents, such as news headlines, social media statuses, or instant messages. Due to the characteristics of these documents, which are usually short and sparse, an appropriate technique is required to discover hidden knowledge. The objective of this paper is to identify the combination of document representation, document distance, and document clustering that yields the best clustering quality. Document representations are expanded by external knowledge sources represented by a Distributed Representation. To cluster documents, a K-means partitioning-based clustering technique is applied, where the similarities of documents are measured by word mover's distance. To validate the effectiveness of the proposed method, experiments were conducted to compare the clustering quality against several leading methods. The proposed method produced clusters of documents that resulted in higher precision, recall, F1-score, and adjusted Rand index for both real-world and standard data sets. Furthermore, manual inspection of the clustering results was conducted to observe the efficacy of the proposed method. The topics of each document cluster are undoubtedly reflected by members in the cluster.

A Dimensionality Assessment for Polytomously Scored Items Using DETECT

  • Kim, Hae-Rim
    • Communications for Statistical Applications and Methods
    • /
    • 제7권2호
    • /
    • pp.597-603
    • /
    • 2000
  • A versatile dimensionality assessment index DETECT has been developed for binary item response data by Kim (1994). The present paper extends the use of DETECT to the polytomously scored item data. A simulation study shows DETECT performs well in differentiating multidimensional data from unidimensional one by yielding a greater value of DETECT in the case of multidimensionality. An additional investigation is necessary for the dimensionally meaningful clustering methods, such as HAC for binary data, particularly sensitive to the polytomous data.

  • PDF

Optimization Driven MapReduce Framework for Indexing and Retrieval of Big Data

  • Abdalla, Hemn Barzan;Ahmed, Awder Mohammed;Al Sibahee, Mustafa A.
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • 제14권5호
    • /
    • pp.1886-1908
    • /
    • 2020
  • With the technical advances, the amount of big data is increasing day-by-day such that the traditional software tools face a burden in handling them. Additionally, the presence of the imbalance data in big data is a massive concern to the research industry. In order to assure the effective management of big data and to deal with the imbalanced data, this paper proposes a new indexing algorithm for retrieving big data in the MapReduce framework. In mappers, the data clustering is done based on the Sparse Fuzzy-c-means (Sparse FCM) algorithm. The reducer combines the clusters generated by the mapper and again performs data clustering with the Sparse FCM algorithm. The two-level query matching is performed for determining the requested data. The first level query matching is performed for determining the cluster, and the second level query matching is done for accessing the requested data. The ranking of data is performed using the proposed Monarch chaotic whale optimization algorithm (M-CWOA), which is designed by combining Monarch butterfly optimization (MBO) [22] and chaotic whale optimization algorithm (CWOA) [21]. Here, the Parametric Enabled-Similarity Measure (PESM) is adapted for matching the similarities between two datasets. The proposed M-CWOA outperformed other methods with maximal precision of 0.9237, recall of 0.9371, F1-score of 0.9223, respectively.

A Classification Algorithm Based on Data Clustering and Data Reduction for Intrusion Detection System over Big Data

  • Wang, Qiuhua;Ouyang, Xiaoqin;Zhan, Jiacheng
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • 제13권7호
    • /
    • pp.3714-3732
    • /
    • 2019
  • With the rapid development of network, Intrusion Detection System(IDS) plays a more and more important role in network applications. Many data mining algorithms are used to build IDS. However, due to the advent of big data era, massive data are generated. When dealing with large-scale data sets, most data mining algorithms suffer from a high computational burden which makes IDS much less efficient. To build an efficient IDS over big data, we propose a classification algorithm based on data clustering and data reduction. In the training stage, the training data are divided into clusters with similar size by Mini Batch K-Means algorithm, meanwhile, the center of each cluster is used as its index. Then, we select representative instances for each cluster to perform the task of data reduction and use the clusters that consist of representative instances to build a K-Nearest Neighbor(KNN) detection model. In the detection stage, we sort clusters according to the distances between the test sample and cluster indexes, and obtain k nearest clusters where we find k nearest neighbors. Experimental results show that searching neighbors by cluster indexes reduces the computational complexity significantly, and classification with reduced data of representative instances not only improves the efficiency, but also maintains high accuracy.

마이크로 어레이 데이터에 적용된 2단계 K-means 클러스터링의 소개 (An Introduction of Two-Step K-means Clustering Applied to Microarray Data)

  • 박대훈;김연태;김성신;이춘환
    • 한국지능시스템학회논문지
    • /
    • 제17권2호
    • /
    • pp.167-172
    • /
    • 2007
  • 많은 유전자 정보와 그 부산물은 많은 방법을 통해 연구되어 왔다. DNA 마이크로어레이 기술의 사용은 많은 데이터를 가져왔으며, 이렇게 얻은 데이터는 기존의 연구 방법으로는 분석하기 힘들다. 본 논문에서는 많은 양의 데이터를 처리할 수 있게 하기 위하여 K-means 클러스터링 알고리즘을 이용한 분할 클러스터링을 제안하였다. 제안한 방법을 쌀 유전자로부터 나온 마이크로어레이 데이터에 적용함으로써 제안된 클러스터링 방법의 유용성을 검증하였으며, 기존의 K-means 클러스터링 알고리즘을 적용한 결과와 비교함으로써 제안된 알고리즘의 우수성을 확인할 수 있었다.

최적화된 pRBF 뉴럴 네트워크에 이용한 삼상 부분방전 패턴분류에 관한 연구 (A Study on Three Phase Partial Discharge Pattern Classification with the Aid of Optimized Polynomial Radial Basis Function Neural Networks)

  • 오성권;김현기;김정태
    • 전기학회논문지
    • /
    • 제62권4호
    • /
    • pp.544-553
    • /
    • 2013
  • In this paper, we propose the pattern classifier of Radial Basis Function Neural Networks(RBFNNs) for diagnosis of 3-phase partial discharge. Conventional methods map the partial discharge/noise data on 3-PARD map, and decide whether the partial discharge occurs or not from 3-phase or neutral point. However, it is decided based on his own subjective knowledge of skilled experter. In order to solve these problems, the mapping of data as well as the classification of phases are considered by using the general 3-PARD map and PA method, and the identification of phases occurring partial discharge/noise discharge is done. In the sequel, the type of partial discharge occurring on arbitrary random phase is classified and identified by fuzzy clustering-based polynomial Radial Basis Function Neural Networks(RBFNN) classifier. And by identifying the learning rate, momentum coefficient, and fuzzification coefficient of FCM fuzzy clustering with the aid of PSO algorithm, the RBFNN classifier is optimized. The virtual simulated data and the experimental data acquired from practical field are used for performance estimation of 3-phase partial discharge pattern classifier.

Density Aware Energy Efficient Clustering Protocol for Normally Distributed Sensor Networks

  • Su, Xin;Choi, Dong-Min;Moh, Sang-Man;Chung, Il-Yong
    • 한국멀티미디어학회논문지
    • /
    • 제13권6호
    • /
    • pp.911-923
    • /
    • 2010
  • In wireless sensor networks (WSNs), cluster based data routing protocols have the advantages of reducing energy consumption and link maintenance cost. Unfortunately, most of clustering protocols have been designed for uniformly distributed sensor networks. However, some urgent situations do not allow thousands of sensor nodes being deployed uniformly. For example, air vehicles or balloons may take the responsibility for deploying sensor nodes hence leading a normally distributed topology. In order to improve energy efficiency in such sensor networks, in this paper, we propose a new cluster formation algorithm named DAEEC (Density Aware Energy-Efficient Clustering). In this algorithm, we define two kinds of clusters: Low Density (LD) clusters and High Density (HD) clusters. They are determined by the number of nodes participated in one cluster. During the data routing period, the HD clusters help the neighbor LD clusters to forward the sensed data to the central base station. Thus, DAEEC can distribute the energy dissipation evenly among all sensor nodes by considering the deployment density to improve network lifetime and average energy savings. Moreover, because the HD clusters are densely deployed they can work in a manner of our former algorithm EEVAR (Energy Efficient Variable Area Routing Protocol) to save energy. According to the performance analysis result, DAEEC outperforms the conventional data routing schemes in terms of energy consumption and network lifetime.