• Title/Summary/Keyword: Data Clustering

Search Result 2,769, Processing Time 0.034 seconds

Frequent Itemset Creation using Bit Transaction Clustering in Data Mining (데이터 마이닝에서 비트 트랜잭션 클러스터링을 이용한 빈발항목 생성)

  • Kim Eui-Chan;Hwang Byung-Yeon
    • The KIPS Transactions:PartD
    • /
    • v.13D no.3 s.106
    • /
    • pp.293-298
    • /
    • 2006
  • Many data are stored in database. For getting any information from many data, we use the query sentences. These information is basic and simple. Data mining method is various. In this paper, we manage clustering and association rules. We present a method for finding the better association rules, and we solve a problem of the existing association rules. We propose and apply a new clustering method to fit for association rules. It is not clustering of the existing distance basis or category basis. If we find association rules of each clusters, we can get not only existing rules found in all transaction but also rules that will be characteristics of clusters. Through this study, we can expect that we will reduce the number of many transaction access in large databases and find association of small group.

Privacy-Preserving K-means Clustering using Homomorphic Encryption in a Multiple Clients Environment (다중 클라이언트 환경에서 동형 암호를 이용한 프라이버시 보장형 K-평균 클러스터링)

  • Kwon, Hee-Yong;Im, Jong-Hyuk;Lee, Mun-Kyu
    • The Journal of Korean Institute of Next Generation Computing
    • /
    • v.15 no.4
    • /
    • pp.7-17
    • /
    • 2019
  • Machine learning is one of the most accurate techniques to predict and analyze various phenomena. K-means clustering is a kind of machine learning technique that classifies given data into clusters of similar data. Because it is desirable to perform an analysis based on a lot of data for better performance, K-means clustering can be performed in a model with a server that calculates the centroids of the clusters, and a number of clients that provide data to server. However, this model has the problem that if the clients' data are associated with private information, the server can infringe clients' privacy. In this paper, to solve this problem in a model with a number of clients, we propose a privacy-preserving K-means clustering method that can perform machine learning, concealing private information using homomorphic encryption.

Inter-clustering Cooperative Relay Selection Schemes for 5G Device-to-device Communication Networks

  • Nasaruddin, Nasaruddin;Yunida, Yunida;Adriman, Ramzi
    • Journal of information and communication convergence engineering
    • /
    • v.20 no.3
    • /
    • pp.143-152
    • /
    • 2022
  • The ongoing adoption of 5G will increase the data traffic, throughput, multimedia services, and power consumption for future wireless applications and services, including sensor and mobile networks. Multipath fading on wireless channels also reduces the system performance and increases energy consumption. To address these issues, device-to-device (D2D) and cooperative communications have been proposed. In this study, we propose two inter-clustering models using the relay selection method to improve system performance and increase energy efficiency in cooperative D2D networks. We develop two inter-clustering models and present their respective algorithms. Subsequently, we run a computer simulation to evaluate each model's outage probability (OP) performance, throughput, and energy efficiency. The simulation results show that inter-clustering model II has the lowest OP, highest throughput, and highest energy efficiency compared with inter-clustering model I and the conventional inter-clustering-based multirelay method. These results demonstrate that inter-clustering model II is well-suited for use in 5G overlay D2D and cellular communications.

Context-awareness Clustering with Adaptive Learning Algorithm (상황인식 기반 클러스터링의 적응적 자율 학습 분할 알고리즘)

  • Jeon, Il-Kyu;Lee, Kang-whan
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2022.10a
    • /
    • pp.612-614
    • /
    • 2022
  • This paper propose a clustering algorithm for mobile nodes that possible more efficient clustering using context-aware attribute information in adaptive learning. In typically, the data will be provided to classify interrelationships within cluster properties. If a new properties are treated as contaminated information in comparative clustering, it can be treated as contaminated properties in comparison clustering. In this paper, To solve this problems in this paper, we have new present a context-awareness learning based model that can analyzes the clustering attributed parameters from the node properties using accumulated information properties.

  • PDF

An Abnormal Worker Movement Detection System Based on Data Stream Processing and Hierarchical Clustering

  • Duong, Dat Van Anh;Lan, Doi Thi;Yoon, Seokhoon
    • International Journal of Internet, Broadcasting and Communication
    • /
    • v.14 no.4
    • /
    • pp.88-95
    • /
    • 2022
  • Detecting anomalies in human movement is an important task in industrial applications, such as monitoring industrial disasters or accidents and recognizing unauthorized factory intruders. In this paper, we propose an abnormal worker movement detection system based on data stream processing and hierarchical clustering. In the proposed system, Apache Spark is used for streaming the location data of people. A hierarchical clustering-based anomalous trajectory detection algorithm is designed for detecting anomalies in human movement. The algorithm is integrated into Apache Spark for detecting anomalies from location data. Specifically, the location information is streamed to Apache Spark using the message queuing telemetry transport protocol. Then, Apache Spark processes and stores location data in a data frame. When there is a request from a client, the processed data in the data frame is taken and put into the proposed algorithm for detecting anomalies. A real mobility trace of people is used to evaluate the proposed system. The obtained results show that the system has high performance and can be used for a wide range of industrial applications.

Parallel k-Modes Algorithm for Spark Framework (스파크 프레임워크를 위한 병렬적 k-Modes 알고리즘)

  • Chung, Jaehwa
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.6 no.10
    • /
    • pp.487-492
    • /
    • 2017
  • Clustering is a technique which is used to measure similarities between data in big data analysis and data mining field. Among various clustering methods, k-Modes algorithm is representatively used for categorical data. To increase the performance of iterative-centric tasks such as k-Modes, a distributed and concurrent framework Spark has been received great attention recently because it overcomes the limitation of Hadoop. Spark provides an environment that can process large amount of data in main memory using the concept of abstract objects called RDD. Spark provides Mllib, a dedicated library for machine learning, but Mllib only includes k-means that can process only continuous data, so there is a limitation that categorical data processing is impossible. In this paper, we design RDD for k-Modes algorithm for categorical data clustering in spark environment and implement an algorithm that can operate effectively. Experiments show that the proposed algorithm increases linearly in the spark environment.

An Energy Consumption Model using Hierarchical Unequal Clustering Method (계층적 불균형 클러스터링 기법을 이용한 에너지 소비 모델)

  • Kim, Jin-Su;Shin, Seung-Soo
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.12 no.6
    • /
    • pp.2815-2822
    • /
    • 2011
  • Clustering method in wireless sensor networks is the technique that forms the cluster to aggregate the data and transmit them at the same time that they can use the energy efficiently. In this paper, I propose the hierarchical unequal clustering method using cluster group model. This divides the entire network into two layers. The data aggregated from layer 2 consisted of cluster group is sent to layer 1, after re-aggregation the total data is sent to base station. This method decreases whole energy consumption by using cluster group model with multi-hop communication architecture. Hot spot problem can be solved by establishing unequal cluster. I also show that proposed hierarchical unequal clustering method is better than previous clustering method at the point of network energy efficiency.

Performance Analysis of Hierarchical Routing Protocols for Sensor Network (센서 네트워크를 위한 계층적 라우팅 프로토콜의 성능 분석)

  • Seo, Byung-Suk;Yoon, Sang-Hyun;Kim, Jong-Hyun
    • Journal of the Korea Society for Simulation
    • /
    • v.21 no.4
    • /
    • pp.47-56
    • /
    • 2012
  • In this study, we use a parallel simulator PASENS(Parallel SEnsor Network Simulator) to predict power consumption and data reception rate of the hierarchical routing protocols for sensor network - LEACH (Low-Energy Adaptive Clustering Hierarchy), TL-LEACH (Two Level Low-Energy Adaptive Clustering Hierarchy), M-LEACH (Multi hop Low-Energy Adaptive Clustering Hierarchy) and LEACH-C (LEACH-Centralized). According to simulation results, M-LEACH routing protocol shows the highest data reception rate for the wider area, since more sensor nodes are involved in the data transmission. And LEACH-C routing protocol, where the sink node considers the entire node's residual energy and location to determine the cluster head, results in the most efficient energy consumption and in the narrow area needed long life of sensor network.

Cluster Merging Using Enhanced Density based Fuzzy C-Means Clustering Algorithm (개선된 밀도 기반의 퍼지 C-Means 알고리즘을 이용한 클러스터 합병)

  • Han, Jin-Woo;Jun, Sung-Hae;Oh, Kyung-Whan
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.14 no.5
    • /
    • pp.517-524
    • /
    • 2004
  • The fuzzy set theory has been wide used in clustering of machine learning with data mining since fuzzy theory has been introduced in 1960s. In particular, fuzzy C-means algorithm is a popular fuzzy clustering algorithm up to date. An element is assigned to any cluster with each membership value using fuzzy C-means algorithm. This algorithm is affected from the location of initial cluster center and the proper cluster size like a general clustering algorithm as K-means algorithm. This setting up for initial clustering is subjective. So, we get improper results according to circumstances. In this paper, we propose a cluster merging using enhanced density based fuzzy C-means clustering algorithm for solving this problem. Our algorithm determines initial cluster size and center using the properties of training data. Proposed algorithm uses grid for deciding initial cluster center and size. For experiments, objective machine learning data are used for performance comparison between our algorithm and others.

Variable Clustering Management for Multiple Streaming of Distributed Mobile Service (분산 모바일 서비스의 다중 스트리밍을 위한 가변 클러스터링 관리)

  • Jeong, Taeg-Won;Lee, Chong-Deuk
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.19 no.4
    • /
    • pp.485-492
    • /
    • 2009
  • In the mobile service environment, patterns generated by temporal synchronization are streamed with different instance values. This paper proposed a variable clustering management method, which manages multiple data streaming dynamically, to support flexible clustering. The method manages synchronization effectively and differently with conventional streaming methods in data streaming environment and manages clustering streaming after the structural presentation level and the fitness presentation level. In the structural presentation level, the stream structure is presented using level matching and accumulation matching, and clustering management is carried out by the management of dynamic segment and static segment. The performance of the proposed method is tested by using k-means method, C/S server method, CDN method, and simulation. The test results showed that the proposed method has better performance than the other methods.