• Title/Summary/Keyword: Unsupervised algorithm

Search Result 281, Processing Time 0.027 seconds

A Novel of Data Clustering Architecture for Outlier Detection to Electric Power Data Analysis (전력데이터 분석에서 이상점 추출을 위한 데이터 클러스터링 아키텍처에 관한 연구)

  • Jung, Se Hoon;Shin, Chang Sun;Cho, Young Yun;Park, Jang Woo;Park, Myung Hye;Kim, Young Hyun;Lee, Seung Bae;Sim, Chun Bo
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.6 no.10
    • /
    • pp.465-472
    • /
    • 2017
  • In the past, researchers mainly used the supervised learning technique of machine learning to analyze power data and investigated the identification of patterns through the data mining technique. Data analysis research, however, faces its limitations with the old data classification and analysis techniques today when the size of electric power data has increased with the possible real-time provision of data. This study thus set out to propose a clustering architecture to analyze large-sized electric power data. The clustering process proposed in the study supplements the K-means algorithm, an unsupervised learning technique, for its problems and is capable of automating the entire process from the collection of electric power data to their analysis. In the present study, power data were categorized and analyzed in total three levels, which include the row data level, clustering level, and user interface level. In addition, the investigator identified K, the ideal number of clusters, based on principal component analysis and normal distribution and proposed an altered K-means algorithm to reduce data that would be categorized as ideal points in order to increase the efficiency of clustering.

Development of a data analysis system for preventing school violence based on AI unsupervised learning (AI 비지도 학습 기반의 학교폭력 예방 데이터 분석 시스템 개발)

  • Jung, Soyeong;Ma, Youngji;Koo, Dukhoi
    • Journal of The Korean Association of Information Education
    • /
    • v.25 no.5
    • /
    • pp.741-750
    • /
    • 2021
  • School violence has long been recognized as a social problem, and various efforts have been made to prevent it. In this study, we propose a system that can prevent school violence by analyzing data on the frequency of conversations between students, friendship and preference to be in the same group. This data was quantified using a Likert scale questionnaire, and also grouped into the appropriate number of clusters using the K-means algorithm. Additionally, the homeroom teacher observed the frequency and nature of conversations between students, and targeted specific individuals or groups for counseling and intervention, with the aim of reducing school violence. Data analysis revealed that the teachers' qualitative observations were consistent with the quantified data based on student questionnaires, and therefore applicable as quantitative data towards the identification and understanding of student relationships within the classroom. The study has potential limitations. The data used is subjective and based on peer evaluations which can be inconsistent as the students may use different criteria to evaluate one another. It is expected that this study will help homeroom teachers in their efforts to prevent school violence by understanding the relationships between students within the classroom.

Centroid Neural Network with Bhattacharyya Kernel (Bhattacharyya 커널을 적용한 Centroid Neural Network)

  • Lee, Song-Jae;Park, Dong-Chul
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.32 no.9C
    • /
    • pp.861-866
    • /
    • 2007
  • A clustering algorithm for Gaussian Probability Distribution Function (GPDF) data called Centroid Neural Network with a Bhattacharyya Kernel (BK-CNN) is proposed in this paper. The proposed BK-CNN is based on the unsupervised competitive Centroid Neural Network (CNN) and employs a kernel method for data projection. The kernel method adopted in the proposed BK-CNN is used to project data from the low dimensional input feature space into higher dimensional feature space so as the nonlinear problems associated with input space can be solved linearly in the feature space. In order to cluster the GPDF data, the Bhattacharyya kernel is used to measure the distance between two probability distributions for data projection. With the incorporation of the kernel method, the proposed BK-CNN is capable of dealing with nonlinear separation boundaries and can successfully allocate more code vector in the region that GPDF data are densely distributed. When applied to GPDF data in an image classification probleml, the experiment results show that the proposed BK-CNN algorithm gives 1.7%-4.3% improvements in average classification accuracy over other conventional algorithm such as k-means, Self-Organizing Map (SOM) and CNN algorithms with a Bhattacharyya distance, classed as Bk-Means, B-SOM, B-CNN algorithms.

EEG Signal Classification based on SVM Algorithm (SVM(Support Vector Machine) 알고리즘 기반의 EEG(Electroencephalogram) 신호 분류)

  • Rhee, Sang-Won;Cho, Han-Jin;Chae, Cheol-Joo
    • Journal of the Korea Convergence Society
    • /
    • v.11 no.2
    • /
    • pp.17-22
    • /
    • 2020
  • In this paper, we measured the user's EEG signal and classified the EEG signal using the Support Vector Machine algorithm and measured the accuracy of the signal. An experiment was conducted to measure the user's EEG signals by separating men and women, and a single channel EEG device was used for EEG signal measurements. The results of measuring users' EEG signals using EEG devices were analyzed using R. In addition, data in the study was predicted using a 80:20 ratio between training data and test data by applying a combination of specific vectors with the highest classifying performance of the SVM, and thus the predicted accuracy of 93.2% of the recognition rate. This paper suggested that the user's EEG signal could be recognized at about 93.2 percent, and that it can be performed only by simple linear classification of the SVM algorithm, which can be used variously for biometrics using EEG signals.

Unsupervised Segmentation of Objects using Genetic Algorithms (유전자 알고리즘 기반의 비지도 객체 분할 방법)

  • 김은이;박세현
    • Journal of the Institute of Electronics Engineers of Korea CI
    • /
    • v.41 no.4
    • /
    • pp.9-21
    • /
    • 2004
  • The current paper proposes a genetic algorithm (GA)-based segmentation method that can automatically extract and track moving objects. The proposed method mainly consists of spatial and temporal segmentation; the spatial segmentation divides each frame into regions with accurate boundaries, and the temporal segmentation divides each frame into background and foreground areas. The spatial segmentation is performed using chromosomes that evolve distributed genetic algorithms (DGAs). However, unlike standard DGAs, the chromosomes are initiated from the segmentation result of the previous frame, then only unstable chromosomes corresponding to actual moving object parts are evolved by mating operators. For the temporal segmentation, adaptive thresholding is performed based on the intensity difference between two consecutive frames. The spatial and temporal segmentation results are then combined for object extraction, and tracking is performed using the natural correspondence established by the proposed spatial segmentation method. The main advantages of the proposed method are twofold: First, proposed video segmentation method does not require any a priori information second, the proposed GA-based segmentation method enhances the search efficiency and incorporates a tracking algorithm within its own architecture. These advantages were confirmed by experiments where the proposed method was success fully applied to well-known and natural video sequences.

Automatic Extraction of Focused Video Object from Low Depth-of-Field Image Sequences (낮은 피사계 심도의 동영상에서 포커스 된 비디오 객체의 자동 검출)

  • Park, Jung-Woo;Kim, Chang-Ick
    • Journal of KIISE:Software and Applications
    • /
    • v.33 no.10
    • /
    • pp.851-861
    • /
    • 2006
  • The paper proposes a novel unsupervised video object segmentation algorithm for image sequences with low depth-of-field (DOF), which is a popular photographic technique enabling to represent the intention of photographer by giving a clear focus only on an object-of-interest (OOI). The proposed algorithm largely consists of two modules. The first module automatically extracts OOIs from the first frame by separating sharply focused OOIs from other out-of-focused foreground or background objects. The second module tracks OOIs for the rest of the video sequence, aimed at running the system in real-time, or at least, semi-real-time. The experimental results indicate that the proposed algorithm provides an effective tool, which can be a basis of applications, such as video analysis for virtual reality, immersive video system, photo-realistic video scene generation and video indexing systems.

Efficient Data Clustering using Fast Choice for Number of Clusters (빠른 클러스터 개수 선정을 통한 효율적인 데이터 클러스터링 방법)

  • Kim, Sung-Soo;Kang, Bum-Su
    • Journal of Korean Society of Industrial and Systems Engineering
    • /
    • v.41 no.2
    • /
    • pp.1-8
    • /
    • 2018
  • K-means algorithm is one of the most popular and widely used clustering method because it is easy to implement and very efficient. However, this method has the limitation to be used with fixed number of clusters because of only considering the intra-cluster distance to evaluate the data clustering solutions. Silhouette is useful and stable valid index to decide the data clustering solution with number of clusters to consider the intra and inter cluster distance for unsupervised data. However, this valid index has high computational burden because of considering quality measure for each data object. The objective of this paper is to propose the fast and simple speed-up method to overcome this limitation to use silhouette for the effective large-scale data clustering. In the first step, the proposed method calculates and saves the distance for each data once. In the second step, this distance matrix is used to calculate the relative distance rate ($V_j$) of each data j and this rate is used to choose the suitable number of clusters without much computation time. In the third step, the proposed efficient heuristic algorithm (Group search optimization, GSO, in this paper) can search the global optimum with saving computational capacity with good initial solutions using $V_j$ probabilistically for the data clustering. The performance of our proposed method is validated to save significantly computation time against the original silhouette only using Ruspini, Iris, Wine and Breast cancer in UCI machine learning repository datasets by experiment and analysis. Especially, the performance of our proposed method is much better than previous method for the larger size of data.

Repeated K-means Clustering Algorithm For Radar Sorting (레이더 군집화를 위한 반복 K-means 클러스터링 알고리즘)

  • Dong Hyun ParK;Dong-ho Seo;Jee-hyeon Baek;Won-jin Lee;Dong Eui Chang
    • Journal of the Korea Institute of Military Science and Technology
    • /
    • v.26 no.5
    • /
    • pp.384-391
    • /
    • 2023
  • In modern electronic warfare, a number of radar emitters are in operation, causing radar receivers to receive high-density signal pulses that occur simultaneously. To analyze the radar signals more accurately and identify enemies, the sorting process of high-density radar signals is very important before analysis. Recently, machine learning algorithms, specifically K-means clustering, are the subject of research aimed at improving the accuracy of radar signal sorting. One of the challenges faced by these studies is that the clustering results can vary depending on how the initial points are selected and how many clusters number are set. This paper introduces a repeated K-means clustering algorithm that aims to accurately cluster all data by identifying and addressing false clusters in the radar sorting problem. To verify the performance of the proposed algorithm, experiments are conducted by applying it to simulated signals that are generated by a signal generator.

Comparison of Association Rule Learning and Subgroup Discovery for Mining Traffic Accident Data (교통사고 데이터의 마이닝을 위한 연관규칙 학습기법과 서브그룹 발견기법의 비교)

  • Kim, Jeongmin;Ryu, Kwang Ryel
    • Journal of Intelligence and Information Systems
    • /
    • v.21 no.4
    • /
    • pp.1-16
    • /
    • 2015
  • Traffic accident is one of the major cause of death worldwide for the last several decades. According to the statistics of world health organization, approximately 1.24 million deaths occurred on the world's roads in 2010. In order to reduce future traffic accident, multipronged approaches have been adopted including traffic regulations, injury-reducing technologies, driving training program and so on. Records on traffic accidents are generated and maintained for this purpose. To make these records meaningful and effective, it is necessary to analyze relationship between traffic accident and related factors including vehicle design, road design, weather, driver behavior etc. Insight derived from these analysis can be used for accident prevention approaches. Traffic accident data mining is an activity to find useful knowledges about such relationship that is not well-known and user may interested in it. Many studies about mining accident data have been reported over the past two decades. Most of studies mainly focused on predict risk of accident using accident related factors. Supervised learning methods like decision tree, logistic regression, k-nearest neighbor, neural network are used for these prediction. However, derived prediction model from these algorithms are too complex to understand for human itself because the main purpose of these algorithms are prediction, not explanation of the data. Some of studies use unsupervised clustering algorithm to dividing the data into several groups, but derived group itself is still not easy to understand for human, so it is necessary to do some additional analytic works. Rule based learning methods are adequate when we want to derive comprehensive form of knowledge about the target domain. It derives a set of if-then rules that represent relationship between the target feature with other features. Rules are fairly easy for human to understand its meaning therefore it can help provide insight and comprehensible results for human. Association rule learning methods and subgroup discovery methods are representing rule based learning methods for descriptive task. These two algorithms have been used in a wide range of area from transaction analysis, accident data analysis, detection of statistically significant patient risk groups, discovering key person in social communities and so on. We use both the association rule learning method and the subgroup discovery method to discover useful patterns from a traffic accident dataset consisting of many features including profile of driver, location of accident, types of accident, information of vehicle, violation of regulation and so on. The association rule learning method, which is one of the unsupervised learning methods, searches for frequent item sets from the data and translates them into rules. In contrast, the subgroup discovery method is a kind of supervised learning method that discovers rules of user specified concepts satisfying certain degree of generality and unusualness. Depending on what aspect of the data we are focusing our attention to, we may combine different multiple relevant features of interest to make a synthetic target feature, and give it to the rule learning algorithms. After a set of rules is derived, some postprocessing steps are taken to make the ruleset more compact and easier to understand by removing some uninteresting or redundant rules. We conducted a set of experiments of mining our traffic accident data in both unsupervised mode and supervised mode for comparison of these rule based learning algorithms. Experiments with the traffic accident data reveals that the association rule learning, in its pure unsupervised mode, can discover some hidden relationship among the features. Under supervised learning setting with combinatorial target feature, however, the subgroup discovery method finds good rules much more easily than the association rule learning method that requires a lot of efforts to tune the parameters.

Study on Dimension Reduction algorithm for unsupervised clustering of the DMR's RF-fingerprinting features (무선단말기 RF-fingerprinting 특징의 비지도 클러스터링을 위한 차원축소 알고리즘 연구)

  • Young-Giu Jung;Hak-Chul Shin;Sun-Phil Nah
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.23 no.3
    • /
    • pp.83-89
    • /
    • 2023
  • The clustering technique using RF fingerprint extracts the characteristic signature of the transmitters which are embedded in the transmission waveforms. The output of the RF-Fingerprint feature extraction algorithm for clustering identical DMR(Digital Mobile Radios) is a high-dimensional feature, typically consisting of 512 or more dimensions. While such high-dimensional features may be effective for the classifiers, they are not suitable to be used as inputs for the clustering algorithms. Therefore, this paper proposes a dimension reduction algorithm that effectively reduces the dimensionality of the multidimensional RF-Fingerprint features while maintaining the fingerprinting characteristics of the DMRs. Additionally, it proposes a clustering algorithm that can effectively cluster the reduced dimensions. The proposed clustering algorithm reduces the multi-dimensional RF-Fingerprint features using t-SNE, based on KL Divergence, and performs clustering using Density Peaks Clustering (DPC). The performance analysis of the DMR clustering algorithm uses a dataset of 3000 samples collected from 10 Motorola XiR and 10 Wintech N-Series DMRs. The results of the RF-Fingerprinting-based clustering algorithm showed the formation of 20 clusters, and all performance metrics including Homogeneity, Completeness, and V-measure, demonstrated a performance of 99.4%.