• Title/Summary/Keyword: K-means clustering technique

Search Result 151, Processing Time 0.028 seconds

Image Clustering Using Machine Learning : Study of InceptionV3 with K-means Methods. (머신 러닝을 사용한 이미지 클러스터링: K-means 방법을 사용한 InceptionV3 연구)

  • Nindam, Somsauwt;Lee, Hyo Jong
    • Annual Conference of KIPS
    • /
    • 2021.11a
    • /
    • pp.681-684
    • /
    • 2021
  • In this paper, we study image clustering without labeling using machine learning techniques. We proposed an unsupervised machine learning technique to design an image clustering model that automatically categorizes images into groups. Our experiment focused on inception convolutional neural networks (inception V3) with k-mean methods to cluster images. For this, we collect the public datasets containing Food-K5, Flowers, Handwritten Digit, Cats-dogs, and our dataset Rice Germination, and the owner dataset Palm print. Our experiment can expand into three-part; First, format all the images to un-label and move to whole datasets. Second, load dataset into the inception V3 extraction image features and transferred to the k-mean cluster group hold on six classes. Lastly, evaluate modeling accuracy using the confusion matrix base on precision, recall, F1 to analyze. In this our methods, we can get the results as 1) Handwritten Digit (precision = 1.000, recall = 1.000, F1 = 1.00), 2) Food-K5 (precision = 0.975, recall = 0.945, F1 = 0.96), 3) Palm print (precision = 1.000, recall = 0.999, F1 = 1.00), 4) Cats-dogs (precision = 0.997, recall = 0.475, F1 = 0.64), 5) Flowers (precision = 0.610, recall = 0.982, F1 = 0.75), and our dataset 6) Rice Germination (precision = 0.997, recall = 0.943, F1 = 0.97). Our experiment showed that modeling could get an accuracy rate of 0.8908; the outcomes state that the proposed model is strongest enough to differentiate the different images and classify them into clusters.

Location Recommendation Customize System Using Opinion Mining (오피니언마이닝을 이용한 사용자 맞춤 장소 추천 시스템)

  • Choi, Eun-jeong;Kim, Dong-keun
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.21 no.11
    • /
    • pp.2043-2051
    • /
    • 2017
  • Lately, In addition to the increased interest in the big data field, there is also a growing interest in application fields through the processing of big data. Opinion Mining is a big data processing technique that is widely used in providing personalized service to users. Based on this, in this paper, textual review of users' places is processed by Opinion mining technique and the sentiment of users was analyzed through k-means clustering. The same numerical value is given to users who have a similar category of sentiment classified as a clustering operation. We propose a method to show recommendation contents to users by predicting preference using collaborative filtering recommendation system with assigned numerical values and marking contents with markers on the map in order of places with high predicted value.

Demand-based charging strategy for wireless rechargeable sensor networks

  • Dong, Ying;Wang, Yuhou;Li, Shiyuan;Cui, Mengyao;Wu, Hao
    • ETRI Journal
    • /
    • v.41 no.3
    • /
    • pp.326-336
    • /
    • 2019
  • A wireless power transfer technique can solve the power capacity problem in wireless rechargeable sensor networks (WRSNs). The charging strategy is a wide-spread research problem. In this paper, we propose a demand-based charging strategy (DBCS) for WRSNs. We improved the charging programming in four ways: clustering method, selecting to-be-charged nodes, charging path, and charging schedule. First, we proposed a multipoint improved K-means (MIKmeans) clustering algorithm to balance the energy consumption, which can group nodes based on location, residual energy, and historical contribution. Second, the dynamic selection algorithm for charging nodes (DSACN) was proposed to select on-demand charging nodes. Third, we designed simulated annealing based on performance and efficiency (SABPE) to optimize the charging path for a mobile charging vehicle (MCV) and reduce the charging time. Last, we proposed the DBCS to enhance the efficiency of the MCV. Simulations reveal that the strategy can achieve better performance in terms of reducing the charging path, thus increasing communication effectiveness and residual energy utility.

Comparison of Clustering Techniques in Flight Approach Phase using ADS-B Track Data (공항 근처 ADS-B 항적 자료에서의 클러스터링 기법 비교)

  • Jong-Chan Park;Heon Jin Park
    • The Journal of Bigdata
    • /
    • v.6 no.2
    • /
    • pp.29-38
    • /
    • 2021
  • Deviation of route in aviation safety management is a dangerous factor that can lead to serious accidents. In this study, the anomaly score is calculated by classifying the tracks through clustering and calculating the distance from the cluster center. The study was conducted by extracting tracks within 100 km of the airport from the ADS-B track data received for one year. The wake was vectorized using linear interpolation. Latitude, longitude, and altitude 3D coordinates were used. Through PCA, the dimension was reduced to an axis representing more than 90% of the overall data distribution, and k-means clustering, hierarchical clustering, and PAM techniques were applied. The number of clusters was selected using the silhouette measure, and an abnormality score was calculated by calculating the distance from the cluster center. In this study, we compare the number of clusters for each cluster technique, and evaluate the clustering result through the silhouette measure.

A Study on Fault Diagnosis of Boiler Tube Leakage based on Neural Network using Data Mining Technique in the Thermal Power Plant (데이터마이닝 기법을 이용한 신경망 기반의 화력발전소 보일러 튜브 누설 고장 진단에 관한 연구)

  • Kim, Kyu-Han;Lee, Heung-Seok;Jeong, Hee-Myung;Kim, Hyung-Su;Park, June-Ho
    • The Transactions of The Korean Institute of Electrical Engineers
    • /
    • v.66 no.10
    • /
    • pp.1445-1453
    • /
    • 2017
  • In this paper, we propose a fault detection model based on multi-layer neural network using data mining technique for faults due to boiler tube leakage in a thermal power plant. Major measurement data related to faults are analyzed using statistical methods. Based on the analysis results, the number of input data of the proposed fault detection model is simplified. Then, each input data is clustering with normal data and fault data by applying K-Means algorithm, which is one of the data mining techniques. fault data were trained by the neural network and tested fault detection for boiler tube leakage fault.

An Optimized e-Lecture Video Search and Indexing framework

  • Medida, Lakshmi Haritha;Ramani, Kasarapu
    • International Journal of Computer Science & Network Security
    • /
    • v.21 no.8
    • /
    • pp.87-96
    • /
    • 2021
  • The demand for e-learning through video lectures is rapidly increasing due to its diverse advantages over the traditional learning methods. This led to massive volumes of web-based lecture videos. Indexing and retrieval of a lecture video or a lecture video topic has thus proved to be an exceptionally challenging problem. Many techniques listed by literature were either visual or audio based, but not both. Since the effects of both the visual and audio components are equally important for the content-based indexing and retrieval, the current work is focused on both these components. A framework for automatic topic-based indexing and search depending on the innate content of the lecture videos is presented. The text from the slides is extracted using the proposed Merged Bounding Box (MBB) text detector. The audio component text extraction is done using Google Speech Recognition (GSR) technology. This hybrid approach generates the indexing keywords from the merged transcripts of both the video and audio component extractors. The search within the indexed documents is optimized based on the Naïve Bayes (NB) Classification and K-Means Clustering models. This optimized search retrieves results by searching only the relevant document cluster in the predefined categories and not the whole lecture video corpus. The work is carried out on the dataset generated by assigning categories to the lecture video transcripts gathered from e-learning portals. The performance of search is assessed based on the accuracy and time taken. Further the improved accuracy of the proposed indexing technique is compared with the accepted chain indexing technique.

Clustering Technique Using Relevance of Data and Applied Algorithms (데이터와 적용되는 알고리즘의 연관성을 이용한 클러스터링 기법)

  • Han Woo-Yeon;Nam Mi-Young;Rhee PhillKyu
    • The KIPS Transactions:PartB
    • /
    • v.12B no.5 s.101
    • /
    • pp.577-586
    • /
    • 2005
  • Many algorithms have been proposed for (ace recognition that is one of the most successful applications in image processing, pattern recognition and computer vision fields. Research for what kind of attribute of face that make harder or easier recognizing the target is going on recently. In flus paper, we propose method to improve recognition performance using relevance of face data and applied algorithms, because recognition performance of each algorithm according to facial attribute(illumination and expression) is change. In the experiment, we use n-tuple classifier, PCA and Gabor wavelet as recognition algorithm. And we propose three vectorization methods. First of all, we estimate the fitnesses of three recognition algorithms about each cluster after clustering the test data using k-means algorithm then we compose new clusters by integrating clusters that select same algorithm. We estimate similarity about a new cluster of test data and then we recognize the target using the nearest cluster. As a result, we can observe that the recognition performance has improved than the performance by a single algorithm without clustering.

A Image Contrast Enhancement Technique by Histogram Distribution Alteration Using Clustering Algorithm (클러스터링 알고리듬을 이용한 히스토그램 변경에 의한 영상 대비 향상 기법)

  • 김남진;김용수
    • Proceedings of the Korean Institute of Intelligent Systems Conference
    • /
    • 2003.09b
    • /
    • pp.177-180
    • /
    • 2003
  • 텔레비젼 카메라, 비디콘 카메라(vidicon camera), 디지털 검지기, 스캐너 등 물리적 장치로 획득한 영상은 주위의 밝기로 인하여 어두운 영상을 얻거나 영상장치의 물리적 속성과 영상 전송에 기인하여 영상은 열악한 대비를 가질 수 있다. 본 논문에서는 획득한 저대비 영상을 대비 향상시켜주는 기법을 제안한다. 제안된 기법은 K-means 알고리듬을 사용하여 교차점을 자동으로 선정하는 방법을 사용한다. 이 최적의 교차점을 선정하는 과정은 획득한 영상을 물체와 배경으로 분리하는 두 개의 클래스 문제로 보고 K-means 알고리듬을 적용하였다. 구한 교차점을 사용하여 영상을 양분하여 히스토그램 평활화 방법을 적용하였다. 본 논문에서는 퍼지성 지수(index of fuzziness)를 사용하여 향상의 정도를 측정하였다. 제안된 기법을 저대비 영상에 적용하였으며 그 결과를 히스토그램 평활화 기법의 결과와 비교하였다.

  • PDF

A Image Contrast Enhancement Technique Using Clustering Algorithm (클러스터링 알고리듬을 이용한 영상 대비 향상 기법)

  • 김남진;김용수
    • Proceedings of the Korean Institute of Intelligent Systems Conference
    • /
    • 2004.04a
    • /
    • pp.188-191
    • /
    • 2004
  • 야간에 비디오카메라로 촬영시 열악한 주위 환경과 영상 전송에 기인하여 다양한 잡음에 의하여 왜곡되거나 흐린 저대비(low contrast)영상을 가질 수 있다. 본 논문에서는 획득한 저대비 영상을 대비 향상시켜주는 기법을 제안한다. 동영상 압축표준인 MPEG-2는 인간의 시각 특성상 색차(chrominance)신호보다 밝기(luminance)신호에 더 민감하기 때문에 밝기신호와 색차 신호를 분리하여 압축한다. 밝기신호만을 추출한 후 K-means 알고리듬을 사용하여 교차점을 자동으로 선정하는 방법을 사용하는데, 이 최적의 교차점을 선정하는 과정은 획득한 영상을 물체와 배경으로 분리하는 두 개의 클래스 문제로 보고 K-means 알고리듬을 적용하였고 구한 교차점을 사용하여 영상을 양분하여 히스토그램 평활화 방법을 적용하였다 븐 논문에서는 퍼지성 지수(index of fuzziness)를 사용하여 향상의 정도를 측정하였다. 제안된 기법을 저대비 영상에 적용하였으며 그 결과를 히스토그램 평활화 기법의 결과와 비교하였다.

  • PDF

Parallel k-Modes Algorithm for Spark Framework (스파크 프레임워크를 위한 병렬적 k-Modes 알고리즘)

  • Chung, Jaehwa
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.6 no.10
    • /
    • pp.487-492
    • /
    • 2017
  • Clustering is a technique which is used to measure similarities between data in big data analysis and data mining field. Among various clustering methods, k-Modes algorithm is representatively used for categorical data. To increase the performance of iterative-centric tasks such as k-Modes, a distributed and concurrent framework Spark has been received great attention recently because it overcomes the limitation of Hadoop. Spark provides an environment that can process large amount of data in main memory using the concept of abstract objects called RDD. Spark provides Mllib, a dedicated library for machine learning, but Mllib only includes k-means that can process only continuous data, so there is a limitation that categorical data processing is impossible. In this paper, we design RDD for k-Modes algorithm for categorical data clustering in spark environment and implement an algorithm that can operate effectively. Experiments show that the proposed algorithm increases linearly in the spark environment.