• Title/Summary/Keyword: k-means 군집 알고리즘

Search Result 191, Processing Time 0.031 seconds

A Study on the Application Modeling of SNS Big-data for a Micro-Targeting using K-Means Clustering (K-평균 군집을 이용한 마이크로타겟팅을 위한 SNS 빅데이터 활용 모델링에 관한 연구)

  • Song, Jeo;Lee, Sang Moon
    • Proceedings of the Korean Society of Computer Information Conference
    • /
    • 2015.01a
    • /
    • pp.321-324
    • /
    • 2015
  • 본 논문에서는 SNS에 존재하는 특정 제품과 브랜드 또는 기업에 대한 평가, 의견, 느낌, 사용 후기 등의 소비자 생각을 수집하여 기업에서 향후 신제품 개발이나 시장 진출 및 확대 등의 경영활동에 활용할 수 있도록 SNS 빅데이터를 문석하고, 이를 활용하여 보다 소집단화 되고 개인화 되어가는 Micro-Trend 중심의 마케팅 활동을 할 수 있는 Micro-Targeting 관련 분석 정보를 제공 모델링하는 것을 제안한다. 본 연구에서는 SNS 데이터의 수집, 저장, 분석에 대한 내용을 다루고 있으며, 특히 마이크로타겟팅을 위한 정보를 머하웃(Mahout)의 유클리드 거리 기반의 유사도와 K-평균 군집 알고리즘을 활용하여 구현하고자 하였다.

  • PDF

Comparison of Initial Seeds Methods for K-Means Clustering (K-Means 클러스터링에서 초기 중심 선정 방법 비교)

  • Lee, Shinwon
    • Journal of Internet Computing and Services
    • /
    • v.13 no.6
    • /
    • pp.1-8
    • /
    • 2012
  • Clustering method is divided into hierarchical clustering, partitioning clustering, and more. K-Means algorithm is one of partitioning clustering and is adequate to cluster so many documents rapidly and easily. It has disadvantage that the random initial centers cause different result. So, the better choice is to place them as far away as possible from each other. We propose a new method of selecting initial centers in K-Means clustering. This method uses triangle height for initial centers of clusters. After that, the centers are distributed evenly and that result is more accurate than initial cluster centers selected random. It is time-consuming, but can reduce total clustering time by minimizing the number of allocation and recalculation. We can reduce the time spent on total clustering. Compared with the standard algorithm, average consuming time is reduced 38.4%.

K-Means Clustering Algorithm and CPA based Collinear Multiple Static Obstacle Collision Avoidance for UAVs (K-평균 군집화 알고리즘 및 최근접점 기반 무인항공기용 공선상의 다중 정적 장애물 충돌 회피)

  • Hyeji Kim;Hyeok Kang;Seongbong Lee;Hyeongseok Kim;Dongjin Lee
    • Journal of Advanced Navigation Technology
    • /
    • v.26 no.6
    • /
    • pp.427-433
    • /
    • 2022
  • Obstacle detection, collision recognition, and avoidance technologies are required the collision avoidance technology for UAVs. In this paper, considering collinear multiple static obstacle, we propose an obstacle detection algorithm using LiDAR and a collision recognition and avoidance algorithm based on CPA. Preprocessing is performed to remove the ground from the LiDAR measurement data before obstacle detection. And we detect and classify obstacles in the preprocessed data using the K-means clustering algorithm. Also, we estimate the absolute positions of detected obstacles using relative navigation and correct the estimated positions using a low-pass filter. For collision avoidance with the detected multiple static obstacle, we use a collision recognition and avoidance algorithm based on CPA. Information of obstacles to be avoided is updated using distance between each obstacle, and collision recognition and avoidance are performed through the updated obstacles information. Finally, through obstacle location estimation, collision recognition, and collision avoidance result analysis in the Gazebo simulation environment, we verified that collision avoidance is performed successfully.

Class Imbalance Resolution Method and Classification Algorithm Suggesting Based on Dataset Type Segmentation (데이터셋 유형 분류를 통한 클래스 불균형 해소 방법 및 분류 알고리즘 추천)

  • Kim, Jeonghun;Kwahk, Kee-Young
    • Journal of Intelligence and Information Systems
    • /
    • v.28 no.3
    • /
    • pp.23-43
    • /
    • 2022
  • In order to apply AI (Artificial Intelligence) in various industries, interest in algorithm selection is increasing. Algorithm selection is largely determined by the experience of a data scientist. However, in the case of an inexperienced data scientist, an algorithm is selected through meta-learning based on dataset characteristics. However, since the selection process is a black box, it was not possible to know on what basis the existing algorithm recommendation was derived. Accordingly, this study uses k-means cluster analysis to classify types according to data set characteristics, and to explore suitable classification algorithms and methods for resolving class imbalance. As a result of this study, four types were derived, and an appropriate class imbalance resolution method and classification algorithm were recommended according to the data set type.

Design of Dynamic Buffer Assignment and Message model for Large-scale Process Monitoring of Personalized Health Data (개인화된 건강 데이터의 대량 처리 모니터링을 위한 메시지 모델 및 동적 버퍼 할당 설계)

  • Jeon, Young-Jun;Hwang, Hee-Joung
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.15 no.6
    • /
    • pp.187-193
    • /
    • 2015
  • The ICT healing platform sets a couple of goals including preventing chronic diseases and sending out early disease warnings based on personal information such as bio-signals and life habits. The 2-step open system(TOS) had a relay designed between the healing platform and the storage of personal health data. It also took into account a publish/subscribe(pub/sub) service based on large-scale connections to transmit(monitor) the data processing process in real time. In the early design of TOS pub/sub, however, the same buffers were allocated regardless of connection idling and type of message in order to encode connection messages into a deflate algorithm. Proposed in this study, the dynamic buffer allocation was performed as follows: the message transmission type of each connection was first put to queuing; each queue was extracted for its feature, computed, and converted into vector through tf-idf, then being entered into a k-means cluster and forming a cluster; connections categorized under a certain cluster would re-allocate the resources according to the resource table of the cluster; the centroid of each cluster would select a queuing pattern to represent the cluster in advance and present it as a resource reference table(encoding efficiency by the buffer sizes); and the proposed design would perform trade-off between the calculation resources and the network bandwidth for cluster and feature calculations to efficiently allocate the encoding buffer resources of TOS to the network connections, thus contributing to the increased tps(number of real-time data processing and monitoring connections per unit hour) of TOS.

A Study for Personalized resource Allocation Method by Workload Clustering Analysis in the Container-based Web VDI System (컨테이너 기반 웹 VDI 시스템에서 군집 분석을 통한 사용자 워크로드 맞춤형 자원 할당 방법 연구)

  • Baek, Hyeon-Ji;Huh, Eui-Nam
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2017.04a
    • /
    • pp.50-52
    • /
    • 2017
  • 클라우드 컴퓨팅 시장의 지속적 성장과 가상화의 인기로 세계적 VDI 시장은 꾸준한 성장률을 보이고 있다. 또한 의료, 교육, 금융 등의 폭넓은 분야에서 VDI 서비스가 활용될 전망이다. 하지만 기존 VDI 서비스는 고정적인 자원 할당으로 사용자 워크로드 맞춤형 자원이 제공되지 못하는 문제점이 있다. 따라서 본 논문에서는 기존 VDI에 비해 실행속도가 빠른 컨테이너의 장점을 살려 VDI를 컨테이너화 하고, 사용자 워크로드 맞춤형으로 자원을 분배하기 위해 VDI 컨테이너 자원 사용량 데이터로 K-means 알고리즘을 통한 군집 분석 기반의 워크로드 분류 방법을 제시하였다.

Blind Channel Estimation through Clustering in Backscatter Communication Systems (후방산란 통신시스템에서 군집화를 통한 블라인드 채널 추정)

  • Kim, Soo-Hyun;Lee, Donggu;Sun, Young-Ghyu;Sim, Issac;Hwang, Yu-Min;Shin, Yoan;Kim, Dong-In;Kim, Jin-Young
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.20 no.2
    • /
    • pp.81-86
    • /
    • 2020
  • Ambient backscatter communication has a drawback in which the transmission power is limited because the data is transmitted using the ambient RF signal. In order to improve transmission efficiency between transceiver, a channel estimator capable of estimating channel state at a receiver is needed. In this paper, we consider the K-means algorithm to improve the performance of the channel estimator based on EM algorithm. The simulation uses MSE as a performance parameter to verify the performance of the proposed channel estimator. The initial value setting through K-means shows improved performance compared to the channel estimation method using the general EM algorithm.

A Study on Comparison of Clustering Algorithm-based Methods for Acquiring Training Sets for Social Image Classification (소셜 이미지 분류를 위한 클러스터링 알고리즘 기반 트레이닝 집합 획득 기법의 비교)

  • Jeong, Jin-Woo;Lee, Dong-Ho
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2011.04a
    • /
    • pp.1294-1297
    • /
    • 2011
  • 최근, Flickr, YouTube 와 같은 사용자 참여형 미디어 공유 및 검색 사이트가 폭발적으로 증가하면서, 이를 멀티미디어 정보 검색 서비스에 효과적으로 활용하기 위한 다양한 연구들이 시도되고 있다. 특히, 이미지에 할당되어 있는 태그를 이용하여 이미지를 효과적으로 검색하기 위한 연구가 활발히 진행 중이다. 그러나 사용자들에 의해 제공되는 소셜 이미지들은 매우 다양한 범위와 주제를 가지고 있기 때문에, 소셜 이미지들의 분류 및 태그 할당을 위한 트레이닝 집합의 획득이 쉽지 않다는 한계점을 가지고 있다. 본 논문에서는 데이터 군집화를 위한 클러스터링 알고리즘들 중 K-Means, K-Medoids, Affinity Propagation 을 활용하여 소셜 이미지 집합으로부터 트레이닝 집합을 획득하기 위한 방법들을 살펴 본다. 또한, 각 알고리즘으로부터 획득한 트레이닝 집합을 이용하여 소셜 이미지를 분류한 결과를 비교 분석한다.

Fast VQ Codebook Design by Sucessively Bisectioning of Principle Axis (주축의 연속적 분할을 통한 고속 벡터 양자화 코드북 설계)

  • Kang, Dae-Seong;Seo, Seok-Bae;Kim, Dai-Jin
    • Journal of KIISE:Software and Applications
    • /
    • v.27 no.4
    • /
    • pp.422-431
    • /
    • 2000
  • This paper proposes a new codebook generation method, called a PCA-Based VQ, that incorporates the PCA (Principal Component Analysis) technique into VQ (Vector Quantization) codebook design. The PCA technique reduces the data dimensions by transforming input image vectors into the feature vectors. The cluster of feature vectors in the transformed domain is bisectioned into two subclusters by an optimally chosen partitioning hyperplane. We expedite the searching of the optimal partitioning hyperplane that is the most time consuming process by considering that (1) the optimal partitioning hyperplane is perpendicular to the first principal axis of the feature vectors, (2) it is located on the equilibrium point of the left and right cluster's distortions, and (3) the left and right cluster's distortions can be adjusted incrementally. This principal axis bisectioning is successively performed on the cluster whose difference of distortion between before and after bisection is the maximum among the existing clusters until the total distortion of clusters becomes as small as the desired level. Simulation results show that the proposed PCA-based VQ method is promising because its reconstruction performance is as good as that of the SOFM (Self-Organizing Feature Maps) method and its codebook generation is as fast as that of the K-means method.

  • PDF

K-means를 활용한 항로표지 센서 데이터 군집화

  • 김두환;성상하;최형림
    • Proceedings of the Korean Institute of Navigation and Port Research Conference
    • /
    • 2022.06a
    • /
    • pp.54-55
    • /
    • 2022
  • 해양에 설치된 항로표지는 선박의 안전한 항해를 위해 위치 정보를 제공하고, 항로표지에 부착된 센서를 통해 다양한 해양 정보를 수집하고 있다. 하지만 항로표지는 육지와 멀리 떨어진 해상이라는 특수한 작업환경으로 인해 항로표지 유지보수를 위한 많은 시간과 비용이 발생하게 된다. 현재 항로표지에 부착된 센서를 통해 다양한 정보를 수집하고 있지만, 정상 데이터와 비정상 데이터를 구분할 수 있는 정보가 없어 고장진단에 어려움이 있다. 따라서 본 연구에서는 항로표지 센서 고장진단을 위해 머신러닝 비지도학습 중 하나인 K-means 알고리즘을 활용하여 정상 데이터와 비정상 데이터로 군집화하였으며, 분류가 잘 되는 것을 확인할 수 있었다. 향후 연구방향으로는 2개의 클러스터로 구분된 데이터가 실제로 정상 데이터인지, 비정상 데이터인지에 대한 비교·분석이 필요하다.

  • PDF