• 제목/요약/키워드: K-means 군집화

Search Result 273, Processing Time 0.028 seconds

Daily Behavior Pattern Extraction using Time-Series Behavioral Data of Dairy Cows and k-Means Clustering (행동 시계열 데이터와 k-평균 군집화를 통한 젖소의 일일 행동패턴 검출)

  • Lee, Seonghun;Park, Gicheol;Park, Jaehwa
    • Journal of Software Assessment and Valuation
    • /
    • v.17 no.1
    • /
    • pp.83-92
    • /
    • 2021
  • There are continuous and tremendous attempts to apply various sensor systems and ICTs into the dairy science for data accumulation and improvement of dairy productivity. However, these only concerns the fields which directly affect to the dairy productivity such as the number of individuals and the milk production amount, while researches on the physiology aspects of dairy cows are not enough which are fundamentally involved in the dairy productivity. This paper proposes the basic approach for extraction of daily behavior pattern from hourly behavioral data of dairy cows to identify the health status and stress. Total four clusters were grouped by k-means clustering and the reasonability was proved by visualization of the data in each groups and the representatives of each groups. We hope that provided results should lead to the further researches on catching abnormalities and disease signs of dairy cows.

A Design of Clustering Classification Systems using Satellite Remote Sensing Images Based on Design Patterns (디자인 패턴을 적용한 위성영상처리를 위한 군집화 분류시스템의 설계)

  • Kim, Dong-Yeon;Kim, Jin-Il
    • The KIPS Transactions:PartB
    • /
    • v.9B no.3
    • /
    • pp.319-326
    • /
    • 2002
  • In this paper, we have designed and implemented cluttering classification systems- unsupervised classifiers-for the processing of satellite remote sensing images. Implemented systems adopt various design patterns which include a factory pattern and a strategy pattern to support various satellite images'formats and to design compatible systems. The clustering systems consist of sequential clustering, K-Means clustering, ISODATA clustering and Fuzzy C-Means clustering classifiers. The systems are tested by using a Landsat TM satellite image for the classification input. As results, these clustering systems are well designed to extract sample data for the classification of satellite images of which there is no previous knowledge. The systems can be provided with real-time base clustering tools, compatibilities and components' reusabilities as well.

Word Cluster-based Mobile Application Categorization (단어 군집 기반 모바일 애플리케이션 범주화)

  • Heo, Jeongman;Park, So-Young
    • Journal of the Korea Society of Computer and Information
    • /
    • v.19 no.3
    • /
    • pp.17-24
    • /
    • 2014
  • In this paper, we propose a mobile application categorization method using word cluster information. Because the mobile application description can be shortly written, the proposed method utilizes the word cluster seeds as well as the words in the mobile application description, as categorization features. For the fragmented categories of the mobile applications, the proposed method generates the word clusters by applying the frequency of word occurrence per category to K-means clustering algorithm. Since the mobile application description can include some paragraphs unrelated to the categorization, such as installation specifications, the proposed method uses some word clusters useful for the categorization. Experiments show that the proposed method improves the recall (5.65%) by using the word cluster information.

Determination of Optimal Cluster Size Using Bootstrap and Genetic Algorithm (붓스트랩 기법과 유전자 알고리즘을 이용한 최적 군집 수 결정)

  • Park, Min-Jae;Jun, Sung-Hae;Oh, Kyung-Whan
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.13 no.1
    • /
    • pp.12-17
    • /
    • 2003
  • Optimal determination of cluster size has an effect on the result of clustering. In K-means algorithm, the difference of clustering performance is large by initial K. But the initial cluster size is determined by prior knowledge or subjectivity in most clustering process. This subjective determination may not be optimal. In this Paper, the genetic algorithm based optimal determination approach of cluster size is proposed for automatic determination of cluster size and performance upgrading of its result. The initial population based on attribution is generated for searching optimal cluster size. The fitness value is defined the inverse of dissimilarity summation. So this is converged to upgraded total performance. The mutation operation is used for local minima problem. Finally, the re-sampling of bootstrapping is used for computational time cost.

Proposal of a Monitoring System to Determine the Possibility of Contact with Confirmed Infectious Diseases Using K-means Clustering Algorithm and Deep Learning Based Crowd Counting (K-평균 군집화 알고리즘 및 딥러닝 기반 군중 집계를 이용한 전염병 확진자 접촉 가능성 여부 판단 모니터링 시스템 제안)

  • Lee, Dongsu;ASHIQUZZAMAN, AKM;Kim, Yeonggwang;Sin, Hye-Ju;Kim, Jinsul
    • Smart Media Journal
    • /
    • v.9 no.3
    • /
    • pp.122-129
    • /
    • 2020
  • The possibility that an asymptotic coronavirus-19 infected person around the world is not aware of his infection and can spread it to people around him is still a very important issue in that the public is not free from anxiety and fear over the spread of the epidemic. In this paper, the K-means clustering algorithm and deep learning-based crowd aggregation were proposed to determine the possibility of contact with confirmed cases of infectious diseases. As a result of 300 iterations of all input learning images, the PSNR value was 21.51, and the final MAE value for the entire data set was 67.984. This means the average absolute error between observations and the average absolute error of fewer than 4,000 people in each CCTV scene, including the calculation of the distance and infection rate from the confirmed patient and the surrounding persons, the net group of potential patient movements, and the prediction of the infection rate.

Detection of inappropriate advertising content on SNS using k-means clustering technique (k-평균 군집화 기법을 활용한 SNS의 부적절한 광고성 콘텐츠 탐지)

  • Lee, Dong-Hwan;Lim, Heui-Seok
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2021.11a
    • /
    • pp.570-573
    • /
    • 2021
  • 오늘날 SNS를 사용하는 사람들이 증가함에 따라, 생성되는 데이터도 많아지고 종류도 매우 다양해졌다. 하지만 유익한 정보만 존재하는 것이 아니라, 부정적, 반사회적, 사행성 등의 부적절한 콘텐츠가 공존한다. 때문에 사용자에 따라 적절한 콘텐츠를 필터링 할 필요성이 증가하고 있다. 따라서 본 연구에서는 SNS Instagram을 대상으로 콘텐츠의 해시태그를 수집하여 데이터화 했다. 또한 k-평균 군집화 기법을 적용하여, 유사한 특성의 콘텐츠들을 군집화하고, 각 군집은 실루엣 계수(Silhouette Coefficient)와 키워드 다양성(Keyword Diversity)을 계산하여 콘텐츠의 적절성을 판단하였다.

A hybrid method to compose an optimal gene set for multi-class classification using mRMR and modified particle swarm optimization (mRMR과 수정된 입자군집화 방법을 이용한 다범주 분류를 위한 최적유전자집단 구성)

  • Lee, Sunho
    • The Korean Journal of Applied Statistics
    • /
    • v.33 no.6
    • /
    • pp.683-696
    • /
    • 2020
  • The aim of this research is to find an optimal gene set that provides highly accurate multi-class classification with a minimum number of genes. A two-stage procedure is proposed: Based on minimum redundancy and maximum relevance (mRMR) framework, several statistics to rank differential expression genes and K-means clustering to reduce redundancy between genes are used for data filtering procedure. And a particle swarm optimization is modified to select a small subset of informative genes. Two well known multi-class microarray data sets, ALL and SRBCT, are analyzed to indicate the effectiveness of this hybrid method.

Determining the number of Clusters in On-Line Document Clustering Algorithm (온라인 문서 군집화에서 군집 수 결정 방법)

  • Jee, Tae-Chang;Lee, Hyun-Jin;Lee, Yill-Byung
    • The KIPS Transactions:PartB
    • /
    • v.14B no.7
    • /
    • pp.513-522
    • /
    • 2007
  • Clustering is to divide given data and automatically find out the hidden meanings in the data. It analyzes data, which are difficult for people to check in detail, and then, makes several clusters consisting of data with similar characteristics. On-Line Document Clustering System, which makes a group of similar documents by use of results of the search engine, is aimed to increase the convenience of information retrieval area. Document clustering is automatically done without human interference, and the number of clusters, which affect the result of clustering, should be decided automatically too. Also, the one of the characteristics of an on-line system is guarantying fast response time. This paper proposed a method of determining the number of clusters automatically by geometrical information. The proposed method composed of two stages. In the first stage, centers of clusters are projected on the low-dimensional plane, and in the second stage, clusters are combined by use of distance of centers of clusters in the low-dimensional plane. As a result of experimenting this method with real data, it was found that clustering performance became better and the response time is suitable to on-line circumstance.

A Comparison of Cluster Analyses and Clustering of Sensory Data on Hanwoo Bulls (군집분석 비교 및 한우 관능평가데이터 군집화)

  • Kim, Jae-Hee;Ko, Yoon-Sil
    • The Korean Journal of Applied Statistics
    • /
    • v.22 no.4
    • /
    • pp.745-758
    • /
    • 2009
  • Cluster analysis is the automated search for groups of related observations in a data set. To group the observations into clusters many techniques has been proposed, and a variety measures aimed at validating the results of a cluster analysis have been suggested. In this paper, we compare complete linkage, Ward's method, K-means and model-based clustering and compute validity measures such as connectivity, Dunn Index and silhouette with simulated data from multivariate distributions. We also select a clustering algorithm and determine the number of clusters of Korean consumers based on Korean consumers' palatability scores for Hanwoo bull in BBQ cooking method.

A Personalized Music Recommendation System with a Time-weighted Clustering (시간 가중치와 가변형 K-means 기법을 이용한 개인화된 음악 추천 시스템)

  • Kim, Jae-Kwang;Yoon, Tae-Bok;Kim, Dong-Moon;Lee, Jee-Hyong
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.19 no.4
    • /
    • pp.504-510
    • /
    • 2009
  • Recently, personalized-adaptive services became the center of interest in the world. However the services about music are not widely diffused out. That is because the analyzing of music information is more difficult than analyzing of text information. In this paper, we propose a music recommendation system which provides personalized services. The system keeps a user's listening list and analyzes it to select pieces of music similar to the user's preference. For analysis, the system extracts properties from the sound wave of music and the time when the user listens to music. Based on the properties, a piece of music is mapped into a point in the property space and the time is converted into the weight of the point. At this time, if we select and analyze the group which is selected by user frequently, we can understand user's taste. However, it is not easy to predict how many groups are formed. To solve this problem, we apply the K-means clustering algorithm to the weighted points. We modified the K-means algorithm so that the number of clusters is dynamically changed. This manner limits a diameter so that we can apply this algorithm effectively when we know the range of data. By this algorithm we can find the center of each group and recommend the similar music with the group. We also consider the time when music is released. When recommending, the system selects pieces of music which is close to and released contemporarily with the user's preference. We perform experiments with one hundred pieces of music. The result shows that our proposed algorithm is effective.