• Title/Summary/Keyword: K-means 군집화

Search Result 274, Processing Time 0.028 seconds

Non-Keyword Model for the Improvement of Vocabulary Independent Keyword Spotting System (가변어휘 핵심어 검출 성능 향상을 위한 비핵심어 모델)

  • Kim, Min-Je;Lee, Jung-Chul
    • The Journal of the Acoustical Society of Korea
    • /
    • v.25 no.7
    • /
    • pp.319-324
    • /
    • 2006
  • We Propose two new methods for non-keyword modeling to improve the performance of speaker- and vocabulary-independent keyword spotting system. The first method is decision tree clustering of monophone at the state level instead of monophone clustering method based on K-means algorithm. The second method is multi-state multiple mixture modeling at the syllable level rather than single state multiple mixture model for the non-keyword. To evaluate our method, we used the ETRI speech DB for training and keyword spotting test (closed test) . We also conduct an open test to spot 100 keywords with 400 sentences uttered by 4 speakers in an of fce environment. The experimental results showed that the decision tree-based state clustering method improve 28%/29% (closed/open test) than the monophone clustering method based K-means algorithm in keyword spotting. And multi-state non-keyword modeling at the syllable level improve 22%/2% (closed/open test) than single state model for the non-keyword. These results show that two proposed methods achieve the improvement of keyword spotting performance.

Analysis of Characteristics of Clusters of Middle School Students Using K-Means Cluster Analysis (K-평균 군집분석을 활용한 중학생의 군집화 및 특성 분석)

  • Jaebong, Lee
    • Journal of The Korean Association For Science Education
    • /
    • v.42 no.6
    • /
    • pp.611-619
    • /
    • 2022
  • The purpose of this study is to explore the possibility of applying big data analysis to provide appropriate feedback to students using evaluation data in science education at a time when interest in educational data mining has recently increased in education. In this study, we use the evaluation data of 2,576 students who took 24 questions of the national assessment of educational achievement. And we use K-means cluster analysis as a method of unsupervised machine learning for clustering. As a result of clustering, students were divided into six clusters. The middle-ranking students are divided into various clusters when compared to upper or lower ranks. According to the results of the cluster analysis, the most important factor influencing clusterization is academic achievement, and each cluster shows different characteristics in terms of content domains, subject competencies, and affective characteristics. Learning motivation is important among the affective domains in the lower-ranking achievement cluster, and scientific inquiry and problem-solving competency, as well as scientific communication competency have a major influence in terms of subject competencies. In the content domain, achievement of motion and energy and matter are important factors to distinguish the characteristics of the cluster. As a result, we can provide students with customized feedback for learning based on the characteristics of each cluster. We discuss implications of these results for science education, such as the possibility of using this study results, balanced learning by content domains, enhancement of subject competency, and improvement of scientific attitude.

Clustering and classification of residential noise sources in apartment buildings based on machine learning using spectral and temporal characteristics (주파수 및 시간 특성을 활용한 머신러닝 기반 공동주택 주거소음의 군집화 및 분류)

  • Jeong-hun Kim;Song-mi Lee;Su-hong Kim;Eun-sung Song;Jong-kwan Ryu
    • The Journal of the Acoustical Society of Korea
    • /
    • v.42 no.6
    • /
    • pp.603-616
    • /
    • 2023
  • In this study, machine learning-based clustering and classification of residential noise in apartment buildings was conducted using frequency and temporal characteristics. First, a residential noise source dataset was constructed . The residential noise source dataset was consisted of floor impact, airborne, plumbing and equipment noise, environmental, and construction noise. The clustering of residential noise was performed by K-Means clustering method. For frequency characteristics, Leq and Lmax values were derived for 1/1 and 1/3 octave band for each sound source. For temporal characteristics, Leq values were derived at every 6 ms through sound pressure level analysis for 5 s. The number of k in K-Means clustering method was determined through the silhouette coefficient and elbow method. The clustering of residential noise source by frequency characteristic resulted in three clusters for both Leq and Lmax analysis. Temporal characteristic clustered residential noise source into 9 clusters for Leq and 11 clusters for Lmax. Clustering by frequency characteristic clustered according to the proportion of low frequency band. Then, to utilize the clustering results, the residential noise source was classified using three kinds of machine learning. The results of the residential noise classification showed the highest accuracy and f1-score for data labeled with Leq values in 1/3 octave bands, and the highest accuracy and f1-score for classifying residential noise sources with an Artificial Neural Network (ANN) model using both frequency and temporal features, with 93 % accuracy and 92 % f1-score.

Selection and Evaluation of Vertiports of Urban Air Mobility (UAM) in the Seoul Metropolitan Area using the K-means Algorithm (K-means 알고리즘을 활용한 수도권 도심항공 모빌리티(UAM) 수직이착륙장 위치 선정 및 평가)

  • Jeong, Jun-Young;Hwang, Ho-Yon
    • Journal of Advanced Navigation Technology
    • /
    • v.25 no.1
    • /
    • pp.8-16
    • /
    • 2021
  • In this paper, locations of vertiports were selected and evaluated to operate urban air mobility (UAM) in the Seoul metropolitan area. Demand data were analyzed using the data from the survey of commuting population and were marked on a map using MATLAB. To cluster the data, the K-means algorithm function built in MATLAB was used to identify the center of the cluster to as the location of vertiports, and using the silhouette technique, the accuracy and reliability of the clustering were evaluated. The locations of the selected vertiports were also identified using satellite maps to ensure that the locations of the selected vertiports were suitable for the actual vertiport location, and, if the location was not appropriate, final vertiports were selected through the repositioning process.

Recruiting Ranking Techniques Based on Hybrid Using Clustering (군집화를 이용한 하이브리드 기반 채용검색 랭킹 기법)

  • Cho, Bo-Yun
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2012.11a
    • /
    • pp.1587-1590
    • /
    • 2012
  • 인터넷의 활용이 보편화 됨에 따라 정보의 양은 급격히 늘어나고 있다. 이에 취업을 희망하는 구직자의 경우 IR 로부터 원하는 정보를 검색하기 위해 과거보다 더 많은 시간과 노력이 필요하게 되었다. 이에 본 논문에서는 TF(Term Frequency)기법을 통해 문서를 추출하고 추출된 문서의 Doc_ID 빈도수를 기준으로 한 내용기반과 군집기법을 혼합한 하이브리드 검색 시스템을 제안한다. 구직자들이 클릭한 취업정보들의 링크번호들을 K-means 알고리즘을 이용하여 군집화를 한다. 생성된 군집들은 각기 하나의 문서로 가정하고, 기존 문서과 더불어 검색 주제와 연관성을 갖고 있는 문서들을 동적비율로 검색 랭킹 하는 방식이다. 기존의 IR 기술과의 비교 실험을 통해 성능을 평가하였다. 실험결과 본 논문에서 제안한 방법이 기존의 방법보다 우수함을 확인할 수 있었다

Energy Efficient Cluster Routing Method Using Machine Learning in WSN (무선 센서 네트워크에서의 머신러닝을 활용한 에너지 효율적인 클러스터 라우팅 방안 연구)

  • Mi-Young, Kang
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.27 no.1
    • /
    • pp.124-130
    • /
    • 2023
  • In this paper, we intend to improve the network lifetime by improving the energy efficiency of sensor nodes in a wireless sensor network by utilizing machine learning using K-means clustering algorithm. A wireless sensor network is a wireless network composed of physical devices including batteries as physical sensors. Due to the characteristics of sensor nodes, all resources must be efficiently used to minimize energy consumption to maximize network lifetime. A cluster based approach is used to manage groups of relatively large numbers of nodes. In the proposed protocol, by improving the existing LEACH algorithm, we propose a clustering algorithm that selects a cluster head using a cluster based approach and a location based approach. The performance results to be improved were measured using Matlab simulation. Through the experimental results, K-means clustering was applied to the energy efficiency part. By utilizing K-means, it is confirmed that energy efficiency is improved and the lifetime of the entire network is extended.

Development of Mining model through reproducibility assessment in Adverse drug event surveillance system (약물부작용감시시스템에서 재현성 평가를 통한 마이닝 모델 개발)

  • Lee, Young-Ho;Yoon, Young-Mi;Lee, Byung-Mun;Hwang, Hee-Joung;Kang, Un-Gu
    • Journal of the Korea Society of Computer and Information
    • /
    • v.14 no.3
    • /
    • pp.183-192
    • /
    • 2009
  • ADESS(Adverse drug event surveillance system) is the system which distinguishes adverse drug events using adverse drug signals. This system shows superior effectiveness in adverse drug surveillance than current methods such as volunteer reporting or char review. In this study, we built clinical data mart(CDM) for the development of ADESS. This CDM could obtain data reliability by applying data quality management and the most suitable clustering number(n=4) was gained through the reproducibility assessment in unsupervised learning techniques of knowledge discovery. As the result of analysis, by applying the clustering number(N=4) K-means, Kohonen, and two-step clustering models were produced and we confirmed that the K-means algorithm makes the most closest clustering to the result of adverse drug events.

Pruning Methodology for Reducing the Size of Speech DB for Corpus-based TTS Systems (코퍼스 기반 음성합성기의 데이터베이스 축소 방법)

  • 최승호;엄기완;강상기;김진영
    • The Journal of the Acoustical Society of Korea
    • /
    • v.22 no.8
    • /
    • pp.703-710
    • /
    • 2003
  • Because of their human-like synthesized speech quality, recently Corpus-Based Text-To-Speech(CB-TTS) have been actively studied worldwide. However, due to their large size speech database (DB), their application is very restricted. In this paper we propose and evaluate three DB reduction algorithms to which are designed to solve the above drawback. The first method is based on a K-means clustering approach, which selects k-representatives among multiple instances. The second method is keeping only those unit instances that are selected during synthesis, using a domain-restricted text as input to the synthesizer. The third method is a kind of hybrid approach of the above two methods and is using a large text as input in the system. After synthesizing the given sentences, the used unit instances and their occurrence information is extracted. As next step a modified K-means clustering is applied, which takes into account also the occurrence information of the selected unit instances, Finally we compare three pruning methods by evaluating the synthesized speech quality for the similar DB reduction rate, Based on perceptual listening tests, we concluded that the last method shows the best performance among three algorithms. More than this, the results show that the last method is able to reduce DB size without speech quality looses.

Refining Initial Seeds using Max Average Distance for K-Means Clustering (K-Means 클러스터링 성능 향상을 위한 최대평균거리 기반 초기값 설정)

  • Lee, Shin-Won;Lee, Won-Hee
    • Journal of Internet Computing and Services
    • /
    • v.12 no.2
    • /
    • pp.103-111
    • /
    • 2011
  • Clustering methods is divided into hierarchical clustering, partitioning clustering, and more. If the amount of documents is huge, it takes too much time to cluster them in hierarchical clustering. In this paper we deal with K-Means algorithm that is one of partitioning clustering and is adequate to cluster so many documents rapidly and easily. We propose the new method of selecting initial seeds in K-Means algorithm. In this method, the initial seeds have been selected that are positioned as far away from each other as possible.

Soft Island Model based on K-means Clustering (K-Mean 군집을 기반으로 하는 소프트 아일랜드 모델)

  • Ichinkhorloo, Gotovsuren;Shin, Seong-Yoon;Lee, Hyun-Chang
    • Proceedings of the Korean Society of Computer Information Conference
    • /
    • 2020.07a
    • /
    • pp.561-562
    • /
    • 2020
  • 연구에서, k-Mean 군집화에 기반 한 다중 집단이 다수의 전략의 앙상블을 실현하기 위해 제안되어, 모집단의 유사한 개체가 동일한 돌연변이 전략을 구현하는 새로운 DE 변이체, 즉 KSDE를 생성하고 유사하지 않은 하위 집단 소프트 아일랜드 모델(SIM)을 통해 정보를 마이그레이션 한다.

  • PDF