• Title/Summary/Keyword: K-means 군집화

Search Result 274, Processing Time 0.023 seconds

Influence of Self-driving Data Set Partition on Detection Performance Using YOLOv4 Network (YOLOv4 네트워크를 이용한 자동운전 데이터 분할이 검출성능에 미치는 영향)

  • Wang, Xufei;Chen, Le;Li, Qiutan;Son, Jinku;Ding, Xilong;Song, Jeongyoung
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.20 no.6
    • /
    • pp.157-165
    • /
    • 2020
  • Aiming at the development of neural network and self-driving data set, it is also an idea to improve the performance of network model to detect moving objects by dividing the data set. In Darknet network framework, the YOLOv4 (You Only Look Once v4) network model was used to train and test Udacity data set. According to 7 proportions of the Udacity data set, it was divided into three subsets including training set, validation set and test set. K-means++ algorithm was used to conduct dimensional clustering of object boxes in 7 groups. By adjusting the super parameters of YOLOv4 network for training, Optimal model parameters for 7 groups were obtained respectively. These model parameters were used to detect and compare 7 test sets respectively. The experimental results showed that YOLOv4 can effectively detect the large, medium and small moving objects represented by Truck, Car and Pedestrian in the Udacity data set. When the ratio of training set, validation set and test set is 7:1.5:1.5, the optimal model parameters of the YOLOv4 have highest detection performance. The values show mAP50 reaching 80.89%, mAP75 reaching 47.08%, and the detection speed reaching 10.56 FPS.

The Effect of the Number of Phoneme Clusters on Speech Recognition (음성 인식에서 음소 클러스터 수의 효과)

  • Lee, Chang-Young
    • The Journal of the Korea institute of electronic communication sciences
    • /
    • v.9 no.11
    • /
    • pp.1221-1226
    • /
    • 2014
  • In an effort to improve the efficiency of the speech recognition, we investigate the effect of the number of phoneme clusters. For this purpose, codebooks of varied number of phoneme clusters are prepared by modified k-means clustering algorithm. The subsequent processing is fuzzy vector quantization (FVQ) and hidden Markov model (HMM) for speech recognition test. The result shows that there are two distinct regimes. For large number of phoneme clusters, the recognition performance is roughly independent of it. For small number of phoneme clusters, however, the recognition error rate increases nonlinearly as it is decreased. From numerical calculation, it is found that this nonlinear regime might be modeled by a power law function. The result also shows that about 166 phoneme clusters would be the optimal number for recognition of 300 isolated words. This amounts to roughly 3 variations per phoneme.

Analyzing the Co-occurrence of Endangered Brackish-Water Snails with Other Species in Ecosystems Using Association Rule Learning and Clustering Analysis (연관 규칙 학습과 군집분석을 활용한 멸종위기 기수갈고둥과 생태계 내 종 간 연관성 분석)

  • Sung-Ho Lim;Yuno Do
    • Korean Journal of Ecology and Environment
    • /
    • v.57 no.2
    • /
    • pp.83-91
    • /
    • 2024
  • This study utilizes association rule learning and clustering analysis to explore the co-occurrence and relationships within ecosystems, focusing on the endangered brackish-water snail Clithon retropictum, classified as Class II endangered wildlife in Korea. The goal is to analyze co-occurrence patterns between brackish-water snails and other species to better understand their roles within the ecosystem. By examining co-occurrence patterns and relationships among species in large datasets, association rule learning aids in identifying significant relationships. Meanwhile, K-means and hierarchical clustering analyses are employed to assess ecological similarities and differences among species, facilitating their classification based on ecological characteristics. The findings reveal a significant level of relationship and co-occurrence between brackish-water snails and other species. This research underscores the importance of understanding these relationships for the conservation of endangered species like C. retropictum and for developing effective ecosystem management strategies. By emphasizing the role of a data-driven approach, this study contributes to advancing our knowledge on biodiversity conservation and ecosystem health, proposing new directions for future research in ecosystem management and conservation strategies.

Outlier Detection By Clustering-Based Ensemble Model Construction (클러스터링 기반 앙상블 모델 구성을 이용한 이상치 탐지)

  • Park, Cheong Hee;Kim, Taegong;Kim, Jiil;Choi, Semok;Lee, Gyeong-Hoon
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.7 no.11
    • /
    • pp.435-442
    • /
    • 2018
  • Outlier detection means to detect data samples that deviate significantly from the distribution of normal data. Most outlier detection methods calculate an outlier score that indicates the extent to which a data sample is out of normal state and determine it to be an outlier when its outlier score is above a given threshold. However, since the range of an outlier score is different for each data and the outliers exist at a smaller ratio than the normal data, it is very difficult to determine the threshold value for an outlier score. Further, in an actual situation, it is not easy to acquire data including a sufficient amount of outliers available for learning. In this paper, we propose a clustering-based outlier detection method by constructing a model representing a normal data region using only normal data and performing binary classification of outliers and normal data for new data samples. Then, by dividing the given normal data into chunks, and constructing a clustering model for each chunk, we expand it to the ensemble method combining the decision by the models and apply it to the streaming data with dynamic changes. Experimental results using real data and artificial data show high performance of the proposed method.

Effect of food-related lifestyle, and SNS use and recommended information utilization on dining out (혼밥 및 외식소비 관련 식생활라이프스타일과 SNS 이용 및 추천정보활용의 영향)

  • Jin A Jang
    • Journal of Nutrition and Health
    • /
    • v.56 no.5
    • /
    • pp.573-588
    • /
    • 2023
  • Purpose: This study aimed to examine social networking service (SNS) use and recommended information utilization (SURU) according to the food-related lifestyles (FRLs) of consumers and analyze how the interaction between the FRL and SURU affects the practice of eating alone and visiting restaurants. Methods: Data on 4,624 adults in their 20s to 50s were collected from the 2021 Consumer Behavior Survey for Food. Statistical methods included factor analysis, K-means cluster analysis, the complex samples general linear model, the complex samples Rao-Scott χ2 test, and the general linear model. Results: The following three factors were extracted from the FRL data: Convenience pursuit, rational consumption pursuit, and gastronomy pursuit, and the subjects were classified into three groups, namely the rational consumption, convenient gastronomy, and smart gourmet groups. An examination of the difference in SURU according to the FRL showed that the smart gourmet group had the highest score. The result of analyzing the effects of the FRL and SURU on eating alone revealed that both the main effect and the interaction effect were significant (p < 0.01, p < 0.001). The higher the SURU, the higher the frequency of eating alone in the convenience pursuit, and gastronomy pursuit groups. The main and interaction effects of the FRL and SURU on the frequency of eating out were also significant (p < 0.01, p < 0.001). In all the FRL groups, the higher the SURU level, the higher the frequency of visiting restaurants. Specifically, the two groups with convenience and gastronomic tendencies showed a steeper increase. Conclusion: This study provides important basic data for research on consumer behavior related to food SNS, market segmentation of restaurant consumers, and development of marketing strategies using SNS in the future.

Community Analysis Based on Functional Feeding Groups of Benthic Macro Invertebrate in Wangpi-cheon (왕피천 저서성 대형무척추동물의 섭식기능군을 이용한 군집분석)

  • Park, Young-Jun;Lim, Heon-Myong;Kim, Ki-Dong;Cho, Young-Ho;Nam, Sang-Ho;Kwon, Oh-Seok
    • Korean Journal of Environment and Ecology
    • /
    • v.24 no.5
    • /
    • pp.556-565
    • /
    • 2010
  • Community analysis based on functional feeding groups of benthic macro invertebrates at Wangpi-cheon was assessed with the result of four field survey from October, 2007 to May, 2008. A total 138 species of benthic macro invertebrates in 58 families, 16 orders, 6 classes, and 4 phyla were collected during the field surveys. The result of EPT index value showed high value of 61.59% and it means that the stream ecosystem of Wangpi-cheon is very clear and healthy. In this study the functional feeding groups in Wangpi-cheon were divided into two groups. First, Scrapers and Collectors-Gathering group which is normally dominant in midstream showed higher dominance in main stream than tributary. Second, shredders group showed higher dominance in tributary than main stream like as the general characteristics of upriver. With the result of cluster analysis based on the similarity index, the study areas could be grouped into a natural area(A group) and an artificial disturbance area(B group; site 8 and 11) where embankment, bank and levee had been built near by. And also, the natural area(A group) was classified into two groups which had the characteristic of main(site 1, 2, 3, 4 and 7) stream and tributary(site 5, 6, 9 and 10) respectively.

Recognition of Flat Type Signboard using Deep Learning (딥러닝을 이용한 판류형 간판의 인식)

  • Kwon, Sang Il;Kim, Eui Myoung
    • Journal of the Korean Society of Surveying, Geodesy, Photogrammetry and Cartography
    • /
    • v.37 no.4
    • /
    • pp.219-231
    • /
    • 2019
  • The specifications of signboards are set for each type of signboards, but the shape and size of the signboard actually installed are not uniform. In addition, because the colors of the signboard are not defined, so various colors are applied to the signboard. Methods for recognizing signboards can be thought of as similar methods of recognizing road signs and license plates, but due to the nature of the signboards, there are limitations in that the signboards can not be recognized in a way similar to road signs and license plates. In this study, we proposed a methodology for recognizing plate-type signboards, which are the main targets of illegal and old signboards, and automatically extracting areas of signboards, using the deep learning-based Faster R-CNN algorithm. The process of recognizing flat type signboards through signboard images captured by using smartphone cameras is divided into two sequences. First, the type of signboard was recognized using deep learning to recognize flat type signboards in various types of signboard images, and the result showed an accuracy of about 71%. Next, when the boundary recognition algorithm for the signboards was applied to recognize the boundary area of the flat type signboard, the boundary of flat type signboard was recognized with an accuracy of 85%.

Identification of Employee Experience Factors and Their Influence on Job Satisfaction (직원경험 요인 파악 및 직무 만족도에 끼치는 영향력 분석)

  • Juhyeon Lee;So-Hyun Lee;Hee-Woong Kim
    • Information Systems Review
    • /
    • v.25 no.2
    • /
    • pp.181-203
    • /
    • 2023
  • With the fierce competition of companies for the attraction of outstanding individuals, job satisfaction of employees has been of importance. In this circumstance, many companies try to invest in job satisfaction improvement by finding employees' everyday experiences and difficulties. However, due to a lack of understanding of the employee experience, their investments are not paying off. This study examined the relationship between employee experience and job satisfaction using employee reviews and company ratings from Glassdoor, one of the largest employee communities worldwide. We use text mining techniques such as K-means clustering and LDA topic-based sentiment analysis to extract key experience factors by job level, and DistilBERT sentiment analysis to measure the sentiment score of each employee experience factor. The drawn employee experience factors and each sentiment score were analyzed quantitatively, and thereby relations between each employee experience factor and job satisfaction were analyzed. As a result, this study found that there is a significant difference between the workplace experiences of managers and general employees. In addition, employee experiences that affect job satisfaction also differed between positions, such as customer relationship and autonomy, which did not affect the satisfaction of managers. This study used text mining and quantitative modeling method based on theory of work adjustment so as to find and verify main factors of employee experience, and thus expanded research literature. In addition, the results of this study are applicable to the personnel management strategy for improving employees' job satisfaction, and are expected to improve corporate productivity ultimately.

Mixed-effects zero-inflated Poisson regression for analyzing the spread of COVID-19 in Daejeon (혼합효과 영과잉 포아송 회귀모형을 이용한 대전광역시 코로나 발생 동향 분석)

  • Kim, Gwanghee;Lee, Eunjee
    • The Korean Journal of Applied Statistics
    • /
    • v.34 no.3
    • /
    • pp.375-388
    • /
    • 2021
  • This paper aims to help prevent the spread of COVID-19 by analyzing confirmed cases of COVID-19 in Daejeon. A high volume of visitors, downtown areas, and psychological fatigue with prolonged social distancing were considered as risk factors associated with the spread of COVID-19. We considered the weekly confirmed cases in each administrative district as a response variable. Explanatory variables were the number of passengers getting off at a bus station in each administrative district and the elapsed time since the Korean government had imposed distancing in daily life. We employed a mixed-effects zero-inflated Poisson regression model because the number of cases was repeatedly measured with excess zero-count data. We conducted k-means clustering to identify three groups of administrative districts having different characteristics in terms of the number of bars, the population size, and the distance to the closest college. Considering that the number of confirmed cases might vary depending on districts' characteristics, the clustering information was incorporated as a categorical explanatory variable. We found that Covid-19 was more prevalent as population size increased and a district is downtown. As the number of passengers getting off at a downtown district increased, the confirmed cases significantly increased.

Hotspot Analysis of Urban Crime Using Space-Time Scan Statistics (시공간검정통계량을 이용한 도시범죄의 핫스팟분석)

  • Jeong, Kyeong-Seok;Moon, Tae-Heon;Jeong, Jae-Hee
    • Journal of the Korean Association of Geographic Information Studies
    • /
    • v.13 no.3
    • /
    • pp.14-28
    • /
    • 2010
  • The aim of this study is to investigate crime hotspot areas using the spatio-temporal cluster analysis which is possible to search simultaneously time range as well as space range as an alternative method of existing hotspot analysis only identifying crime occurrence distribution patterns in urban area. As for research method, first, crime data were collected from criminal registers provided by official police authority in M city, Gyeongnam and crime occurrence patterns were drafted on a map by using Geographic Information Systems(GIS). Second, by utilizing Ripley K-function and Space-Time Scan Statistics analysis, the spatio-temporal distribution of crime was examined. The results showed that the risk of crime was significantly clustered at relatively few places and the spatio-temporal clustered areas of crime were different from those predicted by existing spatial hotspot analysis such as kernel density analysis and k-means clustering analysis. Finally, it is expected that the results of this study can be not only utilized as a valuable reference data for establishing urban planning and crime prevention through environmental design(CPTED), but also made available for the allocation of police resources and the improvement of public security services.