• Title/Summary/Keyword: k means cluster analysis

Search Result 370, Processing Time 0.03 seconds

Factors affecting to the Quality of Korean Soybean Paste, Doenjang (한국 된장의 품질에 영향을 미치는 요인)

  • Shim, Hye-Jeoung;Yun, Jeong-hyun;Koh, Kyung-Hee
    • Journal of Applied Biological Chemistry
    • /
    • v.61 no.4
    • /
    • pp.357-365
    • /
    • 2018
  • The quality of Korean doenjang, which was traditionally made for this study, was monitored for physicochemical properties, antioxidant capacity, and sensory properties at six months intervals for three years. The collected data were comprehensively analyzed using the k-means clustering via principal component analysis (PCA) to determine the optimal intake duration and sensory factors associated with acceptance. Doenjang samples were classified with every year interval based on PCA, and then the classified doenjang samples were further grouped into cluster one, two, and three based on the k-means clustering. In Cluster three, doenjang that was aged for thirty and thirty-six months, respectively, showed high total phenolic content, antioxidant capacity, superoxide dismutase like activity, and 2,2-diphenyl-1-picryl-hydrazyl radical scavenging capacity. Interestingly, along with acceptance, the levels of free amino acids and organic acids were higher in Cluster 3. The sensory factors found to be associated with acceptance included umami taste and brown color. In conclusion, this study proposes the intake of doenjang aged for thirty months based on its antioxidant activity and sensory properties although doenjang is usually ready after twelve months of aging.

Construction of Onion Sentiment Dictionary using Cluster Analysis (군집분석을 이용한 양파 감성사전 구축)

  • Oh, Seungwon;Kim, Min Soo
    • Journal of the Korean Data Analysis Society
    • /
    • v.20 no.6
    • /
    • pp.2917-2932
    • /
    • 2018
  • Many researches are accomplished as a result of the efforts of developing the production predicting model to solve the supply imbalance of onions which are vegetables very closely related to Korean food. But considering the possibility of storing onions, it is very difficult to solve the supply imbalance of onions only with predicting the production. So, this paper's purpose is trying to build a sentiment dictionary to predict the price of onions by using the internet articles which include the informations about the production of onions and various factors of the price, and these articles are very easy to access on our daily lives. Articles about onions are from 2012 to 2016, using TF-IDF for comparing with four kinds of TF-IDFs through the documents classification of wholesale prices of onions. As a result of classifying the positive/negative words for price by k-means clustering, DBSCAN (density based spatial cluster application with noise) clustering, GMM (Gaussian mixture model) clustering which are partitional clustering, GMM clustering is composed with three meaningful dictionaries. To compare the reasonability of these built dictionary, applying classified articles about the rise and drop of the price on logistic regression, and it shows 85.7% accuracy.

Evaluation of Shopping Items: Focused on Purchase of Foreign Tourists in South Korea

  • Jeong, Dong-Bin
    • East Asian Journal of Business Economics (EAJBE)
    • /
    • v.7 no.2
    • /
    • pp.21-30
    • /
    • 2019
  • Purpose - In this work, we categorize the 21 shopping items which foreign tourists purchase in South Korea and monitor the level of dissimilarity (or similarity) between each item by utilizing distance matrix, and both hierarchical and k-means cluster analyses, respectively, based on several purpose of visit attributes in 2017. In addition, multidimensional scaling (MDS) method is applied for mining visual appearance of proximities among shopping items based on purpose of visit attributes. Research design and methodology - This study is carried out in 2017 by Ministry of Culture, Sports and Tourism and conduct a face-to-face survey of foreign tourists from 20 countries who purchase shopping items in South Korea. CLUSTER, PROXIMITIES and ALSCAL modules in IBM SPSS 23.0 are used to perform this work. Results - We ascertain that 21 shopping items can be classified into five similar groups which have homogeneous traits by going through two-step cluster analysis. We can position homogeneous places of cluster and shopping items joining each cluster. Conclusions - We can relatively assess patterns and characteristics of each shopping item, come by useful information in activating shopping tour based on the actual state of recognition of foreign tourists and practically apply to each tourism industry on underlying results.

Cluster analysis of city-level carbon mitigation in South Korea

  • Zhuo Li
    • Journal of the Korea Society of Computer and Information
    • /
    • v.28 no.7
    • /
    • pp.189-198
    • /
    • 2023
  • The phenomenon of climate change is deteriorating which increased heatwaves, typhoons and heavy snowfalls in recent years. Followed by the 25th United nations framework convention on climate change(COP25), the world countries have achieved a consensus on achieving carbon neutrality. City plays a crucial role in achieving carbon mitigation as well as economic development. Considering economic and environmental factors, we selected 63 cities in South Korea to analyze carbon emission situation by Elbow method and K-means clustering algorithm. The results reflected that cities in South Korea can be categorized into 6 clusters, which are technology-intensive cities, light-manufacturing intensive cities, central-innovation intensive cities, heavy-manufacturing intensive cities, service-intensive cities, rural and household-intensive cities. Specific suggestions are provided to improve city-level carbon mitigation development.

An Analysis of Human Body Shape of Junior High School Girls by Using Plan Potogrammetry (평면사진 계측에 의한 여중생의 체형분석)

  • Kim Kyung Sook;Lee Choon Kye
    • Journal of the Korean Society of Clothing and Textiles
    • /
    • v.14 no.3 s.35
    • /
    • pp.208-215
    • /
    • 1990
  • The purpose of this study is to provide the fundamental data of a dummy design for more suitable ready made clothing by making a pattern of somatic types and analyzing their morphological characteristics in accordance with different pattern of somatic types. The side view silhouettes of 90 junior high school girls of age $13\~16$ in seoul urban area were measured by means of the plan photographing and the low data were examined by principal component analysis, while the principal component analysis was applied and three components were extracted and then interpreted to explain to variation of the form of the body. Using three components respectively the cluster analysis was carried out and the subject classified into 4 cluster The following outcomes are obtained. . The results of principal component analysis of this study would be turned out the three; 1) The first principal component shows the degree of erectness or stoop of the figure. 2) The second principal component was a stature length or a growth rate. 3) The third principal component was the obesity component. 2. The results of cluster analysis by using three principal component analysis would be turned out the four cluser; 1) Cluster 1 ($29\%$ of the total) is characterized with lower stature. 2) Cluster 2 ($21\%$ of the total) is characterized with backward somatotype, and the highest leg. 3) Cluster 3 ($23\%$ of the total) is thicked back of neck. 4) Cluster 4 ($27\%$ of the total) is characterized with forward somatotype, and highest stature, height.

  • PDF

A Study on the Relationship between Skill and Competition Score Factors of KLPGA Players Using Canonical Correlation Biplot and Cluster Analysis (정준상관 행렬도와 군집분석을 응용한 KLPGA 선수의 기술과 경기성적요인에 대한 연관성 분석)

  • Choi, Tae-Hoon;Choi, Yong-Seok
    • The Korean Journal of Applied Statistics
    • /
    • v.21 no.3
    • /
    • pp.429-439
    • /
    • 2008
  • Canonical correlation biplot is 2-dimensional plot for investigating the relationship between two sets of variables and the relationship between observations and variables in canonical correlation analysis graphically. In general, biplot is useful for giving a graphical description of the data. However, this general biplot and also canonical correlation biplot do not give some concise interpretations between variables and observations when the number of observations are large. Recently, for overcoming this problem, Choi and Kim (2008) suggested a method to interpret the biplot analysis by applying the K-means clustering analysis. Therefore, in this study, we will apply their method for investigating the relationship between skill and competition score factors of KLPGA players using canonical correlation biplot and cluster analysis.

Applications of Cluster Analysis in Biplots (행렬도에서 군집분석의 활용)

  • Choi, Yong-Seok;Kim, Hyoung-Young
    • Communications for Statistical Applications and Methods
    • /
    • v.15 no.1
    • /
    • pp.65-76
    • /
    • 2008
  • Biplots are the multivariate analogue of scatter plots. They approximate the multivariate distribution of a sample in a few dimensions, typically two, and they superimpose on this display representations of the variables on which the samples are measured(Gower and Hand, 1996, Chapter 1). And the relationships between the observations and variables can be easily seen. Thus, biplots are useful for giving a graphical description of the data. However, this method does not give some concise interpretations between variables and observations when the number of observations are large. Therefore, in this study, we will suggest to interpret the biplot analysis by applying the K-means clustering analysis. It shows that the relationships between the clusters and variables can be easily interpreted. So, this method is more useful for giving a graphical description of the data than using raw data.

The Study on Typology of Internet Shopping Style in Internet Shopping Mall Users (인터넷쇼핑몰 이용 소비자의 쇼핑스타일 유형에 관한 연구)

  • Moon Sook-jae;Lee Youn Hee;Cheon Hyejung
    • Journal of the Korean Home Economics Association
    • /
    • v.43 no.9 s.211
    • /
    • pp.1-13
    • /
    • 2005
  • The purposes of this study were to classify internet shopping mall user by their shopping styles and to define the characteristics of the classified individual clusters. Questionnaires were completed by 338 men and women who have used internet shopping malls at lead once during the previous 6 months. The internet shopping styles were classified into 4 clusters after factor analysis and k-means cluster analysis. Cluster I, named 'high brand proneness', can be described as having low score on devotee tendency. Cluster II, named 'high value proneness', is characterized by a high score on seeking substance. Cluster III, called 'steadiness orientation', can be described as having a tow score on seeking trend and substance. Cluster IV, named 'individuality inclination', can be described as having low score on seeking trend. These four clusters differ in terms of socio-demographic and environmental characteristics such as gender, age, educational level, occupation, and internet using time. Theoretical and practical implications are discussed.

Group Search Optimization Data Clustering Using Silhouette (실루엣을 적용한 그룹탐색 최적화 데이터클러스터링)

  • Kim, Sung-Soo;Baek, Jun-Young;Kang, Bum-Soo
    • Journal of the Korean Operations Research and Management Science Society
    • /
    • v.42 no.3
    • /
    • pp.25-34
    • /
    • 2017
  • K-means is a popular and efficient data clustering method that only uses intra-cluster distance to establish a valid index with a previously fixed number of clusters. K-means is useless without a suitable number of clusters for unsupervised data. This paper aimsto propose the Group Search Optimization (GSO) using Silhouette to find the optimal data clustering solution with a number of clusters for unsupervised data. Silhouette can be used as valid index to decide the number of clusters and optimal solution by simultaneously considering intra- and inter-cluster distances. The performance of GSO using Silhouette is validated through several experiment and analysis of data sets.

K-means Clustering for Environmental Indicator Survey Data

  • Park, Hee-Chang;Cho, Kwang-Hyun
    • 한국데이터정보과학회:학술대회논문집
    • /
    • 2005.04a
    • /
    • pp.185-192
    • /
    • 2005
  • There are many data mining techniques such as association rule, decision tree, neural network analysis, clustering, genetic algorithm, bayesian network, memory-based reasoning, etc. We analyze 2003 Gyeongnam social indicator survey data using k-means clustering technique for environmental information. Clustering is the process of grouping the data into clusters so that objects within a cluster have high similarity in comparison to one another. In this paper, we used k-means clustering of several clustering techniques. The k-means clustering is classified as a partitional clustering method. We can apply k-means clustering outputs to environmental preservation and environmental improvement.

  • PDF