• 제목/요약/키워드: k means cluster analysis

검색결과 372건 처리시간 0.025초

소프트웨어 불법복제에 영향을 미치는 환경 요인에 기반한 국가 분류 (Country Clustering Based on Environmental Factors Influencing on Software Piracy)

  • 서보밀;심준호
    • 한국정보시스템학회지:정보시스템연구
    • /
    • 제26권4호
    • /
    • pp.227-246
    • /
    • 2017
  • Purpose: As the importance of software has been emphasized recently, the size of the software market is continuously expanding. The development of the software market is being adversely affected by software piracy. In this study, we try to classify countries around the world based on the macro environmental factors, which influence software piracy. We also try to identify the differences in software piracy for each classified type. Design/methodology/approach: The data-driven approach is used in this study. From the BSA, the World Bank, and the OECD, we collect data from 1990 to 2015 for 127 environmental variables of 225 countries. Cronbach's ${\alpha}$ analysis, item-to-total correlation analysis, and exploratory factor analysis derive 15 constructs from the data. We apply two-step approach to cluster analysis. The number of clusters is determined to be 5 by hierarchical cluster analysis at the first step, and the countries are classified by the K-means clustering at the second step. We conduct ANOVA and MANOVA in order to verify the differences of the environmental factors and software piracy among derived clusters. Findings: The five clusters are identified as underdeveloped countries, developing countries, developed countries, world powers, and developing country with large market. There are statistically significant differences in the environmental factors among the clusters. In addition, there are statistically significant differences in software piracy rate, pirated value, and legal software sales among the clusters.

Analysis of Female Lower Body Shapes for the Development of Slacks Patterns: Exploring Body Clusters Using Machine Learning

  • Ji Min Kim
    • International Journal of Advanced Culture Technology
    • /
    • 제12권3호
    • /
    • pp.434-440
    • /
    • 2024
  • SIZE KOREA updates body measurement data every five years, providing essential information for the fashion industry. This anthropometric data is widely used to diagnose consumer body shapes and develop optimal clothing sizes. Artificial intelligence, particularly machine learning, excels in predicting such body shape classifications. This study seeks to enhance the suitability of clothing design by applying the new analytical methodology of machine learning techniques to better capture and classify the unique body shapes of Korean women. In this study, machine learning techniques such as K-means clustering, Silhouette analysis, and Decision Tree analysis were used to classify the lower body shapes of Korean women in their twenties and identify standard body shapes useful for slacks design. The results showed that the lower body of the age group could be classified into three categories: 'small stature' (the majority), 'tall with an average lower body volume,' and 'medium height with a fuller lower body' (the smallest share). The three-cluster approach is validated through Silhouette analysis, which minimizes misclassification. Decision Tree analysis then further defines the criteria for these clusters, highlighting waist height and hip depth as the most significant factors, achieving a classification accuracy of 90.6%. While this study is not directly related to Robotic Process Automation, its detailed analysis of body shapes for slacks patterns can aid RPA in clothing production. Future research should continue integrating machine learning in human body and fashion design studies.

표준강수지수를 활용한 제주도 가뭄의 공간적 분류 방법 연구 (Drought Classification Method for Jeju Island using Standard Precipitation Index)

  • 박재규;이준호;양성기;김민철;양세창
    • 한국환경과학회지
    • /
    • 제25권11호
    • /
    • pp.1511-1519
    • /
    • 2016
  • Jeju Island relies on subterranean water for over 98% of its water resources, and it is therefore necessary to continue to perform studies on drought due to climate changes. In this study, the representative standardized precipitation index (SPI) is classified by various criteria, and the spatial characteristics and applicability of drought in Jeju Island are evaluated from the results. As the result of calculating SPI of 4 weather stations (SPI 3, 6, 9, 12), SPI 12 was found to be relatively simple compared to SPI 6. Also, it was verified that the fluctuation of SPI was greater fot short-term data, and that long-term data was relatively more useful for judging extreme drought. Cluster analysis was performed using the K-means technique, with two variables extracted as the result of factor analysis, and the clustering was terminated with seven-time repeated calculations, and eventually two clusters were formed.

도시기후 형성 요소를 고려한 공간유형 분류 -창원시를 대상으로 - (The Classification of Spatial Patterns Considering Formation Parameters of Urban Climate - The case of Changwon city, South Korea -)

  • 송봉근;박경훈
    • 환경영향평가
    • /
    • 제20권3호
    • /
    • pp.299-311
    • /
    • 2011
  • The objective of this paper is to present a methodology for the classification of spatial patterns considering the parameters of urban form which play a significant role in the formation of the urban climate. The urban morphological parameters, i.e. building coverage, impervious pavement, vegetation, water, farmland and landuse types were used to classify the spatial patterns by a K-means cluster analysis. And the presented methodology was applied on Changwon city, South Korea. According to the results of cluster analysis, the total spatial patterns were classified as 24 patterns. First of all, The spatial patterns(A-1, A-2, A-3, B-1, B-2, B-3, C-1, C-2, C-3, D-1, D-2, D-3, E-1, E-2, E-3, F-1, F-2, F-3, G-1, G-2, G-3), which distributed in the rural area and the suburban area, can have the positive impacts of cold air generation and wind corridor on an urban climate environment, were distributed in the rural area. On the other hand, the spatial patterns of the downtown area including A-4, B-4, C-4 and D-4 are expected to have the negative impacts on urban climate owing to the of artificial heat emission or the wind flow obstruction. Finally, it will require the future research to analysis the climatic properties according to the same spatial patterns by the field survey.

사상체질판별(四象體質判別) 검사지(檢査紙) 문항(問項)의 타당성(妥當性)과 신뢰성(信賴性) 및 응답자(應答者) 개체분석(應答者)에 관한 연구(硏究) (The Studies on the Statistical Reliability and Significancy of the Questionnaire for the Sasang Constitution)

  • 이화섭;안탁원
    • 혜화의학회지
    • /
    • 제12권2호
    • /
    • pp.177-197
    • /
    • 2004
  • 1. The values of Cronbach's alpha for the Taeyang, Taeum, Soyang and Soeum questionnaire were 0.7955, 0.7776, 0.8545, and 0.8601 respectively. These results indicate a highly satisfactory level of internal consistency for the questionnaire. 2. If the deletion of an item increases Cronbach's alpha then what that means is that the deletion of that item improves reliability. Therefore, any items that result in substantially greater values of alpha than the overall alpha may need to be deleted from the questionnaire to improve its reliability. 3. Factor analysis was performed on the 81 questionnaires. Based on the scree plot and the number and decrement of eigen values greater than one, three to four factor solution was most significant. 4. The hierarchical cluster analysis was performed on the 81 Sasang constitution questionnaire. These results suggested that two or four clusters identified with homogeneous groups 5. The hierarchical cluster analysis was performed on the 1046 responders. These results suggested that two, three, or four clusters might identified with homogeneous groups. Furthemore, there were statistically significant difference among the each group by ANOVA(P<0.0001).

  • PDF

군집분석을 활용한 지역별 건강격차 연구: 주관적 건강수준을 중심으로 (Regional Health Disparities of Self-Rated Health Using Cluster Analysis in South Korea)

  • 허민희;백세종;김영진;노진원
    • 보건행정학회지
    • /
    • 제33권2호
    • /
    • pp.118-128
    • /
    • 2023
  • Background: Personal socio-economic abilities are crucial as it affects health inequalities. These multidimensional inequalities across the regions have been structured and fixed. This study aimed to analyze health vulnerabilities by regional cluster and identify regional health disparities of self-rated health, using nationally representative cross-sectional data. Methods: This study used personal and regional data. Data from the Community Health Survey 2021 were analyzed. K-means cluster analysis was applied to 250 si-gun-gu using administrative regional data. The clusters were based on three areas: physical environment, health-related behaviors and biological factors, and the psychosocial environment through the conceptual framework for action on the social determinants of health. And binary logistic regression analyses were conducted to examine the differences in self-rated health status by the regional clusters, controlling human biology, environment, lifestyle, and healthcare organization factors. Results: The most vulnerable group was group 3, the moderate vulnerable group was group 1, and the least vulnerable group was group 2. The group 2 was more likely to have high self-rated health status than the moderate vulnerable group (odds ratio [OR], 1.023; p<0.001). And the group 3 showed low self-rated health status than the moderate vulnerable group (OR, 0.775; p<0.001). However, the moderate vulnerable group had significantly higher self-rated health status than the most vulnerable group (group 2: OR, 1.023; p<0.001; group 3: OR, 0.775; p<0.001). Conclusion: These results demonstrate that community members' health status is influenced by regional determinants of health and individual levels. And these contribute to understanding the importance of specific and differentiated interventions like locally tailored support programs considering both individual and regional health determinants.

생태관광지 방문객의 동기 및 태도에 따른 시장세분화 - 경기도 대부 해솔길 방문객을 중심으로 - (Ecotourism Visitors' Motivation/Attitude-Based Market Segmentation - Focused on Visitors at the Daebu Haesolgil, Gyeonggi Province -)

  • 정윤정;김성일
    • 한국조경학회지
    • /
    • 제46권3호
    • /
    • pp.46-57
    • /
    • 2018
  • 이 연구는 우리나라 생태관광지 방문객 시장에 대한 이해를 높이기 위하여 우리나라 생태관광지를 방문하는 국내 관광객의 생태관광 동기와 태도를 바탕으로 시장세분화를 실시하였다. 2016년 11월 말부터 2017년 1월 초까지 경기도 대부 해솔 1길을 방문하는 관광객을 대상으로 설문조사를 실시하여 수집된 434명의 응답자료를 분석에 이용하였다. 탐색적 요인분석과 K-means 군집분석을 이용하여 방문객의 동기와 태도를 바탕으로 시장을 세분화하였다. 그 결과, 방문객들을 '자연탐방형 책임있는 관광객', '소극적 자연탐방형 관광객' 그리고 '자연탐방 및 친목도모형 책임있는 관광객'으로 분류할 수 있었으며, 세 개의 세분시장은 공통적으로 '자연탐방' 동기가 두드러졌으나, 각기 다른 수준의 '일상탈출', '건강증진', '친목도모' 동기와 생태관광 태도를 갖는 것으로 나타났다. 또한, 각 세분시장 간에는 인구통계적 특성, 여행 특성, 만족도에서 통계적으로 유의미한 차이를 보이는 것으로 나타나, 이 연구에서 사용된 생태관광 동기와 태도 척도가 생태관광지 방문객 시장을 세분화하는 기준으로 유용하다는 것을 확인하였다. 과거 연구들이 관광객의 환경관련 태도만 측정하여 시장세분화 가능성을 확인한 반면, 이 연구는 지역사회 및 주민에 대한 책임 있는 행동에 대한 태도의 측정과 분석을 포함하고 있다는 점에서 보다 포괄적이고 생태관광지에 특화된 기준으로 시장 세분화의 가능성을 확인했다는데 의의가 있다.

대한민국 정권별 아동복지정책 관련 뉴스 기사 분석: K-평균 군집 분석 (Analysis of News Articles on Child Welfare Policies in South Korea: K-Means Clustering)

  • 김은주;김성광;박빛나
    • 동서간호학연구지
    • /
    • 제29권2호
    • /
    • pp.185-195
    • /
    • 2023
  • Purpose: The purpose of this study is to analyze changes of child welfare policies and provide insights based on the collection and classification of newspaper articles. Methods: Articles related to child welfare policies were collected from 1990, during the Kim, Young-sam administration, to May 9, 2022, under the Moon, Jae-in administration. K-Means clustering and keyword Term Frequency-Inverse Document Frequency analysis were utilized to cluster and analyze newspaper articles with similar themes. Results: The administrations of Kim, Young-sam, Kim, Dae-jung, Roh, Moo-hyun, and Park, Geun-hye were classified into two clusters, and the Lee, Myung-bak and Moon, Jae-in administrations were classified into three clusters. Conclusion: South Korea's child welfare policies have focused on ensuring the safety and healthy development of children through diverse policies initiatives over the years. However, challenges related to child protection and child abuse persist. This requires additional resources and budget allocation. It is important to establish a comprehensive support system for children and families, including comprehensive nursing support.

공간 자료를 이용한 대기오염이 순환기계 건강에 미치는 영향 분석 (A Study on the effects of air pollution on circulatory health using spatial data)

  • 박진옥;최일수;나명환
    • 품질경영학회지
    • /
    • 제44권3호
    • /
    • pp.677-688
    • /
    • 2016
  • Purpose: In this study, we examine the effects of circulatory diseases mortality in South Korea 2005-2013 using the air pollution index, Methods: We cluster the region of high risk mortality by SaTScan$^{TM}$9.3.1 and compare this result with the regional distribution of air pollution. We use the Geographically Weighted Regression (GWR) to consider the spatial heterogeneity of data collected by administrative district in order to estimate the model. As GWR is spatial analysis techniques utilizing the spatial information, regression model estimated for each region on the assumption that regression coefficients are different by region. Results: As a result of estimating model of the collected air pollution index, circulatory diseases mortality data combined with the spatial information, GWR was found to solve the problem of spatial autocorrelation and increase the fit of the model than OLS regression model. Conclusion: GWR is used to select the air pollution affecting the disease each year, the K-means cluster analysis discover the characteristics of the distribution of air pollution by region.

Experimental Evaluation of Distance-based and Probability-based Clustering

  • Kwon, Na Yeon;Kim, Jang Il;Dollein, Richard;Seo, Weon Joon;Jung, Yong Gyu
    • International journal of advanced smart convergence
    • /
    • 제2권1호
    • /
    • pp.36-41
    • /
    • 2013
  • Decision-making is to extract information that can be executed in the future, it refers to the process of discovering a new data model that is induced in the data. In other words, it is to find out the information to peel off to find the vein to catch the relationship between the hidden patterns in data. The information found here, is a process of finding the relationship between the useful patterns by applying modeling techniques and sophisticated statistical analysis of the data. It is called data mining which is a key technology for marketing database. Therefore, research for cluster analysis of the current is performed actively, which is capable of extracting information on the basis of the large data set without a clear criterion. The EM and K-means methods are used a lot in particular, how the result values of evaluating are come out in experiments, which are depending on the size of the data by the type of distance-based and probability-based data analysis.