• Title/Summary/Keyword: k means cluster analysis

Search Result 372, Processing Time 0.023 seconds

Comprehensive review on Clustering Techniques and its application on High Dimensional Data

  • Alam, Afroj;Muqeem, Mohd;Ahmad, Sultan
    • International Journal of Computer Science & Network Security
    • /
    • v.21 no.6
    • /
    • pp.237-244
    • /
    • 2021
  • Clustering is a most powerful un-supervised machine learning techniques for division of instances into homogenous group, which is called cluster. This Clustering is mainly used for generating a good quality of cluster through which we can discover hidden patterns and knowledge from the large datasets. It has huge application in different field like in medicine field, healthcare, gene-expression, image processing, agriculture, fraud detection, profitability analysis etc. The goal of this paper is to explore both hierarchical as well as partitioning clustering and understanding their problem with various approaches for their solution. Among different clustering K-means is better than other clustering due to its linear time complexity. Further this paper also focused on data mining that dealing with high-dimensional datasets with their problems and their existing approaches for their relevancy

Market Segmentation Based on Types of Motivations to Visit Coffee Shops (커피전문점 방문동기유형에 따른 시장세분화)

  • Lee, Yong-Sook;Kim, Eun-Jung;Park, Heung-Jin
    • The Korean Journal of Franchise Management
    • /
    • v.7 no.1
    • /
    • pp.21-29
    • /
    • 2016
  • Purpose - The primary purpose of this study is to employ effective marketing methods using market segmentation of coffee shops by determining how motivations to visit coffee shops have different impacts on demographic profile of visitors and characteristics of coffee shop visits, so as to draw out a better understanding of customers of coffee market. Research design, data, and methodology - Data were collected using surveys of self-administered questionnaires toward coffee shop users in Daejeon, Korea. A number of samples used in data analysis were 253 excluding unusable responses. The data were analyzed through frequency, reliability, and factor analysis using SPSS 20.0. Factor analysis was conducted through the principal component analysis and varimax rotation method to derive factors of one or more eigen values. In addition, the cluster analysis, multivariate ANOVA, and cross-tab analysis were used for the market segmentation based on the types of motivation for coffee shop visits. The process of the cluster analysis is as follows. Four clusters were derived through hierarchical clustering, and k-means cluster analysis was then carried out using mean value of the four clusters as the initial seed value. Result - The factor analysis delineated four dimensions of motivation to visit coffee shops: ostentation motivation, hedonic motivation, esthetic motivation, utility motivation. The cluster analysis yielded four clusters: utility and esthetic seekers, hedonic seekers, utility seekers, ostentation seekers. In order to further specify the profile of four clusters, each cluster was cross tabulated with socio-demographics and characteristics of coffee shop visits. Four clusters are significantly different from each other by four types of motivations for coffee shop visits. Conclusions - This study has empirically examined the difference in demographic profile of visitors and characteristics of coffee shop visits by motivation to visit coffee shops. There are significant differences according to age, education background, marital status, occupation and monthly income. In addition, coffee shops use pattern characterization in frequency of visits to coffee shops, relationships with companion, purpose of visit, information sources, brand type, average expense per visit, important elements of selection attribute were significantly different depending on motivations for coffee shop visits.

A Study on Green Consumer Segmentation Based on Socio-Demographics and Behavioral Responses: Renewing the Relationships between Socio-demographics and Green Behavior

  • Kim, Young Doo
    • Asia Marketing Journal
    • /
    • v.17 no.1
    • /
    • pp.1-26
    • /
    • 2015
  • In the 21st century, green consumer behavior, playing one of the core roles of sustainability, is still an important issue to green-related stakeholders. Because one of the major objectives of green-consumer research is an improvement of behaviors aligned with greening, this paper revisited socio-demographic variables and shed light on segmenting and profiling green consumers based on their connectedness between socio-demographic variables and green behaviors. Using correlations, factor analysis, analysis of variance, k-means cluster analysis and χ2-tests, this paper shows that socio-demographic variables differentially impact green-consumer behaviors. In order to profile green consumers, this paper additionally attempts to segment green-consumer groups. The results also coincide with former findings that socio-demographic variables relate significantly with segmented green-consumer group behaviors. General findings are summarized as: 1) older people used green practices more strongly than younger people, 2) females demonstrated better energy-saving and recycling practices compared to males, 3) marital status also significantly influenced green-related behaviors, 4) subjective social class had a significant influence on green-related behaviors, 5) education level and income, however, weakly influenced or showed no impact on green-related behaviors, and 6) a green consumer was classified as an 'active green consumer,' 'utilitarian green consumer,' or 'inactivated green consumer.' The utilitarian green consumer group distinctively behaved more strongly in energy-saving and recycling practices compared to the inactivated green consumer group, whereas active green consumers behaved more strongly on the whole, when compared to those in the inactivated green consumer group.

An Analysis of the Cognitive Processes of 5-Year-Old Children : A Focus on a Performance of Cognitive Assessment System Based on Gender, Monthly Age, and Tendencies towards Hyperactivity (만 5세 유아의 인지과정 특성 분석 : 성별, 월령, 과잉행동성향에 따른 CAS 수행 결과를 중심으로)

  • Park, Sae-Rom;Park, Hye-Jun
    • Korean Journal of Child Studies
    • /
    • v.31 no.4
    • /
    • pp.139-157
    • /
    • 2010
  • This study investigated the cognitive process of 5-year-old children, with a particular focus on gender, monthly age, and their tendencies towards hyperactivity through the performance of the Cognitive Assessment System (CAS; Das & Naglieri, 1997). The children with tendencies towards hyperactivity were identified based on Conners Teachers' Rating Scale (CTRS). The subjects were 75 five-year-old children in Seoul and surrounding metropolitan areas. Data were analyzed by means of descriptive statistics, an independent sample t-test, Pearson's correlation coefficient, one-way ANOVA, and by K-mean cluster analysis. Our results were as follows : (1) The CAS and CTRS' sub-factors were correlated negatively, except the positive correlation between planning factor and hyperactivity factor. (2) Girls exhibited significantly higher CAS scores in planning & sequential processing than boys. (3) The upper monthly age group (68-71 months) showed significantly higher score in terms of planning than the lower monthly age group (60-63 months). (4) The CAS scores of the children with tendencies towards hyperactivity was lower than that of normal children. (5) The CAS profile of 5-year-old children was divided into 4 groups with distinctive characteristics by means of K-mean cluster analysis.

Analysis of Document Clustering Varing Cluster Centroid Decisions (클러스터 중심 결정 방법에 따른 문서 클러스터링 성능 분석)

  • 오형진;변동률;이신원;박순철;정성종;안동언
    • Proceedings of the IEEK Conference
    • /
    • 2002.06c
    • /
    • pp.99-102
    • /
    • 2002
  • K-means clustering algorithm is a very popular clustering technique, which is used in the field of information retrieval. In this paper, We deal with the problem of K-means Algorithm from the view of creating the centroids and suggest a method reflecting document feature and considering the context of each document to determine the new centroids during the process of forming new centroids. For experiment, We used the automatic document summarizer to summarize the Reuter21578 newslire test dataset and achieved 20% improved results to the recall metrics.

  • PDF

Analysis of The Partial Discharge Pattern in XLPE Insulator due to Variation of Statistical Distribution (분포통계변화에 따른 XLPE 절연체의 부분방전 패턴해석)

  • Kim, Tag-Yong;Lee, Hyuk-Jin;Cho, Kyung-Soon;Shin, Hyun-Taek;Yeon, Kyu-Ho;Lee, Chung-Ho;Hong, Jin-Woong
    • Proceedings of the Korean Institute of Electrical and Electronic Material Engineers Conference
    • /
    • 2006.06a
    • /
    • pp.83-84
    • /
    • 2006
  • In this paper, we examine discharge characteristics of cross-linked polyethylene (since then; XLPE) according to thickness. Voltage was applied to power frequency by step method, and calibration of discharge was set to 50[pC] (slope=8.333). After the voltage was applied, for 10 [sec] (600 [cycle]), occurring discharge and number were detected. Determine of input pattern is difficult because discharge pattern is irregular. Therefore we investigated pattern using the K-means Analysis and Weibull function. Also we investigated variation of centroid and cluster.

  • PDF

Analysis of Combined Yeast Cell Cycle Data by Using the Integrated Analysis Program for DNA chip (DNA chip 통합분석 프로그램을 이용한 효모의 세포주기 유전자 발현 통합 데이터의 분석)

  • 양영렬;허철구
    • KSBB Journal
    • /
    • v.16 no.6
    • /
    • pp.538-546
    • /
    • 2001
  • An integrated data analysis program for DNA chip containing normalization, FDM analysis, various kinds of clustering methods, PCA, and SVD was applied to analyze combined yeast cell cycle data. This paper includes both comparisons of some clustering algorithms such as K-means, SOM and furry c-means and their results. For further analysis, clustering results from the integrated analysis program was used for function assignments to each cluster and for motif analysis. These results show an integrated analysis view on DNA chip data.

  • PDF

A study on the quantitative risk grade assessment of initial mass production for weapon systems (초도양산 군수품에 대한 정량적 위험등급평가 방안 연구)

  • Jung, Yeongtak;Ham, Younghoon;Roh, Taegoo;Ahn, Manki;Ko, Kyungwa
    • Journal of Korean Society for Quality Management
    • /
    • v.46 no.3
    • /
    • pp.441-452
    • /
    • 2018
  • Purpose: The purpose of this paper is to study quantitative risk grade assessment for objective government quality assurance activities based on risk management in initial mass production for weapon systems. Methods: The Defense quality management regulations and foreign risk assessment documents are referred to analyze problems performing quality assurance actives. The failure rate data, maintainability and cost of products have been studied to quantify the risk Likelihood and impact. The analyzed data were classified as risk grade assessment through K-means Cluster Analysis method. Results: Results show that a proposed method can objectively evaluate risk grade. The analyzed results are clustered into three levels such as high, middle and low. Two products are allocated high, eleven low and seven middle. Conclusion: In this paper, quantitative risk grade assessment methods were presented by analyzing risk ratings based on objective data. The findings showed that the methods would be effective for initial mass production for weapon systems.

County-Based Vulnerability Evaluation to Agricultural Drought Using Principal Component Analysis - The case of Gyeonggi-do - (주성분 분석법을 이용한 시군단위별 농업가뭄에 대한 취약성 분석에 관한 연구 - 경기도를 중심으로 -)

  • Jang, Min-Won
    • Journal of Korean Society of Rural Planning
    • /
    • v.12 no.1 s.30
    • /
    • pp.37-48
    • /
    • 2006
  • The objectives of this study were to develop an evaluation method of regional vulnerability to agricultural drought and to classify the vulnerability patterns. In order to test the method, 24 city or county areas of Gyeonggi-do were chose. First, statistic data and digital maps referred for agricultural drought were defined, and the input data of 31 items were set up from 5 categories: land use factor, water resource factor, climate factor, topographic and soil factor, and agricultural production foundation factor. Second, for simplification of the factors, principal component analysis was carried out, and eventually 4 principal components which explain about 80.8% of total variance were extracted. Each of the principal components was explained into the vulnerability components of scale factor, geographical factor, weather factor and agricultural production foundation factor. Next, DVIP (Drought Vulnerability Index for Paddy), was calculated using factor scores from principal components. Last, by means of statistical cluster analysis on the DVIP, the study area was classified as 5 patterns from A to E. The cluster A corresponds to the area where the agricultural industry is insignificant and the agricultural foundation is little equipped, and the cluster B includes typical agricultural areas where the cultivation areas are large but irrigation facilities are still insufficient. As for the cluster C, the corresponding areas are vulnerable to the climate change, and the D cluster applies to the area with extensive forests and high elevation farmlands. The last cluster I indicates the areas where the farmlands are small but most of them are irrigated as much.

The recent research wave in ecotourism research using keyword network analysis (키워드 네트워크 분석을 활용한 생태관광연구 경향 분석)

  • Lee, Jae-Hyuck;Son, Yong-Hoon
    • Journal of Korean Society of Rural Planning
    • /
    • v.22 no.2
    • /
    • pp.45-55
    • /
    • 2016
  • From 1970, the concept of ecotourism is introduced, lots of studies in ecotourism appeared. Review these studies are necessary for future ecotourism studies. Some review studies on ecotourism are existed. However, these approach also limitation of subjectivities and some sorts of papers has not been reviewed. This study use keyword network analysis which is used as big data analysis to overcome the limitation. Foreign 2455 studies and domestic 163 studies which have ecotoursim in keywords, are analyzed for reviewing. As a result, 3 cluster('Sustainable tourism development', 'Ecological conservation', 'Ecotourist analysis' appeared, in ecotourism studies. In addition, this cluster has deep relationship with region. 'Sustainable tourism development' is related to Eurasia, Australia, Europe. 'Ecological conservation' is related to Africa. 'Ecotourism analysis' is related to North America. Especially 'Resident participation', 'Stakeholder' are appeared many times in Asia region. These results show that ecotourism studies are interpreted in regional contexts. It means that although only one word 'ecotourism' is used in different contexts, regional approach are needed for exact use. In Korea, the keywords are focused on ecotourists and developments. As Korea has lots of ecotour village, resident participation studies have to be supplemented.