• Title/Summary/Keyword: K-means Clustering

Search Result 1,099, Processing Time 0.04 seconds

Analysis of Utilization Characteristics, Health Behaviors and Health Management Level of Participants in Private Health Examination in a General Hospital (일개 종합병원의 민간 건강검진 수검자의 검진이용 특성, 건강행태 및 건강관리 수준 분석)

  • Kim, Yoo-Mi;Park, Jong-Ho;Kim, Won-Joong
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.14 no.1
    • /
    • pp.301-311
    • /
    • 2013
  • This study aims to analyze characteristics, health behaviors and health management level related to private health examination recipients in one general hospital. To achieve this, we analyzed 150,501 cases of private health examination data for 11 years from 2001 to 2011 for 20,696 participants in 2011 in a Dae-Jeon general hospital health examination center. The cluster analysis for classify private health examination group is used z-score standardization of K-means clustering method. The logistic regression analysis, decision tree and neural network analysis are used to periodic/non-periodic private health examination classification model. 1,000 people were selected as a customer management business group that has high probability to be non-periodic private health examination patients in new private health examination. According to results of this study, private health examination group was categorized by new, periodic and non-periodic group. New participants in private health examination were more 30~39 years old person than other age groups and more patients suspected of having renal disease. Periodic participants in private health examination were more male participants and more patients suspected of having hyperlipidemia. Non-periodic participants in private health examination were more smoking and sitting person and more patients suspected of having anemia and diabetes mellitus. As a result of decision tree, variables related to non-periodic participants in private health examination were sex, age, residence, exercise, anemia, hyperlipidemia, diabetes mellitus, obesity and liver disease. In particular, 71.4% of non-periodic participants were female, non-anemic, non-exercise, and suspicious obesity person. To operation of customized customer management business for private health examination will contribute to efficiency in health examination center.

Development of Customer Sentiment Pattern Map for Webtoon Content Recommendation (웹툰 콘텐츠 추천을 위한 소비자 감성 패턴 맵 개발)

  • Lee, Junsik;Park, Do-Hyung
    • Journal of Intelligence and Information Systems
    • /
    • v.25 no.4
    • /
    • pp.67-88
    • /
    • 2019
  • Webtoon is a Korean-style digital comics platform that distributes comics content produced using the characteristic elements of the Internet in a form that can be consumed online. With the recent rapid growth of the webtoon industry and the exponential increase in the supply of webtoon content, the need for effective webtoon content recommendation measures is growing. Webtoons are digital content products that combine pictorial, literary and digital elements. Therefore, webtoons stimulate consumer sentiment by making readers have fun and engaging and empathizing with the situations in which webtoons are produced. In this context, it can be expected that the sentiment that webtoons evoke to consumers will serve as an important criterion for consumers' choice of webtoons. However, there is a lack of research to improve webtoons' recommendation performance by utilizing consumer sentiment. This study is aimed at developing consumer sentiment pattern maps that can support effective recommendations of webtoon content, focusing on consumer sentiments that have not been fully discussed previously. Metadata and consumer sentiments data were collected for 200 works serviced on the Korean webtoon platform 'Naver Webtoon' to conduct this study. 488 sentiment terms were collected for 127 works, excluding those that did not meet the purpose of the analysis. Next, similar or duplicate terms were combined or abstracted in accordance with the bottom-up approach. As a result, we have built webtoons specialized sentiment-index, which are reduced to a total of 63 emotive adjectives. By performing exploratory factor analysis on the constructed sentiment-index, we have derived three important dimensions for classifying webtoon types. The exploratory factor analysis was performed through the Principal Component Analysis (PCA) using varimax factor rotation. The three dimensions were named 'Immersion', 'Touch' and 'Irritant' respectively. Based on this, K-Means clustering was performed and the entire webtoons were classified into four types. Each type was named 'Snack', 'Drama', 'Irritant', and 'Romance'. For each type of webtoon, we wrote webtoon-sentiment 2-Mode network graphs and looked at the characteristics of the sentiment pattern appearing for each type. In addition, through profiling analysis, we were able to derive meaningful strategic implications for each type of webtoon. First, The 'Snack' cluster is a collection of webtoons that are fast-paced and highly entertaining. Many consumers are interested in these webtoons, but they don't rate them well. Also, consumers mostly use simple expressions of sentiment when talking about these webtoons. Webtoons belonging to 'Snack' are expected to appeal to modern people who want to consume content easily and quickly during short travel time, such as commuting time. Secondly, webtoons belonging to 'Drama' are expected to evoke realistic and everyday sentiments rather than exaggerated and light comic ones. When consumers talk about webtoons belonging to a 'Drama' cluster in online, they are found to express a variety of sentiments. It is appropriate to establish an OSMU(One source multi-use) strategy to extend these webtoons to other content such as movies and TV series. Third, the sentiment pattern map of 'Irritant' shows the sentiments that discourage customer interest by stimulating discomfort. Webtoons that evoke these sentiments are hard to get public attention. Artists should pay attention to these sentiments that cause inconvenience to consumers in creating webtoons. Finally, Webtoons belonging to 'Romance' do not evoke a variety of consumer sentiments, but they are interpreted as touching consumers. They are expected to be consumed as 'healing content' targeted at consumers with high levels of stress or mental fatigue in their lives. The results of this study are meaningful in that it identifies the applicability of consumer sentiment in the areas of recommendation and classification of webtoons, and provides guidelines to help members of webtoons' ecosystem better understand consumers and formulate strategies.

A Study on the Satisfaction of Self-Employed (만족도를 이용한 자영업에 관한 연구)

  • Oh, Yu-Jin
    • The Korean Journal of Applied Statistics
    • /
    • v.22 no.2
    • /
    • pp.281-296
    • /
    • 2009
  • This study examines the job and life satisfactions of the self-employed. It uses the Korean Labour and Income Panel Study(KLIPS, hereafter) data for 1998 and 2004. We examine the phases of satisfaction and what variables influence satisfaction for both years and compare the results in order to see what changed between the two regimes. We make use of k-means clustering to divide self-employed into similar degrees of satisfaction. As a result, we are able to classify the self-employed into three groups(low, medium and high) both for the two regimes. High groups consists of relatively younger, well-educated, low working dates, higher proportion of woman than other groups. As a result of regression analysis, we have some evidence that women are more satisfied than men for job satisfaction and that the existence of income is more important than the amount of income for life satisfaction. The age, education, satisfaction for working place, and health are significant to both satisfactions.

Estimation of Drought Rainfall by Regional Frequency Analysis Using L and LH-Moments (II) - On the method of LH-moments - (L 및 LH-모멘트법과 지역빈도분석에 의한 가뭄우량의 추정 (II)- LH-모멘트법을 중심으로 -)

  • Lee, Soon-Hyuk;Yoon , Seong-Soo;Maeng , Sung-Jin;Ryoo , Kyong-Sik;Joo , Ho-Kil;Park , Jin-Seon
    • Journal of The Korean Society of Agricultural Engineers
    • /
    • v.46 no.5
    • /
    • pp.27-39
    • /
    • 2004
  • In the first part of this study, five homogeneous regions in view of topographical and geographically homogeneous aspects except Jeju and Ulreung islands in Korea were accomplished by K-means clustering method. A total of 57 rain gauges were used for the regional frequency analysis with minimum rainfall series for the consecutive durations. Generalized Extreme Value distribution was confirmed as an optimal one among applied distributions. Drought rainfalls following the return periods were estimated by at-site and regional frequency analysis using L-moments method. It was confirmed that the design drought rainfalls estimated by the regional frequency analysis were shown to be more appropriate than those by the at-site frequency analysis. In the second part of this study, LH-moment ratio diagram and the Kolmogorov-Smirnov test on the Gumbel (GUM), Generalized Extreme Value (GEV), Generalized Logistic (GLO) and Generalized Pareto (GPA) distributions were accomplished to get optimal probability distribution. Design drought rainfalls were estimated by both at-site and regional frequency analysis using LH-moments and GEV distribution, which was confirmed as an optimal one among applied distributions. Design rainfalls were estimated by at-site and regional frequency analysis using LH-moments, the observed and simulated data resulted from Monte Carlotechniques. Design drought rainfalls derived by regional frequency analysis using L1, L2, L3 and L4-moments (LH-moments) method have shown higher reliability than those of at-site frequency analysis in view of RRMSE (Relative Root-Mean-Square Error), RBIAS (Relative Bias) and RR (Relative Reduction) for the estimated design drought rainfalls. Relative efficiency were calculated for the judgment of relative merits and demerits for the design drought rainfalls derived by regional frequency analysis using L-moments and L1, L2, L3 and L4-moments applied in the first report and second report of this study, respectively. Consequently, design drought rainfalls derived by regional frequency analysis using L-moments were shown as more reliable than those using LH-moments. Finally, design drought rainfalls for the classified five homogeneous regions following the various consecutive durations were derived by regional frequency analysis using L-moments, which was confirmed as a more reliable method through this study. Maps for the design drought rainfalls for the classified five homogeneous regions following the various consecutive durations were accomplished by the method of inverse distance weight and Arc-View, which is one of GIS techniques.

Study about Library and Information Center's Image of Library and Information Science Students as Workplace (문헌정보학과 학생의 직장으로서의 도서관·정보센터 이미지 분석)

  • Cho, Jane;Lee, Jiwon
    • Journal of the Korean Society for Library and Information Science
    • /
    • v.50 no.3
    • /
    • pp.113-132
    • /
    • 2016
  • Positioning technique which has been widely used for making marketing strategy by analyzing customer's image also has been used for public and test-taker's image analysis about public facilities, entrepreneurs, universities. This study analyze image of library and Information science students who trying to find a job in library fields about diverse types of library and information centers by Positioning technique. As a result of Similarity cognition analysis by multidimensional Scaling and K-means clustering, it was found that students recognize that public, national, university, school library are similar, on the other hand, portal company and special library are different from those types. In the jobs, user service jobs and technical service jobs are recognized as separated clusters, and cultural program job is also recognized dissimilarly from those clusters. By the way, images about work satisfaction and stability of employment shows high in national library; high wage shows high in portal company; employee's growth potential shows high in special library; job importance shows high in reference service jobs; difficulty shows high in content's job. Anyway, in the workplace selection, almost students regard stability of employment as top priorities, accordingly they prefers public library at most. Such a preference concentration tendency is strongly appeared in local university students than in metropolitan area students as a result of Pearson's chi-square test.

Development of Brain Tumor Detection using Improved Clustering Method on MRI-compatible Robotic Assisted Surgery (MRI 영상 유도 수술 로봇을 위한 개선된 군집 분석 방법을 이용한 뇌종양 영역 검출 개발)

  • Kim, DaeGwan;Cha, KyoungRae;Seung, SungMin;Jeong, Semi;Choi, JongKyun;Roh, JiHyoung;Park, ChungHwan;Song, Tae-Ha
    • Journal of Biomedical Engineering Research
    • /
    • v.40 no.3
    • /
    • pp.105-115
    • /
    • 2019
  • Brain tumor surgery may be difficult, but it is also incredibly important. The technological improvements for traditional brain tumor surgeries have always been a focus to improve the precision of surgery and release the potential of the technology in this important area of the body. The need for precision during brain tumor surgery has led to an increase in Robotic-assisted surgeries (RAS). One of the challenges to the widespread acceptance of RAS in the neurosurgery is to recognize invisible tumor accurately. Therefore, it is important to detect brain tumor size and location because surgeon tries to remove as much tumor as possible. In this paper, we proposed brain tumor detection procedures for MRI (Magnetic Resonance Imaging) system. A method of automatic brain tumor detection is needed to accurately target the location of the lesion during brain tumor surgery and to report the location and size of the lesion. In the qualitative assessment, the proposed method showed better results than those obtained with other brain tumor detection methods. Comparisons among all assessment criteria indicated that the proposed method was significantly superior to the threshold method with respect to all assessment criteria. The proposed method was effective for detecting brain tumor.

The Need for Paradigm Shift in Semantic Similarity and Semantic Relatedness : From Cognitive Semantics Perspective (의미간의 유사도 연구의 패러다임 변화의 필요성-인지 의미론적 관점에서의 고찰)

  • Choi, Youngseok;Park, Jinsoo
    • Journal of Intelligence and Information Systems
    • /
    • v.19 no.1
    • /
    • pp.111-123
    • /
    • 2013
  • Semantic similarity/relatedness measure between two concepts plays an important role in research on system integration and database integration. Moreover, current research on keyword recommendation or tag clustering strongly depends on this kind of semantic measure. For this reason, many researchers in various fields including computer science and computational linguistics have tried to improve methods to calculating semantic similarity/relatedness measure. This study of similarity between concepts is meant to discover how a computational process can model the action of a human to determine the relationship between two concepts. Most research on calculating semantic similarity usually uses ready-made reference knowledge such as semantic network and dictionary to measure concept similarity. The topological method is used to calculated relatedness or similarity between concepts based on various forms of a semantic network including a hierarchical taxonomy. This approach assumes that the semantic network reflects the human knowledge well. The nodes in a network represent concepts, and way to measure the conceptual similarity between two nodes are also regarded as ways to determine the conceptual similarity of two words(i.e,. two nodes in a network). Topological method can be categorized as node-based or edge-based, which are also called the information content approach and the conceptual distance approach, respectively. The node-based approach is used to calculate similarity between concepts based on how much information the two concepts share in terms of a semantic network or taxonomy while edge-based approach estimates the distance between the nodes that correspond to the concepts being compared. Both of two approaches have assumed that the semantic network is static. That means topological approach has not considered the change of semantic relation between concepts in semantic network. However, as information communication technologies make advantage in sharing knowledge among people, semantic relation between concepts in semantic network may change. To explain the change in semantic relation, we adopt the cognitive semantics. The basic assumption of cognitive semantics is that humans judge the semantic relation based on their cognition and understanding of concepts. This cognition and understanding is called 'World Knowledge.' World knowledge can be categorized as personal knowledge and cultural knowledge. Personal knowledge means the knowledge from personal experience. Everyone can have different Personal Knowledge of same concept. Cultural Knowledge is the knowledge shared by people who are living in the same culture or using the same language. People in the same culture have common understanding of specific concepts. Cultural knowledge can be the starting point of discussion about the change of semantic relation. If the culture shared by people changes for some reasons, the human's cultural knowledge may also change. Today's society and culture are changing at a past face, and the change of cultural knowledge is not negligible issues in the research on semantic relationship between concepts. In this paper, we propose the future directions of research on semantic similarity. In other words, we discuss that how the research on semantic similarity can reflect the change of semantic relation caused by the change of cultural knowledge. We suggest three direction of future research on semantic similarity. First, the research should include the versioning and update methodology for semantic network. Second, semantic network which is dynamically generated can be used for the calculation of semantic similarity between concepts. If the researcher can develop the methodology to extract the semantic network from given knowledge base in real time, this approach can solve many problems related to the change of semantic relation. Third, the statistical approach based on corpus analysis can be an alternative for the method using semantic network. We believe that these proposed research direction can be the milestone of the research on semantic relation.

Analysis of the Seasonal Concentration Differences of Particulate Matter According to Land Cover of Seoul - Focusing on Forest and Urbanized Area - (서울시 토지피복에 따른 계절별 미세먼지 농도 차이 분석 - 산림과 시가화지역을 중심으로 -)

  • Choi, Tae-Young;Moon, Ho-Gyeong;Kang, Da-In;Cha, Jae-Gyu
    • Journal of Environmental Impact Assessment
    • /
    • v.27 no.6
    • /
    • pp.635-646
    • /
    • 2018
  • This study sought to identify the characteristics of seasonal concentration differences of particulate matter influenced by land cover types associated with particulate matter emission and reductions, namely forest and urbanized regions. PM10 and PM2.5 was measured with quantitative concentration in 2016 on 23 urban air monitoring stations in Seoul, classified the stations into 3 groups based on the ratio of urbanized and forest land covers within a range of 3km around station, and analysed the differences in particulate matter concentration by season. The center values for the urbanized and forest land covers by group were 53.4% and 34.6% in Group A, 61.8% and 16.5% in Group B, and 76.3% and 6.7% in Group C. The group-specific concentration of PM10 and PM2.5 by season indicated that the concentration of Group A, with high ratio of forests, was the lowest in all seasons, and the concentration of Group C, with high ratio of urbanized regions, had the highest concentration from spring to autumn. These inter-group differences were statistically significant. The concentration of Group C was lower than Group B in the winter; however, the differences between Groups B to C in the winter were not statistically significant. Group A concentration compared to the high-concentration groups by season was lower by 8.5%, 11.2%, 8.0%, 6.8% for PM10 in the order of spring, summer, autumn and winter, and 3.5%, 10.0%, 4.1% and 3.3% for PM2.5. The inter-group concentration differences for both PM10 and PM2.5 were the highest in the summer and grew smaller in the winter, this was thought to be because the forests' ability to reduce particulate matter emissions was the most pronounced during the summer and the least pronounced during the winter. The influence of urbanized areas on particulate matter concentration was lower compared to the influence of forests. This study provided evidence that the particulate matter concentration was lower for regions with higher ratios of forests, and subsequent studies are required to identify the role of green space to manage particulate matter concentration in cities.

A Hybrid Forecasting Framework based on Case-based Reasoning and Artificial Neural Network (사례기반 추론기법과 인공신경망을 이용한 서비스 수요예측 프레임워크)

  • Hwang, Yousub
    • Journal of Intelligence and Information Systems
    • /
    • v.18 no.4
    • /
    • pp.43-57
    • /
    • 2012
  • To enhance the competitive advantage in a constantly changing business environment, an enterprise management must make the right decision in many business activities based on both internal and external information. Thus, providing accurate information plays a prominent role in management's decision making. Intuitively, historical data can provide a feasible estimate through the forecasting models. Therefore, if the service department can estimate the service quantity for the next period, the service department can then effectively control the inventory of service related resources such as human, parts, and other facilities. In addition, the production department can make load map for improving its product quality. Therefore, obtaining an accurate service forecast most likely appears to be critical to manufacturing companies. Numerous investigations addressing this problem have generally employed statistical methods, such as regression or autoregressive and moving average simulation. However, these methods are only efficient for data with are seasonal or cyclical. If the data are influenced by the special characteristics of product, they are not feasible. In our research, we propose a forecasting framework that predicts service demand of manufacturing organization by combining Case-based reasoning (CBR) and leveraging an unsupervised artificial neural network based clustering analysis (i.e., Self-Organizing Maps; SOM). We believe that this is one of the first attempts at applying unsupervised artificial neural network-based machine-learning techniques in the service forecasting domain. Our proposed approach has several appealing features : (1) We applied CBR and SOM in a new forecasting domain such as service demand forecasting. (2) We proposed our combined approach between CBR and SOM in order to overcome limitations of traditional statistical forecasting methods and We have developed a service forecasting tool based on the proposed approach using an unsupervised artificial neural network and Case-based reasoning. In this research, we conducted an empirical study on a real digital TV manufacturer (i.e., Company A). In addition, we have empirically evaluated the proposed approach and tool using real sales and service related data from digital TV manufacturer. In our empirical experiments, we intend to explore the performance of our proposed service forecasting framework when compared to the performances predicted by other two service forecasting methods; one is traditional CBR based forecasting model and the other is the existing service forecasting model used by Company A. We ran each service forecasting 144 times; each time, input data were randomly sampled for each service forecasting framework. To evaluate accuracy of forecasting results, we used Mean Absolute Percentage Error (MAPE) as primary performance measure in our experiments. We conducted one-way ANOVA test with the 144 measurements of MAPE for three different service forecasting approaches. For example, the F-ratio of MAPE for three different service forecasting approaches is 67.25 and the p-value is 0.000. This means that the difference between the MAPE of the three different service forecasting approaches is significant at the level of 0.000. Since there is a significant difference among the different service forecasting approaches, we conducted Tukey's HSD post hoc test to determine exactly which means of MAPE are significantly different from which other ones. In terms of MAPE, Tukey's HSD post hoc test grouped the three different service forecasting approaches into three different subsets in the following order: our proposed approach > traditional CBR-based service forecasting approach > the existing forecasting approach used by Company A. Consequently, our empirical experiments show that our proposed approach outperformed the traditional CBR based forecasting model and the existing service forecasting model used by Company A. The rest of this paper is organized as follows. Section 2 provides some research background information such as summary of CBR and SOM. Section 3 presents a hybrid service forecasting framework based on Case-based Reasoning and Self-Organizing Maps, while the empirical evaluation results are summarized in Section 4. Conclusion and future research directions are finally discussed in Section 5.

Determination of Tumor Boundaries on CT Images Using Unsupervised Clustering Algorithm (비교사적 군집화 알고리즘을 이용한 전산화 단층영상의 병소부위 결정에 관한 연구)

  • Lee, Kyung-Hoo;Ji, Young-Hoon;Lee, Dong-Han;Yoo, Seoung-Yul;Cho, Chul-Koo;Kim, Mi-Sook;Yoo, Hyung-Jun;Kwon, Soo-Il;Chun, Jun-Chul
    • Journal of Radiation Protection and Research
    • /
    • v.26 no.2
    • /
    • pp.59-66
    • /
    • 2001
  • It is a hot issue to determine the spatial location and shape of tumor boundary in fractionated stereotactic radiotherapy (FSRT). We could get consecutive transaxial plane images from the phantom (paraffin) and 4 patients with brain tumor using helical computed tomography(HCT). K-means classification algorithm was adjusted to change raw data pixel value in CT images into classified average pixel value. The classified images consists of 5 regions that ate tumor region (TR), normal region (NR), combination region (CR), uncommitted region (UR) and artifact region (AR). The major concern was how to separate the normal region from tumor region in the combination area. Relative average deviation analysis was adjusted to alter average pixel values of 5 regions into 2 regions of normal and tumor region to define maximum point among average deviation pixel values. And then we drawn gross tumor volume (GTV) boundary by connecting maximum points in images using semi-automatic contour method by IDL(Interactive Data Language) program. The error limit of the ROI boundary in homogeneous phantom is estimated within ${\pm}1%$. In case of 4 patients, we could confirm that the tumor lesions described by physician and the lesions described automatically by the K-mean classification algorithm and relative average deviation analyses were similar. These methods can make uncertain boundary between normal and tumor region into clear boundary. Therefore it will be useful in the CT images-based treatment planning especially to use above procedure apply prescribed method when CT images intermittently fail to visualize tumor volume comparing to MRI images.

  • PDF