• Title/Summary/Keyword: Association-based Dissimilarity

Search Result 9, Processing Time 0.023 seconds

Categorical Data Clustering Analysis Using Association-based Dissimilarity (연관성 기반 비유사성을 활용한 범주형 자료 군집분석)

  • Lee, Changki;Jung, Uk
    • Journal of Korean Society for Quality Management
    • /
    • v.47 no.2
    • /
    • pp.271-281
    • /
    • 2019
  • Purpose: The purpose of this study is to suggest a more efficient distance measure taking into account the relationship between categorical variables for categorical data cluster analysis. Methods: In this study, the association-based dissimilarity was employed to calculate the distance between two categorical data observations and the distance obtained from the association-based dissimilarity was applied to the PAM cluster algorithms to verify its effectiveness. The strength of association between two different categorical variables can be calculated using a mixture of dissimilarities between the conditional probability distributions of other categorical variables, given these two categorical values. In particular, this method is suitable for datasets whose categorical variables are highly correlated. Results: The simulation results using several real life data showed that the proposed distance which considered relationships among the categorical variables generally yielded better clustering performance than the Hamming distance. In addition, as the number of correlated variables was increasing, the difference in the performance of the two clustering methods based on different distance measures became statistically more significant. Conclusion: This study revealed that the adoption of the relationship between categorical variables using our proposed method positively affected the results of cluster analysis.

Evaluation on Development Performances of E-Commerce for 50 Major Cities in China (중국 주요 50개 도시의 전자상거래 발전성과에 대한 평가)

  • Jeong, Dong-Bin;Wang, Qiang
    • Journal of Distribution Science
    • /
    • v.14 no.1
    • /
    • pp.67-74
    • /
    • 2016
  • Purpose - In this paper, the degree of similarity and dissimilarity between pairs of 50 major cities in China can be shown on the basis of three evaluation variables(internet businessman index, internet shopping index and e-commerce development index). Dissimilarity distance matrix is used to analyze both similarity and dissimilarity between each fifty city in China by calculating dissimilarity as distance. Higher value signifies higher degree of dissimilarity between two cities. Cluster analysis is exploited to classify 50 cities into a number of different groups such that similar cities are placed in the same group. In addition, multidimensional scaling(MDS) technique can obtain visual representation for exploring the pattern of proximities among 50 major cities in China based on three development performance attributes. Research design, data, and methodology - This research is performed by the 2013 report provided with AliResearch in China(1/1/2013~11/30/2013) and utilized multivariate methods such as dissimilarity distance matrix, cluster analysis and MDS by using CLUSTER, KMEANS, PROXIMITIES and ALSCAL procedures in SPSS 21.0. Results - This research applies two types of cluster analysis and MDS on three development performances based on the 2013 report of Aliresearch. As a result, it is confirmed that grouping is possible by categorizing the types into four clusters which share similar characteristics. MDS is exploited to carry out positioning of both grouped locations of cluster and 50 major cities belonging to each cluster. Since all the values corresponding to Shenzhen, Guangzhou and Hangzhou(which belong to cluster 1 among 50 major cities) are very large, these cities are superior to other cities in all three evaluation attributes. Twelve cities(Beijing, ShangHai, Jinghua, ZhuHai, XiaMen, SuZhou, NanJing, DongWan, ZhangShan, JiaXing, NingBo and FoShan), which belong to cluster 3, are inferior to those of cluster 1 in terms of all three attributes, but they can be expected to be the next e-commerce revolution. The rest of major cities, in particular, which belong to cluster 4 are relatively inferior in all three attributes, so that this automatically evokes creative innovation, which leads to e-commerce development as a whole in China. In terms of internet businessman index, on the other hand, Tainan, Taizhong, and Gaoxiong(which belong to cluster 2) are situated superior to others. However, these three cities are inferior to others in an internet shopping index sense. The rest of major cities, in particular, which belong to cluster 4 are relatively inferior in all three evaluation attributes, so that this automatically evokes innovation and entrepreneurship, which leads to e-commerce development as a whole in China. Conclusions - This study suggests the implications to help e-governmental officers and companies make strategies in both Korea and China. This is expected to give some useful information in understanding the recent situation of e-commerce in China, by looking over development performances of 50 major cities. Therefore, we should develop marketing, branding and communication relevant to online Chinese consumers. One of these efforts will be incentives like loyalty points and coupons that can encourage consumers and building in-house logistics networks.

Assessment of Educational Conditions for 28 National Universities in South Korea

  • Jeong, Dong-Bin
    • Asian Journal of Business Environment
    • /
    • v.7 no.1
    • /
    • pp.25-29
    • /
    • 2017
  • Purpose - In this paper, we categorize and segment the 28 national universities in South Korea and measure the degree of dissimilarity (or similarity) between pairs of ones by using dissimilarity distance matrix and cluster analysis, respectively, based on the seven quantitative evaluation of educational conditions (percentage of small-scale courses, percentage of lecture by the faculty, collection of books per student, material purchase per student, percentage of building capacity, percentage of real estate capacity and rate of accommodation) in 2015. In addition, multidimensional scaling (MDS) techniques can obtain visual representation for exploring patterns of proximities among 28 national universities based on seven attributes of educational conditions. Research design, data, and methodology - This work is carried out by the 2015 Announcement of University Information, which is provided by Ministry of Education in South Korea and utilized by multivariate analyses with CLUSTER, PROXIMITIES and ALSCAL modules in IBM SPSS 23.0. Results - We make certain that 28 national universities can be categorized into five clusters which have similar traits by applying two-stage cluster analysis. MDS is utilized to perform positioning of grouped places of cluster and 28 national universities joining every cluster. Conclusions - Both types and traits of each national university can be relatively assessed and practically utilized for each university competitiveness based on underlying results.

A Spatial Statistical Approach to Residential Differentiation (I): Developing a Spatial Separation Measure (거주지 분화에 대한 공간통계학적 접근 (I): 공간 분리성 측도의 개발)

  • Lee, Sang-Il
    • Journal of the Korean Geographical Society
    • /
    • v.42 no.4
    • /
    • pp.616-631
    • /
    • 2007
  • Residential differentiation is an academic theme which has been given enormous attention in urban studies. This is due to the fact that residential segregation can be seen as one of the best indicators for socio-spatial dialectics occurring on urban space. Measuring how one population group is differentiated from the other group in terms of residential space has been a focal point in the residential segregation studies. The index of dissimilarity has been the most extensively used one. Despite its popularity, however, it has been accused of inability to capture the degree of spatial clustering that unevenly distributed population groups usually display. Further, the spatial indices of segregation which have been introduced to edify the problems of the index of dissimilarity also have some drawbacks: significance testing methods have never been provided; recent advances in spatial statistics have not been extensively exploited. Thus, the main purpose of the research is to devise a spatial separation measure which is expected to gauge not only how unevenly two population groups are distributed over urban space, but also how much the uneven distributions are spatially clustered (spatial dependence). The main results are as follows. First, a new measure is developed by integrating spatial association measures and spatial chi-square statistics. A significance testing method based on the generalized randomization test is also provided. Second, a case study of residential differentiation among groups by educational attainment in major Korean metropolitan cities clearly shows the applicability of the analytical framework presented in the paper.

Cognitive Modeling of Unusual Association with Declarative Knowledge by Positive Affect (긍정적 감정에 따른 선언적 지식에 관한 비전형적 연상 과정에 대한 인지모델링)

  • Park, Sung-Jin;Myung, Ro-Hae
    • Journal of Korean Institute of Industrial Engineers
    • /
    • v.41 no.1
    • /
    • pp.43-49
    • /
    • 2015
  • The aim of this study was to model unusual association with declarative knowledge by positive affect using ACT-R cognitive architecture. Existing research related with cognitive modeling tends to pay a lot of attention to strong and negative cognitive moderator. Mild positive affect, however, has far-reaching effects on problem solving and decision making. Typically, subjects with positive affect were more likely to respond to unusual associates in a word association task than subjects with neutral affect. In this study, a cognitive model using ACT-R cognitive architecture was developed to show the effect of positive affect on the cognitive organization related with memory. First, we organized the memory structure of stimulus word 'palm' based on published results in a word association task. Then, we decreased an ACT-R parameter that reflects the amount of weighting given to the dissimilarity between the stimulus word and the associate word to represent reorganized memory structure of the model by positive affect. As a result, no significant associate probability difference between model prediction and existing empirical data was found. The ACT-R cognitive architecture could be used to model the effect of positive affect on the unusual association by decreasing (manipulating) the weight of the dissimilarity. This study is useful in conducting model-based evaluation of the effects of positive affect in complex tasks involving memory, such as creative problem solving.

Evaluation of Shopping Items: Focused on Purchase of Foreign Tourists in South Korea

  • Jeong, Dong-Bin
    • East Asian Journal of Business Economics (EAJBE)
    • /
    • v.7 no.2
    • /
    • pp.21-30
    • /
    • 2019
  • Purpose - In this work, we categorize the 21 shopping items which foreign tourists purchase in South Korea and monitor the level of dissimilarity (or similarity) between each item by utilizing distance matrix, and both hierarchical and k-means cluster analyses, respectively, based on several purpose of visit attributes in 2017. In addition, multidimensional scaling (MDS) method is applied for mining visual appearance of proximities among shopping items based on purpose of visit attributes. Research design and methodology - This study is carried out in 2017 by Ministry of Culture, Sports and Tourism and conduct a face-to-face survey of foreign tourists from 20 countries who purchase shopping items in South Korea. CLUSTER, PROXIMITIES and ALSCAL modules in IBM SPSS 23.0 are used to perform this work. Results - We ascertain that 21 shopping items can be classified into five similar groups which have homogeneous traits by going through two-step cluster analysis. We can position homogeneous places of cluster and shopping items joining each cluster. Conclusions - We can relatively assess patterns and characteristics of each shopping item, come by useful information in activating shopping tour based on the actual state of recognition of foreign tourists and practically apply to each tourism industry on underlying results.

Estimation of Scour Depth at Bridnges and Comparative Analysis between Estimated and Measured Scour Depths (교량에서의 세굴깊이 산정 및 산정치와 실측치의 비교분석)

  • Yun, Yong-Nam;Lee, Jae-Su;Ho, Jeong-Seok
    • Journal of Korea Water Resources Association
    • /
    • v.30 no.5
    • /
    • pp.477-485
    • /
    • 1997
  • Recent internal and external bridge failures due to pier and abutment scour have emphasized the need for better methods of scour depth estimation. This paper compares the hydraulic analysis of the Namhan River Bridge over the Namhan River using one-dimensional models. WSPRO & HEC-2, and the two-dimensional model. TABS-MD based on the procedures presented in HEC-18 published by the U.S. FEdral Highway Administration. A comparison of estimated scour depth for this research based on the results from both one-dimensional and two-dimensional model is presented. At the same time, field measurement has been performed before and after flood using sounding instrument. Fathometer (DE-719C). A comparison between estimated and measured scour depth at bridge is also presented. Result shows that there is all the difference between estimated and measured scour depth due to dissimilarity between laboratory and field conditions. Also, it is difficult to measure the maximum scour depth accurately due to refilling. Therefore development of scour measuring equipment which can be used during peak flood, and derivation of empirical model appropriate for internal river system seems urgent.

  • PDF

Association-based Unsupervised Feature Selection for High-dimensional Categorical Data (고차원 범주형 자료를 위한 비지도 연관성 기반 범주형 변수 선택 방법)

  • Lee, Changki;Jung, Uk
    • Journal of Korean Society for Quality Management
    • /
    • v.47 no.3
    • /
    • pp.537-552
    • /
    • 2019
  • Purpose: The development of information technology makes it easy to utilize high-dimensional categorical data. In this regard, the purpose of this study is to propose a novel method to select the proper categorical variables in high-dimensional categorical data. Methods: The proposed feature selection method consists of three steps: (1) The first step defines the goodness-to-pick measure. In this paper, a categorical variable is relevant if it has relationships among other variables. According to the above definition of relevant variables, the goodness-to-pick measure calculates the normalized conditional entropy with other variables. (2) The second step finds the relevant feature subset from the original variables set. This step decides whether a variable is relevant or not. (3) The third step eliminates redundancy variables from the relevant feature subset. Results: Our experimental results showed that the proposed feature selection method generally yielded better classification performance than without feature selection in high-dimensional categorical data, especially as the number of irrelevant categorical variables increase. Besides, as the number of irrelevant categorical variables that have imbalanced categorical values is increasing, the difference in accuracy between the proposed method and the existing methods being compared increases. Conclusion: According to experimental results, we confirmed that the proposed method makes it possible to consistently produce high classification accuracy rates in high-dimensional categorical data. Therefore, the proposed method is promising to be used effectively in high-dimensional situation.

A Study on Interactions of Competitive Promotions Between the New and Used Cars (신차와 중고차간 프로모션의 상호작용에 대한 연구)

  • Chang, Kwangpil
    • Asia Marketing Journal
    • /
    • v.14 no.1
    • /
    • pp.83-98
    • /
    • 2012
  • In a market where new and used cars are competing with each other, we would run the risk of obtaining biased estimates of cross elasticity between them if we focus on only new cars or on only used cars. Unfortunately, most of previous studies on the automobile industry have focused on only new car models without taking into account the effect of used cars' pricing policy on new cars' market shares and vice versa, resulting in inadequate prediction of reactive pricing in response to competitors' rebate or price discount. However, there are some exceptions. Purohit (1992) and Sullivan (1990) looked into both new and used car markets at the same time to examine the effect of new car model launching on the used car prices. But their studies have some limitations in that they employed the average used car prices reported in NADA Used Car Guide instead of actual transaction prices. Some of the conflicting results may be due to this problem in the data. Park (1998) recognized this problem and used the actual prices in his study. His work is notable in that he investigated the qualitative effect of new car model launching on the pricing policy of the used car in terms of reinforcement of brand equity. The current work also used the actual price like Park (1998) but the quantitative aspect of competitive price promotion between new and used cars of the same model was explored. In this study, I develop a model that assumes that the cross elasticity between new and used cars of the same model is higher than those amongst new cars and used cars of the different model. Specifically, I apply the nested logit model that assumes the car model choice at the first stage and the choice between new and used cars at the second stage. This proposed model is compared to the IIA (Independence of Irrelevant Alternatives) model that assumes that there is no decision hierarchy but that new and used cars of the different model are all substitutable at the first stage. The data for this study are drawn from Power Information Network (PIN), an affiliate of J.D. Power and Associates. PIN collects sales transaction data from a sample of dealerships in the major metropolitan areas in the U.S. These are retail transactions, i.e., sales or leases to final consumers, excluding fleet sales and including both new car and used car sales. Each observation in the PIN database contains the transaction date, the manufacturer, model year, make, model, trim and other car information, the transaction price, consumer rebates, the interest rate, term, amount financed (when the vehicle is financed or leased), etc. I used data for the compact cars sold during the period January 2009- June 2009. The new and used cars of the top nine selling models are included in the study: Mazda 3, Honda Civic, Chevrolet Cobalt, Toyota Corolla, Hyundai Elantra, Ford Focus, Volkswagen Jetta, Nissan Sentra, and Kia Spectra. These models in the study accounted for 87% of category unit sales. Empirical application of the nested logit model showed that the proposed model outperformed the IIA (Independence of Irrelevant Alternatives) model in both calibration and holdout samples. The other comparison model that assumes choice between new and used cars at the first stage and car model choice at the second stage turned out to be mis-specfied since the dissimilarity parameter (i.e., inclusive or categroy value parameter) was estimated to be greater than 1. Post hoc analysis based on estimated parameters was conducted employing the modified Lanczo's iterative method. This method is intuitively appealing. For example, suppose a new car offers a certain amount of rebate and gains market share at first. In response to this rebate, a used car of the same model keeps decreasing price until it regains the lost market share to maintain the status quo. The new car settle down to a lowered market share due to the used car's reaction. The method enables us to find the amount of price discount to main the status quo and equilibrium market shares of the new and used cars. In the first simulation, I used Jetta as a focal brand to see how its new and used cars set prices, rebates or APR interactively assuming that reactive cars respond to price promotion to maintain the status quo. The simulation results showed that the IIA model underestimates cross elasticities, resulting in suggesting less aggressive used car price discount in response to new cars' rebate than the proposed nested logit model. In the second simulation, I used Elantra to reconfirm the result for Jetta and came to the same conclusion. In the third simulation, I had Corolla offer $1,000 rebate to see what could be the best response for Elantra's new and used cars. Interestingly, Elantra's used car could maintain the status quo by offering lower price discount ($160) than the new car ($205). In the future research, we might want to explore the plausibility of the alternative nested logit model. For example, the NUB model that assumes choice between new and used cars at the first stage and brand choice at the second stage could be a possibility even though it was rejected in the current study because of mis-specification (A dissimilarity parameter turned out to be higher than 1). The NUB model may have been rejected due to true mis-specification or data structure transmitted from a typical car dealership. In a typical car dealership, both new and used cars of the same model are displayed. Because of this fact, the BNU model that assumes brand choice at the first stage and choice between new and used cars at the second stage may have been favored in the current study since customers first choose a dealership (brand) then choose between new and used cars given this market environment. However, suppose there are dealerships that carry both new and used cars of various models, then the NUB model might fit the data as well as the BNU model. Which model is a better description of the data is an empirical question. In addition, it would be interesting to test a probabilistic mixture model of the BNU and NUB on a new data set.

  • PDF