• 제목/요약/키워드: Dissimilarity Distance Matrix

검색결과 11건 처리시간 0.028초

중국 주요 50개 도시의 전자상거래 발전성과에 대한 평가 (Evaluation on Development Performances of E-Commerce for 50 Major Cities in China)

  • 정동빈;왕강
    • 유통과학연구
    • /
    • 제14권1호
    • /
    • pp.67-74
    • /
    • 2016
  • Purpose - In this paper, the degree of similarity and dissimilarity between pairs of 50 major cities in China can be shown on the basis of three evaluation variables(internet businessman index, internet shopping index and e-commerce development index). Dissimilarity distance matrix is used to analyze both similarity and dissimilarity between each fifty city in China by calculating dissimilarity as distance. Higher value signifies higher degree of dissimilarity between two cities. Cluster analysis is exploited to classify 50 cities into a number of different groups such that similar cities are placed in the same group. In addition, multidimensional scaling(MDS) technique can obtain visual representation for exploring the pattern of proximities among 50 major cities in China based on three development performance attributes. Research design, data, and methodology - This research is performed by the 2013 report provided with AliResearch in China(1/1/2013~11/30/2013) and utilized multivariate methods such as dissimilarity distance matrix, cluster analysis and MDS by using CLUSTER, KMEANS, PROXIMITIES and ALSCAL procedures in SPSS 21.0. Results - This research applies two types of cluster analysis and MDS on three development performances based on the 2013 report of Aliresearch. As a result, it is confirmed that grouping is possible by categorizing the types into four clusters which share similar characteristics. MDS is exploited to carry out positioning of both grouped locations of cluster and 50 major cities belonging to each cluster. Since all the values corresponding to Shenzhen, Guangzhou and Hangzhou(which belong to cluster 1 among 50 major cities) are very large, these cities are superior to other cities in all three evaluation attributes. Twelve cities(Beijing, ShangHai, Jinghua, ZhuHai, XiaMen, SuZhou, NanJing, DongWan, ZhangShan, JiaXing, NingBo and FoShan), which belong to cluster 3, are inferior to those of cluster 1 in terms of all three attributes, but they can be expected to be the next e-commerce revolution. The rest of major cities, in particular, which belong to cluster 4 are relatively inferior in all three attributes, so that this automatically evokes creative innovation, which leads to e-commerce development as a whole in China. In terms of internet businessman index, on the other hand, Tainan, Taizhong, and Gaoxiong(which belong to cluster 2) are situated superior to others. However, these three cities are inferior to others in an internet shopping index sense. The rest of major cities, in particular, which belong to cluster 4 are relatively inferior in all three evaluation attributes, so that this automatically evokes innovation and entrepreneurship, which leads to e-commerce development as a whole in China. Conclusions - This study suggests the implications to help e-governmental officers and companies make strategies in both Korea and China. This is expected to give some useful information in understanding the recent situation of e-commerce in China, by looking over development performances of 50 major cities. Therefore, we should develop marketing, branding and communication relevant to online Chinese consumers. One of these efforts will be incentives like loyalty points and coupons that can encourage consumers and building in-house logistics networks.

A practical application of cluster analysis using SPSS

  • Kim, Dae-Hak
    • Journal of the Korean Data and Information Science Society
    • /
    • 제20권6호
    • /
    • pp.1207-1212
    • /
    • 2009
  • Basic objective in cluster analysis is to discover natural groupings of items or variables. In general, clustering is conducted based on some similarity (or dissimilarity) matrix or the original input text data. Various measures of similarities (or dissimilarities) between objects (or variables) are developed. We introduce a real application problem of clustering procedure in SPSS when the distance matrix of the objects (or variables) is only given as an input data. It will be very helpful for the cluster analysis of huge data set which leads the size of the proximity matrix greater than 1000, particularly. Syntax command for matrix input data in SPSS for clustering is given with numerical examples.

  • PDF

DTW와 퓨전기법을 이용한 비유사도 기반 분류법의 최적화 (On Optimizing Dissimilarity-Based Classifications Using a DTW and Fusion Strategies)

  • 김상운;김승환
    • 전자공학회논문지CI
    • /
    • 제47권2호
    • /
    • pp.21-28
    • /
    • 2010
  • 본 논문에서는 동적시간교정법(dynamic time warping: DTW)과 다중퓨전기법(multiple fusion strategy: MFS)을 연속 적용하여 비유사도기반 분류법(dissimilarity-based classification: DBC)을 최적화시키는 방법의 실험결과를 보고한다. DBC란 샘플패턴을 분류하기 위하여 샘플의 특징 값을 이용하는 대신에 샘플들 사이의 비유사도를 측정하여 분류기를 설계하는 방법이다. DTW에서는 다음과 같이 두 단계로 나누어 비유사도를 측정한다. 먼저 상관계수를 이용하여 객체 샘플들을 대응시키기 위한 최적의 대응경로를 찾을 수 있도록 샘플들을 조정한다. 그리고 기존의 거리측정법으로 조정된 샘플들 사이의 비유사도를 측정한다. MFS에서는 분류기결합 뿐만 아니라 비유사도 행렬생성에서도 퓨전기법을 적용한다. 즉, DTW 기법으로 작성한 다수의 비유사도 행렬들을 결합하여 새로운 비유사도 행렬을 생성한 다음, 이 행렬공간에서 여러 개의 베이스 분류기를 학습하여 다시 결합한다. 본 논문에서 제안한 방법을 벤취마크 영상 데이터베이스를 대상으로 실험한 결과, 기존의 방법과 비교하여 분류성능을 향상시킬 수 있음을 확인하였다. 이와 같은 실험결과로 볼 때, 제안 방법을 멀티미디어 정보검색 등과 같은 다른 고차원 응용에도 활용할 수 있을 것으로 사료된다.

Genetic variation and relationship of Artemisia capillaris Thunb.(Compositae) by RAPD analysis

  • Kim, Jung-Hyun;Kim, Dong-Kap;Kim, Joo-Hwan
    • 한국자원식물학회지
    • /
    • 제22권3호
    • /
    • pp.242-247
    • /
    • 2009
  • Randomly Amplified Polymorphic DNA (RAPD) was performed to define the genetic variation and relationships of Artemisia capillaris. Fifteen populations by the distributions and habitat were collected to conduct RAPD analysis. RAPD markers were observed mainly between 300bp and 1600bp. Total 72 scorable markers from 7 primers were applied to generate the genetic matrix, and 69 bands were polymorphic and only 3 bands were monomorphic. The genetic dissimilarity matrix by Nei's genetic distance (1972) and UPGMA phenogram were produced from the data matrix. Populations of Artemisia capillaris were clustered with high genetic affinities and cluster patterns were correlated with distributional patterns. Two big groups were clustered as southern area group and middle area group. The closest OTUs were GW2 and GG1 in middle area group, and GB1 from southern area group was clustered with OTUs in middle area group. RAPD data was useful to define the genetic variations and relationships of A. capillaris.

Geodesic Clustering for Covariance Matrices

  • Lee, Haesung;Ahn, Hyun-Jung;Kim, Kwang-Rae;Kim, Peter T.;Koo, Ja-Yong
    • Communications for Statistical Applications and Methods
    • /
    • 제22권4호
    • /
    • pp.321-331
    • /
    • 2015
  • The K-means clustering algorithm is a popular and widely used method for clustering. For covariance matrices, we consider a geodesic clustering algorithm based on the K-means clustering framework in consideration of symmetric positive definite matrices as a Riemannian (non-Euclidean) manifold. This paper considers a geodesic clustering algorithm for data consisting of symmetric positive definite (SPD) matrices, utilizing the Riemannian geometric structure for SPD matrices and the idea of a K-means clustering algorithm. A K-means clustering algorithm is divided into two main steps for which we need a dissimilarity measure between two matrix data points and a way of computing centroids for observations in clusters. In order to use the Riemannian structure, we adopt the geodesic distance and the intrinsic mean for symmetric positive definite matrices. We demonstrate our proposed method through simulations as well as application to real financial data.

WAVELET-BASED FOREST AREAS CLASSIFICATION BY USING HIGH RESOLUTION IMAGERY

  • Yoon Bo-Yeol;Kim Choen
    • 대한원격탐사학회:학술대회논문집
    • /
    • 대한원격탐사학회 2005년도 Proceedings of ISRS 2005
    • /
    • pp.698-701
    • /
    • 2005
  • This paper examines that is extracted certain information in forest areas within high resolution imagery based on wavelet transformation. First of all, study areas are selected one more species distributed spots refer to forest type map. Next, study area is cut 256 x 256 pixels size because of image processing problem in large volume data. Prior to wavelet transformation, five texture parameters (contrast, dissimilarity, entropy, homogeneity, Angular Second Moment (ASM≫ calculated by using Gray Level Co-occurrence Matrix (GLCM). Five texture images are set that shifting window size is 3x3, distance .is 1 pixel, and angle is 45 degrees used. Wavelet function is selected Daubechies 4 wavelet basis functions. Result is summarized 3 points; First, Wavelet transformation images derived from contrast, dissimilarity (texture parameters) have on effect on edge elements detection and will have probability used forest road detection. Second, Wavelet fusion images derived from texture parameters and original image can apply to forest area classification because of clustering in Homogeneous forest type structure. Third, for grading evaluation in forest fire damaged area, if data fusion of established classification method, GLCM texture extraction concept and wavelet transformation technique effectively applied forest areas (also other areas), will obtain high accuracy result.

  • PDF

Assessment of Educational Conditions for 28 National Universities in South Korea

  • Jeong, Dong-Bin
    • Asian Journal of Business Environment
    • /
    • 제7권1호
    • /
    • pp.25-29
    • /
    • 2017
  • Purpose - In this paper, we categorize and segment the 28 national universities in South Korea and measure the degree of dissimilarity (or similarity) between pairs of ones by using dissimilarity distance matrix and cluster analysis, respectively, based on the seven quantitative evaluation of educational conditions (percentage of small-scale courses, percentage of lecture by the faculty, collection of books per student, material purchase per student, percentage of building capacity, percentage of real estate capacity and rate of accommodation) in 2015. In addition, multidimensional scaling (MDS) techniques can obtain visual representation for exploring patterns of proximities among 28 national universities based on seven attributes of educational conditions. Research design, data, and methodology - This work is carried out by the 2015 Announcement of University Information, which is provided by Ministry of Education in South Korea and utilized by multivariate analyses with CLUSTER, PROXIMITIES and ALSCAL modules in IBM SPSS 23.0. Results - We make certain that 28 national universities can be categorized into five clusters which have similar traits by applying two-stage cluster analysis. MDS is utilized to perform positioning of grouped places of cluster and 28 national universities joining every cluster. Conclusions - Both types and traits of each national university can be relatively assessed and practically utilized for each university competitiveness based on underlying results.

CUDA 및 분할-정복 기반의 효율적인 다차원 척도법 (An Efficient Multidimensional Scaling Method based on CUDA and Divide-and-Conquer)

  • 박성인;황규백
    • 한국정보과학회논문지:컴퓨팅의 실제 및 레터
    • /
    • 제16권4호
    • /
    • pp.427-431
    • /
    • 2010
  • 다차원 척도법(multidimensional scaling)은 고차원의 데이터를 낮은 차원의 공간에 매핑(mapping)하여 데이터 간의 유사성을 표현하는 방법이다. 이는 주로 자질 선정 및 데이터를 시각화하는 데 이용된다. 그러한 다차원 척도법 중, 전통 다차원 척도법(classical multidimensional scaling)은 긴 수행 시간과 큰 공간을 필요로 하기 때문에 객체의 수가 많은 경우에 대해 적용하기 어렵다. 이는 유클리드 거리(Euclidean distance)에 기반한 $n{\times}n$ 상이도 행렬(dissimilarity matrix)에 대해 고유쌍 문제(eigenpair problem)를 풀어야 하기 때문이다(단, n은 객체의 개수). 따라서, n이 커질수록 수행 시간이 길어지며, 메모리 사용량 증가로 인해 적용할 수 있는 데이터 크기에 한계가 있다. 본 논문에서는 이러한 문제를 완화하기 위해 GPGPU 기술 중 하나인 CUDA와 분할-정복(divide-and-conquer)기법을 활용한 효율적인 다차원 척도법을 제안하며, 다양한 실험을 통해 제안하는 기법이 객체의 개수가 많은 경우에 매우 효율적일 수 있음을 보인다.

Evaluation of Shopping Items: Focused on Purchase of Foreign Tourists in South Korea

  • Jeong, Dong-Bin
    • 동아시아경상학회지
    • /
    • 제7권2호
    • /
    • pp.21-30
    • /
    • 2019
  • Purpose - In this work, we categorize the 21 shopping items which foreign tourists purchase in South Korea and monitor the level of dissimilarity (or similarity) between each item by utilizing distance matrix, and both hierarchical and k-means cluster analyses, respectively, based on several purpose of visit attributes in 2017. In addition, multidimensional scaling (MDS) method is applied for mining visual appearance of proximities among shopping items based on purpose of visit attributes. Research design and methodology - This study is carried out in 2017 by Ministry of Culture, Sports and Tourism and conduct a face-to-face survey of foreign tourists from 20 countries who purchase shopping items in South Korea. CLUSTER, PROXIMITIES and ALSCAL modules in IBM SPSS 23.0 are used to perform this work. Results - We ascertain that 21 shopping items can be classified into five similar groups which have homogeneous traits by going through two-step cluster analysis. We can position homogeneous places of cluster and shopping items joining each cluster. Conclusions - We can relatively assess patterns and characteristics of each shopping item, come by useful information in activating shopping tour based on the actual state of recognition of foreign tourists and practically apply to each tourism industry on underlying results.

Relationships between Genetic Diversity and Fusarium Toxin Profiles of Winter Wheat Cultivars

  • Goral, Tomasz;Stuper-Szablewska, Kinga;Busko, Maciej;Boczkowska, Maja;Walentyn-Goral, Dorota;Wisniewska, Halina;Perkowski, Juliusz
    • The Plant Pathology Journal
    • /
    • 제31권3호
    • /
    • pp.226-244
    • /
    • 2015
  • Fusarium head blight is one of the most important and most common diseases of winter wheat. In order to better understanding this disease and to assess the correlations between different factors, 30 cultivars of this cereal were evaluated in a two-year period. Fusarium head blight resistance was evaluated and the concentration of trichothecene mycotoxins was analysed. Grain samples originated from plants inoculated with Fusarium culmorum and naturally infected with Fusarium species. The genetic distance between the tested cultivars was determined and data were analysed using multivariate data analysis methods. Genetic dissimilarity of wheat cultivars ranged between 0.06 and 0.78. They were grouped into three distinct groups after cluster analysis of genetic distance. Wheat cultivars differed in resistance to spike and kernel infection and in resistance to spread of Fusarium within a spike (type II). Only B trichothecenes (deoxynivalenol, 3-acetyldeoxynivalenol and nivalenol) produced by F. culmorum in grain samples from inoculated plots were present. In control samples trichothecenes of groups A (H-2 toxin, T-2 toxin, T-2 tetraol, T-2 triol, scirpentriol, diacetoxyscirpenol) and B were detected. On the basis of Fusarium head blight assessment and analysis of trichothecene concentration in the grain relationships between morphological characters, Fusarium head blight resistance and mycotoxins in grain of wheat cultivars were examined. The results were used to create of matrices of distance between cultivars - for trichothecene concentration in inoculated and naturally infected grain as well as for FHB resistance Correlations between genetic distance versus resistance/mycotoxin profiles were calculated using the Mantel test. A highly significant correlation between genetic distance and mycotoxin distance was found for the samples inoculated with Fusarium culmorum. Significant but weak relationships were found between genetic distance matrix and FHB resistance or trichothecene concentration in naturally infected grain matrices.