• Title/Summary/Keyword: Dissimilarity

Search Result 269, Processing Time 0.027 seconds

Semi-supervised learning using similarity and dissimilarity

  • Seok, Kyung-Ha
    • Journal of the Korean Data and Information Science Society
    • /
    • v.22 no.1
    • /
    • pp.99-105
    • /
    • 2011
  • We propose a semi-supervised learning algorithm based on a form of regularization that incorporates similarity and dissimilarity penalty terms. Our approach uses a graph-based encoding of similarity and dissimilarity. We also present a model-selection method which employs cross-validation techniques to choose hyperparameters which affect the performance of the proposed method. Simulations using two types of dat sets demonstrate that the proposed method is promising.

Design and evaluation of a dissimilarity-based anomaly detection method for mobile wireless networks (이동 무선망을 위한 비유사도 기반 비정상 행위 탐지 방법의 설계 및 평가)

  • Lee, Hwa-Ju;Bae, Ihn-Han
    • Journal of the Korean Data and Information Science Society
    • /
    • v.20 no.2
    • /
    • pp.387-399
    • /
    • 2009
  • Mobile wireless networks continue to be plagued by theft of identify and intrusion. Both problems can be addressed in two different ways, either by misuse detection or anomaly-based detection. In this paper, we propose a dissimilarity-based anomaly detection method which can effectively identify abnormal behavior such as mobility patterns of mobile wireless networks. In the proposed algorithm, a normal profile is constructed from normal mobility patterns of mobile nodes in mobile wireless networks. From the constructed normal profile, a dissimilarity is computed by a weighted dissimilarity measure. If the value of the weighted dissimilarity measure is greater than the dissimilarity threshold that is a system parameter, an alert message is occurred. The performance of the proposed method is evaluated through a simulation. From the result of the simulation, we know that the proposed method is superior to the performance of other anomaly detection methods using dissimilarity measures.

  • PDF

Spousal Dissimilarity in Age and Education and Marital Stability among Transnational Couples in Korea: A Test of the Transnational Openness Hypothesis (국제결혼 부부의 연령 및 교육수준 격차와 결혼안정성: 국제결혼개방성 가설의 검증)

  • Kim, Doo-Sub
    • Korea journal of population studies
    • /
    • v.35 no.1
    • /
    • pp.1-30
    • /
    • 2012
  • This study explores the effects of spousal dissimilarity on marital stability among transnational couples in Korea. Utilizing micro-data from the 2009 Korean National Multi-culture Family Survey, this paper examines whether formation of transnational marriage generally involves positive assortative matching on age and education. Indices of age dissimilarity and educational dissimilarity are calculated for each country of origin of the foreign wife, and their relationships to the average duration of marriage are analyzed. This study also conducts a micro-level analysis of whether age and educational dissimilarity between spouses helps explain variations in marital duration and probability of getting divorced. Results show greater incidences of spousal dissimilarity in age and educational attainment among transnational couples, which supports the transnational openness hypothesis proposed in this paper. The extant hypothesis that spousal dissimilarity increases the risk of marital dissolution and shortens the duration of marriage is not found to fit transnational couples in Korea.

  • PDF

On Optimizing Dissimilarity-Based Classifications Using a DTW and Fusion Strategies (DTW와 퓨전기법을 이용한 비유사도 기반 분류법의 최적화)

  • Kim, Sang-Woon;Kim, Seung-Hwan
    • Journal of the Institute of Electronics Engineers of Korea CI
    • /
    • v.47 no.2
    • /
    • pp.21-28
    • /
    • 2010
  • This paper reports an experimental result on optimizing dissimilarity-based classification(DBC) by simultaneously using a dynamic time warping(DTW) and a multiple fusion strategy(MFS). DBC is a way of defining classifiers among classes; they are not based on the feature measurements of individual samples, but rather on a suitable dissimilarity measure among the samples. In DTW, the dissimilarity is measured in two steps: first, we adjust the object samples by finding the best warping path with a correlation coefficient-based DTW technique. We then compute the dissimilarity distance between the adjusted objects with conventional measures. In MFS, fusion strategies are repeatedly used in generating dissimilarity matrices as well as in designing classifiers: we first combine the dissimilarity matrices obtained with the DTW technique to a new matrix. After training some base classifiers in the new matrix, we again combine the results of the base classifiers. Our experimental results for well-known benchmark databases demonstrate that the proposed mechanism achieves further improved results in terms of classification accuracy compared with the previous approaches. From this consideration, the method could also be applied to other high-dimensional tasks, such as multimedia information retrieval.

Studies on structural interaction and performance of cement composite using Molecular Dynamics

  • Sindu, B.S.;Alex, Aleena;Sasmal, Saptarshi
    • Advances in Computational Design
    • /
    • v.3 no.2
    • /
    • pp.147-163
    • /
    • 2018
  • Cementitious composites are multiphase heterogeneous materials with distinct dissimilarity in strength under compression and tension (high under compression and very low under tension). At macro scale, the phenomenon can be well-explained as the material contains physical heterogeneity and pores. But, it is interesting to note that this dissimilarity initiates at molecular level where there is no heterogeneity. In this regard, molecular dynamics based computational investigations are carried out on cement clinkers and calcium silicate hydrate (C-S-H) under tension and compression to trace out the origin of dissimilarity. In the study, effect of strain rate, size of computational volume and presence of un-structured atoms on the obtained response is also investigated. It is identified that certain type of molecular interactions and the molecular structural parameters are responsible for causing the dissimilarity in behavior. Hence, the judiciously modified or tailored molecular structure would not only be able to reduce the extent of dissimilarity, it would also be capable of incorporating the desired properties in heterogeneous composites. The findings of this study would facilitate to take step to scientifically alter the structure of cementitious composites to attain the desired mechanical properties.

A Dissimilarity with Dice-Jaro-Winkler Test Case Prioritization Approach for Model-Based Testing in Software Product Line

  • Sulaiman, R. Aduni;Jawawi, Dayang N.A.;Halim, Shahliza Abdul
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.15 no.3
    • /
    • pp.932-951
    • /
    • 2021
  • The effectiveness of testing in Model-based Testing (MBT) for Software Product Line (SPL) can be achieved by considering fault detection in test case. The lack of fault consideration caused test case in test suite to be listed randomly. Test Case Prioritization (TCP) is one of regression techniques that is adaptively capable to detect faults as early as possible by reordering test cases based on fault detection rate. However, there is a lack of studies that measured faults in MBT for SPL. This paper proposes a Test Case Prioritization (TCP) approach based on dissimilarity and string based distance called Last Minimal for Local Maximal Distance (LM-LMD) with Dice-Jaro-Winkler Dissimilarity. LM-LMD with Dice-Jaro-Winkler Dissimilarity adopts Local Maximum Distance as the prioritization algorithm and Dice-Jaro-Winkler similarity measure to evaluate distance among test cases. This work is based on the test case generated from statechart in Software Product Line (SPL) domain context. Our results are promising as LM-LMD with Dice-Jaro-Winkler Dissimilarity outperformed the original Local Maximum Distance, Global Maximum Distance and Enhanced All-yes Configuration algorithm in terms of Average Fault Detection Rate (APFD) and average prioritization time.

Evaluation on Development Performances of E-Commerce for 50 Major Cities in China (중국 주요 50개 도시의 전자상거래 발전성과에 대한 평가)

  • Jeong, Dong-Bin;Wang, Qiang
    • Journal of Distribution Science
    • /
    • v.14 no.1
    • /
    • pp.67-74
    • /
    • 2016
  • Purpose - In this paper, the degree of similarity and dissimilarity between pairs of 50 major cities in China can be shown on the basis of three evaluation variables(internet businessman index, internet shopping index and e-commerce development index). Dissimilarity distance matrix is used to analyze both similarity and dissimilarity between each fifty city in China by calculating dissimilarity as distance. Higher value signifies higher degree of dissimilarity between two cities. Cluster analysis is exploited to classify 50 cities into a number of different groups such that similar cities are placed in the same group. In addition, multidimensional scaling(MDS) technique can obtain visual representation for exploring the pattern of proximities among 50 major cities in China based on three development performance attributes. Research design, data, and methodology - This research is performed by the 2013 report provided with AliResearch in China(1/1/2013~11/30/2013) and utilized multivariate methods such as dissimilarity distance matrix, cluster analysis and MDS by using CLUSTER, KMEANS, PROXIMITIES and ALSCAL procedures in SPSS 21.0. Results - This research applies two types of cluster analysis and MDS on three development performances based on the 2013 report of Aliresearch. As a result, it is confirmed that grouping is possible by categorizing the types into four clusters which share similar characteristics. MDS is exploited to carry out positioning of both grouped locations of cluster and 50 major cities belonging to each cluster. Since all the values corresponding to Shenzhen, Guangzhou and Hangzhou(which belong to cluster 1 among 50 major cities) are very large, these cities are superior to other cities in all three evaluation attributes. Twelve cities(Beijing, ShangHai, Jinghua, ZhuHai, XiaMen, SuZhou, NanJing, DongWan, ZhangShan, JiaXing, NingBo and FoShan), which belong to cluster 3, are inferior to those of cluster 1 in terms of all three attributes, but they can be expected to be the next e-commerce revolution. The rest of major cities, in particular, which belong to cluster 4 are relatively inferior in all three attributes, so that this automatically evokes creative innovation, which leads to e-commerce development as a whole in China. In terms of internet businessman index, on the other hand, Tainan, Taizhong, and Gaoxiong(which belong to cluster 2) are situated superior to others. However, these three cities are inferior to others in an internet shopping index sense. The rest of major cities, in particular, which belong to cluster 4 are relatively inferior in all three evaluation attributes, so that this automatically evokes innovation and entrepreneurship, which leads to e-commerce development as a whole in China. Conclusions - This study suggests the implications to help e-governmental officers and companies make strategies in both Korea and China. This is expected to give some useful information in understanding the recent situation of e-commerce in China, by looking over development performances of 50 major cities. Therefore, we should develop marketing, branding and communication relevant to online Chinese consumers. One of these efforts will be incentives like loyalty points and coupons that can encourage consumers and building in-house logistics networks.

A Comparative Experiment on Dimensional Reduction Methods Applicable for Dissimilarity-Based Classifications (비유사도-기반 분류를 위한 차원 축소방법의 비교 실험)

  • Kim, Sang-Woon
    • Journal of the Institute of Electronics and Information Engineers
    • /
    • v.53 no.3
    • /
    • pp.59-66
    • /
    • 2016
  • This paper presents an empirical evaluation on dimensionality reduction strategies by which dissimilarity-based classifications (DBC) can be implemented efficiently. In DBC, classification is not based on feature measurements of individual objects (a set of attributes), but rather on a suitable dissimilarity measure among the individual objects (pair-wise object comparisons). One problem of DBC is the high dimensionality of the dissimilarity space when a lots of objects are treated. To address this issue, two kinds of solutions have been proposed in the literature: prototype selection (PS)-based methods and dimension reduction (DR)-based methods. In this paper, instead of utilizing the PS-based or DR-based methods, a way of performing DBC in Eigen spaces (ES) is considered and empirically compared. In ES-based DBC, classifications are performed as follows: first, a set of principal eigenvectors is extracted from the training data set using a principal component analysis; second, an Eigen space is expanded using a subset of the extracted and selected Eigen vectors; third, after measuring distances among the projected objects in the Eigen space using $l_p$-norms as the dissimilarity, classification is performed. The experimental results, which are obtained using the nearest neighbor rule with artificial and real-life benchmark data sets, demonstrate that when the dimensionality of the Eigen spaces has been selected appropriately, compared to the PS-based and DR-based methods, the performance of the ES-based DBC can be improved in terms of the classification accuracy.

Fuzzy Clustering of Fuzzy Data using a Dissimilarity Measure (비유사도 척도를 이용한 퍼지 데이터에 대한 퍼지 클러스터링)

  • Lee, Geon-Myeong
    • Journal of KIISE:Software and Applications
    • /
    • v.26 no.9
    • /
    • pp.1114-1124
    • /
    • 1999
  • 클러스터링은 동일한 클러스터에 속하는 데이타들 간에는 유사도가 크도록 하고 다른 클러스터에 속하는 데이타들 간에는 유사도가 작도록 주어진 데이타를 몇 개의 클러스터로 묶는 것이다. 어떤 대상을 기술하는 데이타는 수치 속성뿐만 아니라 정성적인 비수치 속성을 갖게 되고, 이들 속성값은 관측 오류, 불확실성, 주관적인 판정 등으로 인해서 정확한 값으로 주어지지 않고 애매한 값으로 주어지는 경우가 많다. 본 논문에서는 애매한 값을 퍼지값으로 표현하는 수치 속성과 비수치 속성을 포함한 데이타에 대한 비유사도 척도를 제안하고, 이 척도를 이용하여 퍼지값을 포함한 데이타에 대하여 퍼지 클러스터링하는 방법을 소개한 다음, 이를 이용한 실험 결과를 보인다. Abstract The objective of clustering is to group a set of data into some number of clusters in a way to minimize the similarity between data belonging to different clusters and to maximize the similarity between data belonging to the same cluster. Many data for real world objects consist of numeric attributes and non-numeric attributes whose values are fuzzily described due to observation error, uncertainty, subjective judgement, and so on. This paper proposes a dissimilarity measure applicable to such data and then introduces a fuzzy clustering method for such data using the proposed dissimilarity measure. It also presents some experiment results to show the applicability of the proposed clustering method and dissimilarity measure.

Categorical Data Clustering Analysis Using Association-based Dissimilarity (연관성 기반 비유사성을 활용한 범주형 자료 군집분석)

  • Lee, Changki;Jung, Uk
    • Journal of Korean Society for Quality Management
    • /
    • v.47 no.2
    • /
    • pp.271-281
    • /
    • 2019
  • Purpose: The purpose of this study is to suggest a more efficient distance measure taking into account the relationship between categorical variables for categorical data cluster analysis. Methods: In this study, the association-based dissimilarity was employed to calculate the distance between two categorical data observations and the distance obtained from the association-based dissimilarity was applied to the PAM cluster algorithms to verify its effectiveness. The strength of association between two different categorical variables can be calculated using a mixture of dissimilarities between the conditional probability distributions of other categorical variables, given these two categorical values. In particular, this method is suitable for datasets whose categorical variables are highly correlated. Results: The simulation results using several real life data showed that the proposed distance which considered relationships among the categorical variables generally yielded better clustering performance than the Hamming distance. In addition, as the number of correlated variables was increasing, the difference in the performance of the two clustering methods based on different distance measures became statistically more significant. Conclusion: This study revealed that the adoption of the relationship between categorical variables using our proposed method positively affected the results of cluster analysis.