• Title/Summary/Keyword: euclidean similarity

Search Result 121, Processing Time 0.027 seconds

Similarity Measurement Between Titles and Abstracts Using Bijection Mapping and Phi-Correlation Coefficient

  • John N. Mlyahilu;Jong-Nam Kim
    • Journal of the Institute of Convergence Signal Processing
    • /
    • v.23 no.3
    • /
    • pp.143-149
    • /
    • 2022
  • This excerpt delineates a quantitative measure of relationship between a research title and its respective abstract extracted from different journal articles documented through a Korean Citation Index (KCI) database published through various journals. In this paper, we propose a machine learning-based similarity metric that does not assume normality on dataset, realizes the imbalanced dataset problem, and zero-variance problem that affects most of the rule-based algorithms. The advantage of using this algorithm is that, it eliminates the limitations experienced by Pearson correlation coefficient (r) and additionally, it solves imbalanced dataset problem. A total of 107 journal articles collected from the database were used to develop a corpus with authors, year of publication, title, and an abstract per each. Based on the experimental results, the proposed algorithm achieved high correlation coefficient values compared to others which are cosine similarity, euclidean, and pearson correlation coefficients by scoring a maximum correlation of 1, whereas others had obtained non-a-number value to some experiments. With these results, we found that an effective title must have high correlation coefficient with the respective abstract.

Trajectory Clustering in Road Network Environment (도로 네트워크 환경을 위한 궤적 클러스터링)

  • Bak, Ji-Haeng;Won, Jung-Im;Kim, Sang-Wook
    • The KIPS Transactions:PartD
    • /
    • v.16D no.3
    • /
    • pp.317-326
    • /
    • 2009
  • Recently, there have been many research efforts proposed on trajectory information. Most of them mainly focus their attention on those objects moving in Euclidean space. Many real-world applications such as telematics, however, deal with objects that move only over road networks, which are highly restricted for movement. Thus, the existing methods targeting Euclidean space cannot be directly applied to the road network space. This paper proposes a new clustering scheme for a large volume of trajectory information of objects moving over road networks. To the end, we first define a trajectory on a road network as a sequence of road segments a moving object has passed by. Next, we propose a similarity measurement scheme that judges the degree of similarity by considering the total length of matched road segments. Based on such similarity measurement, we propose a new clustering algorithm for trajectories by modifying and adjusting the FastMap and hierarchical clustering schemes. To evaluate the performance of the proposed clustering scheme, we also develop a trajectory generator considering the observation that most objects tend to move from the starting point to the destination point along their shortest path, and perform a variety of experiments using the trajectories thus generated. The performance result shows that our scheme has the accuracy of over 95% in comparison with that judged by human beings.

A New Similarity Measure based on RMF and It s Application to Linguistic Approximation (상대적 소수 함수에 기반을 둔 새로운 유사성 측도와 언어 근사에의 응용)

  • Choe, Dae-Yeong
    • The KIPS Transactions:PartB
    • /
    • v.8B no.5
    • /
    • pp.463-468
    • /
    • 2001
  • We propose a new similarity measure based on relative membership function (RMF). In this paper, the RMF is suggested to represent the relativity between fuzzy subsets easily. Since the shape of the RMF is determined according to the values of its parameters, we can easily represent the relativity between fuzzy subsets by adjusting only the values of its parameters. Hence, we can easily reflect the relativity among individuals or cultural differences when we represent the subjectivity by using the fuzzy subsets. In this case, these parameters may be regarded as feature points for determining the structure of fuzzy subset. In the sequel, the degree of similarity between fuzzy subsets can be quickly computed by using the parameters of the RMF. We use Euclidean distance to compute the degree of similarity between fuzzy subsets represented by the RMF. In the meantime, we present a new linguistic approximation method as an application area of the proposed similarity measure and show its numerical example.

  • PDF

Genetic Distances between Two Echiuran Populations Discriminated by PCR

  • Yoon, Jong-Man
    • Development and Reproduction
    • /
    • v.23 no.4
    • /
    • pp.377-384
    • /
    • 2019
  • Genomic DNA extracted from representatives of two populations, Gunsan and Chinese, of Urechis spp. was amplified using PCR with several primers. The band-sharing (BS) value between individuals no. 05 from the Gunsan population and no. 22 from the Chinese population was 0.206, which was the lowest recognized value. Oligonucleotides primer OPC-04 revealed 44 unique loci, which distinguished the Chinese population. Primer OPB-17 allowed the discovery of 22 loci shared by the two populations, which were present in all samples. Based on the average BS results, individuals from the Gunsan population demonstrated lower BS values (0.661±0.012) than did those from the Chinese population (0.788±0.014; p<0.05). The shortest genetic distance (GD) displaying a noteworthy molecular difference was between individuals CHINESE no. 12 and no. 13 (GD=0.027). Individual no. 06 from the Gunsan population was most distantly related to CHINESE no. 22 (GD=0.703). A group tree of the two populations was constructed by UPGMA Euclidean GD analysis based on a total of 543 fragments generated using six primers. The explicit markers recognized in this study will be used for genetic analysis, as well as to evaluate the species security and proliferation of echiuran individuals in intertidal regions of the Korean Peninsula.

Genetic Differences in Natural and Cultured River Pufferfish Populations by PCR Analysis

  • Yoon, Jong-Man
    • Development and Reproduction
    • /
    • v.24 no.4
    • /
    • pp.327-335
    • /
    • 2020
  • Genomic DNA (gDNA) extracted from two populations of natural and cultured river pufferfish (Takifugu obscurus) was amplified by polymerase chain reaction (PCR). The complexity of the fragments derived from the two locations varied dramatically. The genetic distances (GDs) between individuals numbered 15 and 12 in the cultured population was 0.053, which was the lowest acknowledged. The oligonucleotide primer OPC-11 identified 88 unique loci shared within each population reflecting the natural population. The OPC-05 primer identified 44 loci shared by the two populations. The average band-sharing (BS) values of individuals in the natural population (0.683±0.014) were lower than in those derived from the cultured population (0.759±0.009) (p<0.05). The shortest GD demonstrating a significant molecular difference was found between the cultured individuals # 15 and # 12 (GD=0.053). Individual # 02 of the natural population was most distantly related to cultured individual # 22 (GD=0.827). A cluster tree was built using the unweighted pair group method with arithmetic mean (UPGMA) Euclidean GD analysis based on a total of 578 various fragments derived from five primers in the two populations. Obvious markers identified in this study represent the genetic structure, species security, and proliferation of river pufferfish in the rivers of the Korean peninsula.

Face Image Retrieval by Using Eigenface Projection Distance (고유영상 투영거리를 이용한 얼굴영상 검색)

  • Lim, Kil-Taek
    • Journal of Korea Society of Industrial Information Systems
    • /
    • v.14 no.5
    • /
    • pp.43-51
    • /
    • 2009
  • In this paper, we propose an efficient method of face retrieval by using PCA(principal component analysis) based features. The coarse-to-fine strategy is adopted to sort the retrieval results in the lower dimensional eigenface space and to rearrange candidates at high ranks in higher dimensional eigenface space. To evaluate similarity between a query face image and class reference image, we utilize the PD (projection distance), MQDF(modified quadratic distance function) and MED(minimum Euclidean distance). The experimental results show that the proposed method which rearrange the retrieval results incrementally by using projection distance is efficient for face image retrieval.

A Study on the Synthetic ECG Generation for User Recognition (사용자 인식을 위한 가상 심전도 신호 생성 기술에 관한 연구)

  • Kim, Min Gu;Kim, Jin Su;Pan, Sung Bum
    • Smart Media Journal
    • /
    • v.8 no.4
    • /
    • pp.33-37
    • /
    • 2019
  • Because the ECG signals are time-series data acquired as time elapses, it is important to obtain comparative data the same in size as the enrolled data every time. This paper suggests a network model of GAN (Generative Adversarial Networks) based on an auxiliary classifier to generate synthetic ECG signals which may address the different data size issues. The Cosine similarity and Cross-correlation are used to examine the similarity of synthetic ECG signals. The analysis shows that the Average Cosine similarity was 0.991 and the Average Euclidean distance similarity based on cross-correlation was 0.25: such results indicate that data size difference issue can be resolved while the generated synthetic ECG signals, similar to real ECG signals, can create synthetic data even when the registered data are not the same as the comparative data in size.

Hierarchic Document Clustering in OPAC (OPAC에서 자동분류 열람을 위한 계층 클러스터링 연구)

  • 노정순
    • Journal of the Korean Society for information Management
    • /
    • v.21 no.1
    • /
    • pp.93-117
    • /
    • 2004
  • This study is to develop a hierarchic clustering model fur document classification and browsing in OPAC systems. Two automatic indexing techniques (with and without controlled terms), two term weighting methods (based on term frequency and binary weight), five similarity coefficients (Dice, Jaccard, Pearson, Cosine, and Squared Euclidean). and three hierarchic clustering algorithms (Between Average Linkage, Within Average Linkage, and Complete Linkage method) were tested on the document collection of 175 books and theses on library and information science. The best document clusters resulted from the Between Average Linkage or Complete Linkage method with Jaccard or Dice coefficient on the automatic indexing with controlled terms in binary vector. The clusters from Between Average Linkage with Jaccard has more likely decimal classification structure.

VRTEC : Multi-step Retrieval Model for Content-based Video Query (VRTEC : 내용 기반 비디오 질의를 위한 다단계 검색 모델)

  • 김창룡
    • Journal of the Korean Institute of Telematics and Electronics T
    • /
    • v.36T no.1
    • /
    • pp.93-102
    • /
    • 1999
  • In this paper, we propose a data model and a retrieval method for content-based video query After partitioning a video into frame sets of same length which is called video-window, each video-window can be mapped to a point in a multidimensional space. A video can be represented a trajectory by connection of neighboring video-window in a multidimensional space. The similarity between two video-windows is defined as the euclidean distance of two points in multidimensional space, and the similarity between two video segments of arbitrary length is obtained by comparing corresponding trajectory. A new retrieval method with filtering and refinement step if developed, which return correct results and makes retrieval speed increase by 4.7 times approximately in comparison to a method without filtering and refinement step.

  • PDF

Performance Improvement of Deep Clustering Networks for Multi Dimensional Data (다차원 데이터에 대한 심층 군집 네트워크의 성능향상 방법)

  • Lee, Hyunjin
    • Journal of Korea Multimedia Society
    • /
    • v.21 no.8
    • /
    • pp.952-959
    • /
    • 2018
  • Clustering is one of the most fundamental algorithms in machine learning. The performance of clustering is affected by the distribution of data, and when there are more data or more dimensions, the performance is degraded. For this reason, we use a stacked auto encoder, one of the deep learning algorithms, to reduce the dimension of data which generate a feature vector that best represents the input data. We use k-means, which is a famous algorithm, as a clustering. Sine the feature vector which reduced dimensions are also multi dimensional, we use the Euclidean distance as well as the cosine similarity to increase the performance which calculating the similarity between the center of the cluster and the data as a vector. A deep clustering networks combining a stacked auto encoder and k-means re-trains the networks when the k-means result changes. When re-training the networks, the loss function of the stacked auto encoder and the loss function of the k-means are combined to improve the performance and the stability of the network. Experiments of benchmark image ad document dataset empirically validated the power of the proposed algorithm.