• Title/Summary/Keyword: cosine similarity measure

Search Result 47, Processing Time 0.02 seconds

Fingerprint Matching Based on Dimension Reduced DCT Feature Vectors

  • Bharkad, Sangita;Kokare, Manesh
    • Journal of Information Processing Systems
    • /
    • v.13 no.4
    • /
    • pp.852-862
    • /
    • 2017
  • In this work a Discrete Cosine Transform (DCT)-based feature dimensionality reduced approach for fingerprint matching is proposed. The DCT is applied on a small region around the core point of fingerprint image. The performance of our proposed method is evaluated on a small database of Bologna University and two large databases of FVC2000. A dimensionally reduced feature vector is formed using only approximately 19%, 7%, and 6% DCT coefficients for the three databases from Bologna University and FVC2000, respectively. We compared the results of our proposed method with the discrete wavelet transform (DWT) method, the rotated wavelet filters (RWFs) method, and a combination of DWT+RWF and DWT+(HL+LH) subbands of RWF. The proposed method reduces the false acceptance rate from approximately 18% to 4% on DB1 (Database of Bologna University), approximately 29% to 16% on DB2 (FVC2000), and approximately 26% to 17% on DB3 (FVC2000) over the DWT based feature extraction method.

Analysis of Performance Improvement of Collaborative Filtering based on Neighbor Selection Criteria (이웃 선정 조건에 따른 협력 필터링의 성능 향상 분석)

  • Lee, Soojung
    • The Journal of Korean Association of Computer Education
    • /
    • v.18 no.4
    • /
    • pp.55-62
    • /
    • 2015
  • Recommender systems through collaborative filtering has been utilized successfully in various areas by providing with convenience in searching information. Measuring similarity is critical in determining performance of these systems, because it is the criteria for the range of recommenders. This study analyzes distributions of similarity from traditional measures and investigates relations between similarities and the number of co-rated items. With this, this study suggests a method for selecting reliable recommenders by restricting similarities, which compensates for the drawbacks of previous measures. Experimental results showed that restricting similarities of neighbors by upper and lower thresholds yield superior performance than previous methods, especially when consulting fewer nearest neighbors. Maximum improvement of 0.047 for cosine similarity and that of 0.03 for Pearson was achieved. This result tells that a collaborative filtering system using Pearson or cosine similarities should not consult neighbors with very high or low similarities.

A Technique to Detect Change-Coupled Files Using the Similarity of Change Types and Commit Time (변경 유형의 유사도 및 커밋 시간을 이용한 파일 변경 결합도)

  • Kim, Jung Il;Lee, Eun Joo
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.3 no.2
    • /
    • pp.65-72
    • /
    • 2014
  • Change coupling is a measure to show how strongly change-related two entities are. When two source files have been frequently changed together, they are regarded as change-coupled files and they will probably be changed together in the near future. In the previous studies, the change coupling between two files is defined with the number of common changed time, that is, common commit time of the files. However, the frequency-based technique has limitations because of 'tangled changes', which frequently happens in the development environments with version control systems. The tangled change means that several code hunks have been changed at the same time, though they have no relation with each other. In this paper, the change types of the code hunks are also used to define change coupling, in addition to the common commit time of target files. First, the frequency vector based on change types are defined with the extracted change types, and then, the similarity of change patterns are calculated using the cosine similarity measure. We conducted experiments on open source project Eclipse JDT and CDT for case studies. The result shows that the applicability of the proposed method, compared to the previous studies.

A Study of CBIR(Content-based Image Retrieval) Computer-aided Diagnosis System of Breast Ultrasound Images using Similarity Measures of Distance (거리 기반 유사도 측정을 통한 유방 초음파 영상의 내용 기반 검색 컴퓨터 보조 진단 시스템에 관한 연구)

  • Kim, Min-jeong;Cho, Hyun-chong
    • The Transactions of The Korean Institute of Electrical Engineers
    • /
    • v.66 no.8
    • /
    • pp.1272-1277
    • /
    • 2017
  • To assist radiologists for the characterization of breast masses, Computer-aided Diagnosis(CADx) system has been studied. The CADx system can improve the diagnostic accuracy of radiologists by providing objective information about breast masses. Morphological and texture features were extracted from the breast ultrasound images. Based on extracted features, the CADx system retrieves masses that are similar to a query mass from a reference library using a k-nearest neighbor (k-NN) approach. Eight similarity measures of distance, Euclidean, Chebyshev(Minkowski family), Canberra, Lorentzian($F_2$ family), Wave Hedges, Motyka(Intersection family), and Cosine, Dice(Inner Product family) are evaluated by ROC(Receiver Operating Characteristic) analysis. The Inner Product family measure used with the k-NN classifier provided slightly higher performance for classification of malignant and benign masses than those with the Minkowski, $F_2$, and Intersection family measures.

An Effective Metric for Measuring the Degree of Web Page Changes (효과적인 웹 문서 변경도 측정 방법)

  • Kwon, Shin-Young;Kim, Sung-Jin;Lee, Sang-Ho
    • Journal of KIISE:Databases
    • /
    • v.34 no.5
    • /
    • pp.437-447
    • /
    • 2007
  • A variety of similarity metrics have been used to measure the degree of web page changes. In this paper, we first define criteria for web page changes to evaluate the effectiveness of the similarity metrics in terms of six important types of web page changes. Second, we propose a new similarity metric appropriate for measuring the degree of web page changes. Using real web pages and synthesized pages, we analyze the five existing metrics (i.e., the byte-wise comparison, the TF IDF cosine distance, the word distance, the edit distance, and the shingling) and ours under the proposed criteria. The analysis result shows that our metric represents the changes more effectively than other metrics. We expect that our study can help users select an appropriate metric for particular web applications.

Similarity Measurement Between Titles and Abstracts Using Bijection Mapping and Phi-Correlation Coefficient

  • John N. Mlyahilu;Jong-Nam Kim
    • Journal of the Institute of Convergence Signal Processing
    • /
    • v.23 no.3
    • /
    • pp.143-149
    • /
    • 2022
  • This excerpt delineates a quantitative measure of relationship between a research title and its respective abstract extracted from different journal articles documented through a Korean Citation Index (KCI) database published through various journals. In this paper, we propose a machine learning-based similarity metric that does not assume normality on dataset, realizes the imbalanced dataset problem, and zero-variance problem that affects most of the rule-based algorithms. The advantage of using this algorithm is that, it eliminates the limitations experienced by Pearson correlation coefficient (r) and additionally, it solves imbalanced dataset problem. A total of 107 journal articles collected from the database were used to develop a corpus with authors, year of publication, title, and an abstract per each. Based on the experimental results, the proposed algorithm achieved high correlation coefficient values compared to others which are cosine similarity, euclidean, and pearson correlation coefficients by scoring a maximum correlation of 1, whereas others had obtained non-a-number value to some experiments. With these results, we found that an effective title must have high correlation coefficient with the respective abstract.

A Model-Based Image Steganography Method Using Watson's Visual Model

  • Fakhredanesh, Mohammad;Safabakhsh, Reza;Rahmati, Mohammad
    • ETRI Journal
    • /
    • v.36 no.3
    • /
    • pp.479-489
    • /
    • 2014
  • This paper presents a model-based image steganography method based on Watson's visual model. Model-based steganography assumes a model for cover image statistics. This approach, however, has some weaknesses, including perceptual detectability. We propose to use Watson's visual model to improve perceptual undetectability of model-based steganography. The proposed method prevents visually perceptible changes during embedding. First, the maximum acceptable change in each discrete cosine transform coefficient is extracted based on Watson's visual model. Then, a model is fitted to a low-precision histogram of such coefficients and the message bits are encoded to this model. Finally, the encoded message bits are embedded in those coefficients whose maximum possible changes are visually imperceptible. Experimental results show that changes resulting from the proposed method are perceptually undetectable, whereas model-based steganography retains perceptually detectable changes. This perceptual undetectability is achieved while the perceptual quality - based on the structural similarity measure - and the security - based on two steganalysis methods - do not show any significant changes.

Measuring Similarity of Android Applications Using Method Reference Frequency and Manifest Information (메소드 참조 빈도와 매니페스트 정보를 이용한 안드로이드 애플리케이션들의 유사도 측정)

  • Kim, Gyoosik;Hamedani, Masoud Reyhani;Cho, Seong-je;Kim, Seong Baeg
    • The Journal of Korean Institute of Next Generation Computing
    • /
    • v.13 no.3
    • /
    • pp.15-25
    • /
    • 2017
  • As the value and importance of softwares are growing up, software theft and piracy become a much larger problem. To tackle this problem, it is highly required to provide an accurate method for detecting software theft and piracy. Especially, while software theft is relatively easy in the case of Android applications (apps), screening illegal apps has not been properly performed in Android markets. In this paper, we propose a method to effectively measure the similarity between Android apps for detecting software theft at the executable file level. Our proposed method extracts method reference frequency and manifest information through static analysis of executable Android apps as the main features for similarity measurement. Each app is represented as an n-dimensional vectors with the features, and then cosine similarity is utilized as the similarity measure. We demonstrate the effectiveness of our proposed method by evaluating its accuracy in comparison with typical source code-based similarity measurement methods. As a result of the experiments for the Android apps whose source file and executable file are available side by side, we found that our similarity degree measured at the executable file level is almost equivalent to the existing well-known similarity degree measured at the source file level.

A Tracking Method of Same Drug Sales Accounts through Similarity Analysis of Instagram Profiles and Posts

  • Eun-Young Park;Jiyeon Kim;Chang-Hoon Kim
    • Journal of the Korea Society of Computer and Information
    • /
    • v.29 no.2
    • /
    • pp.109-118
    • /
    • 2024
  • With the increasing number of social media users worldwide, cases of social media being abused to perpetrate various crimes are increasing. Specifically, drug distribution through social media is emerging as a serious social problem. Using social media channels, the curiosity of teenagers regarding drugs is stimulated through clever marketing. Further, social media easily facilitates drug purchases due to the high accessibility of drug sellers and consumers. Among various social media platforms, we focused on Instagram, which is the most used social media platform by young adults aged 19 to 24 years in South Korea. We collected four types of information, including profile photos, introductions, posts in the form of images, and posts in the form of texts on Instagram; then, we analyzed the similarity among each type of collected information. The profile photos and posts in the form of image were analyzed for similarity based on the SSIM(Structural Simplicity Index Measure), while introductions and posts in the form of text were analyzed for similarity using Jaccard and Cosine similarity techniques. Through the similarity analysis, the similarity among various accounts for each collected information type was measured, and accounts with similarity above the significance level were determined as the same drug sales account. By performing logistic regression analysis on the aforementioned information types, we confirmed that except posts in image form, profile photos, introductions, and posts in the text form were valid information for tracking the same drug sales account.

Development of An Automatic Classification System for Game Reviews Based on Word Embedding and Vector Similarity (단어 임베딩 및 벡터 유사도 기반 게임 리뷰 자동 분류 시스템 개발)

  • Yang, Yu-Jeong;Lee, Bo-Hyun;Kim, Jin-Sil;Lee, Ki Yong
    • The Journal of Society for e-Business Studies
    • /
    • v.24 no.2
    • /
    • pp.1-14
    • /
    • 2019
  • Because of the characteristics of game software, it is important to quickly identify and reflect users' needs into game software after its launch. However, most sites such as the Google Play Store, where users can download games and post reviews, provide only very limited and ambiguous classification categories for game reviews. Therefore, in this paper, we develop an automatic classification system for game reviews that categorizes reviews into categories that are clearer and more useful for game providers. The developed system converts words in reviews into vectors using word2vec, which is a representative word embedding model, and classifies reviews into the most relevant categories by measuring the similarity between those vectors and each category. Especially, in order to choose the best similarity measure that directly affects the classification performance of the system, we have compared the performance of three representative similarity measures, the Euclidean similarity, cosine similarity, and the extended Jaccard similarity, in a real environment. Furthermore, to allow a review to be classified into multiple categories, we use a threshold-based multi-category classification method. Through experiments on real reviews collected from Google Play Store, we have confirmed that the system achieved up to 95% accuracy.