• Title/Summary/Keyword: similarity-based

Search Result 3,619, Processing Time 0.037 seconds

Learning Discriminative Fisher Kernel for Image Retrieval

  • Wang, Bin;Li, Xiong;Liu, Yuncai
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.7 no.3
    • /
    • pp.522-538
    • /
    • 2013
  • Content based image retrieval has become an increasingly important research topic for its wide application. It is highly challenging when facing to large-scale database with large variance. The retrieval systems rely on a key component, the predefined or learned similarity measures over images. We note that, the similarity measures can be potential improved if the data distribution information is exploited using a more sophisticated way. In this paper, we propose a similarity measure learning approach for image retrieval. The similarity measure, so called Fisher kernel, is derived from the probabilistic distribution of images and is the function over observed data, hidden variable and model parameters, where the hidden variables encode high level information which are powerful in discrimination and are failed to be exploited in previous methods. We further propose a discriminative learning method for the similarity measure, i.e., encouraging the learned similarity to take a large value for a pair of images with the same label and to take a small value for a pair of images with distinct labels. The learned similarity measure, fully exploiting the data distribution, is well adapted to dataset and would improve the retrieval system. We evaluate the proposed method on Corel-1000, Corel5k, Caltech101 and MIRFlickr 25,000 databases. The results show the competitive performance of the proposed method.

Similarity measurement based on Min-Hash for Preserving Privacy

  • Cha, Hyun-Jong;Yang, Ho-Kyung;Song, You-Jin
    • International Journal of Advanced Culture Technology
    • /
    • v.10 no.2
    • /
    • pp.240-245
    • /
    • 2022
  • Because of the importance of the information, encryption algorithms are heavily used. Raw data is encrypted and secure, but problems arise when the key for decryption is exposed. In particular, large-scale Internet sites such as Facebook and Amazon suffer serious damage when user data is exposed. Recently, research into a new fourth-generation encryption technology that can protect user-related data without the use of a key required for encryption is attracting attention. Also, data clustering technology using encryption is attracting attention. In this paper, we try to reduce key exposure by using homomorphic encryption. In addition, we want to maintain privacy through similarity measurement. Additionally, holistic similarity measurements are time-consuming and expensive as the data size and scope increases. Therefore, Min-Hash has been studied to efficiently estimate the similarity between two signatures Methods of measuring similarity that have been studied in the past are time-consuming and expensive as the size and area of data increases. However, Min-Hash allowed us to efficiently infer the similarity between the two sets. Min-Hash is widely used for anti-plagiarism, graph and image analysis, and genetic analysis. Therefore, this paper reports privacy using homomorphic encryption and presents a model for efficient similarity measurement using Min-Hash.

A DoS Detection Method Based on Composition Self-Similarity

  • Jian-Qi, Zhu;Feng, Fu;Kim, Chong-Kwon;Ke-Xin, Yin;Yan-Heng, Liu
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.6 no.5
    • /
    • pp.1463-1478
    • /
    • 2012
  • Based on the theory of local-world network, the composition self-similarity (CSS) of network traffic is presented for the first time in this paper for the study of DoS detection. We propose the concept of composition distribution graph and design the relative operations. The $(R/S)^d$ algorithm is designed for calculating the Hurst parameter. Based on composition distribution graph and Kullback Leibler (KL) divergence, we propose the composition self-similarity anomaly detection (CSSD) method for the detection of DoS attacks. We evaluate the effectiveness of the proposed method. Compared to other entropy based anomaly detection methods, our method is more accurate and with higher sensitivity in the detection of DoS attacks.

Determining Absolute Interpolation Weights for Neighborhood-Based Collaborative Filtering

  • Kim, Hyoung-Do
    • Management Science and Financial Engineering
    • /
    • v.16 no.2
    • /
    • pp.53-65
    • /
    • 2010
  • Despite the overall success of neighbor-based CF methods, there are some fundamental questions about neighbor selection and prediction mechanism including arbitrary similarity, over-fitting interpolation weights, no trust consideration between neighbours, etc. This paper proposes a simple method to compute absolute interpolation weights based on similarity values. In order to supplement the method, two schemes are additionally devised for high-quality neighbour selection and trust metrics based on co-ratings. The former requires that one or more neighbour's similarity should be better than a pre-specified level which is higher than the minimum level. The latter gives higher trust to neighbours that have more co-ratings. Experimental results show that the proposed method outperforms the pure IBCF by about 8% improvement. Furthermore, it can be easily combined with other predictors for achieving better prediction quality.

A METHOD OF IMAGE DATA RETRIEVAL BASED ON SELF-ORGANIZING MAPS

  • Lee, Mal-Rey;Oh, Jong-Chul
    • Journal of applied mathematics & informatics
    • /
    • v.9 no.2
    • /
    • pp.793-806
    • /
    • 2002
  • Feature-based similarity retrieval become an important research issue in image database systems. The features of image data are useful to discrimination of images. In this paper, we propose the highspeed k-Nearest Neighbor search algorithm based on Self-Organizing Maps. Self-Organizing Maps (SOM) provides a mapping from high dimensional feature vectors onto a two-dimensional space. The mapping preserves the topology of the feature vectors. The map is called topological feature map. A topological feature map preserves the mutual relations (similarity) in feature spaces of input data. and clusters mutually similar feature vectors in a neighboring nodes. Each node of the topological feature map holds a node vector and similar images that is closest to each node vector. In topological feature map, there are empty nodes in which no image is classified. We experiment on the performance of our algorithm using color feature vectors extracted from images. Promising results have been obtained in experiments.

An Effective WSSENet-Based Similarity Retrieval Method of Large Lung CT Image Databases

  • Zhuang, Yi;Chen, Shuai;Jiang, Nan;Hu, Hua
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.16 no.7
    • /
    • pp.2359-2376
    • /
    • 2022
  • With the exponential growth of medical image big data represented by high-resolution CT images(CTI), the high-resolution CTI data is of great importance for clinical research and diagnosis. The paper takes lung CTI as an example to study. Retrieving answer CTIs similar to the input one from the large-scale lung CTI database can effectively assist physicians to diagnose. Compared with the conventional content-based image retrieval(CBIR) methods, the CBIR for lung CTIs demands higher retrieval accuracy in both the contour shape and the internal details of the organ. In traditional supervised deep learning networks, the learning of the network relies on the labeling of CTIs which is a very time-consuming task. To address this issue, the paper proposes a Weakly Supervised Similarity Evaluation Network (WSSENet) for efficiently support similarity analysis of lung CTIs. We conducted extensive experiments to verify the effectiveness of the WSSENet based on which the CBIR is performed.

A Similarity Computation Algorithm Based on the Pitch and Rhythm of Music Melody (선율의 음높이와 리듬 정보를 이용한 음악의 유사도 계산 알고리즘)

  • Mo, Jong-Sik;Kim, So-Young;Ku, Kyong-I;Han, Chang-Ho;Kim, Yoo-Sung
    • The Transactions of the Korea Information Processing Society
    • /
    • v.7 no.12
    • /
    • pp.3762-3774
    • /
    • 2000
  • The advances of computer hardware and information processing technologies raise the needs of multimedia information retrieval systems. Up to date. multimedia information systems have been developed for text information and image information. Nowadays. the multimedia information systems for video and audio information. especially for musical information have been grown up more and more. In recent music information retrieval systems. not only the information retrieval based on meta-information such like composer and title but also the content-based information retrieval is supported. The content-based information retrieval in music information retrieval systems utilize the similarity value between the user query and the music information stored in music database. In tbis paper. hence. we developed a similarity computation algorithm in which the pitches and lengths of each corresponding pair of notes are used as the fundamental factors for similarity computation between musical information. We also make an experiment of the proposed algorithm to validate its appropriateness. From the experimental results. the proposed similarity computation algorithm is shown to be able to correctly check whether two music files are analogous to each other or not based on melodies.

  • PDF

Shape-Based Subsequence Retrieval Supporting Multiple Models in Time-Series Databases (시계열 데이터베이스에서 복수의 모델을 지원하는 모양 기반 서브시퀀스 검색)

  • Won, Jung-Im;Yoon, Jee-Hee;Kim, Sang-Wook;Park, Sang-Hyun
    • The KIPS Transactions:PartD
    • /
    • v.10D no.4
    • /
    • pp.577-590
    • /
    • 2003
  • The shape-based retrieval is defined as the operation that searches for the (sub) sequences whose shapes are similar to that of a query sequence regardless of their actual element values. In this paper, we propose a similarity model suitable for shape-based retrieval and present an indexing method for supporting the similarity model. The proposed similarity model enables to retrieve similar shapes accurately by providing the combination of various shape-preserving transformations such as normalization, moving average, and time warping. Our indexing method stores every distinct subsequence concisely into the disk-based suffix tree for efficient and adaptive query processing. We allow the user to dynamically choose a similarity model suitable for a given application. More specifically, we allow the user to determine the parameter p of the distance function $L_p$ when submitting a query. The result of extensive experiments revealed that our approach not only successfully finds the subsequences whose shapes are similar to a query shape but also significantly outperforms the sequence search.

A Study on Forecasting Accuracy Improvement of Case Based Reasoning Approach Using Fuzzy Relation (퍼지 관계를 활용한 사례기반추론 예측 정확성 향상에 관한 연구)

  • Lee, In-Ho;Shin, Kyung-Shik
    • Journal of Intelligence and Information Systems
    • /
    • v.16 no.4
    • /
    • pp.67-84
    • /
    • 2010
  • In terms of business, forecasting is a work of what is expected to happen in the future to make managerial decisions and plans. Therefore, the accurate forecasting is very important for major managerial decision making and is the basis for making various strategies of business. But it is very difficult to make an unbiased and consistent estimate because of uncertainty and complexity in the future business environment. That is why we should use scientific forecasting model to support business decision making, and make an effort to minimize the model's forecasting error which is difference between observation and estimator. Nevertheless, minimizing the error is not an easy task. Case-based reasoning is a problem solving method that utilizes the past similar case to solve the current problem. To build the successful case-based reasoning models, retrieving the case not only the most similar case but also the most relevant case is very important. To retrieve the similar and relevant case from past cases, the measurement of similarities between cases is an important key factor. Especially, if the cases contain symbolic data, it is more difficult to measure the distances. The purpose of this study is to improve the forecasting accuracy of case-based reasoning approach using fuzzy relation and composition. Especially, two methods are adopted to measure the similarity between cases containing symbolic data. One is to deduct the similarity matrix following binary logic(the judgment of sameness between two symbolic data), the other is to deduct the similarity matrix following fuzzy relation and composition. This study is conducted in the following order; data gathering and preprocessing, model building and analysis, validation analysis, conclusion. First, in the progress of data gathering and preprocessing we collect data set including categorical dependent variables. Also, the data set gathered is cross-section data and independent variables of the data set include several qualitative variables expressed symbolic data. The research data consists of many financial ratios and the corresponding bond ratings of Korean companies. The ratings we employ in this study cover all bonds rated by one of the bond rating agencies in Korea. Our total sample includes 1,816 companies whose commercial papers have been rated in the period 1997~2000. Credit grades are defined as outputs and classified into 5 rating categories(A1, A2, A3, B, C) according to credit levels. Second, in the progress of model building and analysis we deduct the similarity matrix following binary logic and fuzzy composition to measure the similarity between cases containing symbolic data. In this process, the used types of fuzzy composition are max-min, max-product, max-average. And then, the analysis is carried out by case-based reasoning approach with the deducted similarity matrix. Third, in the progress of validation analysis we verify the validation of model through McNemar test based on hit ratio. Finally, we draw a conclusion from the study. As a result, the similarity measuring method using fuzzy relation and composition shows good forecasting performance compared to the similarity measuring method using binary logic for similarity measurement between two symbolic data. But the results of the analysis are not statistically significant in forecasting performance among the types of fuzzy composition. The contributions of this study are as follows. We propose another methodology that fuzzy relation and fuzzy composition could be applied for the similarity measurement between two symbolic data. That is the most important factor to build case-based reasoning model.