• Title/Summary/Keyword: Similarity Query

Search Result 246, Processing Time 0.028 seconds

Content-based Retrieval System using Image Shape Features (영상 형태 특징을 이용한 내용 기반 검색 시스템)

  • 황병곤;정성호;이상열
    • Journal of Korea Society of Industrial Information Systems
    • /
    • v.6 no.1
    • /
    • pp.33-38
    • /
    • 2001
  • In this paper, we present an image retrieval system using shape features. The preprocessing to gain shape feature includes edge extraction using chain code. The shape features consist of center of mass, standard deviation, ratio of major axis and minor axis length. The similarity is estimated as comparing the features of query image with the features of images in database. Thus, the candidates of images are retrieved according to the order of similarity. The result of an experimentation is dullness for scale, rotation and translation. We evaluate the performance of shape features for image retrieval on a database with over 170 images. The Recall and the Precision is each 0.72 and 0.83 in the result of average experiment. So the proposed method is presented useful method.

  • PDF

Image Search Using Interpolated Color Histograms (히스토그램 보간에 의한 영상 검색)

  • Lee, Hyo-Jong
    • The KIPS Transactions:PartB
    • /
    • v.9B no.5
    • /
    • pp.701-706
    • /
    • 2002
  • A set of color features has been efficiently used to measure the similarity of given images. However, the size of the color features is too large to implement an indexing scheme effectively. In this paper a new method is proposed to retrieve similar images using an interpolated color histogram. The idea is similar to the already reported methods that use the distributions of color histograms. The new method is different in that simplified color histograms decide the similarity between a query image and target images. In order to represent the distribution of the color histograms, the best order of interpolated polynomial has been simulated. After a histogram distribution is represented in a polynomial form, only a few number of polynomial coefficients are indexed and stored in a database as a color descriptor. The new method has been applied to real images and achieved satisfactory results.

A Dynamic Locality Sensitive Hashing Algorithm for Efficient Security Applications

  • Mohammad Y. Khanafseh;Ola M. Surakhi
    • International Journal of Computer Science & Network Security
    • /
    • v.24 no.5
    • /
    • pp.79-88
    • /
    • 2024
  • The information retrieval domain deals with the retrieval of unstructured data such as text documents. Searching documents is a main component of the modern information retrieval system. Locality Sensitive Hashing (LSH) is one of the most popular methods used in searching for documents in a high-dimensional space. The main benefit of LSH is its theoretical guarantee of query accuracy in a multi-dimensional space. More enhancement can be achieved to LSH by adding a bit to its steps. In this paper, a new Dynamic Locality Sensitive Hashing (DLSH) algorithm is proposed as an improved version of the LSH algorithm, which relies on employing the hierarchal selection of LSH parameters (number of bands, number of shingles, and number of permutation lists) based on the similarity achieved by the algorithm to optimize searching accuracy and increasing its score. Using several tampered file structures, the technique was applied, and the performance is evaluated. In some circumstances, the accuracy of matching with DLSH exceeds 95% with the optimal parameter value selected for the number of bands, the number of shingles, and the number of permutations lists of the DLSH algorithm. The result makes DLSH algorithm suitable to be applied in many critical applications that depend on accurate searching such as forensics technology.

Efficient Multi-Step k-NN Search Methods Using Multidimensional Indexes in Large Databases (대용량 데이터베이스에서 다차원 인덱스를 사용한 효율적인 다단계 k-NN 검색)

  • Lee, Sanghun;Kim, Bum-Soo;Choi, Mi-Jung;Moon, Yang-Sae
    • Journal of KIISE
    • /
    • v.42 no.2
    • /
    • pp.242-254
    • /
    • 2015
  • In this paper, we address the problem of improving the performance of multi-step k-NN search using multi-dimensional indexes. Due to information loss by lower-dimensional transformations, existing multi-step k-NN search solutions produce a large tolerance (i.e., a large search range), and thus, incur a large number of candidates, which are retrieved by a range query. Those many candidates lead to overwhelming I/O and CPU overheads in the postprocessing step. To overcome this problem, we propose two efficient solutions that improve the search performance by reducing the tolerance of a range query, and accordingly, reducing the number of candidates. First, we propose a tolerance reduction-based (approximate) solution that forcibly decreases the tolerance, which is determined by a k-NN query on the index, by the average ratio of high- and low-dimensional distances. Second, we propose a coefficient control-based (exact) solution that uses c k instead of k in a k-NN query to obtain a tigher tolerance and performs a range query using this tigher tolerance. Experimental results show that the proposed solutions significantly reduce the number of candidates, and accordingly, improve the search performance in comparison with the existing multi-step k-NN solution.

Algorithms for Indexing and Integrating MPEG-7 Visual Descriptors (MPEG-7 시각 정보 기술자의 인덱싱 및 결합 알고리즘)

  • Song, Chi-Ill;Nang, Jong-Ho
    • Journal of KIISE:Software and Applications
    • /
    • v.34 no.1
    • /
    • pp.1-10
    • /
    • 2007
  • This paper proposes a new indexing mechanism for MPEG-7 visual descriptors, especially Dominant Color and Contour Shape descriptors, that guarantees an efficient similarity search for the multimedia database whose visual meta-data are represented with MPEG-7. Since the similarity metric used in the Dominant Color descriptor is based on Gaussian mixture model, the descriptor itself could be transform into a color histogram in which the distribution of the color values follows the Gauss distribution. Then, the transformed Dominant Color descriptor (i.e., the color histogram) is indexed in the proposed indexing mechanism. For the indexing of Contour Shape descriptor, we have used a two-pass algorithm. That is, in the first pass, since the similarity of two shapes could be roughly measured with the global parameters such as eccentricity and circularity used in Contour shape descriptor, the dissimilar image objects could be excluded with these global parameters first. Then, the similarities between the query and remaining image objects are measured with the peak parameters of Contour Shape descriptor. This two-pass approach helps to reduce the computational resources to measure the similarity of image objects using Contour Shape descriptor. This paper also proposes two integration schemes of visual descriptors for an efficient retrieval of multimedia database. The one is to use the weight of descriptor as a yardstick to determine the number of selected similar image objects with respect to that descriptor, and the other is to use the weight as the degree of importance of the descriptor in the global similarity measurement. Experimental results show that the proposed indexing and integration schemes produce a remarkable speed-up comparing to the exact similarity search, although there are some losses in the accuracy because of the approximated computation in indexing. The proposed schemes could be used to build a multimedia database represented in MPEG-7 that guarantees an efficient retrieval.

CS-Tree : Cell-based Signature Index Structure for Similarity Search in High-Dimensional Data (CS-트리 : 고차원 데이터의 유사성 검색을 위한 셀-기반 시그니쳐 색인 구조)

  • Song, Gwang-Taek;Jang, Jae-U
    • The KIPS Transactions:PartD
    • /
    • v.8D no.4
    • /
    • pp.305-312
    • /
    • 2001
  • Recently, high-dimensional index structures have been required for similarity search in such database applications s multimedia database and data warehousing. In this paper, we propose a new cell-based signature tree, called CS-tree, which supports efficient storage and retrieval on high-dimensional feature vectors. The proposed CS-tree partitions a high-dimensional feature space into a group of cells and represents a feature vector as its corresponding cell signature. By using cell signatures rather than real feature vectors, it is possible to reduce the height of our CS-tree, leading to efficient retrieval performance. In addition, we present a similarity search algorithm for efficiently pruning the search space based on cells. Finally, we compare the performance of our CS-tree with that of the X-tree being considered as an efficient high-dimensional index structure, in terms of insertion time, retrieval time for a k-nearest neighbor query, and storage overhead. It is shown from experimental results that our CS-tree is better on retrieval performance than the X-tree.

  • PDF

Image Retrieval Using the Fusion of Spatial Histogram and Wavelet Moments (공간 히스토그램과 웨이브렛 모멘트의 융합에 의한 영상검색)

  • Seo, Sang-Yong;Kim, Nam-Cheol
    • Journal of the Institute of Electronics Engineers of Korea SP
    • /
    • v.38 no.4
    • /
    • pp.434-441
    • /
    • 2001
  • We present an image retrieval method for improving retrieval performance by the effective fusion of spatial histogram and wavelet moments. In this method, the similarity for spatial histograms and the similarity for wavelet moment are effectively fused in the computation of the similarity between a query image and DB image. That is, the wavelet moments feature represented in multi-resolution and the spatial histogram feature robust to translation and rotation are used to improve retrieval performance. In order to evaluate the performance of the proposed method, we use Brodatz texture DB, MPEG-7 T1 DB, and Corel Draw Photo DB. Experimental results show that the proposed method yields 5.3% and 13.8% better Performances for Brodatz DB, and 15.5% and 3.2% better Performances for Corel Draw Photo DB over the histogram method and the wavelet moment method, respectively.

  • PDF

Complete Sequence of a Gene Encoding KAR3-Related Kinesin-like Protein in Candida albicans

  • Kim Min-Kyoung;Lee Young Mi;Kim Wankee;Choi Wonja
    • Journal of Microbiology
    • /
    • v.43 no.5
    • /
    • pp.406-410
    • /
    • 2005
  • In contrast to Saccharomyces cerevisiae, little is known about the kinesin-like protein (KLP) in Candida albicans. The motor domain of kinesin, or KLP, contains a subregion, which is well conserved from yeast to humans. A similarity search, with the murine ubiquitous kinesin heavy chain region as a query, revealed 6 contigs that contain putative KLPs in the genome of C. albicans. Of these, the length of an open reading (ORF) of 375 amino acids, temporarily designated CaKAR3, was noticeably short compared with the closely related S. cerevisiae KAR3 (ScKAR3) of 729 amino acids. This finding prompted us to isolate a ${\lambda}$ genomic clone containing the complete CaKAR3 ORF, and here the complete sequence of CaKAR3 is reported. CaKAR3 is a C-terminus motor protein, of 687 amino acids, encoded by a non-disrupting gene. When compared with ScKAR3, the amino terminal region of 112 amino acids was unique, with the middle part of the 306 amino acids exhibiting $25\%$ identity and $44\%$ similarity, while the remaining C-terminal motor domain exhibited $64\%$ identity and $78\%$ similarity, and have been submitted to GeneBank under the accession number AY182242.

Time-Series Data Prediction using Hidden Markov Model and Similarity Search for CRM (CRM을 위한 은닉 마코프 모델과 유사도 검색을 사용한 시계열 데이터 예측)

  • Cho, Young-Hee;Jeon, Jin-Ho;Lee, Gye-Sung
    • Journal of the Korea Society of Computer and Information
    • /
    • v.14 no.5
    • /
    • pp.19-28
    • /
    • 2009
  • Prediction problem of the time-series data has been a research issue for a long time among many researchers and a number of methods have been proposed in the literatures. In this paper, a method is proposed that similarities among time-series data are examined by use of Hidden Markov Model and Likelihood and future direction of the data movement is determined. Query sequence is modeled by Hidden Markov Modeling and then the model is examined over the pre-recorded time-series to find the subsequence which has the greatest similarity between the model and the extracted subsequence. The similarity is evaluated by likelihood. When the best subsequence is chosen, the next portion of the subsequence is used to predict the next phase of the data movement. A number of experiments with different parameters have been conducted to confirm the validity of the method. We used KOSPI to verify suggested method.

Content-based Image Retrieval using an Improved Chain Code and Hidden Markov Model (개선된 chain code와 HMM을 이용한 내용기반 영상검색)

  • 조완현;이승희;박순영;박종현
    • Proceedings of the IEEK Conference
    • /
    • 2000.09a
    • /
    • pp.375-378
    • /
    • 2000
  • In this paper, we propose a novo] content-based image retrieval system using both Hidden Markov Model(HMM) and an improved chain code. The Gaussian Mixture Model(GMM) is applied to statistically model a color information of the image, and Deterministic Annealing EM(DAEM) algorithm is employed to estimate the parameters of GMM. This result is used to segment the given image. We use an improved chain code, which is invariant to rotation, translation and scale, to extract the feature vectors of the shape for each image in the database. These are stored together in the database with each HMM whose parameters (A, B, $\pi$) are estimated by Baum-Welch algorithm. With respect to feature vector obtained in the same way from the query image, a occurring probability of each image is computed by using the forward algorithm of HMM. We use these probabilities for the image retrieval and present the highest similarity images based on these probabilities.

  • PDF