• Title/Summary/Keyword: Similarity Query

Search Result 246, Processing Time 0.021 seconds

An Efficient String Similarity Search Technique based on Generating Inverted Lists of Variable-Length Grams (가변길이 그램의 역리스트 생성을 이용한 효율적인 유사 문자열 검색 기법)

  • Kim, Jongik
    • Journal of KIISE
    • /
    • v.43 no.11
    • /
    • pp.1275-1280
    • /
    • 2016
  • Existing techniques for string similarity search first generate a set of candidate strings and then verify the candidates. The efficiency of string similarity search is highly dependent on candidate generation methods. State of the art techniques select fixed length q-grams from a query string and generate candidates using inverted lists of the selected q-grams. In this paper, we propose a technique to generate candidates using variable length grams of a query string and develop a dynamic programming algorithm that selects an optimal combination of variable length grams from a query string. Experimental results show that the proposed technique improves the performance of string similarity search compared with the existing techniques.

Semantic Similarity Search using the Signature Tree (시그니처 트리를 사용한 의미적 유사성 검색 기법)

  • Kim, Ki-Sung;Im, Dong-Hyuk;Kim, Cheol-Han;Kim, Hyoung-Joo
    • Journal of KIISE:Databases
    • /
    • v.34 no.6
    • /
    • pp.546-553
    • /
    • 2007
  • As ontologies are used widely, interest for semantic similarity search is also increasing. In this paper, we suggest a query evaluation scheme for k-nearest neighbor query, which retrieves k most similar objects to the query object. We use the best match method to calculate the semantic similarity between objects and use the signature tree to index annotation information of objects in database. The signature tree is usually used for the set similarity search. When we use the signature tree in similarity search, we are required to predict the upper-bound of similarity for a node; the highest similarity value which can be found when we traverse into the node. So we suggest a prediction function for the best match similarity function and prove the correctness of the prediction. And we modify the original signature tree structure for same signatures not to be stored redundantly. This improved structure of signature tree not only reduces the size of signature tree but also increases the efficiency of query evaluation. We use the Gene Ontology(GO) for our experiments, which provides large ontologies and large amount of annotation data. Using GO, we show that proposed method improves query efficiency and present several experimental results varying the page size and using several node-splitting methods.

Fuzzy Query Processing through Two-level Similarity Relation Matrices Construction (2계층 유사관계행렬 구축을 통한 질의 처리)

  • 이기영
    • Journal of the Korea Computer Industry Society
    • /
    • v.4 no.10
    • /
    • pp.587-598
    • /
    • 2003
  • This paper construct two-level word similarity relation matrices about title and to scientific treatise. As guide keyword similarity relation matrices which is constructed to co-occurrence frequency base same time keeps recall rater by query expansion by tolerance relation, it is index structure to improve the precision rate by two-level contents base retrieval. Therefore, draw area knowledge through subject analysis and reasoned user's information request and area knowledge to fuzzy logic base. This research is research to improve vocabulary mismatch problem and information expression having essentially on query.

  • PDF

Content similarity matching for video sequence identification

  • Kim, Sang-Hyun
    • International Journal of Contents
    • /
    • v.6 no.3
    • /
    • pp.5-9
    • /
    • 2010
  • To manage large database system with video, effective video indexing and retrieval are required. A large number of video retrieval algorithms have been presented for frame-wise user query or video content query, whereas a few video identification algorithms have been proposed for video sequence query. In this paper, we propose an effective video identification algorithm for video sequence query that employs the Cauchy function of histograms between successive frames and the modified Hausdorff distance. To effectively match the video sequences with a low computational load, we make use of the key frames extracted by the cumulative Cauchy function and compare the set of key frames using the modified Hausdorff distance. Experimental results with several color video sequences show that the proposed algorithm for video identification yields remarkably higher performance than conventional algorithms such as Euclidean metric, and directed divergence methods.

Shape-Based Subsequence Retrieval Supporting Multiple Models in Time-Series Databases (시계열 데이터베이스에서 복수의 모델을 지원하는 모양 기반 서브시퀀스 검색)

  • Won, Jung-Im;Yoon, Jee-Hee;Kim, Sang-Wook;Park, Sang-Hyun
    • The KIPS Transactions:PartD
    • /
    • v.10D no.4
    • /
    • pp.577-590
    • /
    • 2003
  • The shape-based retrieval is defined as the operation that searches for the (sub) sequences whose shapes are similar to that of a query sequence regardless of their actual element values. In this paper, we propose a similarity model suitable for shape-based retrieval and present an indexing method for supporting the similarity model. The proposed similarity model enables to retrieve similar shapes accurately by providing the combination of various shape-preserving transformations such as normalization, moving average, and time warping. Our indexing method stores every distinct subsequence concisely into the disk-based suffix tree for efficient and adaptive query processing. We allow the user to dynamically choose a similarity model suitable for a given application. More specifically, we allow the user to determine the parameter p of the distance function $L_p$ when submitting a query. The result of extensive experiments revealed that our approach not only successfully finds the subsequences whose shapes are similar to a query shape but also significantly outperforms the sequence search.

A Ranking Technique of XML Documents using Path Similarity for Expanded Query Processing (확장된 질의 처리를 위해 경로간 의미적 유사도를 고려한 XML 문서 순위화 기법)

  • Kim, Hyun-Joo;Park, So-Mi;Park, Seog
    • Journal of KIISE:Databases
    • /
    • v.37 no.2
    • /
    • pp.113-120
    • /
    • 2010
  • XML is broadly using for data storing and processing. XML is specified its structural characteristic and user can query with XPath when information from data document is needed. XPath query can process when the tern and structure of document and query is matched with each other. However, nowadays there are lots of data documents which are made by using different terminology and structure therefore user can not know the exact idea of target data. In fact, there are many possibilities that target data document has information which user is find or a similar ones. Accordingly user query should be processed when their term usage or structural characteristic is slightly different with data document. In order to do that we suggest a XML document ranking method based on path similarity. The method can measure a semantic similarity between user query and data document using three steps which are position, node and relaxation factors.

Video Retrieval System supporting Content-based Retrieval and Scene-Query-By-Example Retrieval (비디오의 의미검색과 예제기반 장면검색을 위한 비디오 검색시스템)

  • Yoon, Mi-Hee;Cho, Dong-Uk
    • The KIPS Transactions:PartB
    • /
    • v.9B no.1
    • /
    • pp.105-112
    • /
    • 2002
  • In order to process video data effectively, we need to save its content on database and a content-based retrieval method which processes various queries of all users is required. In this paper, we present VRS(Video Retrieval System) which provides similarity query, SQBE(Scene Query By Example) query, and content-based retrieval by combining the feature-based retrieval and the annotation-based retrieval. The SQBE query makes it possible for a user to retrieve scones more exactly by inserting and deleting objects based on a retrieved scene. We proposed query language and query processing algorithm for SQBE query, and carried out performance evaluation on similarity retrieval. The proposed system is implemented with Visual C++ and Oracle.

A Study of Similarity Measures on Multidimensional Data Sequences Using Semantic Information (의미 정보를 이용한 다차원 데이터 시퀀스의 유사성 척도 연구)

  • Lee, Seok-Lyong;Lee, Ju-Hong;Chun, Seok-Ju
    • The KIPS Transactions:PartD
    • /
    • v.10D no.2
    • /
    • pp.283-292
    • /
    • 2003
  • One-dimensional time-series data have been studied in various database applications such as data mining and data warehousing. However, in the current complex business environment, multidimensional data sequences (MDS') become increasingly important in addition to one-dimensional time-series data. For example, a video stream can be modeled as an MDS in the multidimensional space with respect to color and texture attributes. In this paper, we propose the effective similarity measures on which the similar pattern retrieval is based. An MDS is partitioned into segments, each of which is represented by various geometric and semantic features. The similarity measures are defined on the basis of these segments. Using the measures, irrelevant segments are pruned from a database with respect to a given query. Both data sequences and query sequences are partitioned into segments, and the query processing is based upon the comparison of the features between data and query segments, instead of scanning all data elements of entire sequences.

Semantic Conceptual Relational Similarity Based Web Document Clustering for Efficient Information Retrieval Using Semantic Ontology

  • Selvalakshmi, B;Subramaniam, M;Sathiyasekar, K
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.15 no.9
    • /
    • pp.3102-3119
    • /
    • 2021
  • In the modern rapid growing web era, the scope of web publication is about accessing the web resources. Due to the increased size of web, the search engines face many challenges, in indexing the web pages as well as producing result to the user query. Methodologies discussed in literatures towards clustering web documents suffer in producing higher clustering accuracy. Problem is mitigated using, the proposed scheme, Semantic Conceptual Relational Similarity (SCRS) based clustering algorithm which, considers the relationship of any document in two ways, to measure the similarity. One is with the number of semantic relations of any document class covered by the input document and the second is the number of conceptual relation the input document covers towards any document class. With a given data set Ds, the method estimates the SCRS measure for each document Di towards available class of documents. As a result, a class with maximum SCRS is identified and the document is indexed on the selected class. The SCRS measure is measured according to the semantic relevancy of input document towards each document of any class. Similarly, the input query has been measured for Query Relational Semantic Score (QRSS) towards each class of documents. Based on the value of QRSS measure, the document class is identified, retrieved and ranked based on the QRSS measure to produce final population. In both the way, the semantic measures are estimated based on the concepts available in semantic ontology. The proposed method had risen efficient result in indexing as well as search efficiency also has been improved.

A Content-Based Image Retrieval using Object Segmentation Method (물체 분할 기법을 이용한 내용기반 영상 검색)

  • 송석진;차봉현;김명호;남기곤;이상욱;주재흠
    • Journal of the Institute of Convergence Signal Processing
    • /
    • v.4 no.1
    • /
    • pp.1-8
    • /
    • 2003
  • Various methods have been studying to maintain and apply the multimedia inform abruptly increasing over all social fields, in recent years. For retrieval of still images, we is implemented content-based image retrieval system in this paper that make possible to retrieve similar objects from image database after segmenting query object from background if user request query. Query image is processed median filtering to remove noise first and then object edge is detected it by canny edge detection. And query object is segmented from background by using convex hull. Similarity value can be obtained by means of histogram intersection with database image after securing color histogram from segmented image. Also segmented image is processed gray convert and wavelet transform to extract spacial gray distribution and texture feature. After that, Similarity value can be obtained by means of banded autocorrelogram and energy. Final similar image can be retrieved by adding upper similarity values that it make possible to not only robust in background but also better correct object retrieval by using object segmentation method.

  • PDF