• Title/Summary/Keyword: Similarity Search

Search Result 528, Processing Time 0.024 seconds

Trajectory Search Algorithm for Spatio-temporal Similarity of Moving Objects on Road Network (도로 네트워크에서 이동 객체를 위한 시공간 유사 궤적 검색 알고리즘)

  • Kim, Young-Chang;Vista, Rabindra;Chang, Jae-Woo
    • Journal of Korea Spatial Information System Society
    • /
    • v.9 no.1
    • /
    • pp.59-77
    • /
    • 2007
  • Advances in mobile techknowledges and supporting techniques require an effective representation and analysis of moving objects. Similarity search of moving object trajectories is an active research area in data mining. In this paper, we propose a trajectory search algorithm for spatio-temporal similarity of moving objects on road network. For this, we define spatio-temporal distance between two trajectories of moving objects on road networks, and propose a new method to measure spatio-temporal similarity based on the real road network distance. In addition, we propose a similar trajectory search algorithm that retrieves spatio-temporal similar trajectories in the road network. The algorithm uses a signature file in order to retrieve candidate trajectories efficiently. Finally, we provide performance analysis to show the efficiency of the proposed algorithm.

  • PDF

Software Similarity Measurement based on Dependency Graph using Harmony Search

  • Yun, Ho Yeong;Joe, Yong Joon;Jung, Byung Ok;Shin, Dong myung;Bahng, Hyo Keun
    • Journal of the Korea Society of Computer and Information
    • /
    • v.21 no.12
    • /
    • pp.1-10
    • /
    • 2016
  • In this paper, we attempt to prevent certain cases by tracing a history and making genogram about open source software and its modification using similarity of source code. There are many areas which use open source software actively and widely, and open source software contributes their development. However, there are many unconscious cases like ignoring license or intellectual properties infringe which can lead litigation. To prevent such situation, we analyze source code similarity using program dependence graph which resembles subgraph isomorphism problem, a typical NP-complete problem. To solve subgraph isomorphism problem, we utilized harmony search of metaheuristic algorithm and compared its result with a genetic algorithm. For the future works, we represent open source software as program dependence graph and analyze their similarity.

An Implementation of XML document searching system based on Structure and Semantics Similarity (구조와 내용 유사도에 기반한 XML 웹 문서 검색시스템 구축)

  • Park Uchang;Seo Yeojin
    • Journal of Internet Computing and Services
    • /
    • v.6 no.2
    • /
    • pp.99-115
    • /
    • 2005
  • Extensible Markup Language (XML) is an Internet standard that is used to express and convert data, In order to find the necessary information out of XML documents, you need a search system for XML documents, In this research, we have developed a search system that can find documents that matches the structure and content of a given XML document, making the best use of XML structure, Search metrics take account of the similarity in tag names, tag values, and the structure of tags, After a search, the system displays the ranked results in the order of aggregate similarity, Three methods of query are provided: keyword search which is conventional; search with tag names and their values; and search with XML documents, These three methods enable users to choose the method that best suits their preference, resulting in the increase of the usefulness of the system.

  • PDF

A Method for Time Warping Based Similarity Search in Sequence Databases (시퀀스 데이터베이스를 위한 타임 워핑 기반 유사 검색)

  • Kim, Sang-Wook;Park, Sang-Hyun
    • Journal of Industrial Technology
    • /
    • v.20 no.B
    • /
    • pp.219-226
    • /
    • 2000
  • In this paper, we propose a new novel method for similarity search that supports time warping. Our primary goal is to innovate on search performance in large databases without false dismissal. To attain this goal, we devise a new distance function $D_{tw-lb}$ that consistently underestimates the time warping distance and also satisfies the triangular inequality. $D_{tw-lb}$ uses a 4-tuple feature vector extracted from each sequence and is invariant to time warping. For efficient processing, we employ a multidimensional index that uses the 4-tuple feature vector as indexing attributes and $D_{tw-lb}$ as a distance function. We prove that our method does not incur false dismissal. To verify the superiority of our method, we perform extensive experiments. The results reveal that our method achieves significant speedup up to 43 times with real-world S&P 500 stock data.

  • PDF

GORank: Semantic Similarity Search for Gene Products using Gene Ontology (GORank: Gene Ontology를 이용한 유전자 산물의 의미적 유사성 검색)

  • Kim, Ki-Sung;Yoo, Sang-Won;Kim, Hyoung-Joo
    • Journal of KIISE:Databases
    • /
    • v.33 no.7
    • /
    • pp.682-692
    • /
    • 2006
  • Searching for gene products which have similar biological functions are crucial for bioinformatics. Modern day biological databases provide the functional description of gene products using Gene Ontology(GO). In this paper, we propose a technique for semantic similarity search for gene products using the GO annotation information. For this purpose, an information-theoretic measure for semantic similarity between gene products is defined. And an algorithm for semantic similarity search using this measure is proposed. We adapt Fagin's Threshold Algorithm to process the semantic similarity query as follows. First, we redefine the threshold for our measure. This is because our similarity function is not monotonic. Then cluster-skipping and the access ordering of the inverted index lists are proposed to reduce the number of disk accesses. Experiments with real GO and annotation data show that GORank is efficient and scalable.

Similarity Search Algorithm Based on Hyper-Rectangular Representation of Video Data Sets (비디오 데이터 세트의 하이퍼 사각형 표현에 기초한 비디오 유사성 검색 알고리즘)

  • Lee, Seok-Lyong
    • The KIPS Transactions:PartD
    • /
    • v.11D no.4
    • /
    • pp.823-834
    • /
    • 2004
  • In this research, the similarity search algorithms are provided for large video data streams. A video stream that consists of a number of frames can be expressed by a sequence in the multidimensional data space, by representing each frame with a multidimensional vector By analyzing various characteristics of the sequence, it is partitioned into multiple video segments and clusters which are represented by hyper-rectangles. Using the hyper-rectangles of video segments and clusters, similarity functions between two video streams are defined, and two similarity search algorithms are proposed based on the similarity functions algorithms by hyper-rectangles and by representative frames. The former is an algorithm that guarantees the correctness while the latter focuses on the efficiency with a slight sacrifice of the correctness Experiments on different types of video streams and synthetically generated stream data show the strength of our proposed algorithms.

Effects of Perceived Similarity between Consumers and Product Reviewers on Consumer Behaviors (상품후기 작성자에 대해 상품후기 독자가 느끼는 유사성이 상품후기 독자에게 미치는 영향)

  • Kim, Ji-Young;Suh, Eung-Kyo;Suh, Kil-Soo
    • Asia pacific journal of information systems
    • /
    • v.18 no.3
    • /
    • pp.67-90
    • /
    • 2008
  • Prior to making choices among online products and services, consumers often search online product reviews written by other consumers. Online product reviews have great influences on consumer behavior because they are believed to be more reliable than information provided by sellers. However, ever-increasing lists of product reviews make it difficult for consumers to find the right information efficiently. A customized search mechanism is a method to provide personalized information which fits the user's requirements. This study examines effects of a customized search mechanism and perceived similarity between consumers and product reviewers on consumer behaviors. More specifically, we address the following research questions: (1) Can a customized search mechanism increase perceived similarity between product review authors and readers? (2) Are product reviews perceived as more credible when product reviews were written by the authors perceived similar to them? (3) Does credibility of product reviews have a positive impact on acceptance of product reviews? (4) Does acceptance of product reviews have an influence on purchase intention of the readers? To examine these research questions, a lab experiment with a between-subject factor (whether a customized search mechanism is provided or not) design was employed. In order to enhance mundane realism and increase generalizability of the findings, the experiment sites were built based on a real online store, cherrya.com (http://www.cherrya.com/). Sixty participants were drawn from a pool that consisted of undergraduate and graduate students in a large university. Participation was voluntary; all the participants received 5,000 won to encourage their motivation and involvement in the experiment tasks. In addition, 15 participants, who selected by a random draw, received 30,000 won to actually purchase the product that he or she decided to buy during the experiment. Of the 60 participants, 25 were male and 35 were female. In examining the homogeneity between the two groups, the results of t-tests revealed no significant difference in gender, age, academic years, online shopping experience, and Internet usage. To test our research model, we completed tests of the measurement models and the structural models using PLS Graph version 3.00. The analysis confirmed individual item reliability, internal consistency, and discriminant validity of measurements. The results show that participants feel more credible when product reviews were written by the authors perceived similar to them, credibility of product reviews have a positive impact on acceptance of product reviews, and acceptance of product reviews have an influence on purchase intention of the readers. However, a customized search mechanism did not increase perceived similarity between product review authors and readers. The results imply that there is an urgent need to develop a better customized search tool in order to increase perceived similarity between product review authors and readers.

An Efficient Algorithm for Similarity Search in Large Biosequence Database (대용량 유전체를 위한 효율적인 유사성 검색 알고리즘)

  • Jeong, In-Seon;Park, Kyoung-Wook;Lim, Hyeong-Seok
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • v.9 no.2
    • /
    • pp.1073-1076
    • /
    • 2005
  • Since the size of biosequence database grows exponentially every year, it becomes impractical to use Smith-Waterman algorithm for exact sequence similarity search. For fast sequence similarity search, researchers have been proposed heuristic methods that use the frequency of characters in subsequences. These methods have the defect that different sequences are treated as the same sequence. Because of using only the frequency of characters, the accuracy of these methods are lower than Smith-Waterman algorithm. In this paper, we propose an algorithm which processes query efficiently by indexing the frequency of characters including the positional information of characters in subsequences. The experiments show that our algorithm improve the accuracy of sequence similarity search approximately 5${\sim}$20% than heuristic algorithms using only the frequency of characters.

  • PDF

Semantic Similarity Search using the Signature Tree (시그니처 트리를 사용한 의미적 유사성 검색 기법)

  • Kim, Ki-Sung;Im, Dong-Hyuk;Kim, Cheol-Han;Kim, Hyoung-Joo
    • Journal of KIISE:Databases
    • /
    • v.34 no.6
    • /
    • pp.546-553
    • /
    • 2007
  • As ontologies are used widely, interest for semantic similarity search is also increasing. In this paper, we suggest a query evaluation scheme for k-nearest neighbor query, which retrieves k most similar objects to the query object. We use the best match method to calculate the semantic similarity between objects and use the signature tree to index annotation information of objects in database. The signature tree is usually used for the set similarity search. When we use the signature tree in similarity search, we are required to predict the upper-bound of similarity for a node; the highest similarity value which can be found when we traverse into the node. So we suggest a prediction function for the best match similarity function and prove the correctness of the prediction. And we modify the original signature tree structure for same signatures not to be stored redundantly. This improved structure of signature tree not only reduces the size of signature tree but also increases the efficiency of query evaluation. We use the Gene Ontology(GO) for our experiments, which provides large ontologies and large amount of annotation data. Using GO, we show that proposed method improves query efficiency and present several experimental results varying the page size and using several node-splitting methods.

Video Data Modeling for Supporting Structural and Semantic Retrieval (구조 및 의미 검색을 지원하는 비디오 데이타의 모델링)

  • 복경수;유재수;조기형
    • Journal of KIISE:Databases
    • /
    • v.30 no.3
    • /
    • pp.237-251
    • /
    • 2003
  • In this paper, we propose a video retrieval system to search logical structure and semantic contents of video data efficiently. The proposed system employs a layered modelling method that orBanifes video data in raw data layer, content layer and key frame layer. The layered modelling of the proposed system represents logical structures and semantic contents of video data in content layer. Also, the proposed system supports various types of searches such as text search, visual feature based similarity search, spatio-temporal relationship based similarity search and semantic contents search.