• Title/Summary/Keyword: Trie

Search Result 87, Processing Time 0.025 seconds

A Practical Approximate Sub-Sequence Search Method for DNA Sequence Databases (DNA 시퀀스 데이타베이스를 위한 실용적인 유사 서브 시퀀스 검색 기법)

  • Won, Jung-Im;Hong, Sang-Kyoon;Yoon, Jee-Hee;Park, Sang-Hyun;Kim, Sang-Wook
    • Journal of KIISE:Databases
    • /
    • v.34 no.2
    • /
    • pp.119-132
    • /
    • 2007
  • In molecular biology, approximate subsequence search is one of the most important operations. In this paper, we propose an accurate and efficient method for approximate subsequence search in large DNA databases. The proposed method basically adopts a binary trie as its primary structure and stores all the window subsequences extracted from a DNA sequence. For approximate subsequence search, it traverses the binary trie in a breadth-first fashion and retrieves all the matched subsequences from the traversed path within the trie by a dynamic programming technique. However, the proposed method stores only window subsequences of the pre-determined length, and thus suffers from large post-processing time in case of long query sequences. To overcome this problem, we divide a query sequence into shorter pieces, perform searching for those subsequences, and then merge their results. To verify the superiority of the proposed method, we conducted performance evaluation via a series of experiments. The results reveal that the proposed method, which requires smaller storage space, achieves 4 to 17 times improvement in performance over the suffix tree based method. Even when the length of a query sequence is large, our method is more than an order of magnitude faster than the suffix tree based method and the Smith-Waterman algorithm.

A Structure of Korean Electronic Dictionary using the Finite State Transducer (Finite State Transducer를 이용한 한국어 전자 사전의 구조)

  • Baek, Dae-Ho;Lee, Ho;Rim, Hae-Chang
    • Annual Conference on Human and Language Technology
    • /
    • 1995.10a
    • /
    • pp.181-187
    • /
    • 1995
  • 한국어 형태소 해석기와 같은 한국어 정보 치리 시스템은 많은 전자 사진 검색 작업을 요구하기 때문에 전자 사전의 성능은 전체 시스템의 성능에 많은 영향을 미친다. 이에 본 논문은 적은 기억 장소를 차지하면서 탐색 속도가 빠른 Finite State Transducer(FST)를 이용한 전자 사전 구조를 제안한다. 제안된 전자 사진은 Deterministic Finite State Automata(DFA)로 표제어를 표현하고 DFA 상태수 최소화 알고리즘으로 모든 위치에 존재하는 중복된 상태를 제거하여 필요한 기억 장소가 적으며, FST를 일차원 배열에 매핑하고 탐색시 이 배열내에서의 상태 전이만으로 탐색을 하기 때문에 탐색 속도가 매우 빠르다. 또한 TRIE 구조에서와 같이 한번의 탐색으로 입력된 단어로 가능한 모든 표제어들을 찾아 줄 수 있다. 실험 결과 표제어 수가 증가하여도 FST를 이용한 전자 사전의 크기는 표제어 수에 비례하여 커지지 않고, 전자 사전 탐색 시간은 표제어 수에 영향을 받지 않으며, 약 237만 단어를 검색하는 실험에서 TRIE나 $B^+-Tree$구조를 사용한 전자 사전보다 빠름을 알 수 있었다.

  • PDF

Certains problemes fondamentaux de la syntaxe $reconsid{\acute{e}}r{\acute{e}}s$ du point de vue de la syntaxe positionnelle (위치통사론을 통해 살펴 본 몇 가지 통사론의 본질적 문제)

  • Leem, Jai-Ho
    • Lingua Humanitatis
    • /
    • v.7
    • /
    • pp.271-289
    • /
    • 2005
  • Dans cet article, nous discutons de certains $probl{\grave{e}}mes$ syntaxiques en ayant recours $\grave{a}$ la $th{\acute{e}}orie$ linguistique de Milner. Nous remettons en question $l'ind{\acute{e}}pendance$ et $l'identit{\acute{e}}$ de la structure syntaxique, la relation entre le plan syntaxique et le plan lexical, le $caract{\grave{e}}re$ de la $g{\acute{e}}om{\acute{e}}trie$ de la syntaxe, etc.. La discussion est non seulement linguistique mais aussi interdisciplinaire et ${\acute{e}}pist{\acute{e}}mologique$, dans la mesure $o{\grave{u}}$ nous examinons la nature de $l'entit{\acute{e}}$ syntaxique et la $m{\acute{e}}thode$ "scientifique" de la syntaxe qui donne $acc{\grave{e}}s$ $\grave{a}$ $l'entit{\acute{e}}$ syntaxique. Selon Milner, il faut distinguer la place du terme lexical avec la position syntaxique qui est $l'entit{\acute{e}}$ syntaxique. La $premi{\grave{e}}re$ n'est pas syntaxique $\grave{a}$ strictement parler, mais elle, observable contrairement $\grave{a}$ la seconde, sert $\grave{a}$ conjecturer la dimension syntaxique, $c'est-\grave{a}-dire$ le $syst{\grave{e}}me$ positionnel. Le dispositif $th{\acute{e}}orique$ dans la $th{\acute{e}}orie$ linguistique de Milner n'est rien d'autre que l'ensemble des propositions qui permet, en absence d'observatoire, de conjecturer le $syst{\grave{e}}me$ positionnel sur la base du $syst{\grave{e}}me$ des places. Dire $l'ind{\acute{e}}pendance$ de la structure syntaxique revient $\grave{a}$ dire qu'il y a une coupure entre le $syst{\grave{e}}me$ positionnel et le $syst{\grave{e}}me$ des places. Autrement dit, sans cette coupure, on ne peut parler de $l'ind{\acute{e}}pendance$ de la structure syntaxique. Ainsi $distingu{\acute{e}}s$, les deux $syst{\grave{e}}mes$ en cause se mettent en relation soit naturels soit non naturels ou par distorsion $d'apr{\grave{e}}s$ Milner. La relation naturelle est une relation lexico-syntaxique $n{\acute{e}}e$ au moment $o{\grave{u}}$ un terme lexical occupe une position syntaxique dont la $cat{\acute{e}}gorie$ est identique $\grave{a}$ celle de son occupant. A la $diff{\acute{e}}rence$ de cette relation d'occupation naturelle $suppos{\acute{e}}e$ chez Milner comme une tendance du langage naturel, la relation d'occupation non naturelle est "paradoxale" dans le sens $o{\grave{u}}$ elle est produite dans la rencontre plus ou moins "anomale" entre l'occupant lexical et $l'occup{\acute{e}}$ syntaxique. Le $degr{\acute{e}}$ de l'anomalie qu'une langue autorise peut ${\hat{e}}tre$ $mesur{\acute{e}}$ empiriquement et doit ${\hat{e}}tre$ $vari{\acute{e}}$ en fonction de la langue $concern{\acute{e}}e$. Le $caract{\grave{e}re$ $g{\acute{e}}om{\acute{e}}trique$ de la syntaxe $am{\grave{e}}ne$ ${\grave{a}}$ remettre en cause, entre autres, $l'empiricit{\acute{e}}$ et la $mat{\acute{e}}rialit{\acute{e}}$ de la $g{\acute{e}}om{\acute{e}}trie$ syntaxique. En ce qui concerne ces sujets, nos $th{\grave{e}}ses$ sont les suivantes : la nature de la $g{\acute{e}}om{\acute{e}}trie$ syntaxique n'est pas a priori mais empirique ; la $g{\acute{e}}om{\acute{e}}trie$ de la syntaxe peut et doit ${\hat{e}}tre$ construite $\grave{a}$ l'aide de la logique "empirique".

  • PDF

A Bit-Map Trie for the High-Speed Longest Prefix Search of IP Addresses (고속의 최장 IP 주소 프리픽스 검색을 위한 비트-맵 트라이)

  • 오승현;안종석
    • Journal of KIISE:Information Networking
    • /
    • v.30 no.2
    • /
    • pp.282-292
    • /
    • 2003
  • This paper proposes an efficient data structure for forwarding IPv4 and IPv6 packets at the gigabit speed in backbone routers. The LPM(Longest Prefix Matching) search becomes a bottleneck of routers' performance since the LPM complexity grows in proportion to the forwarding table size and the address length. To speed up the forwarding process, this paper introduces a data structure named BMT(Bit-Map Tie) to minimize the frequent main memory accesses. All the necessary search computations in BMT are done over a small index table stored at cache. To build the small index table from the tie representation of the forwarding table, BMT represents a link pointer to the child node and a node pointer to the corresponding entry in the forwarding table with one bit respectively. To improve the poor performance of the conventional tries when their height becomes higher due to the increase of the address length, BMT adopts a binary search algorithm for determining the appropriate level of tries to start. The simulation experiments show that BMT compacts the IPv4 backbone routers' forwarding table into a small one less than 512-kbyte and achieves the average speed of 250ns/packet on Pentium II processors, which is almost the same performance as the fastest conventional lookup algorithms.

An effective algorithm for checking subsumption relation on string data containing wildcard characters (Wildcard character를 포함하는 String Data 사이의 Subsumption 관계 확인을 위한 효율적인 알고리즘)

  • 김도한;박희진;백은옥
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2004.10a
    • /
    • pp.712-714
    • /
    • 2004
  • 본 논문에서는 wildcard character를 포함하는 문자열의 집합을 대상으로, 이들 사이의 subsumption 관계를 파악하여 더 구체적인 정보를 가지는 문자열들의 집합을 구하고자 하는 것이다. 이를 위해 기존의 suffix tree 알고리즘이 wildcard character를 포함하는 문자열을 처리할 수 있도록 단순 적용한 방법과 trie의 집합을 이용하여 wildcard character를 포함한 문자열을 처리하는 두 가지 방법을 고려하였다

  • PDF

A Study on the Implementation of Small Capacity Dictionary for Mobile Equipments Using a CBDS tree (CBDS 트리를 이용한 모바일 기기용 저용량 사전 구현에 관한 연구)

  • Jung Kyu-Cheol;Lee Jin-Hwan;Jang Hye-Suk;Park Ki-hong
    • Journal of the Korea Society of Computer and Information
    • /
    • v.10 no.5 s.37
    • /
    • pp.33-40
    • /
    • 2005
  • Recently So far Many low-cost mobile machinery have been produced. Those are being used for study and business. But those are some weak Points which are small-capacity storage and quite low-speed system. If we use general database programs or key-searching algorithm, It could decrease in performance of system. To solve those Problems, we applied CBDS(Compact Binary Digital Search) trie to mobile environment. As a result we could accomplish our goal which are quick searching and low-capacity indexing. We compared with some Java classes such as TreeSet to evaluation. As a result, the velocity of searching was a little slow than B-tree based TreeSet. But the storage space have been decreased by 29 percent. So I think that it would be practical use.

  • PDF

A Parallel Match Method for Path-oriented Query Processing in iW- Databases (XML 데이타베이스에서 경로-지향 질의처리를 위한 병렬 매치 방법)

  • Park Hee-Sook;Cho Woo-Hyun
    • Journal of KIISE:Databases
    • /
    • v.32 no.5
    • /
    • pp.558-566
    • /
    • 2005
  • The XML is the new standard fir data representation and exchange on the Internet. In this paper, we describe a new approach for evaluating a path-oriented query against XML document. In our approach, we propose the Parallel Match Indexing Fabric to speed up evaluation of path-oriented query using path signature and design the parallel match algorithm to perform a match process between a path signature of input query and path signatures of elements stored in the database. To construct a structure of the parallel match indexing, we first make the binary tie for all path signatures on an XML document and then which trie is transformed to the Parallel Match Indexing Fabric. Also we use the Parallel Match Indexing Fabric and a parallel match algorithm for executing a search operation of a path-oriented query. In our proposed approach, Time complexity of the algorithm is proportional to the logarithm of the number of path signatures in the XML document.

Region Query Reconstruction Method Using Trie-Structured Quad Tree in USN Middleware (USN 미들웨어에서 트라이 구조 쿼드 트리를 이용한 영역 질의 재구성 기법)

  • Cho, Sook-Kyoung;Jeong, Mi-Young;Jung, Hyun-Meen;Kim, Jong-Hoon
    • Journal of Korea Spatial Information System Society
    • /
    • v.10 no.1
    • /
    • pp.15-28
    • /
    • 2008
  • In ubiquitous sensor networks(USN) environment, it is essential to process region query for user-demand services. Using R-tree is a preferred technique to process region query for in-network query environment. In USN environment, USN middleware must select sensors that transfers region query with accuracy because the lifetime of sensors is that of whole sensor networks. When R-tree is used, however, it blindly passes the region query including non-existent sensors where MBR(Minimum Boundary Rectangle) of R-tree is Intersected by region of query. To solve in this problem, we propose a reconstruction of region query method which is a trie-structured Quad tree in the base station that includes sensors in region of query select with accuracy. We observed that the proposed method delays response time than R-tree, but is useful for reducing communication cost and energy consumption.

  • PDF

Two-dimensional Binary Search Tree for Packet Classification at Internet Routers (인터넷 라우터에서의 패킷 분류를 위한 2차원 이진 검색 트리)

  • Lee, Goeun;Lim, Hyesook
    • Journal of the Institute of Electronics and Information Engineers
    • /
    • v.52 no.6
    • /
    • pp.21-31
    • /
    • 2015
  • The Internet users want to get real-time services for various multi-media applications. Network traffic rate has been rapidly increased, and data amounts that the Internet has to carry have been exponentially increased. A packet is the basic unit in transferring data at the Internet, and packet classification is one of the most challenging functionalities that routers should perform at wire-speed. Among various known packet classification algorithms, area-based quad-trie (AQT) algorithm is one of the efficient algorithms which can lookup five header fields simultaneously. As a representative space decomposition algorithm, the AQT requires a small amount of memory in storing classification rules, but it does not provide high-speed classification performance. In this paper, we propose a new packet classification algorithm by applying a binary search for the codewords of the AQT to overcome the issue of the AQT. Throughout simulation, it is shown that the proposed algorithm provides a better performance than the AQT in the number of rule comparisons with each input packet.