• Title/Summary/Keyword: Search Tree

Search Result 636, Processing Time 0.034 seconds

Improvement of algorithm for calculating word count using character hash and binary search tree (문자 해시와 이원 탐색 트리를 이용한 어절 빈도 계산 알고리즘의 성능 개선)

  • Park, Il-Nam;Kang, Seung-Shik
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2010.11a
    • /
    • pp.599-602
    • /
    • 2010
  • 인터넷 검색 사이트는 사용자들이 검색한 단어들의 순위를 매기는 실시간 검색 순위 서비스를 제공하는데 검색되는 단어들의 순위를 매기기 위해서는 각 단어들의 분포도를 알 수 있는 어절 빈도 계산을 수행해야 한다. 어절 빈도는 BST(Binary Search Tree)를 수행하여 계산할 수 있는데, 사용자에 의하여 검색되는 단어들은 길이와 그 형태가 다양하여 빈도 계산시에 BST 의 깊이가 깊어져서 계산 시간이 오래 걸리게 된다. 본 논문에서는 문자 해시를 이용하여 깊이가 깊은 BST 의 탐색 속도를 개선하는 알고리즘을 제안하였다. 이 방법으로 빈도 계산 속도를 비교하였을 때 문자 해시의 범위에 의해 1KB 의 추가적인 기억공간의 사용하여 9.3%의 성능 개선 효과가 있었고, 해시 공간을 10KB 추가로 사용할 때는 24.3%, 236KB 일 때는 40.6%로의 효율로 BST 의 빈도 계산 속도를 향상 시킬 수 있었다.

A Simple Algorithm for Fast Codebook Search in Image Vector Quantization (벡터 양자화에서 벡터의 특성을 이용한 단축 탐색방법)

  • Koh, Jong-Seog;Kim, Jae-Kyoon;Kim, Seong-Dae
    • Proceedings of the KIEE Conference
    • /
    • 1987.07b
    • /
    • pp.1434-1437
    • /
    • 1987
  • We present a very simple algorithm for reducing the encoding (codebook search) complexity of vector quantization (VQ), exploiting some features of a vector currently being encoded. A proposed VQ of 16 (=$4{\times}4$) vector dimension and 256 codewords shows a slight performance degradation of about 0.1-0.9 dB, however, with only 16 or 32 among 256 codeword searches, i.e., with just 1/16 or 1/8 search complexity compared to a full-search VQ. And the proposed VQ scheme is also compared to and shown to be a bit superior to tree-search VQ with regard to their SNR performance and memory requirement.

  • PDF

Efficient Analysis of Korean Dependency Structures Using Beam Search Algorithms (Beam Search 알고리즘을 이용한 효율적인 한국어 의존 구조 분석)

  • Kim, Hark-Soo;Seo, Jung-Yun
    • Annual Conference on Human and Language Technology
    • /
    • 1998.10c
    • /
    • pp.281-286
    • /
    • 1998
  • 구문분석(syntactic analysis)은 형태소 분석된 결과를 입력으로 받아 구문단위간의 관계를 결정해 주는 자연어 처리의 한 과정이다. 그러나 구문분석된 결과는 많은 중의성(ambiguity)을 갖게 되며, 이러한 중의성은 이후의 자연어 처리 수행과정에서 많은 복잡성(complexity)를 유발하게 된다. 지금까지 이러한 문제를 해결하기 위한 여러 가지 연구들이 있었으며, 그 중 하나가 대량의 데이터로부터 추출된 통계치를 이용한 방법이다. 그러나, 생성된 모든 구문 트리(parse tree)에 통계치를 부여하고, 그것들을 순위화하는 것은 굉장히 시간 소모적인 일(time-consuming job)이다. 그러므로, 생성 가능한 트리의 수를 효과적으로 줄이는 방법이 필요하다. 본 논문에서는 이러한 문제를 해결하기 위해 개선된 beam search 알고리즘을 제안하고, 기존의 방법과 비교한다. 본 논문에서 제안된 beam search 알고리즘을 사용한 구문분석기는 beam search를 사용하지 않은 구문분석기가 생성하는 트리 수의 1/3정도만으로도 같은 구문 구조 정확률을 보였다.

  • PDF

RepWeb: A Web-Based Search Tool for Repeat-Related Literatures

  • Woo, Tae-Ha;Kim, Young-Uk;Kwon, Je-Keun;Seo, Jung-Min
    • Genomics & Informatics
    • /
    • v.5 no.2
    • /
    • pp.88-91
    • /
    • 2007
  • Repetitive sequences such as SINE, LINE, and LTR elements form a major part of eukaryotic genomes. A literature search tool that summarizes the information contained within repeat elements would provide biologists in the field of genomics with a useful tool for analyzing genomic sequence features. We developed a java program designed to make literature access easier by using two search engines simultaneously. RepWeb is a web-based search system that provides a user friendly interface for searching the reference data and journals for information related to repeat elements by using the search engines, Google Scholar and PubMed, simultaneously. It provides an interface that displays the repeat element- related biological information, and includes useful functions such as the production of a repeat tree, clickable links to PubMed and Google Scholar, exporting, and sorting a field into date, author, journal and title.

A 0.5-2.0 GHz Dual-Loop SAR-controlled Duty-Cycle Corrector Using a Mixed Search Algorithm

  • Han, Sangwoo;Kim, Jongsun
    • JSTS:Journal of Semiconductor Technology and Science
    • /
    • v.13 no.2
    • /
    • pp.152-156
    • /
    • 2013
  • This paper presents a fast-lock dual-loop successive approximation register-controlled duty-cycle corrector (SARDCC) circuit using a mixed (binary+sequential) search algorithm. A wider duty-cycle correction range, higher operating frequency, and higher duty-cycle correction accuracy have been achieved by utilizing the dual-loop architecture and the binary search SAR that achieves the fast duty-cycle correcting property. By transforming the binary search SAR into a sequential search counter after the first DCC lock-in, the proposed dual-loop SARDCC keeps the closed-loop characteristic and tracks variations in process, voltage, and temperature (PVT). The measured duty cycle error is less than ${\pm}0.86%$ for a wide input duty-cycle range of 15-85 % over a wide frequency range of 0.5-2.0 GHz. The proposed dual-loop SARDCC is fabricated in a 0.18-${\mu}m$, 1.8-V CMOS process and occupies an active area of $0.075mm^2$.

A Study on the Implementation of Small Capacity Dictionary for Mobile Equipments Using a CBDS tree (CBDS 트리를 이용한 모바일 기기용 저용량 사전 구현에 관한 연구)

  • Jung Kyu-Cheol;Lee Jin-Hwan;Jang Hye-Suk;Park Ki-hong
    • Journal of the Korea Society of Computer and Information
    • /
    • v.10 no.5 s.37
    • /
    • pp.33-40
    • /
    • 2005
  • Recently So far Many low-cost mobile machinery have been produced. Those are being used for study and business. But those are some weak Points which are small-capacity storage and quite low-speed system. If we use general database programs or key-searching algorithm, It could decrease in performance of system. To solve those Problems, we applied CBDS(Compact Binary Digital Search) trie to mobile environment. As a result we could accomplish our goal which are quick searching and low-capacity indexing. We compared with some Java classes such as TreeSet to evaluation. As a result, the velocity of searching was a little slow than B-tree based TreeSet. But the storage space have been decreased by 29 percent. So I think that it would be practical use.

  • PDF

A Study on Clustering and Identifying Gene Sequences using Suffix Tree Clustering Method and BLAST (서픽스트리 클러스터링 방법과 블라스트를 통합한 유전자 서열의 클러스터링과 기능검색에 관한 연구)

  • Han, Sang-Il;Lee, Sung-Gun;Kim, Kyung-Hoon;Lee, Ju-Yeong;Kim, Young-Han;Hwang, Kyu-Suk
    • Journal of Institute of Control, Robotics and Systems
    • /
    • v.11 no.10
    • /
    • pp.851-856
    • /
    • 2005
  • The DNA and protein data of diverse species have been daily discovered and deposited in the public archives according to each established format. Database systems in the public archives provide not only an easy-to-use, flexible interface to the public, but also in silico analysis tools of unidentified sequence data. Of such in silico analysis tools, multiple sequence alignment [1] methods relying on pairwise alignment and Smith-Waterman algorithm [2] enable us to identify unknown DNA, protein sequences or phylogenetic relation among several species. However, in the existing multiple alignment method as the number of sequences increases, the runtime increases exponentially. In order to remedy this problem, we adopted a parallel processing suffix tree algorithm that is able to search for common subsequences at one time without pairwise alignment. Also, the cross-matching subsequences triggering inexact-matching among the searched common subsequences might be produced. So, the cross-matching masking process was suggested in this paper. To identify the function of the clusters generated by suffix tree clustering, BLAST was combined with a clustering tool. Our clustering and annotating tool is summarized as the following steps: (1) construction of suffix tree; (2) masking of cross-matching pairs; (3) clustering of gene sequences and (4) annotating gene clusters by BLAST search. The system was successfully evaluated with 22 gene sequences in the pyrubate pathway of bacteria, clustering 7 clusters and finding out representative common subsequences of each cluster

Design and Implementation of an Extended Directory System for Management of the DFR Attributes (DFR 속성 관리를 위한 확장된 디렉토리 시스템의 설계 및 구현)

  • Im, Jae-Hong;Kim, Yeong-Jun
    • The Transactions of the Korea Information Processing Society
    • /
    • v.3 no.6
    • /
    • pp.1542-1552
    • /
    • 1996
  • This paper presents a design and implementation of an extended directory system in manage and provide an effective search operation of the DFR(Document Filing and Retrieval) Object's attributes. For this, the design and implementation of a configuration model for interworking between the DFR and directory systems, an association mechanisms between two application's operation, extended directory schema and extended DIT(Directory Information Tree) are described. In a distributed environment, the interworking between the DFR and directory system implemented by using QUIPU 8.0 of ISODE(ISO Development Environment) directory system is tested. based on the result of this paper, an extended model of the directory system is proposed for proving search operation of various application's objects.

  • PDF

The Vocabulary Recognition Optimize using Acoustic and Lexical Search (음향학적 및 언어적 탐색을 이용한 어휘 인식 최적화)

  • Ahn, Chan-Shik;Oh, Sang-Yeob
    • Journal of Korea Multimedia Society
    • /
    • v.13 no.4
    • /
    • pp.496-503
    • /
    • 2010
  • Speech recognition system is developed of standalone, In case of a mobile terminal using that low recognition rate represent because of limitation of memory size and audio compression. This study suggest vocabulary recognition highest performance improvement system for separate acoustic search and lexical search. Acoustic search is carry out in mobile terminal, lexical search is carry out in server processing system. feature vector of speech signal extract using GMM a phoneme execution, recognition a phoneme list transmission server using Lexical Tree Search algorithm lexical search recognition execution. System performance as a result of represent vocabulary dependence recognition rate of 98.01%, vocabulary independence recognition rate of 97.71%, represent recognition speed of 1.58 second.

Feature-Based Image Retrieval using SOM-Based R*-Tree

  • Shin, Min-Hwa;Kwon, Chang-Hee;Bae, Sang-Hyun
    • Proceedings of the KAIS Fall Conference
    • /
    • 2003.11a
    • /
    • pp.223-230
    • /
    • 2003
  • Feature-based similarity retrieval has become an important research issue in multimedia database systems. The features of multimedia data are useful for discriminating between multimedia objects (e 'g', documents, images, video, music score, etc.). For example, images are represented by their color histograms, texture vectors, and shape descriptors, and are usually high-dimensional data. The performance of conventional multidimensional data structures(e'g', R- Tree family, K-D-B tree, grid file, TV-tree) tends to deteriorate as the number of dimensions of feature vectors increases. The R*-tree is the most successful variant of the R-tree. In this paper, we propose a SOM-based R*-tree as a new indexing method for high-dimensional feature vectors.The SOM-based R*-tree combines SOM and R*-tree to achieve search performance more scalable to high dimensionalities. Self-Organizing Maps (SOMs) provide mapping from high-dimensional feature vectors onto a two dimensional space. The mapping preserves the topology of the feature vectors. The map is called a topological of the feature map, and preserves the mutual relationship (similarity) in the feature spaces of input data, clustering mutually similar feature vectors in neighboring nodes. Each node of the topological feature map holds a codebook vector. A best-matching-image-list. (BMIL) holds similar images that are closest to each codebook vector. In a topological feature map, there are empty nodes in which no image is classified. When we build an R*-tree, we use codebook vectors of topological feature map which eliminates the empty nodes that cause unnecessary disk access and degrade retrieval performance. We experimentally compare the retrieval time cost of a SOM-based R*-tree with that of an SOM and an R*-tree using color feature vectors extracted from 40, 000 images. The result show that the SOM-based R*-tree outperforms both the SOM and R*-tree due to the reduction of the number of nodes required to build R*-tree and retrieval time cost.

  • PDF