• Title/Summary/Keyword: String Matching

Search Result 101, Processing Time 0.028 seconds

Searching for Variants Using Trie-Index (트라이 인덱스를 이용한 이형태 검색)

  • Park, In-Cheol
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.10 no.8
    • /
    • pp.1986-1992
    • /
    • 2009
  • A user often searches a data by inputting a variant such as the abbreviation or substring of a word, or a misspelled word. The simple approach to the searching for variants is to build a variants dictionary. However, it entails enormous cost and time and can not handle variants by misspelling. Approximate searching, searching by approximate string matching, is a good approach to the searching. A problem in the approach is that it cannot handle variants by abbreviations. This paper propose a method for searching various variants including abbreviations and misspelled words, by using the trie indexing. First, this paper shows a variant matching method with the calculation of path weighted-metric. In addition, it provides variant searching algorithm to reduce the search time.

Large Scale Voice Dialling using Speaker Adaptation (화자 적응을 이용한 대용량 음성 다이얼링)

  • Kim, Weon-Goo
    • Journal of Institute of Control, Robotics and Systems
    • /
    • v.16 no.4
    • /
    • pp.335-338
    • /
    • 2010
  • A new method that improves the performance of large scale voice dialling system is presented using speaker adaptation. Since SI (Speaker Independent) based speech recognition system with phoneme HMM uses only the phoneme string of the input sentence, the storage space could be reduced greatly. However, the performance of the system is worse than that of the speaker dependent system due to the mismatch between the input utterance and the SI models. A new method that estimates the phonetic string and adaptation vectors iteratively is presented to reduce the mismatch between the training utterances and a set of SI models using speaker adaptation techniques. For speaker adaptation the stochastic matching methods are used to estimate the adaptation vectors. The experiments performed over actual telephone line shows that proposed method shows better performance as compared to the conventional method. with the SI phonetic recognizer.

The Consensus String Problem based on Radius is NP-complete (거리반경기반 대표문자열 문제의 NP-완전)

  • Na, Joong-Chae;Sim, Jeong-Seop
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.36 no.3
    • /
    • pp.135-139
    • /
    • 2009
  • The problems to compute the distances or similarities of multiple strings have been vigorously studied in such diverse fields as pattern matching, web searching, bioinformatics, computer security, etc. One well-known method to compare multiple strings in the given set is finding a consensus string which is a representative of the given set. There are two objective functions that are frequently used to find a consensus string, one is the radius and the other is the consensus error. The radius of a string x with respect to a set S of strings is the smallest number r such that the distance between the string x and each string in S is at most r. A consensus string based on radius is a string that minimizes the radius with respect to a given set. The consensus error of a string with respect to a given set S is the sum of the distances between x and all the strings in S. A consensus string of S based on consensus error is a string that minimizes the consensus error with respect to S. In this paper, we show that the problem of finding a consensus string based on radius is NP-complete when the distance function is a metric.

A Traffic Pattern Matching Hardware for a Contents Security System (콘텐츠 보안 시스템용 트래픽 패턴 매칭 하드웨어)

  • Choi, Young;Hong, Eun-Kyung;Kim, Tae-Wan;Paek, Seung-Tae;Choi, Il-Hoon;Oh, Hyeong-Cheol
    • Journal of the Institute of Electronics Engineers of Korea CI
    • /
    • v.46 no.1
    • /
    • pp.88-95
    • /
    • 2009
  • This paper presents a traffic pattern matching hardware that can be used in high performance network applications. The presented hardware is designed for a contents security system which is to block various kinds of information drain or intrusion activities. The hardware consists of two parts: the header lookup and string pattern matching parts. For implementing the header lookup part in hardware, the TCAMs(ternary CAMs) are popularly used. Since the TCAM approach is inefficient in terms of the hardware and memory costs and the power consumption, however, we adopt and modify an alternative approach based on the comparator arrays and the HiCuts tree. Our implementation results, using Xilinx FPGA XC4VSX55, show that our design can reduce the usage of the FPGA slices by about 26%, and the Block RAM by about 58%. In the design of string pattern matching part, we design and use a hashing module based on cellular automata, which is hardware efficient and consumes less power by adaptively changing its configuration to reduce the collision rates.

A String Reconstruction Algorithm and Its Application to Exponentiation Problems (문자열 재구성 알고리즘 및 멱승문제 응용)

  • Sim, Jeong-Seop;Lee, Mun-Kyu;Kim, Dong-Kyue
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.35 no.9_10
    • /
    • pp.476-484
    • /
    • 2008
  • Most string problems and their solutions are relevant to diverse applications such as pattern matching, data compression, recently bioinformatics, and so on. However, there have been few works on the relations between string problems and cryptographic problems. In this paper, we consider the following string reconstruction problems and show how these problems can be applied to cryptography. Given a string x of length n over a constant-sized alphabet ${\sum}$ and a set W of strings of lengths at most an integer $k({\leq}n)$, the first problem is to find the sequence of strings in W that reconstruct x by the minimum number of concatenations. We propose an O(kn+L)-time algorithm for this problem, where L is the sum of all lengths of strings in a given set, using suffix trees and a shortest path algorithm for directed acyclic graphs. The other is a dynamic version of the first problem and we propose an $O(k^3n+L)$-time algorithm. Finally, we show that exponentiation problems that arise in cryptography can be successfully reduced to these problems and propose a new solution for exponentiation.

Order preserving matching with k mismatches (k개의 오차를 허용하는 순위 패턴 매칭)

  • Lee, Inbok
    • Smart Media Journal
    • /
    • v.9 no.2
    • /
    • pp.33-38
    • /
    • 2020
  • Order preserving matching refers to the problem of reporting substrings of a given text where there exists order isomorphism with the pattern. In this paper, we propose a new algorithm based on filtering and evaluation. The proposed algorithm is simple and easy to implement, and runs in linear time on average. Experimental results show that it works efficiently with real world data.

The Ontology based Resource Discovery (온톨로지 기반의 정보검색)

  • 정은경;김영민;변영철;이상준
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2003.10a
    • /
    • pp.121-123
    • /
    • 2003
  • 본 연구에서는 현재 웹 환경에서의 단순한 string matching 검색에 대한 한계점을 해결하기 위해서 온톨로지 기반의 자원검색 방안을 논한다. 테스트 베드로서 제주도의 숙박, 관광정보에 따른 온톨로지를 DAML+OIL언어로 생성하고 Jena에서 지원하는 API를 이용하여 사용자가 원하는 정보검색을 수행할 수 있는 테스트 베드 구축 방안도 제시한다.

  • PDF

Wine Label Character Recognition in Mobile Phone Images using a Lexicon-Driven Post-Processing (사전기반 후처리를 이용한 모바일 폰 영상에서 와인 라벨 문자 인식)

  • Lim, Jun-Sik;Kim, Soo-Hyung;Lee, Chil-Woo;Lee, Guee-Sang;Yang, Hyung-Jung;Lee, Myung-Eun
    • Journal of KIISE:Computing Practices and Letters
    • /
    • v.16 no.5
    • /
    • pp.546-550
    • /
    • 2010
  • In this paper, we propose a method for the postprocessing of cursive script recognition in Wine Label Images. The proposed method mainly consists of three steps: combination matrix generation, character combination filtering, string matching. Firstly, the combination matrix generation step detects all possible combinations from a recognition result for each of the pieces. Secondly, the unnecessary information in the combination matrix is removed by comparing with bigram of word in the lexicon. Finally, string matching step decides the identity of result as a best matched word in the lexicon based on the levenshtein distance. An experimental result shows that the recognition accuracy is 85.8%.

A Novel Cryptosystem Based on Steganography and Automata Technique for Searchable Encryption

  • Truong, Nguyen Huy
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.14 no.5
    • /
    • pp.2258-2274
    • /
    • 2020
  • In this paper we first propose a new cryptosystem based on our data hiding scheme (2,9,8) introduced in 2019 with high security, where encrypting and hiding are done at once, the ciphertext does not depend on the input image size as existing hybrid techniques of cryptography and steganography. We then exploit our automata approach presented in 2019 to design two algorithms for exact and approximate pattern matching on secret data encrypted by our cryptosystem. Theoretical analyses remark that these algorithms both have O(n) time complexity in the worst case, where for the approximate algorithm, we assume that it uses ⌈(1-ε)m)⌉ processors, where ε, m and n are the error of our string similarity measure and lengths of the pattern and secret data, respectively. In searchable encryption, our cryptosystem is used by users and our pattern matching algorithms are performed by cloud providers.