• Title/Summary/Keyword: Approximate searching

Search Result 48, Processing Time 0.027 seconds

Searching Sequential Patterns by Approximation Algorithm (근사 알고리즘을 이용한 순차패턴 탐색)

  • Sarlsarbold, Garawagchaa;Hwang, Young-Sup
    • Journal of the Korea Society of Computer and Information
    • /
    • v.14 no.5
    • /
    • pp.29-36
    • /
    • 2009
  • Sequential pattern mining, which discovers frequent subsequences as patterns in a sequence database, is an important data mining problem with broad applications. Since a sequential pattern in DNA sequences can be a motif, we studied to find sequential patterns in DNA sequences. Most previously proposed mining algorithms follow the exact matching with a sequential pattern definition. They are not able to work in noisy environments and inaccurate data in practice. Theses problems occurs frequently in DNA sequences which is a biological data. We investigated approximate matching method to deal with those cases. Our idea is based on the observation that all occurrences of a frequent pattern can be classified into groups, which we call approximated pattern. The existing PrefixSpan algorithm can successfully find sequential patterns in a long sequence. We improved the PrefixSpan algorithm to find approximate sequential patterns. The experimental results showed that the number of repeats from the proposed method was 5 times more than that of PrefixSpan when the pattern length is 4.

A Practical Approximate Sub-Sequence Search Method for DNA Sequence Databases (DNA 시퀀스 데이타베이스를 위한 실용적인 유사 서브 시퀀스 검색 기법)

  • Won, Jung-Im;Hong, Sang-Kyoon;Yoon, Jee-Hee;Park, Sang-Hyun;Kim, Sang-Wook
    • Journal of KIISE:Databases
    • /
    • v.34 no.2
    • /
    • pp.119-132
    • /
    • 2007
  • In molecular biology, approximate subsequence search is one of the most important operations. In this paper, we propose an accurate and efficient method for approximate subsequence search in large DNA databases. The proposed method basically adopts a binary trie as its primary structure and stores all the window subsequences extracted from a DNA sequence. For approximate subsequence search, it traverses the binary trie in a breadth-first fashion and retrieves all the matched subsequences from the traversed path within the trie by a dynamic programming technique. However, the proposed method stores only window subsequences of the pre-determined length, and thus suffers from large post-processing time in case of long query sequences. To overcome this problem, we divide a query sequence into shorter pieces, perform searching for those subsequences, and then merge their results. To verify the superiority of the proposed method, we conducted performance evaluation via a series of experiments. The results reveal that the proposed method, which requires smaller storage space, achieves 4 to 17 times improvement in performance over the suffix tree based method. Even when the length of a query sequence is large, our method is more than an order of magnitude faster than the suffix tree based method and the Smith-Waterman algorithm.

Optimal Network Design with Hooke-and-Jeeves Algorithm (Hooke-and-Jeeves 기법에 의한 최적가로망설계)

  • 장현봉;박창호
    • Journal of Korean Society of Transportation
    • /
    • v.6 no.1
    • /
    • pp.5-16
    • /
    • 1988
  • Development is given to an optimal network design method using continuous design variables. Modified Hooke-and-Jeeves algorithm is implemented in order to solve nonlinear programming problem which is approximately equivalent to the real network design problem with system efficiency crieteria and improvement cost as objective function. the method was tested for various forms of initial solution, and dimensions of initial step size of link improvements. At each searching point of evaluating the objective function, a link flow problem was solved with user equilibrium principles using the Frank-Wolfe algorithm. The results obtained are quite promising interms fo numbers of evaluation, and the speed of convergence. Suggestions are given to selections of efficient initial solution, initial step size and convergence criteria. An approximate method is also suggested for reducing computation time.

  • PDF

A Faster Algorithm for Target Search (근사적 확률을 이용한 표적 탐색)

  • Jeong, Seong-Jin;Hong, Seong-Pil;Jo, Seong-Jin;Park, Myeong-Ju
    • Proceedings of the Korean Operations and Management Science Society Conference
    • /
    • 2006.11a
    • /
    • pp.57-59
    • /
    • 2006
  • The purpose of search problem is to maximize the probability of target detection as limited search capability. Especially, as elapsing of time at a point of time of initial information received the target detection rate for searching an expected location due to a moving target such that wrecked ship or submarine decrease in these problems. The algorithm of search problem to a moving target having similar property of above targets should solve the search route as quickly as possible. In existing studies, they have a limit of applying in practice due to increasing computation time required by problem size (i.e., number of search area, search time). In this study, we provide that it takes more reasonable computation time than preceding studies even though extending a problem size practically using an approximate computation of probability.

  • PDF

Topology Optimization Considering Reliability (신뢰성을 고려한 위상최적설계)

  • Min, Seung-Jae;Bang, Seung-Hyun
    • Proceedings of the KSME Conference
    • /
    • 2004.04a
    • /
    • pp.468-473
    • /
    • 2004
  • New reliability-based topology optimization method is proposed by utilizing single-loop single vector approach, which approximate searching the most probable point in the probabilistic design domain analytically, to reduce the time cost and dealing with several constraints to handle practical design requirements. To examine uncertainties in the topology design of a structure, the modulus of elasticity of the material and applied loadings are considered as probabilistic design variables. The results of design examples show that the proposed method provides efficiency curtailing the time for the optimization process and accuracy satisfying the specified reliability.

  • PDF

Flaw Detection in Ceramics using Hough transform and Least squares

  • Hong, Dong-Jin;Cha, Eui-Young
    • Journal of the Korea Society of Computer and Information
    • /
    • v.20 no.10
    • /
    • pp.23-29
    • /
    • 2015
  • In this paper, we suggest a method of detecting defects by applying Hough transform and least squares on ceramic images obtained from non-destructive testing. In the ceramic images obtained from non-destructive testing, the background area, where the defect does not exist, commonly show gradual change of luminosity in vertical direction. In order to extract the background area which is going to be used in the detection of defects, Hough transform is performed to rotate the ceramic image in a way that the direction of overall luminosity change lies in the vertical direction as much as possible. Least squares are then applied on the rotated image to approximate the contrast value of the background area. The extracted background area is used for extracting defects from the ceramic images. In this paper we applied this method on ceramic images acquired from non-destructive testing. It was confirmed that extracted background area could be effectively applied for searching the section where the defect exists and detecting the defect.

Design of Genetic Algorithm-based Parking System for an Autonomous Vehicle

  • Xiong, Xing;Choi, Byung-Jae
    • International Journal of Fuzzy Logic and Intelligent Systems
    • /
    • v.9 no.4
    • /
    • pp.275-280
    • /
    • 2009
  • A Genetic Algorithm (GA) is a kind of search techniques used to find exact or approximate solutions to optimization and searching problems. This paper discusses the design of a genetic algorithm-based intelligent parking system. This is a search strategy based on the model of evolution to solve the problem of parking systems. A genetic algorithm for an optimal solution is used to find a series of optimal angles of the moving vehicle at a parking space autonomously. This algorithm makes the planning simpler and the movement more effective. At last we present some simulation results.

An Empirical Study of Base Pivot Choosing Method for Approximate Word Searching (근사 단어 검색 효율성 개선을 위한 기준 Pivot 선택방법 실험적 연구)

  • Yoon, Tai-Jin;Chung, Woo-Keun;Cho, Hwan-Gue
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2010.06c
    • /
    • pp.271-274
    • /
    • 2010
  • 한글 근사 단어 검색 시스템은 사용자의 오류를 포함한 검색 질의에 효과적으로 대응할 수 있는 방법이나 검색 속도가 매우 느려서 실제 사용에 큰 어려움이 있다. 일반적으로 DNA 검색에 사용하는 서열 정렬 기법을 사용할 경우 데이터 베이스의 모든 문자열과 비교가 이루어져야 하기 때문에 많은 검색 시간이 걸리게 된다. 이것을 해결하기 위해 우리는 편집거리가 metric space를 만족하는 성질을 이용한 한글 근사단어 검색 시스템을 사용하여 실제 서열정렬을 사용하여 비교가 필요한 후보 단어를 거르게 된다. 이 한글 근사 단어 검색 시스템에서 가장 중요한 것은 기준축의 역할을 하는 Base-Pivot의 선택 방법이다. 본 논문에서는 이 Base-Pivot의 효율적인 선택방법을 실험을 통해서 분석하도록 한다.

  • PDF

Korean Approximate String Searching System by Hierarchical Metric Space Structure (계층적 메트릭 공간(metric space) 구조의 한글 근사 단어 검색 시스템)

  • Yoon, Taijin;Cho, Hwan-Gue
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2010.04a
    • /
    • pp.397-400
    • /
    • 2010
  • 우리는 지난 연구에서 변형 비속어 필터링 시스템을 위하여 근사 문자열 검색 시스템을 적용하여 서열 정렬 횟수를 비약적으로 줄일 수 있었다. 다차원 데이터 구조를 이용한 한글 근사 검색 시스템은 기준축인 Base-Pivot의 숫자에 따라 검색 결과의 정확도를 높일 수 있으나 BP이 증가한 만큼 질의 단어의 좌표를 계산하기 위한 시간이 오래 걸린다. 소규모 데이터 검색에는 문제가 되지 않으나 60,000단어 이상의 데이터가 수록되는 국어사전과 같은 대규모 데이터를 검색하게 될 경우 요구되는 BP의 숫자도 증가하여 많은 연산시간을 필요로 한다. 본 논문에서는 기존의 근사 단어 검색 시스템을 계층구조화 하여 요구되는 BP 숫자를 감소 시켜 성능을 향상 시키는 방법을 제안하고자 한다. 그리고 실험을 통하여 본 아이디어의 실효성을 증명하였다. 본 아이디어는 기존의 6000개의 비속어에 대하여 약 20%정도의 성능향상을 보였다.

A DNA Index Structure using Frequency and Position Information of Genetic Alphabet (염기문자의 빈도와 위치정보를 이용한 DNA 인덱스구조)

  • Kim Woo-Cheol;Park Sang-Hyun;Won Jung-Im;Kim Sang-Wook;Yoon Jee-Hee
    • Journal of KIISE:Databases
    • /
    • v.32 no.3
    • /
    • pp.263-275
    • /
    • 2005
  • In a large DNA database, indexing techniques are widely used for rapid approximate sequence searching. However, most indexing techniques require a space larger than original databases, and also suffer from difficulties in seamless integration with DBMS. In this paper, we suggest a space-efficient and disk-based indexing and query processing algorithm for approximate DNA sequence searching, specially exact match queries, wildcard match queries, and k-mismatch queries. Our indexing method places a sliding window at every possible location of a DNA sequence and extracts its signature by considering the occurrence frequency of each nucleotide. It then stores a set of signatures using a multi-dimensional index, such as R*-tree. Especially, by assigning a weight to each position of a window, it prevents signatures from being concentrated around a few spots in index space. Our query processing algorithm converts a query sequence into a multi-dimensional rectangle and searches the index for the signatures overlapped with the rectangle. The experiments with real biological data sets revealed that the proposed method is at least three times, twice, and several orders of magnitude faster than the suffix-tree-based method in exact match, wildcard match, and k- mismatch, respectively.