Search | Korea Science

Order preserving matching with k mismatches (k개의 오차를 허용하는 순위 패턴 매칭)

Lee, Inbok
- Smart Media Journal
- /
- v.9 no.2
- /
- pp.33-38
- /
- 2020
Order preserving matching refers to the problem of reporting substrings of a given text where there exists order isomorphism with the pattern. In this paper, we propose a new algorithm based on filtering and evaluation. The proposed algorithm is simple and easy to implement, and runs in linear time on average. Experimental results show that it works efficiently with real world data.
https://doi.org/10.30693/SMJ.2020.9.2.33 인용 PDF KSCI

Parallel Computation For The Edit Distance Based On The Four-Russians' Algorithm (4-러시안 알고리즘 기반의 편집거리 병렬계산)

Kim, Young Ho;Jeong, Ju-Hui;Kang, Dae Woong;Sim, Jeong Seop
- KIPS Transactions on Computer and Communication Systems
- /
- v.2 no.2
- /
- pp.67-74
- /
- 2013
Approximate string matching problems have been studied in diverse fields. Recently, fast approximate string matching algorithms are being used to reduce the time and costs for the next generation sequencing. To measure the amounts of errors between two strings, we use a distance function such as the edit distance. Given two strings X(|X| = m) and Y(|Y| = n) over an alphabet ${\Sigma}$, the edit distance between X and Y is the minimum number of edit operations to convert X into Y. The edit distance between X and Y can be computed using the well-known dynamic programming technique in O(mn) time and space. The edit distance also can be computed using the Four-Russians' algorithm whose preprocessing step runs in $O((3{\mid}{\Sigma}{\mid})^{2t}t^2)$ time and $O((3{\mid}{\Sigma}{\mid})^{2t}t)$ space and the computation step runs in O(mn/t) time and O(mn) space where t represents the size of the block. In this paper, we present a parallelized version of the computation step of the Four-Russians' algorithm. Our algorithm computes the edit distance between X and Y in O(m+n) time using m/t threads. Then we implemented both the sequential version and our parallelized version of the Four-Russians' algorithm using CUDA to compare the execution times. When t = 1 and t = 2, our algorithm runs about 10 times and 3 times faster than the sequential algorithm, respectively.
https://doi.org/10.3745/KTCCS.2013.2.2.067 인용 PDF KSCI

Fast, Flexible Text Search Using Genomic Short-Read Mapping Model

Kim, Sung-Hwan;Cho, Hwan-Gue
- ETRI Journal
- /
- v.38 no.3
- /
- pp.518-528
- /
- 2016
The searching of an extensive document database for documents that are locally similar to a given query document, and the subsequent detection of similar regions between such documents, is considered as an essential task in the fields of information retrieval and data management. In this paper, we present a framework for such a task. The proposed framework employs the method of short-read mapping, which is used in bioinformatics to reveal similarities between genomic sequences. In this paper, documents are considered biological objects; consequently, edit operations between locally similar documents are viewed as an evolutionary process. Accordingly, we are able to apply the method of evolution tracing in the detection of similar regions between documents. In addition, we propose heuristic methods to address issues associated with the different stages of the proposed framework, for example, a frequency-based fragment ordering method and a locality-aware interval aggregation method. Extensive experiments covering various scenarios related to the search of an extensive document database for documents that are locally similar to a given query document are considered, and the results indicate that the proposed framework outperforms existing methods.
https://doi.org/10.4218/etrij.16.0115.0594 인용 PDF KSCI

Parallel Computation for Extended Edit Distances Using the Shared Memory on GPU (GPU의 공유메모리를 활용한 확장편집거리 병렬계산)

Kim, Youngho;Na, Joong Chae;Sim, Jeong Seop
- KIPS Transactions on Computer and Communication Systems
- /
- v.4 no.7
- /
- pp.213-218
- /
- 2015
Given two strings X and Y (|X|=m, |Y|=n) over an alphabet ${\Sigma}$, the extended edit distance between X and Y can be computed using dynamic programming in O(mn) time and space. Recently, a parallel algorithm that takes O(m+n) time and O(mn) space using m threads to compute the extended edit distance between X and Y was presented. In this paper, we present an improved parallel algorithm using the shared memory on GPU. The experimental results show that our parallel algorithm runs about 19~25 times faster than the previous parallel algorithm.
https://doi.org/10.3745/KTCCS.2015.4.7.213 인용 PDF KSCI

Ontology Alignment based on Parse Tree Kernel usig Structural and Semantic Information (구조 및 의미 정보를 활용한 파스 트리 커널 기반의 온톨로지 정렬 방법)

Son, Jeong-Woo;Park, Seong-Bae
- Journal of KIISE:Software and Applications
- /
- v.36 no.4
- /
- pp.329-334
- /
- 2009
The ontology alignment has two kinds of major problems. First, the features used for ontology alignment are usually defined by experts, but it is highly possible for some critical features to be excluded from the feature set. Second, the semantic and the structural similarities are usually computed independently, and then they are combined in an ad-hoc way where the weights are determined heuristically. This paper proposes the modified parse tree kernel (MPTK) for ontology alignment. In order to compute the similarity between entities in the ontologies, a tree is adopted as a representation of an ontology. After transforming an ontology into a set of trees, their similarity is computed using MPTK without explicit enumeration of features. In computing the similarity between trees, the approximate string matching is adopted to naturally reflect not only the structural information but also the semantic information. According to a series of experiments with a standard data set, the kernel method outperforms other structural similarities such as GMO. In addition, the proposed method shows the state-of-the-art performance in the ontology alignment.
PDF KSCI

Search Result 15, Processing Time 0.018 seconds

Order preserving matching with k mismatches (k개의 오차를 허용하는 순위 패턴 매칭)

Parallel Computation For The Edit Distance Based On The Four-Russians' Algorithm (4-러시안 알고리즘 기반의 편집거리 병렬계산)

Fast, Flexible Text Search Using Genomic Short-Read Mapping Model

Parallel Computation for Extended Edit Distances Using the Shared Memory on GPU (GPU의 공유메모리를 활용한 확장편집거리 병렬계산)

Ontology Alignment based on Parse Tree Kernel usig Structural and Semantic Information (구조 및 의미 정보를 활용한 파스 트리 커널 기반의 온톨로지 정렬 방법)

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)