• Title/Summary/Keyword: Suffix Structure Analysis

Search Result 5, Processing Time 0.021 seconds

Suffix Tree Constructing Algorithm for Large DNA Sequences Analysis (대용량 DNA서열 처리를 위한 서픽스 트리 생성 알고리즘의 개발)

  • Choi, Hae-Won
    • Journal of Korea Society of Industrial Information Systems
    • /
    • v.15 no.1
    • /
    • pp.37-46
    • /
    • 2010
  • A Suffix Tree is an efficient data structure that exposes the internal structure of a string and allows efficient solutions to a wide range of complex string problems, in particular, in the area of computational biology. However, as the biological information explodes, it is impossible to construct the suffix trees in main memory. We should find an efficient technique to construct the trees in a secondary storage. In this paper, we present a method for constructing a suffix tree in a disk for large set of DNA strings using new index scheme. We also show a typical application example with a suffix tree in the disk.

Comparison Architecture for Large Number of Genomic Sequences

  • Choi, Hae-won;Ryoo, Myung-Chun;Park, Joon-Ho
    • Journal of Information Technology and Architecture
    • /
    • v.9 no.1
    • /
    • pp.11-19
    • /
    • 2012
  • Generally, a suffix tree is an efficient data structure since it reveals the detailed internal structures of given sequences within linear time. However, it is difficult to implement a suffix tree for a large number of sequences because of memory size constraints. Therefore, in order to compare multi-mega base genomic sequence sets using suffix trees, there is a need to re-construct the suffix tree algorithms. We introduce a new method for constructing a suffix tree on secondary storage of a large number of sequences. Our algorithm divides three files, in a designated sequence, into parts, storing references to the locations of edges in hash tables. To execute experiments, we used 1,300,000 sequences around 300Mbyte in EST to generate a suffix tree on disk.

Korean Semantic Role Labeling Based on Suffix Structure Analysis and Machine Learning (접사 구조 분석과 기계 학습에 기반한 한국어 의미 역 결정)

  • Seok, Miran;Kim, Yu-Seop
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.5 no.11
    • /
    • pp.555-562
    • /
    • 2016
  • Semantic Role Labeling (SRL) is to determine the semantic relation of a predicate and its argu-ments in a sentence. But Korean semantic role labeling has faced on difficulty due to its different language structure compared to English, which makes it very hard to use appropriate approaches developed so far. That means that methods proposed so far could not show a satisfied perfor-mance, compared to English and Chinese. To complement these problems, we focus on suffix information analysis, such as josa (case suffix) and eomi (verbal ending) analysis. Korean lan-guage is one of the agglutinative languages, such as Japanese, which have well defined suffix structure in their words. The agglutinative languages could have free word order due to its de-veloped suffix structure. Also arguments with a single morpheme are then labeled with statistics. In addition, machine learning algorithms such as Support Vector Machine (SVM) and Condi-tional Random Fields (CRF) are used to model SRL problem on arguments that are not labeled at the suffix analysis phase. The proposed method is intended to reduce the range of argument instances to which machine learning approaches should be applied, resulting in uncertain and inaccurate role labeling. In experiments, we use 15,224 arguments and we are able to obtain approximately 83.24% f1-score, increased about 4.85% points compared to the state-of-the-art Korean SRL research.

An Efficient Suffix Tree Reconstructing Algorithm for Biological Sequence Analysis (DNA 분석에 효율적인 서픽스 트리 재구성 알고리즘)

  • Choi, Hae-Won;Jung, Young-Seok;Kim, Sang-Jin
    • Journal of Digital Convergence
    • /
    • v.12 no.12
    • /
    • pp.265-275
    • /
    • 2014
  • This paper introduces a new algorithms for reconstructing the suffix tree of character string, when a substring id deleted from the string or a string is inserted into the string as a substring. The algorithem has two main functions, delete-structure and insert-structure. The main objective of this algorithm is to save the time for constructing the suffix tree of an edited string, when the suffix tree of the original string is available. We tested the performance of this algorithm with some DNA sequences. This test shows that delete-reconstructing can save time when the length of the subsequence deleted is less than 30% of the original sequence, and the insert-reconstructing takes less time with regard to the length of inserted sequence.

An Analysis of the Word-Final Cluster of the Syllable Structure (음절구조의 어말 자음군에 관한 분석)

  • Oh, Kwan-Young
    • English Language & Literature Teaching
    • /
    • v.10 no.2
    • /
    • pp.67-87
    • /
    • 2004
  • The purpose of this paper is to show how the coda of a syllable and word-final clusters are represented in the English syllable structure. Previous theories on the syllable assume that there is only one segment in the coda position. And, as we know, the theories that license only one segment in the coda make it difficult to syllabicate the word-final cluster appropriately when more than two segments in the word-final cluster are encountered. I considered three approaches: the previous syllable structure (Selkirk, 1982; Borowsky 1989), sonority sequencing (Giegerich, 1992; Roca, 1999) and feature analysis (Goldsmith, 1990), But, all the considered methods don't give us a satisfactory explanation regarding word-final clusters. Finally, I will suggest a modified syllable representation as an alternative by placing two different appendixes under the Phonological Word which forms a constituent above the syllable node. From this it is possible to explain the former problematic word-final clusters including morphological information asan inflectional suffix in the structure.

  • PDF