• Title/Summary/Keyword: sequence alignment

Search Result 346, Processing Time 0.031 seconds

An Analysis System for Whole Genomic Sequence Using String B-Tree (스트링 B-트리를 이용한 게놈 서열 분석 시스템)

  • Choe, Jeong-Hyeon;Jo, Hwan-Gyu
    • The KIPS Transactions:PartA
    • /
    • v.8A no.4
    • /
    • pp.509-516
    • /
    • 2001
  • As results of many genome projects, genomic sequences of many organisms are revealed. Various methods such as global alignment, local alignment are used to analyze the sequences of the organisms, and k -mer analysis is one of the methods for analyzing the genomic sequences. The k -mer analysis explores the frequencies of all k-mers or the symmetry of them where the k -mer is the sequenced base with the length of k. However, existing on-memory algorithms are not applicable to the k -mer analysis because a whole genomic sequence is usually a large text. Therefore, efficient data structures and algorithms are needed. String B-tree is a good data structure that supports external memory and fits into pattern matching. In this paper, we improve the string B-tree in order to efficiently apply the data structure to k -mer analysis, and the results of k -mer analysis for C. elegans and other 30 genomic sequences are shown. We present a visualization system which enables users to investigate the distribution and symmetry of the frequencies of all k -mers using CGR (Chaotic Game Representation). We also describe the method to find the signature which is the part of the sequence that is similar to the whole genomic sequence.

  • PDF

Optimized and Portable FPGA-Based Systolic Cell Architecture for Smith-Waterman-Based DNA Sequence Alignment

  • Shah, Hurmat Ali;Hasan, Laiq;Koo, Insoo
    • Journal of information and communication convergence engineering
    • /
    • v.14 no.1
    • /
    • pp.26-34
    • /
    • 2016
  • The alignment of DNA sequences is one of the important processes in the field of bioinformatics. The Smith-Waterman algorithm (SWA) performs optimally for aligning sequences but is computationally expensive. Field programmable gate array (FPGA) performs the best on parameters such as cost, speed-up, and ease of re-configurability to implement SWA. The performance of FPGA-based SWA is dependent on efficient cell-basic implementation-unit design. In this paper, we present an optimized systolic cell design while avoiding oversimplification, very large-scale integration (VLSI)-level design, and direct mapping of iterative equations such as previous cell designs. The proposed design makes efficient use of hardware resources and provides portability as the proposed design is not based on gate-level details. Our cell design implementing a linear gap penalty resulted in a performance improvement of 32× over a GPP platform and surpassed the hardware utilization of another implementation by a factor of 4.23.

An effcient algorithm for multiple sequence alignment (복수 염기서열 정렬을 위한 한 유용성 알고리즘)

  • Kim, Jin;Song, Min-Dong
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 1998.10c
    • /
    • pp.51-53
    • /
    • 1998
  • 3개 이상의 DNA 혹은 단백질의 염기서열을 정렬하는 복수 염기서열 정렬(multiple sequence alignment)방법은 염기서열들 사이의 진화관계, gene regulation, 단백질의 구조와 기능에 관한 연구에 필수적인 도구이다. 복수 염기서열 정렬문제는 NP-complete 문제군에 속하며, 이 문제를 해결하기 위하여 가장 유용하게 사용되는 알고리즘으로는 dynamic programming이 있다. Dynamic programming은 주어진 입력 염기서열 군들에 대한 최적의 정렬을 생산할 수 있다. 그러나 dynamic programming의 단점은 오랜 실행시간이 요구되며, 때로는 dynamic programming의 속성 때문에 이 알고리즘을 사용하여도 주어진 입력 염기서열 군들에 대한 최적의 정렬을 얻어내지 못하는 경우가 있다. 본 연구에서는 이러한 dynamic programming의 문제를 해결하기 위하여 genetic algorithm을 복수 염기서열 정렬문제에 적용하였다. 본 논문에서는 genetic algorithm의 design과 적용방법을 기술하였다. 본 연구에서 제안된 genetic algorithm을 사용하여 dynamic programming의 단점이었던 오랜 실행시간을 줄일 수 있었으며, dynamic programming이 제공하지 못하는 최적의 염기서열 정렬을 제공할 수 있었다.

  • PDF

A Classification Method for Deformed Words Using Multiple Sequence Alignment (다중서열정렬을 이용한 변형단어집합의 분류 기법)

  • Kim, Sung-Hwan;Cho, Hwan-Gue
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2012.06b
    • /
    • pp.264-266
    • /
    • 2012
  • 인터넷 상에서의 변형 단어들을 처리하는 문제는 정보 검색, 기계 번역, 웹 마이닝, 욕설 및 스팸 필터링과 같은 다양한 분야에서 사용될 수 있다. 특히 단어의 변형 추이를 파악하는 등 데이터 수집 및 분석을 위해서는 주어진 단어가 어떤 변형 단어의 집합으로 이루어진 부류에 포함되는지 여부를 파악해야 할 필요성이 있다. 본 논문에서는 같은 부류에 속한 변형 단어 집합에 대하여 다중 서열 정렬(multiple sequence alignment)을 수행함으로써 해당 집합을 하나의 대표 문자열로 취급하는 변환 기법을 제안하고, 이를 이용해 주어진 단어가 해당 부류에 속하는지 여부를 효과적으로 분류하는 기법을 소개한다. 실험결과 제안 기법의 분류 성능은 민감도 93.4% 수준에서 89.1%의 특이도를 보여 전수 비교를 통한 분류에 비하여 결코 성능은 하락하지 않으면서 분류 속도는 16.5배 향상되었음을 확인할 수 있었다.

On heuristics for multiple sequence alignment (복수 염기서열 정렬을 위한 휴리스틱에 관하여)

  • Kim, Jin;Chang, Yeon-Ah;Choi, Hong-Sik
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 1999.10a
    • /
    • pp.661-663
    • /
    • 1999
  • 복수 염기서열 정렬(multiple sequence alignment)은 염기서열들 사이의 진화관계, 단백질의 구조와 기능에 관한 연구에 필수적인 도구이다. 다이나믹 프로그래밍(dynamic programming) 방법은 대부분의 경우에 있어 최적의 염기서열 정렬 결과를 제공할 수 있다. 그러나 그것이 사용하는 갭 비용함수 때문에 특별한 경우에 최적의 염기서열 정렬을 만들어 내지 못한다. 본 논문에서는 다이나믹 프로그래밍에 의해 획득된 염기서열을 개선하기 위한 휴리스틱 방법을 제안한 후, 실제 단백질 데이터를 가지고 성능 분석을 한다.

  • PDF

Alignment of Tilted TEM Images for 3D Reconstruction (3차원 복원을 위하여 특정 투사각도에서 획득한 TEM 영상열의 정렬)

  • Lee, Jun-Ho;Lee, Ji-Ho;Kim, Dong-Sik
    • Proceedings of the IEEK Conference
    • /
    • 2007.07a
    • /
    • pp.207-208
    • /
    • 2007
  • In this paper, the tilted image sequence, which is obtained the transmission electron microscopy (TEM) for a 3D reconstruction, is aligned based on the fiducial marker method. A direct correlation method is also conducted between adjacent tilted images for the performance comparison. Using real TEM tilted images, we can successfully perform the alignment.

  • PDF

Performance Improvement of BLAST using Grid Computing and Implementation of Genome Sequence Analysis System (그리드 컴퓨팅을 이용한 BLAST 성능개선 및 유전체 서열분석 시스템 구현)

  • Kim, Dong-Wook;Choi, Han-Suk
    • The Journal of the Korea Contents Association
    • /
    • v.10 no.7
    • /
    • pp.81-87
    • /
    • 2010
  • This paper proposes a G-BLAST(BLAST using Grid Computing) system, an integrated software package for BLAST searches operated in heterogeneous distributed environment. G-BLAST employed 'database splicing' method to improve the performance of BLAST searches using exists computing resources. G-BLAST is a basic local alignment search tool of DNA Sequence using grid computing in heterogeneous distributed environment. The G-BLAST improved the existing BLAST search performance in gene sequence analysis. Also G-BLAST implemented the pipeline and data management method for users to easily manage and analyze the BLAST search results. The proposed G-BLAST system has been confirmed the speed and efficiency of BLAST search performance in heterogeneous distributed computing.

Cloning and characterization of a cDNA encoding a paired box protein, PAX7, from black sea bream, Acanthopagrus schlegelii

  • Choi, Jae Hoon;Han, Dan Hee;Gong, Seung Pyo
    • Journal of Animal Reproduction and Biotechnology
    • /
    • v.36 no.4
    • /
    • pp.314-322
    • /
    • 2021
  • Paired box protein, PAX7, is a key molecule for the specification, maintenance and skeletal muscle regeneration of muscle satellite cells. In this study, we identified and characterized the cDNA and amino acid sequences of PAX7 from black sea bream (Acanthopagrus schlegelii) via molecular cloning and sequence analysis. A. schlegelii PAX7 cDNA was comprised of 1,524 bp encoding 507 amino acids and multiple sequence alignment analysis of the translated amino acids showed that it contained three domains including paired DNA-binding domain, homeobox domain and OAR domain which were well conserved across various animal species investigated. Pairwise Sequence Alignment indicated that A. schlegelii PAX7 had the same amino acid sequences with that of yellowfin seabream (A. latus) and 99.8% identity and similarity with that of gilt-head bream (Sparus aurata). Molecular phylogenetic analysis confirmed that A. schlegelii PAX7 formed a monophyletic group with those of teleost and most closely related with those of the fish that belong to Sparidae family including A. latus and S. aurata. In the investigation of its tissue specific mRNA expression, the expression was specifically identified in skeletal muscle tissue and a weak expression was also shown in gonad tissue. The cultured cells derived from skeletal muscle tissues expressed PAX7 mRNA at early passage but the expression was not observed after several times of subculture.

An Analysis of Trip Chain of Freight Travel using Sequence Alignment Methods (Sequence Alignment 기법을 활용한 화물 통행의 Trip Chain 분석)

  • Joh, Chang-Hyeon
    • Journal of the Economic Geographical Society of Korea
    • /
    • v.14 no.4
    • /
    • pp.540-552
    • /
    • 2011
  • Freight travel pattern has been less studied comparing with the field of passenger travel. Nonetheless, the importance of the freight travel has been increasing in urban travel sector, and the research needs on the freight travel demand hence is increasing. The current paper aims to identify, by tons of freight trucks and cargos, the characteristics of mean travel pattern, efficiency or performance, and the characteristics of freight trip chain regarding destination location, destination type and freight type. The study analyzed the nation-wide data of freight travel behavior survey. This study intended to set the starting framework of decision-making principle in freight travel, which has already been popular in passenger travel study. Findings suggest that those characteristics are clearly distinguished among trucks and cargos of different sizes of tons. The results are expected to provide important insight to the development of relevant transportation policy measures.

  • PDF

Mining Approximate Sequential Patterns in a Large Sequence Database (대용량 순차 데이터베이스에서 근사 순차패턴 탐색)

  • Kum Hye-Chung;Chang Joong-Hyuk
    • The KIPS Transactions:PartD
    • /
    • v.13D no.2 s.105
    • /
    • pp.199-206
    • /
    • 2006
  • Sequential pattern mining is an important data mining task with broad applications. However, conventional methods may meet inherent difficulties in mining databases with long sequences and noise. They may generate a huge number of short and trivial patterns but fail to find interesting patterns shared by many sequences. In this paper, to overcome these problems, we propose the theme of approximate sequential pattern mining roughly defined as identifying patterns approximately shared by many sequences. The proposed method works in two steps: one is to cluster target sequences by their similarities and the other is to find consensus patterns that ire similar to the sequences in each cluster directly through multiple alignment. For this purpose, a novel structure called weighted sequence is presented to compress the alignment result, and the longest consensus pattern that represents each cluster is generated from its weighted sequence. Finally, the effectiveness of the proposed method is verified by a set of experiments.