• Title/Summary/Keyword: sequence alignment

Search Result 351, Processing Time 0.024 seconds

DNA Sequence Alignment Using a Graph-based Distributed System (그래프 기반 분산 시스템을 이용한 염기 서열 정렬)

  • Lee, Jun-Su;Ahn, Jae-Gyoon;Yeu, Yun-Ku;Roh, Hong-Chan;Park, Sang-Hyun
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2013.05a
    • /
    • pp.894-897
    • /
    • 2013
  • 서열 정렬(sequence alignment)은 유전학(genomic)에서 널리 사용되는 도구 중 하나이다. 최근에는 차세대 시퀀싱 기술(NGS)이 발달함에 따라 데이터의 생산량이 크게 증가했고, 이에 따라 높은 처리량(throughput)을 가진 서열 정렬 알고리즘의 필요성이 증가하였다. 본 논문에서 제안하는 염기 서열 정렬 알고리즘은 시퀀스(sequence)데이터를 그래프 형태로 변형시킨 다음, 마이크로소프트사의 그래프 기반인 메모리(in-memory) 분산시스템(distributed system) 트리니티(Trinity)를 이용해 서열 정렬을 수행한다. 본 논문의 알고리즘은 트리니티 시스템에서 시뮬레이션 염기 데이터를 성공적으로 정렬하였으며, 슬레이브의 개수가 늘어날수록 빠른 속도를 나타내어 확장성(scalability)을 입증했다.

Applying Genomic Sequence Alignment Methodology for Source Codes Plagiarism Detection (유전체 서열의 정렬 기법을 이용한 소스 코드 표절 검사)

  • 강은미;황미녕;조환규
    • Journal of KIISE:Computing Practices and Letters
    • /
    • v.9 no.3
    • /
    • pp.352-367
    • /
    • 2003
  • The syntactic and semantic characteristics of a computer program can be represented by the keywords sequence extracted from the source code. Therefore the similarity and the difference between two programs can be clearly figured out by comparing the keyword sequences obtained from the given programs. Various methods for measuring the similarity of two different sequences have been intensively studied already in bioinformatics on biological genetic sequence manipulation. In this paper, we propose a new method for measuring the similarity of two different programs and detecting the partial plagiarism by exploiting the sequence alignment techniques. In order to evaluate the performance of the proposed method, we experimented with the actual Program codes submitted by 70 students attending a Data Structure course )tow 2001. The experimental results show that the proposed method is more effective and powerful than the fingerprint method which is the most commonly used for the Plagiarism detection.

A management Technique for Protein Version Information based on Local Sequence Alignment and Trigger (로컬 서열 정렬과 트리거 기반의 단백질 버전 정보 관리 기법)

  • Jung Kwang-Su;Park Sung-Hee;Ryu Keun-Ho
    • The KIPS Transactions:PartD
    • /
    • v.12D no.1 s.97
    • /
    • pp.51-62
    • /
    • 2005
  • After figuring out the function of an amino acid sequence, we can infer the function of the other amino acids that have similar sequence composition. Besides, it is possible that we alter protein whose function we know, into useful protein using genetic engineering method. In this process. an original protein amino sequence produces various protein sequences that have different sequence composition. Here, a systematic technique is needed to manage protein version sequences and reference data of those sequences. Thus, in this paper we proposed a technique of managing protein version sequences based on local sequence alignment and a technique of managing protein historical reference data using Trigger This method automatically determines the similarity between an original sequence and each version sequence while the protein version sequences are stored into database. When this technique is employed, the storage space that stores protein sequences is also reduced. After storing the historical information of protein and analyzing the change of protein sequence, we expect that a new useful protein and drug are able to be discovered based on analysis of version sequence.

Development of a Fast Alignment Method of Micro-Optic Parts Using Multi Dimension Vision and Optical Feedback

  • Han, Seung-Hyun;Kim, Jin-Oh;Park, Joong-Wan;Kim, Jong-Han
    • 제어로봇시스템학회:학술대회논문집
    • /
    • 2003.10a
    • /
    • pp.273-277
    • /
    • 2003
  • A general process of electronic assembly is composed of a series of geometric alignments and bonding/screwing processes. After assembly, the function is tested in a following process of inspection. However, assembly of micro-optic devices requires both processes to be performed in equipment. Coarse geometric alignment is made by using vision and optical function is improved by the following fine motion based on feedback of tunable laser interferometer. The general system is composed of a precision robot system for 3D assembly, a 3D vision guided system for geometric alignment and an optical feedback system with a tunable laser. In this study, we propose a new fast alignment algorithm of micro-optic devices for both of visual and optical alignments. The main goal is to find a fastest alignment process and algorithms with state-of-the-art technology. We propose a new approach with an optimal sequence of processes, a visual alignment algorithm and a search algorithm for an optimal optical alignment. A system is designed to show the effectiveness and efficiency of the proposed method.

  • PDF

Implementation of Parallel Local Alignment Method for DNA Sequence using Apache Spark (Apache Spark을 이용한 병렬 DNA 시퀀스 지역 정렬 기법 구현)

  • Kim, Bosung;Kim, Jinsu;Choi, Dojin;Kim, Sangsoo;Song, Seokil
    • The Journal of the Korea Contents Association
    • /
    • v.16 no.10
    • /
    • pp.608-616
    • /
    • 2016
  • The Smith-Watrman (SW) algorithm is a local alignment algorithm which is one of important operations in DNA sequence analysis. The SW algorithm finds the optimal local alignment with respect to the scoring system being used, but it has a problem to demand long execution time. To solve the problem of SW, some methods to perform SW in distributed and parallel manner have been proposed. The ADAM which is a distributed and parallel processing framework for DNA sequence has parallel SW. However, the parallel SW of the ADAM does not consider that the SW is a dynamic programming method, so the parallel SW of the ADAM has the limit of its performance. In this paper, we propose a method to enhance the parallel SW of ADAM. The proposed parallel SW (PSW) is performed in two phases. In the first phase, the PSW splits a DNA sequence into the number of partitions and assigns them to multiple nodes. Then, the original Smith-Waterman algorithm is performed in parallel at each node. In the second phase, the PSW estimates the portion of data sequence that should be recalculated, and the recalculation is performed on the portions in parallel at each node. In the experiment, we compare the proposed PSW to the parallel SW of the ADAM to show the superiority of the PSW.

Using multiple sequence alignment to extract daily activity routines of the elderly living alone

  • Lee, Bogyeong;Lee, Hyun-Soo;Park, Moonseo;Ahn, Changbum Ryan;Choi, Nakjung;Kim, Toseung
    • Advances in Computational Design
    • /
    • v.4 no.2
    • /
    • pp.73-90
    • /
    • 2019
  • The growth in the number of single-member households is a critical issue worldwide, especially among the elderly. For those living alone, who may be unaware of their health status or routines that could improve their health, a continuous healthcare monitoring system could provide valuable feedback. Assessing the performance adequacy of activities of daily living (ADL) can serve as a measure of an individual's health status; previous research has focused on determining a person's daily activities and extracting the most frequently performed behavioral patterns using camera recordings or wearable sensing techniques. However, existing methods used to extract common patterns of an occupant's activities in the home fail to address the spatio-temporal dimensions of human activities simultaneously. Though multiple sequence alignment (MSA) offers some advantages - such as inherent containment of the spatio-temporal data in sequence format, and rapid identification of hidden patterns - MSA has rarely been used to extract in-home ADL routines. This research proposes a method to extract a household occupant's ADL routines from a cumulative spatio-temporal data log of occupancy collected using a non-intrusive method (i.e., a tomographic motion detection system). The findings from an occupant's 28-day spatio-temporal activity log demonstrate the capacity of the proposed approach to identify routine patterns of an occupant's daily activities and to reveal the order, duration, and frequency of routine activities. Routine ADL patterns identified from the proposed approach are expected to provide a basis for detecting/evaluating abrupt or gradual changes of an occupant's ADL patterns that result from a physical or mental disorder, and can offer valuable information for home automation applications by enabling the prediction of ADL patterns.

A Local Alignment Algorithm using Normalization by Functions (함수에 의한 정규화를 이용한 local alignment 알고리즘)

  • Lee, Sun-Ho;Park, Kun-Soo
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.34 no.5_6
    • /
    • pp.187-194
    • /
    • 2007
  • A local alignment algorithm does comparing two strings and finding a substring pair with size l and similarity s. To find a pair with both sufficient size and high similarity, existing normalization approaches maximize the ratio of the similarity to the size. In this paper, we introduce normalization by functions that maximizes f(s)/g(l), where f and g are non-decreasing functions. These functions, f and g, are determined by experiments comparing DNA sequences. In the experiments, our normalization by functions finds appropriate local alignments. For the previous algorithm, which evaluates the similarity by using the longest common subsequence, we show that the algorithm can also maximize the score normalized by functions, f(s)/g(l) without loss of time.

Global Sequence Homology Detection Using Word Conservation Probability

  • Yang, Jae-Seong;Kim, Dae-Kyum;Kim, Jin-Ho;Kim, Sang-Uk
    • Interdisciplinary Bio Central
    • /
    • v.3 no.4
    • /
    • pp.14.1-14.9
    • /
    • 2011
  • Protein homology detection is an important issue in comparative genomics. Because of the exponential growth of sequence databases, fast and efficient homology detection tools are urgently needed. Currently, for homology detection, sequence comparison methods using local alignment such as BLAST are generally used as they give a reasonable measure for sequence similarity. However, these methods have drawbacks in offering overall sequence similarity, especially in dealing with eukaryotic genomes that often contain many insertions and duplications on sequences. Also these methods do not provide the explicit models for speciation, thus it is difficult to interpret their similarity measure into homology detection. Here, we present a novel method based on Word Conservation Score (WCS) to address the current limitations of homology detection. Instead of counting each amino acid, we adopted the concept of 'Word' to compare sequences. WCS measures overall sequence similarity by comparing word contents, which is much faster than BLAST comparisons. Furthermore, evolutionary distance between homologous sequences could be measured by WCS. Therefore, we expect that sequence comparison with WCS is useful for the multiple-species-comparisons of large genomes. In the performance comparisons on protein structural classifications, our method showed a considerable improvement over BLAST. Our method found bigger micro-syntenic blocks which consist of orthologs with conserved gene order. By testing on various datasets, we showed that WCS gives faster and better overall similarity measure compared to BLAST.

Online Handwritten Digit Recognition by Smith-Waterman Alignment (Smith-Waterman 정렬 알고리즘을 이용한 온라인 필기체 숫자인식)

  • Mun, Won-Ho;Choi, Yeon-Seok;Lee, Sang-Geol;Cha, Eui-Young
    • Journal of the Korea Society of Computer and Information
    • /
    • v.16 no.9
    • /
    • pp.27-33
    • /
    • 2011
  • In this paper, we propose an efficient on-line handwritten digit recognition base on Convex-Concave curves feature which is extracted by a chain code sequence using Smith-Waterman alignment algorithm. The time sequential signal from mouse movement on the writing pad is described as a sequence of consecutive points on the x-y plane. So, we can create data-set which are successive and time-sequential pixel position data by preprocessing. Data preprocessed is used for Convex-Concave curves feature extraction. This feature is scale-, translation-, and rotation-invariant. The extracted specific feature is fed to a Smith-Waterman alignment algorithm, which in turn classifies it as one of the nine digits. In comparison with backpropagation neural network, Smith-Waterman alignment has the more outstanding performance.