• Title/Summary/Keyword: 서열정렬

Search Result 105, Processing Time 0.034 seconds

Design of Gene Alignment Program(FastA) Using Carpool and Grouping Schemes (카풀 및 그룹핑 기법을 이용한 유전자 서열 정렬 프로그램(FastA) 설계)

  • 이성준;김재훈;정진원;이원태
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2003.04a
    • /
    • pp.124-126
    • /
    • 2003
  • 생물정보학에서 사용되는 많은 프로그램들은 데이터베이스로 부터 방대한 양의 데이터를 검색하고 처리한다. 이러한 환경에서 사용자의 요청마다 데이터베이스를 검색하는 경우 사용자들의 대기 시간이 길어지고 시스템 용량을 초과한다. 이러한 데이터베이스 액세스의 문제점을 해결하기 위하여 카플 기법과 그룹핑 기법이 제안되었다. 본 논문에서는 카플 기법과 그룹핑 기법을 이용하여 유전자 서열 비교 프로그램인 Fasta를 구현하였고 사용자 응답시간을 측정하여 프로그램의 성능을 높일 수 있음을 확인하였다.

  • PDF

A CNV Detection Algorithm (CNV 영역 검색 알고리즘)

  • Sang-Kyoon Hong;Dong-Wan Hong;Jee-Hee Yoon
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2008.11a
    • /
    • pp.356-359
    • /
    • 2008
  • 최근 생물정보학 분야에서 인간 유전체에 존재하는 CNV(copy number variation)에 관한 연구가 주목 받고 있다. CNV 영역은 1kbp-3Mbp 사리의 서열이 반복되거나 결실되는 변이 영역으로 정의된다. 우리는 선행연구에서 기가 시퀀싱(giga sequencing)의 결과 산출되는 DNA 서열조각인 리드(read)를 레퍼런스 시퀀스에 서열 정렬하여 CNV 영역을 찾아내는 새로운 CNV 검색 방식을 제안하였다. 후속 연구로서 본 논문에서는 DNA 서열에 존재하는 repeat 영역 문제를 해결하기 위한 새로운 방안을 제안하고, 리드의 출현 빈도 정보를 분석하여 CNV 영역을 찾아내는 CNV 영역 검색 알고리즘을 보인다. 제안된 알고리즘 Gaussian 분포를 갖는 출현 빈도 정보로부터 통계적 유의성을 갖는 영역을 추출하여 CNV 영역후보로 하고, 다음 경제 과정을 거쳐 최종의 CNV 영역을 추출한다. 성능 평가를 위하여 프로토타임 시스템을 개발하였으며, 시뮬레이션 실험을 수행하였다. 실험 결과에 의하여 제안된 방식은 반복되거나 결실되는 형태의 CNV 영역을 효율적으로 검출하며, 또한 다양한 크기의 CNV 영역을 효율적으로 검출할 수 있음을 입증한다.

Sequence based Intrusion Detection using Similarity Matching of the Multiple Sequence Alignments (다중서열정렬의 유사도 매칭을 이용한 순서기반 침입탐지)

  • Kim Yong-Min
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.16 no.1
    • /
    • pp.115-122
    • /
    • 2006
  • The most methods for intrusion detection are based on the misuse detection which accumulates hewn intrusion information and makes a decision of an attack against any behavior data. However it is very difficult to detect a new or modified aoack with only the collected patterns of attack behaviors. Therefore, if considering that the method of anomaly behavior detection actually has a high false detection rate, a new approach is required for very huge intrusion patterns based on sequence. The approach can improve a possibility for intrusion detection of known attacks as well as modified and unknown attacks in addition to the similarity measurement of intrusion patterns. This paper proposes a method which applies the multiple sequence alignments technique to the similarity matching of the sequence based intrusion patterns. It enables the statistical analysis of sequence patterns and can be implemented easily. Also, the method reduces the number of detection alerts and false detection for attacks according to the changes of a sequence size.

Next Generation Sequencing and Bioinformatics (차세대 염기서열 분석기법과 생물정보학)

  • Kim, Ki-Bong
    • Journal of Life Science
    • /
    • v.25 no.3
    • /
    • pp.357-367
    • /
    • 2015
  • With the ongoing development of next-generation sequencing (NGS) platforms and advancements in the latest bioinformatics tools at an unprecedented pace, the ultimate goal of sequencing the human genome for less than $1,000 can be feasible in the near future. The rapid technological advances in NGS have brought about increasing demands for statistical methods and bioinformatics tools for the analysis and management of NGS data. Even in the early stages of the commercial availability of NGS platforms, a large number of applications or tools already existed for analyzing, interpreting, and visualizing NGS data. However, the availability of this plethora of NGS data presents a significant challenge for storage, analyses, and data management. Intrinsically, the analysis of NGS data includes the alignment of sequence reads to a reference, base-calling, and/or polymorphism detection, de novo assembly from paired or unpaired reads, structural variant detection, and genome browsing. While the NGS technologies have allowed a massive increase in available raw sequence data, a number of new informatics challenges and difficulties must be addressed to improve the current state and fulfill the promise of genome research. This review aims to provide an overview of major NGS technologies and bioinformatics tools for NGS data analyses.

Development of Contig Assembly Program for Nucleotide Sequencing (염기서열 해독작업을 위한 핵산 단편 조립 프로그램의 개발)

  • 이동훈
    • Korean Journal of Microbiology
    • /
    • v.35 no.2
    • /
    • pp.121-127
    • /
    • 1999
  • An effective computer program for assembling fragments in DNA sequencing has been developed. The program, called SeqEditor (Sequence Editor), is usable on the pcrsonal computer systems of MS-Widows which is the mosl popular operating system in Korea. It c'm recd several sequence file formats such as GenBak, FASTA, and ASCII. In the SeqEditor program, a dynamic programming algorihm is applied to compute the maximalscoring overlapping alignment between each pjlr of fragments. A novel feature of the program is that SeqEdilor implemnents interaclive operation with a graphical user interface. The performance lests of the prograln 011 fragmen1 data from 16s and 18s rDNA sequencing pi-ojects produced saiisIactory results. This program may be useful to a person who has work of time with large-scale DNA sequencing projects.

  • PDF

Protein Structure Alignment Based on Maximum of Residue Pair Distance and Similarity Graph (정렬된 잔기 사이의 최대거리와 유사도 그래프에 기반한 단백질 구조 정렬)

  • Kim, Woo-Cheol;Park, Sang-Hyun;Won, Jung-Im
    • Journal of KIISE:Databases
    • /
    • v.34 no.5
    • /
    • pp.396-408
    • /
    • 2007
  • After the Human Genome Project finished the sequencing of a human DNA sequence, the concerns on protein functions are increasing. Since the structures of proteins are conserved in divergent evolution, their functions are determined by their structures rather than by their amino acid sequences. Therefore, if similarities between two protein structures are observed, we could expect them to have common biological functions. So far, a lot of researches on protein structure alignment have been performed. However, most of them use RMSD(Root Mean Square Deviation) as a similarity measure with which it is hard to judge the similarity level of two protein structures intuitively. In addition, they retrieve only one result having the highest alignment score with which it is hard to satisfy various users of different purpose. To overcome these limitations, we propose a novel protein structure alignment algorithm based on MRPD(Maximum of Residue Pair Distance) and SG (Similarity Graph). MRPD is more intuitive similarity measure by which fast tittering of unpromising pairs of protein pairs is possible, and SG is a compact representation method for multiple alignment results with which users can choose the most plausible one among various users' needs by providing multiple alignment results without compromising the time to align protein structures.

Development of an X-window Program, XFAP, for Assembling Contigs from DNA Fragment Data (DNA 염기 서열로부터 contig 구성을 위한 프로그램 XFAP의 개발)

  • Lee, Byung-Uk;Park, Kie-Jung;Kim, Seung-Moak
    • Korean Journal of Microbiology
    • /
    • v.34 no.1_2
    • /
    • pp.58-63
    • /
    • 1998
  • Fragment assembly problem is to reconstruct DNA sequence contigs from a collection of fragment sequences. We have developed an efficient X-window program, XFAP, for assembling DNA fragments. In the XFAP, the dimer frequency comparison method is used to quickly eliminate pairs of fragments that can not overlap. This method takes advantage of the difference of dimer frequencies within the minimum acceptable overlap length in each fragment pair. Hirschberg algorithm is applied to compute the maximal-scoring overlapping alignment in linear space. The perfomance of XFAP was tested on a set of DNA fragment sequences extracted from long DNA sequences of GenBank by a fragmentation program and showed a great improvement in execution time, especially as the number of fragments increases.

  • PDF

Bio Grid Computing and Biosciences Research Application (바이오그리드 컴퓨팅과 생명과학 연구에의 활용)

  • Kim, Tae-Ho;Kim, Eui-Yong;Youm, Jae-Boum;Kho, Weon-Gyu;Gwak, Heui-Chul;Joo, Hyun
    • Bioinformatics and Biosystems
    • /
    • v.2 no.2
    • /
    • pp.37-45
    • /
    • 2007
  • 생물정보학은 컴퓨터를 이용하여 방대한 양의 생물학적 데이터를 처리하고 그 결과를 분석하는 학문으로서 IT의 고속성장과 맞물려 점차 그 활용도를 넓혀가고 있다. 특히 의학, 생명과학 연구에 사용되는 데이터는 그 종류도 다양하고 크기가 매우 큰 것이 일반적인데, 이의 처리를 위해서는 고속 네트워크가 바탕이 된 그리드-컴퓨팅(Grid-Computing) 기술 접목이 필연적이다. 고속 네트워크 기술의 발전은 슈퍼컴퓨터를 대체해 컴퓨터 풀 내에 분산된 시스템들을 하나로 묶을 수 있는 그리드-컴퓨팅 분야를 선도하고 있다. 최근 생물정보학 분야에서도 이처럼 발전된 고성능 분산 컴퓨팅 기술을 이용하여 데이터의 신속한 처리와 관리의 효율성을 증대시키고 있는 추세이다. 그리드-컴퓨팅 기술은 크게 데이터 가공을 위한 응용 프로그램 개발과 데이터 관리를 위한 데이터베이스 구축으로 구분 지을 수 있다. 전자에 해당하는 생물정보 연구용 프로그램들은 mpiBLAST, ClustalW-MPI와 같은 MSA서열정렬 프로그램들을 꼽을 수 있으며, BioSimGrid, Taverna와 같은 프로젝트는 그리드-데이터베이스 (Grid-Database)기술을 바탕으로 개발되었다. 본 고에서는 미지의 생명현상을 탐구하고 연구하기 위하여 현재까지 개발된 그리드-컴퓨팅 환경과 의생명과학 연구를 위한 응용 프로그램들, 그리고 그리드-데이터베이스 기술 등을 소개한다.

  • PDF

A Database Retrieval Model for Efficient Gene Sequence Alignment (효율적인 유전자 서열 비고를 위한 데이타베이스 검색 모델)

  • 김민준;임성화;김재훈;이원태;정진원
    • Journal of KIISE:Databases
    • /
    • v.31 no.3
    • /
    • pp.243-251
    • /
    • 2004
  • Most programs of bioinformatics provide biochemists and biologists retrieve and analysis services of gene and protein database. As these services retrieve database for each arrival of user's request, it takes a long time and increases server's load and response time. In this paper. by utilizing database retrieval patterns of sequence alignment programs in bioinformatics, grouping method is proposed to share database retrieval between many requests. Carpool method is also proposed to reduce response time as well as to increase system expandability by combining new arriving requests with the previous on going requests. The performance of our two proposed schemes is verified by mathematic analysis and simulation.

Web-based Research Assistant Tools for Analysis of Microbial Diversity (미생물 다양성 분석을 위한 웹기반의 생물정보도구 개발)

  • Kang, Byeong-Chul;Kim, Hyun-Jin;Park, Jun-Hyung;Park, Hee-Kyung;Kim, Cheol-Min
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.14 no.5
    • /
    • pp.545-550
    • /
    • 2004
  • The study of available genotypes (biodiversity analysis) in bacterial communities is of growing importance in several fields such as ecology, environmental technology, clinical diagnostics, etc. These culture-independent genotyping techniques, especially amplifying 16S rRNA genes, attempt to overcome some shortcomings of conventional cultivation method. Biodiversity analysis based on molecular technique were laborious for base-calling chromatogram, trimming primer sites, correcting strand directions, electing representative operation taxonomic units (OTU), etc. Also, biologists wanted intuitively to confirm results of the above processes. For making up these demands, we developed the web application based on Folder-Process-Filter (FPF) modeling with correspondence to classical Model-View-Controller model. The model of web application leads to keep virtues of simplicity and directness for development and management of the stepwise web interfaces. The web application was developed in Perl and CGI on Linux workstation. It can be freely accessed from http://home.pusan.ac.kr/~genome/tools/rat.htm.