• Title/Summary/Keyword: 서열

Search Result 3,685, Processing Time 0.028 seconds

Prediction of Protein Secondary Structure Using the Weighted Combination of Homology Information of Protein Sequences (단백질 서열의 상동 관계를 가중 조합한 단백질 이차 구조 예측)

  • Chi, Sang-mun
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.20 no.9
    • /
    • pp.1816-1821
    • /
    • 2016
  • Protein secondary structure is important for the study of protein evolution, structure and function of proteins which play crucial roles in most of biological processes. This paper try to effectively extract protein secondary structure information from the large protein structure database in order to predict the protein secondary structure of a query protein sequence. To find more remote homologous sequences of a query sequence in the protein database, we used PSI-BLAST which can perform gapped iterative searches and use profiles consisting of homologous protein sequences of a query protein. The secondary structures of the homologous sequences are weighed combined to the secondary structure prediction according to their relative degree of similarity to the query sequence. When homologous sequences with a neural network predictor were used, the accuracies were higher than those of current state-of-art techniques, achieving a Q3 accuracy of 92.28% and a Q8 accuracy of 88.79%.

Next Generation Sequencing and Bioinformatics (차세대 염기서열 분석기법과 생물정보학)

  • Kim, Ki-Bong
    • Journal of Life Science
    • /
    • v.25 no.3
    • /
    • pp.357-367
    • /
    • 2015
  • With the ongoing development of next-generation sequencing (NGS) platforms and advancements in the latest bioinformatics tools at an unprecedented pace, the ultimate goal of sequencing the human genome for less than $1,000 can be feasible in the near future. The rapid technological advances in NGS have brought about increasing demands for statistical methods and bioinformatics tools for the analysis and management of NGS data. Even in the early stages of the commercial availability of NGS platforms, a large number of applications or tools already existed for analyzing, interpreting, and visualizing NGS data. However, the availability of this plethora of NGS data presents a significant challenge for storage, analyses, and data management. Intrinsically, the analysis of NGS data includes the alignment of sequence reads to a reference, base-calling, and/or polymorphism detection, de novo assembly from paired or unpaired reads, structural variant detection, and genome browsing. While the NGS technologies have allowed a massive increase in available raw sequence data, a number of new informatics challenges and difficulties must be addressed to improve the current state and fulfill the promise of genome research. This review aims to provide an overview of major NGS technologies and bioinformatics tools for NGS data analyses.

Development of Primer and Probe Design System for Microbial Identification (미생물 동정을 위한 프로브와 프라이머 고안 시스템의 개발)

  • Park, Jun-Hyung;Kang, Byeong-Chul;Park, Hee-Kyung;Jang, Hyun-Jung;Song, Eun-Sil;Lee, Seung-Won;Kim, Hyun-Jin;Kim, Cheol-Min
    • Proceedings of the Korean Society for Bioinformatics Conference
    • /
    • 2004.11a
    • /
    • pp.21-28
    • /
    • 2004
  • 모든 생명체의 genetic information에는 보존적 염기서열과 다형적 염기서열이 존재한다. 다형적 염기서열과 보존적 염기서열은 하나의 종(species)을 감별하거나, 여러 종류의 종을 동시에 감별할 수 있는 genotyping의 표지자로 각각 이용될 수 있다. 본 논문은 병원성 감염질환 세균, 식중독 유발 세균, 생물의약품 오염 유발 세균 및 환경오염 세균 등 세균의 존재 유무와 속과 종 감별을 위해 대부분 세균 종의 보존적 염기서열과 다형적인 염기서열을 포함하고 있는 23S rDNA 유전자의 표적 염기 서열로부터 고안된 세균 특이적(bacterial-specific), 속 특이적(genus-specific), 종 특이적(species-specific) 올리고 뉴클레오티드프로브와 프라이머를 디자인하는 시스템을 소개한다. 시스템을 통해서 얻어진 프로브와 프라이머들은 PCR을 통한 검증단계를 거쳐서 디자인 결과의 정확성을 확인하였다. 본 시스템의 이용으로 프로브와 프라이머를 디자인하는데 몇 주가 소요되는 시간을 몇 일 내로 줄일 수 있었으며, 체계적인 데이터의 관리로 결과의 정확성을 높일 수 있었다.

  • PDF

A DNA Sequence Search Algorithm Using Integer Type Transformation (정수형 변환을 이용한 DNA 서열 검색 알고리즘)

  • Yoon, Kyong-Oh;Cho, Sung-Bae
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2012.06b
    • /
    • pp.357-359
    • /
    • 2012
  • 초 고성능 바이오 서열 분석 장비 기술의 발달로 대량의 바이오 정보가 쏟아져 나오고 있으며, 바이오산업의 발달로 개인별 유전체 정보에 의한 맞춤의학의 시대가 도래되고 있다. 수많은 서열에 대한 분석에는 많은 저장장치 및 주기억장치가 필요하므로 슈퍼컴퓨터 급의 서버와 대량의 데이터를 빠르게 처리할 수 있는 프로그램이 필요하다. 이러한 분석에는 염기서열 일치 검색과 이를 기반으로 하는 Alignment와 Assembly 분석이 있으며, 이를 수행하는 기존의 알고리즘 및 대부분의 프로그램들은 염기서열을 문자열로 취급하고, 해쉬 인덱스 테이블, Brujin 그래프의 사용, 버러우즈 휠러 변환(BWT) 등의 기법을 활용하여 효율적인 분석을 도모하였다. 본 논문에서는 염기서열을 문자열이 아닌 k-mer 묶음의 정수형 하나로 변환하여 검색함으로써 저장 공간의 크기를 약 28% 이상으로 줄이고 형 변환 상태에서의 검색을 수행할 수 있는 알고리즘을 제안한다. Assembly 분석 프로그램인 CalcGen 프로그램을 개발하여 본 알고리즘의 효용성 및 효율성을 실험을 통해 검증하였다. 이 연구의 결과는 향후 대량의 유전체 염기서열의 효율적 분석과 저장 및 처리에 또 하나의 새로운 접근 방법을 제안하는데에 그 의미를 둘 수 있다.

A Comparison Study for Ordination Methods in Ecology (생태학의 통계적 서열화 방법 비교에 관한 연구)

  • Ko, Hyeon-Seok;Jhun, Myoungshic;Jeong, Hyeong Chul
    • The Korean Journal of Applied Statistics
    • /
    • v.28 no.1
    • /
    • pp.49-60
    • /
    • 2015
  • Various kinds of ordination methods such as correspondence analysis and canonical correspondence analysis are used in community ecology to visualize relationships among species, sites, and environmental variables. Ter Braak (1986), Jackson and Somers (1991), Parmer (1993), compared the ordination methods using eigenvalue and distance graph. However, these methods did not show the relationship between population and biplot because they are only based on surveyed data. In this paper, a method that measures the extent to show population information to biplot was introduced to compare ordination methods objectively.

SeqWeB: Sequence Annotation System based on SOA (SeqWeB: SOA 기반의 서열 주해 시스템)

  • Nam, Seong-Hyeuk;Jung, Tae-Sung;Kim, Tae-Kyung;Yoo, Jae-Soo;Cho, Wan-Sup
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2007.10b
    • /
    • pp.1-6
    • /
    • 2007
  • 서열을 분석하고, 기능을 예측하는 서열 주해는 생명 현상 규명을 위한 필수 과정이다. 서열 주해는 다수 응용 프로그램간 상호 연계를 통한 복잡한 처리 과정을 거쳐 이루어진다. 현재 사용자는 다양한 응용 프로그램들 중 적합한 응용 프로그램을 선택한 후, 운영환경에 맞도록 설치하고, 사용법을 익혀야 한다. 또한 각 프로그램들의 연계를 위해 입출력 데이터 형식을 변환해야 하는 불편함이 있다. 이를 위해 자동화된 솔루션들이 개발되고 있지만, 각 단계별 프로그램들이 강결합(tightly coupled)되어 있어 유연성(flexibility)이 떨어지고, 기능의 확장 및 변경에 어려움이 있다. 본 논문에서는 기존 시스템들의 한계를 극복하기 위하여 SOA (Service Oriented Architecture) 기반의 서열 주해 시스템인 SeqWeB을 제안한다. SeqWeB은 서열 주해에 필요한 7개의 응용 프로그램(Phred, cross_match, RepeatMasker, ICAtools, Phrap, CAP3, Blast)들을 웹 서비스 기술을 통해 단위 서비스로 개발하고, BPM 기법을 이용하여 통합하였다. SeqWeB은 각 응용 프로그램간 상호 운용성을 높이기 위하여 XML 형식의 입/출력 데이터를 사용하며, SOA 기반의 시스템 통합으로 각 응용 프로그램들을 약결합(loosely coupled)하여 시스템의 확장 및 변경이 용이하다. 또한 웹을 기반으로 하는 다양한 조합의 서열 주해 솔루션 제공이 가능한 특징이 있다.

  • PDF

Suffix Tree Constructing Algorithm for Large DNA Sequences Analysis (대용량 DNA서열 처리를 위한 서픽스 트리 생성 알고리즘의 개발)

  • Choi, Hae-Won
    • Journal of Korea Society of Industrial Information Systems
    • /
    • v.15 no.1
    • /
    • pp.37-46
    • /
    • 2010
  • A Suffix Tree is an efficient data structure that exposes the internal structure of a string and allows efficient solutions to a wide range of complex string problems, in particular, in the area of computational biology. However, as the biological information explodes, it is impossible to construct the suffix trees in main memory. We should find an efficient technique to construct the trees in a secondary storage. In this paper, we present a method for constructing a suffix tree in a disk for large set of DNA strings using new index scheme. We also show a typical application example with a suffix tree in the disk.

Differences between Species Based on Multiple Sequence Alignment Analysis (다중서열정렬에 기반한 종의 차이)

  • Hyeok-Zu Kwon;Sang-Jin Kim;Geun-Mu Kim
    • The Journal of the Korea institute of electronic communication sciences
    • /
    • v.19 no.2
    • /
    • pp.467-472
    • /
    • 2024
  • Multiple sequence alignment (MSA) is a method of collecting and aligning multiple protein sequences or nucleic acid sequences that perform the same function in various organisms at once. clustalW, a representative multiple sequence alignment algorithm using BioPython, compares the degree of alignment by column position. In addition, a web logo and phylogenetic tree are created to visualize conserved sequences in order to improve understanding. An example was given to confirm the differences between humans and other species, and applications of BioPython are presented.

Direct detection of hemophilia B F9 gene mutation using multiplex PCR and conformation sensitive gel electrophoresis (Multiplex PCR과 Conformation Sensitive Gel Electrophoresis를 이용한 혈우병B F9 유전자 돌연변이 직접 진단법)

  • Yoo, Ki Young;Kim, Hee Jin;Lee, Kwang Chul
    • Clinical and Experimental Pediatrics
    • /
    • v.53 no.3
    • /
    • pp.397-407
    • /
    • 2010
  • Purpose : The F9 gene is known to be the causative gene for hemophilia B, but unfortunately the detection rate for restriction fragment length polymorphism-based linkage analysis is only 55.6%. Direct DNA sequencing can detect 98% of mutations, but this alternative procedure is very costly. Here, we conducted multiplex polymerase chain reactions (PCRs) and conformation sensitive gel electrophoresis (CSGE) to perform a screened DNA sequencing for the F9 gene, and we compared the results with direct sequencing in terms of accuracy, cost, simplicity, and time consumption. Methods : A total of 27 unrelated hemophilia B patients were enrolled. Direct DNA sequencing was performed for 27 patients by a separate institute, and multiplex PCR-CSGE screened sequencing was done in our laboratory. Results of the direct DNA sequencing were used as a reference, to which the results of the multiplex PCR-CSGE screened sequencing were compared. For the patients whose mutation was not detected by the 2 methods, multiplex ligation-dependent probe amplification (MLPA) was conducted. Results : With direct sequencing, the mutations could be identified from 26 patients (96.3%), whereas for multiplex PCRCSGE screened sequencing, the mutations could be detected in 23 (85.2%). One patient's mutation was identified by MLPA. A total of 21 different mutations were found among the 27 patients. Conclusion : Multiplex PCR-CSGE screened DNA sequencing detected 88.9% of mutations and reduced costs by 55.7% compared with direct DNA sequencing. However, it was more labor-intensive and time-consuming.

A DNA Sequence Alignment Algorithm Using Quality Information and a Fuzzy Inference Method (품질 정보와 퍼지 추론 기법을 이용한 DNA 염기 서열 배치 알고리즘)

  • Kim, Kwang-Baek
    • Journal of Intelligence and Information Systems
    • /
    • v.13 no.2
    • /
    • pp.55-68
    • /
    • 2007
  • DNA sequence alignment algorithms in computational molecular biology have been improved by diverse methods. In this paper, we proposed a DNA sequence alignment algorithm utilizing quality information and a fuzzy inference method utilizing characteristics of DNA sequence fragments and a fuzzy logic system in order to improve conventional DNA sequence alignment methods using DNA sequence quality information. In conventional algorithms, DNA sequence alignment scores were calculated by the global sequence alignment algorithm proposed by Needleman-Wunsch applying quality information of each DNA fragment. However, there may be errors in the process for calculating DNA sequence alignment scores in case of low quality of DNA fragment tips, because overall DNA sequence quality information are used. In the proposed method, exact DNA sequence alignment can be achieved in spite of low quality of DNA fragment tips by improvement of conventional algorithms using quality information. And also, mapping score parameters used to calculate DNA sequence alignment scores, are dynamically adjusted by the fuzzy logic system utilizing lengths of DNA fragments and frequencies of low quality DNA bases in the fragments. From the experiments by applying real genome data of NCBI (National Center for Biotechnology Information), we could see that the proposed method was more efficient than conventional algorithms using quality information in DNA sequence alignment.

  • PDF