• 제목/요약/키워드: Sequence Databases

검색결과 226건 처리시간 0.021초

탠덤 질량 분석을 위한 디코이 데이터베이스 생성 방법의 중복성 관점에서의 성능 평가 (Evaluation of the Redundancy in Decoy Database Generation for Tandem Mass Analysis)

  • 이홍란;류단휘;이기욱;황규백
    • 정보과학회 컴퓨팅의 실제 논문지
    • /
    • 제22권1호
    • /
    • pp.56-60
    • /
    • 2016
  • 탠덤 질량 분석에서는 신뢰도 높은 펩타이드 동정을 위해 목표 데이터베이스의 참조 단백질 순서를 재배치한 디코이 데이터베이스가 주로 이용된다. 한편 목표 데이터베이스와 디코이 데이터베이스 사이 혹은 디코이 데이터베이스 내부에 서열이 동일한 중복 펩타이드가 존재할 수 있으며, 이는 단백질 동정을 어렵게 하는 요인이 된다. 따라서 디코이 데이터베이스의 중복성을 최소화하는 것은 중요한 문제이다. 본 논문에서는 디코이 데이터베이스 생성에 널리 사용되는 의사셔플(pseudo-shuffling)과 의사역순(pseudo-reversing) 방법이 디코이 데이터베이스의 중복성에 미치는 영향을 조사하였다. 실험 결과, 목표 데이터베이스 크기와 데이터베이스 생성 시 허용되는 'missed cleavage site'의 최대 개수는 중복성을 증가시킴을 확인하였다. 또한 동일한 조건에서는 의사역순 방법이 의사셔플보다 항상 낮은 수준의 중복성을 가지는 디코이 데이터베이스를 생성하였다.

SFannotation: A Simple and Fast Protein Function Annotation System

  • Yu, Dong Su;Kim, Byung Kwon
    • Genomics & Informatics
    • /
    • 제12권2호
    • /
    • pp.76-78
    • /
    • 2014
  • Owing to the generation of vast amounts of sequencing data by using cost-effective, high-throughput sequencing technologies with improved computational approaches, many putative proteins have been discovered after assembly and structural annotation. Putative proteins are typically annotated using a functional annotation system that uses extant databases, but the expansive size of these databases often causes a bottleneck for rapid functional annotation. We developed SFannotation, a simple and fast functional annotation system that rapidly annotates putative proteins against four extant databases, Swiss-Prot, TIGRFAMs, Pfam, and the non-redundant sequence database, by using a best-hit approach with BLASTP and HMMSEARCH.

Bioinformatics in Fish: its Present Status and Perspectives with Particular Emphasis on Expressed Sequence Tags

  • Nam, Yoon-Kwon;Kim, Dong-Soo
    • 한국양식학회지
    • /
    • 제14권1호
    • /
    • pp.9-16
    • /
    • 2001
  • Characterization of a single pass of cDNA sequence, an expressed sequence tag (EST) has been a fast growing activity in fish genomics. Despite its relatively short history, fish EST databases (dbESTs) have already begun to play a significant role in bridging the gaps in our knowledge on the gene expression in fish genome. This review provides a brief description of the technology for establishing fish dbESTs, its current status, and implication of the ESTs to aquaculture and fisheries science with particular emphasis on the discovery of novel genes for transgenic application, the use of polymorphic EST markers in genetic linkage mapping and the evaluation of signal-responsive gene expression.

  • PDF

시퀀스 데이터베이스에서 유연 규칙의 탐사 (Elastic Rule Discovering in Sequence Databases)

  • 박상현;김상욱;김만순
    • 산업기술연구
    • /
    • 제21권A호
    • /
    • pp.147-153
    • /
    • 2001
  • This paper presents techniques for discovering rules with elastic patterns. Elastic patterns are useful for discovering rules from data sequences with different sampling rates. For fast discovery of rules whose heads and bodies are elastic patterns, we construct a suffix tree from succinct forms of data sequences. The suffix tree is a compact representation of rules, and is also used as an index structure for finding rules matched to a target head sequence. When matched rules cannot be found, the concept of rule relaxation is introduced. Using a cluster hierarchy and a relaxation error, we find the least relaxed rules that provide the most specific information on a target head sequence. Performance evaluation through extensive experiments reseals the effectiveness of the proposed approach.

  • PDF

효율적인 닫힌 빈발 시퀀스 마이닝 (An Efficient Mining for Closed Frequent Sequences)

  • 김형근;황환규
    • 산업기술연구
    • /
    • 제25권A호
    • /
    • pp.163-173
    • /
    • 2005
  • Recent sequential pattern mining algorithms mine all of the frequent sequences satisfying a minimum support threshold in a large database. However, when a frequent sequence becomes very long, such mining will generate an explosive number of frequent sequence, which is prohibitively expensive in time. In this paper, we proposed a novel sequential pattern algorithm using only closed frequent sequences which are small subset of very large frequent sequences. Our algorithm extends the sequence by depth-first search strategy with effective pruning. Using bitmap representation of underlying databases, we can obtain a closed frequent sequence considerably faster than the currently reported methods.

  • PDF

정규 거리에 기반한 시계열 데이터베이스의 유사 검색 기법 (Similarity Search in Time Series Databases based on the Normalized Distance)

  • 이상준;이석호
    • 한국정보과학회논문지:데이타베이스
    • /
    • 제31권1호
    • /
    • pp.23-29
    • /
    • 2004
  • 본 논문에서는 정규 거리에 기반 한 유사 시퀀스의 검색 기법을 제안한다. 시퀀스의 형태가 중요한 관심 사항인 응용에서 정규 거리는 단순한 Lp 거리에 비해 적합한 유사도라 할 수 있다. 이러한 정규 거리에 기반 한 질의를 처리하기 위한 기존의 기법들은 시퀀스의 평균을 구한 후 이를 이용하여 시퀀스를 수직 이동하는 전처리 과정을 가지고 있다. 제안된 기법은 시퀀스의 인접한 두 요소들 간의 변이가 정규화 과정에 불변이라는 속성을 이용하여 수직 이동의 전처리 과정 없이 특징 벡터를 추출한 후 이를 R-tree와 같은 공간 접근 기법을 이용하여 인덱싱한다. 제안된 기법은 비슷한 형태의 시퀀스를 검색할 수 있으며 착오 누락이 얼음을 보장한다. 실제 주식 데이타를 이용한 실험을 통해 제안된 기법의 성능을 확인하였다.

An Efficient Approach to Mining Maximal Contiguous Frequent Patterns from Large DNA Sequence Databases

  • Karim, Md. Rezaul;Rashid, Md. Mamunur;Jeong, Byeong-Soo;Choi, Ho-Jin
    • Genomics & Informatics
    • /
    • 제10권1호
    • /
    • pp.51-57
    • /
    • 2012
  • Mining interesting patterns from DNA sequences is one of the most challenging tasks in bioinformatics and computational biology. Maximal contiguous frequent patterns are preferable for expressing the function and structure of DNA sequences and hence can capture the common data characteristics among related sequences. Biologists are interested in finding frequent orderly arrangements of motifs that are responsible for similar expression of a group of genes. In order to reduce mining time and complexity, however, most existing sequence mining algorithms either focus on finding short DNA sequences or require explicit specification of sequence lengths in advance. The challenge is to find longer sequences without specifying sequence lengths in advance. In this paper, we propose an efficient approach to mining maximal contiguous frequent patterns from large DNA sequence datasets. The experimental results show that our proposed approach is memory-efficient and mines maximal contiguous frequent patterns within a reasonable time.

Linkage of the Kanamycin Resistance Gene with the Streptothricin Resistance Gene in Staphylococcus aureus SA2

  • Shin, Chul Kyo;Sung Hwan Im;Woo Koo Kim;Kyung Bo Moon
    • Journal of Microbiology and Biotechnology
    • /
    • 제6권3호
    • /
    • pp.219-220
    • /
    • 1996
  • The pKH2 isolated from the multidrug-resistant Staphylococcus aureus SA2 is a 40.98-kb plasmid and mediates resistance to ampicillin, clindamycin, erythromycin, kanamycin, and streptomycin. The 3.4-kb HindIII fragment conferring kanamycin resistance was cloned from the pKH2 into pBluescriptII $KS^+$ and partial sequence determination of that fragment was carried out. Sequence analysis revealed that the kanamycin resistance gene which encoded aminoglycoside 3'-phosphotransferase was linked to the streptothricin resistance gene. But a nonsense mutation was found in the streptothricin resistance gene and this mutation resulted in a truncated protein of streptothricin acetyltransferase. Homology comparison with nucleotide sequence databases revealed that the 3.4-kb HindIII fragment of pKH2 had been derived not from S. aureus but from Gram-negative Campylobacter coli.

  • PDF

In Silico Functional Assessment of Sequence Variations: Predicting Phenotypic Functions of Novel Variations

  • Won, Hong-Hee;Kim, Jong-Won
    • Genomics & Informatics
    • /
    • 제6권4호
    • /
    • pp.166-172
    • /
    • 2008
  • A multitude of protein-coding sequence variations (CVs) in the human genome have been revealed as a result of major initiatives, including the Human Variome Project, the 1000 Genomes Project, and the International Cancer Genome Consortium. This naturally has led to debate over how to accurately assess the functional consequences of CVs, because predicting the functional effects of CVs and their relevance to disease phenotypes is becoming increasingly important. This article surveys and compares variation databases and in silico prediction programs that assess the effects of CVs on protein function. We also introduce a combinatorial approach that uses machine learning algorithms to improve prediction performance.

Tissue-specific Expressed Sequence Tags from the Olive flounder, Paralichthys olivaceus

  • Kim, Young-Ok;Lee, Jeong-Ho;Kim, Kyung-Kil;Lee, Jong-yun
    • 한국어업기술학회:학술대회논문집
    • /
    • 한국어업기술학회 2002년도 추계 수산관련학회 공동학술대회발표요지집
    • /
    • pp.181-182
    • /
    • 2002
  • Expressed sequence tags (ESTs) are generated by single-pass DNA sequencing of clones obtained from cDNA libraries and are powerful tool in the genetic characterization of organisms, owing in large part to the speed and affordability of generating these sequences. Comparison of sequences obtained with those available in public sequence databases allows putative identification of many genes. (omitted)

  • PDF