• Title/Summary/Keyword: Sequence database

Search Result 567, Processing Time 0.024 seconds

Evaluation of the Redundancy in Decoy Database Generation for Tandem Mass Analysis (탠덤 질량 분석을 위한 디코이 데이터베이스 생성 방법의 중복성 관점에서의 성능 평가)

  • Li, Honglan;Liu, Duanhui;Lee, Kiwook;Hwang, Kyu-Baek
    • KIISE Transactions on Computing Practices
    • /
    • v.22 no.1
    • /
    • pp.56-60
    • /
    • 2016
  • Peptide identification in tandem mass spectrometry is usually done by searching the spectra against target databases consisting of reference protein sequences. To control false discovery rates for high-confidence peptide identification, spectra are also searched against decoy databases constructed by permuting reference protein sequences. In this case, a peptide of the same sequence could be included in both the target and the decoy databases or multiple entries of a same peptide could exist in the decoy database. These phenomena make the protein identification problem complicated. Thus, it is important to minimize the number of such redundant peptides for accurate protein identification. In this regard, we examined two popular methods for decoy database generation: 'pseudo-shuffling' and 'pseudo-reversing'. We experimented with target databases of varying sizes and investigated the effect of the maximum number of missed cleavage sites allowed in a peptide (MC), which is one of the parameters for target and decoy database generation. In our experiments, the level of redundancy in decoy databases was proportional to the target database size and the value of MC, due to the increase in the number of short peptides (7 to 10 AA). Moreover, 'pseudo-reversing' always generated decoy databases with lower levels of redundancy compared to 'pseudo-shuffling'.

Implementation of Prototype for a Protein Motif Prediction and Update (단백질 모티프 예측 및 갱신 프로토 타입 구현)

  • Noh, Gi-Young;Kim, Wuon-Shik;Lee, Bum-Ju;Lee, Sang-Tae;Ryu, Keun-Ho
    • The KIPS Transactions:PartD
    • /
    • v.11D no.4
    • /
    • pp.845-854
    • /
    • 2004
  • Motif databases are used in the function and structure prediction of proteins. The frequency of use about these databases increases continuously because of protein sequence data growth. Recently, many researches about motif resource integration are proceeding. However, existing motif databases were developed independently, thus these databases have a heterogeneous search result problem. Database intnegration for this problem resolution has a periodic update problem, a complex query process problem, a duplicate database entry handling problem and BML support problem. Therefore, in this paper, we suppose a database resource integration method for these problem resolution, describe periodically integrated database update method and XML transformation. finally, we estimate the implementation of our prototype and a case database.

Promoter Prediction using Genetic Algorithm (유전자 알고리즘을 이용한 Promoter 예측)

  • 오민경;김창훈;김기봉;공은배;김승목
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 1999.10b
    • /
    • pp.12-14
    • /
    • 1999
  • Promoter는 transcript start site 앞부분에 위치하여 RNA polymerase가 높은 친화성을 보이며 바인당하는 DNA상의 특별한 부위로서 여기서부터 DNA transcription이 시작된다. function이나 tissue-specific gene들의 그룹별로 그 promoter들의 특이한 패턴들의 조합을 발견함으로써 Specific한 transcription을 조절하는 것으로 알려져 있어 promoter로 인한 그 gene의 정보를 어느 정도 알 수가 있다. 사람의 housekeeping gene promoter들을 EPD(eukaryotic promoter database)와 EMBL nucleic acid sequence database로부터 수집하여 이것들 간에 의미 있게 나타나는 모든 패턴들을 optimization algorithm으로 알려진 genetic algorithm을 이용해서 찾아보았다.

  • PDF

A design of a prototype system for automatic robot programming (로보트 자동 프로그래밍을 위한 원형 시스템의 설계)

  • 조혜경;고명삼;이범희
    • 제어로봇시스템학회:학술대회논문집
    • /
    • 1988.10a
    • /
    • pp.501-506
    • /
    • 1988
  • This paper describes an experimental system for automatic robot programming, The SNU-ARPS (Seoul National University Automatic Robot Programming System). The SNU-ARPS generates executable robot programs for pick and place operation and some simple mechanical assembly tasks by menudriven dialog. It is intended to enable the user to concentrate on the overall operation sequence instead of the knowledge regarding the details of robot languages. To convert task specifications into manipulator motions, the SNU-ARPS uses an internal representation of the world. This representation initially consists of geometric database from CAD system and is updated at each operation step to reflect the state changes of the world.

  • PDF

Oligomer Probe Sequence Design System in DNA Chips for Mutation Detection

  • Lee, Kyu-Sang
    • Proceedings of the Korean Society for Bioinformatics Conference
    • /
    • 2001.10a
    • /
    • pp.87-96
    • /
    • 2001
  • 삼성종합기술원에서는 인간의 genomic DNA의 이상을 발견하여 이와 연관된 질병을 진단하는 DNA chip을 개발하고 있다. 이를 위하여 특정한 염기서열의 변화에 따라 민감하게 hybridization strength가 변화하는 oligomer를 선택해야 한다. 따라서, specificity가 가장 큰 probe를 골라내야 한다. 여기에는 열역학적인 고려와 여러가지 물리화학적인 approximation이 사용되며, DNA chip 생산 공정에 의존하는 요소도 포함되어 있다 모든 생산용 data와 결과의 분석은 database를 기반으로 이루어지며, 자동화된 통계적 분석법과 최적화 방법이 함께 사용된다.

  • PDF

BioCovi: A Visualization Service for Comparative Genomics Analysis

  • Lee, Jungsul;Park, Daeui;Bhak, Jong
    • Genomics & Informatics
    • /
    • v.3 no.2
    • /
    • pp.52-54
    • /
    • 2005
  • Visualization of the homology information is an important method to analyze the evolutionary and functional meanings of genes. With a database containing model genomes of Homo sapiens, Mus muculus, and Rattus norvegicus, we constructed a web­based comparative analysis tool, BioCovi, to visualize the homology information of mammalian sequences on a very large scale. The user interface has several features: it marks regions whose identity is greater than that specified, it shows or hides gaps from the result of global sequence alignment, and it inverts the graph when total identity is higher than the threshold specified.

An Adaptive Audio Watermarking using Frequency Masking and Wavelet Transform (Frequency masking과 Wavelet 변환을 이용한 적응형 오디오 워터마킹)

  • 이동인;김순곤
    • Proceedings of the Korea Database Society Conference
    • /
    • 2000.11a
    • /
    • pp.358-363
    • /
    • 2000
  • 본 논문에서는 디지털오디오 원시 데이터의 양에 따라 적당한 양의 오디오워터마크를 생성, 삽입하여 일정한 수준의 오디오데이터의 품질을 유지하도록 하는 적응적 워터마킹을 제안한다. 제안하는 알고리즘은 심리음향모델인 frequency masking과 Wavelet 변환의 개념을 적용한다. 저작권자 혹은 소유자의 데이터는 PN-sequence를 이용하여 생성된다. 워터마크 생성량의 조절은 특정한 모듈이 담당하게 되는데 이 모듈은 원시 데이터의 크기에 따라 워터마크의 적당한 양을 산출하여 오디오데이터의 품질을 유지하도록 한다.

  • PDF

Evaluation of horticultural traits and genetic relationship in melon germplasm (멜론 유전자원의 원예형질 특성 및 유연관계 분석)

  • Jung, Jaemin;Choi, Sunghwan;Oh, Juyeol;Kim, Nahui;Kim, Daeun;Son, Beunggu;Park, Younghoon
    • Journal of Plant Biotechnology
    • /
    • v.42 no.4
    • /
    • pp.401-408
    • /
    • 2015
  • Horticultural traits and genetic relationship were evaluated for 83 melon (Cucumis melo L.) cultivars. Survey of a total of 36 characteristics for seedling, leaf, stem, flower, fruit, and seed and subsequent multiple analysis of variance (MANOVA) were conducted. Principal component analysis (PCA) showed that 8 principle components including fruit weight, fruit length, fruit diameter, cotyledon length, seed diameter, and seed length accounted for 76.3% of the total variance. Cluster analysis of the 83 melon cultivars using average linkage method resulted in 5 clusters at coefficient of 0.7. Cluster I consisted of cultivars with high values for fruit-related traits, Cluster II for soluble solid content, and Cluster V for high ripening rate. Genotyping of the 83 cultivars was conducted using 15 expressed-sequence tagged-simple sequence repeat (EST-SSR) from the Cucurbit Genomics Initiative (ICuGI) database. Analysis of genetic relatedness by UPGMA resulted in 6 clusters. Mantel test indicated that correlation between morphological and genetic distance was very low (r = -0.11).

An Efficient Subsequence Matching Method Based on Index Interpolation (인덱스 보간법에 기반한 효율적인 서브시퀀스 매칭 기법)

  • Loh Woong-Kee;Kim Sang-Wook
    • The KIPS Transactions:PartD
    • /
    • v.12D no.3 s.99
    • /
    • pp.345-354
    • /
    • 2005
  • Subsequence matching is one of the most important operations in the field of data mining. The existing subsequence matching algorithms use only one index, and their performance gets worse as the difference between the length of a query sequence and the site of windows, which are subsequences of a same length extracted from data sequences to construct the index, increases. In this paper, we propose a new subsequence matching method based on index interpolation to overcome such a problem. An index interpolation method constructs two or more indexes, and performs search ing by selecting the most appropriate index among them according to the given query sequence length. In this paper, we first examine the performance trend with the difference between the query sequence length and the window size through preliminary experiments, and formulate a search cost model that reflects the distribution of query sequence lengths in the view point of the physical database design. Next, we propose a new subsequence matching method based on the index interpolation to improve search performance. We also present an algorithm based on the search cost formula mentioned above to construct optimal indexes to get better search performance. Finally, we verify the superiority of the proposed method through a series of experiments using real and synthesized data sets.

A DNA Index Structure using Frequency and Position Information of Genetic Alphabet (염기문자의 빈도와 위치정보를 이용한 DNA 인덱스구조)

  • Kim Woo-Cheol;Park Sang-Hyun;Won Jung-Im;Kim Sang-Wook;Yoon Jee-Hee
    • Journal of KIISE:Databases
    • /
    • v.32 no.3
    • /
    • pp.263-275
    • /
    • 2005
  • In a large DNA database, indexing techniques are widely used for rapid approximate sequence searching. However, most indexing techniques require a space larger than original databases, and also suffer from difficulties in seamless integration with DBMS. In this paper, we suggest a space-efficient and disk-based indexing and query processing algorithm for approximate DNA sequence searching, specially exact match queries, wildcard match queries, and k-mismatch queries. Our indexing method places a sliding window at every possible location of a DNA sequence and extracts its signature by considering the occurrence frequency of each nucleotide. It then stores a set of signatures using a multi-dimensional index, such as R*-tree. Especially, by assigning a weight to each position of a window, it prevents signatures from being concentrated around a few spots in index space. Our query processing algorithm converts a query sequence into a multi-dimensional rectangle and searches the index for the signatures overlapped with the rectangle. The experiments with real biological data sets revealed that the proposed method is at least three times, twice, and several orders of magnitude faster than the suffix-tree-based method in exact match, wildcard match, and k- mismatch, respectively.