• Title/Summary/Keyword: sequence database

Search Result 566, Processing Time 0.027 seconds

TFSCAN 검색 프로그램 TFSCAN의 개발

  • Lee, Byung-Uk;Park, Kie-Jung;Kim, Ki-Bong;Park, Wan;Park, Yong-Ha
    • Microbiology and Biotechnology Letters
    • /
    • v.24 no.3
    • /
    • pp.371-375
    • /
    • 1996
  • TFD is a transcription factor database which consists of short functional DNA sequences called as signals and their references. SIGNAL SCAN, developed by Dan S. Prestridge, is used to determine what signals of TFD may exist in a DNA sequence. This program searches TFD database by using a simple algorithm for character string comparison. We developed TFSCAN that aims at searching for signals in an input DNA sequence more efficently than SIGNAL SCAN. Our algorithms consist of two parts, one constructs an automata by scanning sequences of rFD, the other searches for signals through this automata. Searching for signal-related references is radically improved in time by using an indexing method. Usage of TFSCAN is very simple and its output is obvious. We developed and installed a TFSCAN input form and a CGI program in GINet Web server, to use TFSCAN. The algorithm applying automata showed drastical results in improvement of computing time. This approach may apply to recognizing several biological patterns. We have been developing our algorithm to optimize the automata and to search more sensitively for signals.

  • PDF

Performance Evaluation of Methods for Time-Series Subsequence Matching Under Time Warping (타임 워핑 하의 시계열 서브시퀀스 매칭 기법의 성능 평가)

  • 김만순;김상욱
    • Proceedings of the Korea Contents Association Conference
    • /
    • 2003.11a
    • /
    • pp.290-297
    • /
    • 2003
  • A time-series database is a set of data sequences, each of which is a list of changing values corresponding to an object. Subsequence matching under time warping is defined as an operation that finds such subsequences whose time warping distance to a given query sequence are below a tolerance from a time-series database. In this paper, we first point out the characteristics of the previous methods for time-series sequence matching under time warping, and then discuss the approaches for applying them to whole matching as well as subsequence matching. Also, we perform quantitative performance evaluation via a series of experiments with real-life data. There have not been such researches in the literature that compare the performances of all the previous methods of subsequence matching under time warping. Thus, our results would be used as a good reference for showing their relative performances.

  • PDF

Genomic Organization of Heat Shock Protein Genes of Silkworm Bombyx mori

  • Velu, Dhanikachalam;Ponnuvel, Kangayam M.;Qadri, Sayed M. Hussaini
    • International Journal of Industrial Entomology and Biomaterials
    • /
    • v.15 no.2
    • /
    • pp.123-130
    • /
    • 2007
  • The Hsp 20.8 and Hsp 90 cDNA sequence retrieved from NCBI database and consists of 764 bp and 2582 bp lengths respectively. The corresponding cDNA homologus sequences were BLAST searched in Bombyx mori genomic DNA database and two genomic contigs viz., BAAB01120347 and AADK01011786 showed maximum homology. In B. mori Hsp 20.8 and Hsp 90 is encoded by single gene without intron. Specific primers were used to amplify the Hsp 20.8 gene and Hsp 90 variable region from genomic DNA by using the PCR. Obtained products were 216 bp in Hsp 20.8 and 437 bp in Hsp 90. There was no variation found in the six silkworm races PCR products size of contrasting response to thermal tolerance. The comparison of the sequenced nucleotide variations through multiple sequence alignment analysis of Hsp 90 variable region products of three races not showed any differences respect to their thermotolerance and formed the clusters among the voltinism. The comparison of aminoacid sequences of B. mori Hsps with dipteran and other insect taxa revealed high percentage of identity growing with phylogenetic relatedness of species. The conserved domains of B. mori Hsps predicted, in which the Hsp 20.8 possesses ${\alpha}-crystallin$ domain and Hsp 90 holds HATPase and Hsp 90 domains.

A Subsequence Matching Technique that Supports Time Warping Efficiently (타임 워핑을 지원하는 효율적인 서브시퀀스 매칭 기법)

  • Park, Sang-Hyun;Kim, Sang-Wook;Cho, June-Suh;Lee, Hoen-Gil
    • Journal of Industrial Technology
    • /
    • v.21 no.A
    • /
    • pp.167-179
    • /
    • 2001
  • This paper discusses an index-based subsequence matching that supports time warping in large sequence databases. Time warping enables finding sequences with similar patterns even when they are of different lengths. In earlier work, we suggested an efficient method for whole matching under time warping. This method constructs a multidimensional index on a set of feature vectors, which are invariant to time warping, from data sequences. For filtering at feature space, it also applies a lower-bound function, which consistently underestimates the time warping distance as well as satisfies the triangular inequality. In this paper, we incorporate the prefix-querying approach based on sliding windows into the earlier approach. For indexing, we extract a feature vector from every subsequence inside a sliding window and construct a multi-dimensional index using a feature vector as indexing attributes. For query precessing, we perform a series of index searches using the feature vectors of qualifying query prefixes. Our approach provides effective and scalable subsequence matching even with a large volume of a database. We also prove that our approach does not incur false dismissal. To verily the superiority of our method, we perform extensive experiments. The results reseal that our method achieves significant speedup with real-world S&P 500 stock data and with very large synthetic data.

  • PDF

Gene Sequences Clustering for the Prediction of Functional Domain (기능 도메인 예측을 위한 유전자 서열 클러스터링)

  • Han Sang-Il;Lee Sung-Gun;Hou Bo-Kyeng;Byun Yoon-Sup;Hwang Kyu-Suk
    • Journal of Institute of Control, Robotics and Systems
    • /
    • v.12 no.10
    • /
    • pp.1044-1049
    • /
    • 2006
  • Multiple sequence alignment is a method to compare two or more DNA or protein sequences. Most of multiple sequence alignment tools rely on pairwise alignment and Smith-Waterman algorithm to generate an alignment hierarchy. Therefore, in the existing multiple alignment method as the number of sequences increases, the runtime increases exponentially. In order to remedy this problem, we adopted a parallel processing suffix tree algorithm that is able to search for common subsequences at one time without pairwise alignment. Also, the cross-matching subsequences triggering inexact-matching among the searched common subsequences might be produced. So, the cross-matching masking process was suggested in this paper. To identify the function of the clusters generated by suffix tree clustering, BLAST and CDD (Conserved Domain Database)search were combined with a clustering tool. Our clustering and annotating tool consists of constructing suffix tree, overlapping common subsequences, clustering gene sequences and annotating gene clusters by BLAST and CDD search. The system was successfully evaluated with 36 gene sequences in the pentose phosphate pathway, clustering 10 clusters, finding out representative common subsequences, and finally identifying functional domains by searching CDD database.

Synthesis of Expressive Talking Heads from Speech with Recurrent Neural Network (RNN을 이용한 Expressive Talking Head from Speech의 합성)

  • Sakurai, Ryuhei;Shimba, Taiki;Yamazoe, Hirotake;Lee, Joo-Ho
    • The Journal of Korea Robotics Society
    • /
    • v.13 no.1
    • /
    • pp.16-25
    • /
    • 2018
  • The talking head (TH) indicates an utterance face animation generated based on text and voice input. In this paper, we propose the generation method of TH with facial expression and intonation by speech input only. The problem of generating TH from speech can be regarded as a regression problem from the acoustic feature sequence to the facial code sequence which is a low dimensional vector representation that can efficiently encode and decode a face image. This regression was modeled by bidirectional RNN and trained by using SAVEE database of the front utterance face animation database as training data. The proposed method is able to generate TH with facial expression and intonation TH by using acoustic features such as MFCC, dynamic elements of MFCC, energy, and F0. According to the experiments, the configuration of the BLSTM layer of the first and second layers of bidirectional RNN was able to predict the face code best. For the evaluation, a questionnaire survey was conducted for 62 persons who watched TH animations, generated by the proposed method and the previous method. As a result, 77% of the respondents answered that the proposed method generated TH, which matches well with the speech.

Gene Reangement through 151 bp Repeated Sequence in Rice Chloroplast DNA (벼 엽록체 DNA내의 151 bp 반복염기서열에 의한 유전자 재배열)

  • Nahm, Baek-Hie;Kim, Han-Jip
    • Applied Biological Chemistry
    • /
    • v.36 no.3
    • /
    • pp.208-214
    • /
    • 1993
  • To investigate the gene rearrangement via short repeated sequences in chloroplast DNA, the pattern of heterologous gene clusters containing the 151 bp repeated sequence with the development of plastid was compared in rice and the homologous gene clusters from various plant sources were searched for comparative analysis. Southern blot analysis of rice DNA using rp12 gene containing 151 bp repeated sequence as a probe showed the presence of heterologous gene clusters. Such heterologous gene clusters varied with the development of plastid. Also it was observed that the heterologous gene clusters were observed in all of the rice cultivars used in this work. Finally the comparative analysis of DNA sequence of the homologous gene clusters from various plants showed the evolutionary gene rearragngement via short repeated sequence among plants. These results suggest the possible relationship between the plastid development and gene rearrangement through short repeated sequences.

  • PDF

Improvement of protein identification performance by reinterpreting the precursor ion mass tolerance of mass spectrum (질량스펙트럼의 펩타이드 분자량 오차범위 재해석에 의한 단백질 동정의 성능 향상)

  • Gwon, Gyeong-Hun;Kim, Jin-Yeong;Park, Geon-Uk;Lee, Jeong-Hwa;Baek, Yung-Gi;Yu, Jong-Sin
    • Bioinformatics and Biosystems
    • /
    • v.1 no.2
    • /
    • pp.109-114
    • /
    • 2006
  • In proteomics research, proteins are digested into peptides by an enzyme and in mass spectrometer, these peptides break into fragment ions to generate tandem mass spectra. The tandem mass spectral data obtained from the mass spectrometer consists of the molecular weights of the precursor ion and fragment ions. The precursor ion mass of tandem mass spectrum is the first value that is fetched to sort the candidate peptides in the database search. We look far the peptide sequences whose molecular weight matches with precursor ion mass of the mass spectrum. Then, we choose one peptide sequence that shows the best match with fragment ions information. The precursor ion mass of the tandem mass spectrum is compared with that of the digested peptides of protein database within the mass tolerance that is assigned by users according to the mass spectrometer accuracy. In this study, we used reversed sequence database method to analyze the molecular weight distribution of precursor ions of the tandem mass spectra obtained by the FT LTQ mass spectrometer for human plasma sample. By reinterpreting the precursor ion mass distribution, we could compute the experimental accuracy and we suggested a method to improve the protein identification performance.

  • PDF

System Development for Analysis and Compensation of Column Shortening of Reinforced Concrete Tell Buildings (철근콘크리트 고층건물 기둥의 부등축소량 해석 및 보정을 위한 시스템 개발)

  • 김선영;김진근;김원중
    • Journal of the Korea Concrete Institute
    • /
    • v.14 no.3
    • /
    • pp.291-298
    • /
    • 2002
  • Recently, construction of reinforced concrete tall buildings is widely increased according to the improvement of material quality and design technology. Therefore, differential shortenings of columns due to elastic, creep, and shrinkage have been an important issue. But it has been neglected to predict the Inelastic behavior of RC structures even though those deformations make a serious problem on the partition wall, external cladding, duct, etc. In this paper, analysis system for prediction and compensation of the differential column shortenings considering time-dependent deformations and construction sequence is developed using the objected-oriented technique. Developed analysis system considers the construction sequence, especially time-dependent deformation in early days, and is composed of input module, database module, database store module, analysis module, and analysis result generation module. Graphic user interface(GUI) is supported for user's convenience. After performing the analysis, the output results like deflections and member forces according to the time can be observed in the generation module using the graphic diagram, table, and chart supported by the integrated environment.

An Effective Similarity Search Technique supporting Time Warping in Sequence Databases (시퀀스 데이타베이스에서 타임 워핑을 지원하는 효과적인 유살 검색 기법)

  • Kim, Sang-Wook;Park, Sang-Hyun
    • Journal of KIISE:Databases
    • /
    • v.28 no.4
    • /
    • pp.643-654
    • /
    • 2001
  • This paper discusses an effective processing of similarity search that supports time warping in large sequence database. Time warping enables finding sequences with similar patterns even when they are of different length, Previous methods fail to employ multi-dimensional indexes without false dismissal since the time warping distance does not satisfy the triangular inequality. They have to scan all the database, thus suffer from serious performance degradation in large database. Another method that hires the suffix tree also shows poor performance due to the large tree size. In this paper we propose a new novel method for similarity search that supports time warping Our primary goal is to innovate on search performance in large database without false dismissal. to attain this goal ,we devise a new distance function $D_{tw-Ib}$ consistently underestimates the time warping distance and also satisfies the triangular inequality, $D_{tw-Ib}$ uses a 4-tuple feature vector extracted from each sequence and is invariant to time warping, For efficient processing, we employ a distance function, We prove that our method does not incur false dismissal. To verify the superiority of our method, we perform extensive experiments . The results reveal that our method achieves significant speedup up to 43 times with real-world S&P 500 stock data and up to 720 times with very large synthetic data.

  • PDF