• Title/Summary/Keyword: Sequence database

Search Result 567, Processing Time 0.029 seconds

An Efficient Mining for Closed Frequent Sequences (효율적인 닫힌 빈발 시퀀스 마이닝)

  • Kim, Hyung-Geun;Whang, Whan-Kyu
    • Journal of Industrial Technology
    • /
    • v.25 no.A
    • /
    • pp.163-173
    • /
    • 2005
  • Recent sequential pattern mining algorithms mine all of the frequent sequences satisfying a minimum support threshold in a large database. However, when a frequent sequence becomes very long, such mining will generate an explosive number of frequent sequence, which is prohibitively expensive in time. In this paper, we proposed a novel sequential pattern algorithm using only closed frequent sequences which are small subset of very large frequent sequences. Our algorithm extends the sequence by depth-first search strategy with effective pruning. Using bitmap representation of underlying databases, we can obtain a closed frequent sequence considerably faster than the currently reported methods.

  • PDF

AN IMPROVED ALGORITHM FOR RNA SECONDARY STRUCTURE PREDICTION

  • Namsrai Oyun-Erdene;Jung Kwang Su;Kim Sunshin;Ryu Keun Ho
    • Proceedings of the KSRS Conference
    • /
    • 2005.10a
    • /
    • pp.280-282
    • /
    • 2005
  • A ribonucleic acid (RNA) is one of the two types of nucleic acids found in living organisms. An RNA molecule represents a long chain of monomers called nucleotides. The sequence of nucleotides of an RNA molecule constitutes its primary structure, and the pattern of pairing between nucleotides determines the secondary structure of an RNA. Non-coding RNA genes produce transcripts that exert their function without ever producing proteins. Predicting the secondary structure of non-coding RNAs is very important for understanding their functions. We focus on Nussinov's algorithm as useful techniques for predicting RNA secondary structures. We introduce a new traceback matrix and scoring table to improve above algorithm. And the improved algorithm provides better levels of performance than the originals.

  • PDF

A Web-based Unified Design Methodology using XML Applications (XML을 이용한 웹기반 정보 관리 통합설계 방법론)

  • 김경수;신현철;장희선
    • Journal of the Korea Society of Computer and Information
    • /
    • v.7 no.4
    • /
    • pp.157-162
    • /
    • 2002
  • In this paper, we implement the XML and data modeling by the UML tool, in which the class diagram is constructed from the sequence diagram after making the use case diagram. For the XML modeling. the guiding line will be presented to transform the UML class into the XML document, and then an example to draw the XML DTD from the UML class will be also shown. Furthermore, through the proposed data modeling, the integrated design methods for the transformation of the UML class into relational database schema. object-relational database schema and object-oriented database schema also will be proposed. Finally, we will be presented schema for each database system.

  • PDF

Improved spectral line measurements of the SDSS galaxy spectra

  • Oh, Kyu-Seok;Sarzi, Marc;Yi, Suk-Young
    • Bulletin of the Korean Space Science Society
    • /
    • 2009.10a
    • /
    • pp.35.1-35.1
    • /
    • 2009
  • We have established a database of galaxy spectral line strengths for the SDSS database using an improved line measuring method. Our work includes the entire SDSS DR7 galaxies within redshift of 0.2. The absorption line strengths measured by the SDSS pipeline are seriously contaminated by emission filling. Our code, GANDALF (gas and absorption line fitting code) performs more accurate measurements by effectively separating emission lines from absorption lines. A significant improvement has also been made on the velocity dispersion measurement, more notably in late-type galaxies. We have also identified a number of broad line region galaxies which were misclassified as normal galaxies by the SDSS pipeline. We developed an effective method measuring their line strengths. The database will be provided with new parameters that are indicative of the line strength measurement quality. In addition, we made galaxy templates for the Hubble sequence. The database will be useful for many fields of galaxy studies including star formation and AGN activities.

  • PDF

Buliding Clustered EST database for In Silico Cloning (전산 클로닝을 위한 Clustered EST 데이터베이스 구축)

  • Lee, Jin-Kwan;Choi, Eun-Sun;Ryu, Keun-Ho
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2001.10a
    • /
    • pp.105-108
    • /
    • 2001
  • cDNA(complementary DNA)를 복제(cloneing)하여 염기 서열화 한 EST(Expressed Sequence Tag) 데이터는 여러 생물체들의 염기서열 정보들과 비교를 통해 유사점을 찾거나 기능적 부위 검색을 통해 유전자 기능을 추정한 수 있어 기능 유전체 연구에 많이 사용되고 있다. EST 데이터를 식물은 특정종(Species)별로, 동물의 경우 종의 조직별로 클러스터링 함으로써 아직 알려지지 않은 종의 유전자를 밝혀낼 수 있음은 물론 유전자의 발현에 따른 단백질의 기능도 알아낼 수 있다. 따라서 이 논문에서는 NCBI에서 flatfile 형태로 제공하는 EST 데이터를 분석하여 관계형 데이터베이스로 모델링하고 구축하였다. 또한 EST 데이터의 효율적인 사용을 위하여 데이터를 특정 종의 조직별로 클러스터링하여 제공하는 시스템을 설계하고 구현하였다.

  • PDF

Identification of Novel Cupredoxin Homologs Using Overlapped Conserved Residues Based Approach

  • Goyal, Amit;Madan, Bharat;Hwang, Kyu-Suk;Lee, Sun-Gu
    • Journal of Microbiology and Biotechnology
    • /
    • v.25 no.1
    • /
    • pp.127-136
    • /
    • 2015
  • Cupredoxin-like proteins are mainly copper-binding proteins that conserve a typical rigid Greek-key arrangement consisting of an eight-stranded β-sandwich, even though they share as little as 10-15% sequence similarity. The electron transport function of the Cupredoxins is critical for respiration and photosynthesis, and the proteins have therapeutic potential. Despite their crucial biological functions, the identification of the distant Cupredoxin homologs has been a difficult task due to their low sequence identity. In this study, the overlapped conserved residue (OCR) fingerprint for the Cupredoxin superfamily, which consists of conserved residues in three aspects (i.e., the sequence, structure, and intramolecular interaction), was used to detect the novel Cupredoxin homologs in the NCBI non-redundant protein sequence database. The OCR fingerprint could identify 54 potential Cupredoxin sequences, which were validated by scanning them against the conserved Cupredoxin motif near the Cu-binding site. This study also attempted to model the 3D structures and to predict the functions of the identified potential Cupredoxins. This study suggests that the OCR-based approach can be used efficiently to detect novel homologous proteins with low sequence identity, such as Cupredoxins.

Processing Temporal Aggregate Functions using a Time Point Sequence (시점 시퀀스를 이용한 시간지원 집계의 처리)

  • 권준호;송병호;이석호
    • Journal of KIISE:Databases
    • /
    • v.30 no.4
    • /
    • pp.372-380
    • /
    • 2003
  • Temporal databases support time-varying events so that conventional aggregate functions are extended to be processed with time for temporal aggregate functions. In the previous approach, it is done repeatedly to find time intervals and is calculated the result of each interval whenever target events are different. This paper proposes a method which processes temporal aggregate function queries using time point sequence. We can make time point sequence storing the start time and the end time of events in temporal databases in advance. It is also needed to update time point sequence due to insertion or deletion of events in temporal databases. Because time point sequence maintains the information of time intervals, it is more efficient than the previous approach when temporal aggregate function queries are continuously requested, which have different target events.

Construction of a full-length cDNA library from Pinus koraiensis and analysis of EST dataset (잣나무(Pinus koraiensis)의 cDNA library 제작 및 EST 분석)

  • Kim, Joon-Ki;Im, Su-Bin;Choi, Sun-Hee;Lee, Jong-Suk;Roh, Mark S.;Lim, Yong-Pyo
    • Korean Journal of Agricultural Science
    • /
    • v.38 no.1
    • /
    • pp.11-16
    • /
    • 2011
  • In this study, we report the generation and analysis of a total of 1,211 expressed sequence tags (ESTs) from Pinus koraiensis. A cDNA library was generated from the young leaf tissue and a total of 1,211 cDNA were partially sequenced. EST and unigene sequence quality were determined by computational filtering, manual review, and BLAST analyses. In all, 857 ESTs were acquired after the removal of the vector sequence and filtering over a minimum length 50 nucleotides. A total of 411 unigene, consisting of 89 contigs and 322 singletons, was identified after assembling. Also, we identified 77 new microsatellite-containing sequences from the unigenes and classified the structure according to their repeat unit. According to homology search with BLASTX against the NCBI database, 63.1% of ESTs were homologous with known function and 22.2% of ESTs were matched with putative or unknown function. The remaining 14.6% of ESTs showed no significant similarity to any protein sequences found in the public database. Gene ontology (GO) classification showed that the most abundant GO terms were transport, nucleotide binding, plastid, in terms biological process, molecular function and cellular component, respectively. The sequence data will be used to characterize potential roles of new genes in Pinus and provided for the useful tools as a genetic resource.

Prediction of Protein Secondary Structure Using the Weighted Combination of Homology Information of Protein Sequences (단백질 서열의 상동 관계를 가중 조합한 단백질 이차 구조 예측)

  • Chi, Sang-mun
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.20 no.9
    • /
    • pp.1816-1821
    • /
    • 2016
  • Protein secondary structure is important for the study of protein evolution, structure and function of proteins which play crucial roles in most of biological processes. This paper try to effectively extract protein secondary structure information from the large protein structure database in order to predict the protein secondary structure of a query protein sequence. To find more remote homologous sequences of a query sequence in the protein database, we used PSI-BLAST which can perform gapped iterative searches and use profiles consisting of homologous protein sequences of a query protein. The secondary structures of the homologous sequences are weighed combined to the secondary structure prediction according to their relative degree of similarity to the query sequence. When homologous sequences with a neural network predictor were used, the accuracies were higher than those of current state-of-art techniques, achieving a Q3 accuracy of 92.28% and a Q8 accuracy of 88.79%.

HorseDB; an Integrated Horse Resource and Web Service (말 데이터베이스 구축)

  • Kim Dae-Soo;Jo Un-Jong;Huh Jae-Won;Choe Eun-Sang;Cho Byung-Wook;Kim Heui-Soo
    • Journal of Life Science
    • /
    • v.16 no.3 s.76
    • /
    • pp.472-476
    • /
    • 2006
  • We have built a database server called HorseDB which contains the genome annotation information and biological information for horse from public database entries. The aims of HorseDB are the integration of biological information and horse genome data on genome scale using bioinformatic methods. To facilitate the extraction of useful information among collected horse genome and biological data, we developed a user-friendly interface system, HorseDB; an Integrated Horse Resource and web Service. The database is categorized by the general horse information data, a sequence annotation data, and a world-wide web analysis program interface. The database also provides an easy access for user to find out the useful information within horse genomes and support analyzed information, such as sequence alignment and gene annotation results. HorseDB can be accessed at http://www.primate.or.kr./horse.