• Title/Summary/Keyword: DNA sequence database

Search Result 209, Processing Time 0.026 seconds

A Pattern Summary System Using BLAST for Sequence Analysis

  • Choi, Han-Suk;Kim, Dong-Wook;Ryu, Tae-W.
    • Genomics & Informatics
    • /
    • v.4 no.4
    • /
    • pp.173-181
    • /
    • 2006
  • Pattern finding is one of the important tasks in a protein or DNA sequence analysis. Alignment is the widely used technique for finding patterns in sequence analysis. BLAST (Basic Local Alignment Search Tool) is one of the most popularly used tools in bio-informatics to explore available DNA or protein sequence databases. BLAST may generate a huge output for a large sequence data that contains various sequence patterns. However, BLAST does not provide a tool to summarize and analyze the patterns or matched alignments in the BLAST output file. BLAST lacks of general and robust parsing tools to extract the essential information out from its output. This paper presents a pattern summary system which is a powerful and comprehensive tool for discovering pattern structures in huge amount of sequence data in the BLAST. The pattern summary system can identify clusters of patterns, extract the cluster pattern sequences from the subject database of BLAST, and display the clusters graphically to show the distribution of clusters in the subject database.

Gene Reangement through 151 bp Repeated Sequence in Rice Chloroplast DNA (벼 엽록체 DNA내의 151 bp 반복염기서열에 의한 유전자 재배열)

  • Nahm, Baek-Hie;Kim, Han-Jip
    • Applied Biological Chemistry
    • /
    • v.36 no.3
    • /
    • pp.208-214
    • /
    • 1993
  • To investigate the gene rearrangement via short repeated sequences in chloroplast DNA, the pattern of heterologous gene clusters containing the 151 bp repeated sequence with the development of plastid was compared in rice and the homologous gene clusters from various plant sources were searched for comparative analysis. Southern blot analysis of rice DNA using rp12 gene containing 151 bp repeated sequence as a probe showed the presence of heterologous gene clusters. Such heterologous gene clusters varied with the development of plastid. Also it was observed that the heterologous gene clusters were observed in all of the rice cultivars used in this work. Finally the comparative analysis of DNA sequence of the homologous gene clusters from various plants showed the evolutionary gene rearragngement via short repeated sequence among plants. These results suggest the possible relationship between the plastid development and gene rearrangement through short repeated sequences.

  • PDF

A Review of Extended STR Loci and DNA Database

  • Cho, Yoonjung;Lee, Min Ho;Kim, Su Jin;Park, Ji Hwan;Jung, Ju Yeon
    • Biomedical Science Letters
    • /
    • v.28 no.3
    • /
    • pp.157-169
    • /
    • 2022
  • DNA typing is the typical technology in the forensic science and plays a significant role in the personal identification of victims and suspects. Short tandem repeat (STR) is the short tandemly repeated DNA sequence consisting of 2~7 bp DNA units in specific loci. It is disseminated across the human genome and represents polymorphism among individuals. Because polymorphism is a key feature of the application of DNA typing STR analysis, STR analysis becomes the standard technology in forensics. Therefore, the DNA database (DNA-DB) was first introduced with 4 essential STR markers for the application of forensic science; however, the number of STR markers was expanded from 4 to 13 and 13 to 20 later to counteract the continuously increased DNA profile and other needed situations. After applying expanded STR markers to the South Korean DNA-DB system, it positively affected to low copy number analysis that had a high possibility of partial DNA profiles, and especially contributed to the theft cases due to the high portion of touch DNA evidence in the theft case. Furthermore, STR marker expansion not only contributed to the resolution of cold cases but also increased kinship index indicating the potential for improved kinship test accuracy using extended STR markers. Collectively, the expansion of the STR locus was considered to be necessary to keep pace with the continuously increasing DNA profile, and to improve the data integrity of the DNA-DB.

Prediction of Rice Embryo Proteins using EST-Databases

  • Woo, Sun-Hee;Cho, Seung-Woo;Kim, Tae-Seon;Chung, Keun-Yook;Cho, Yong-Gu;Kim, Hong-Sig;Song, Beom-Heon;Lee, Chul-Won;Jong, Seung-Keun
    • Korean Journal of Breeding Science
    • /
    • v.40 no.1
    • /
    • pp.1-7
    • /
    • 2008
  • An attempt was made to link rice embryo proteins to DNA sequences and to understand their functions. One hundred of the 700 spots detected on the embryo 2-DE gels were microsequenced. Of these, 28% of the embryo proteins were matched to DNA sequences with known functions, but 72% of the proteins were unknown in functions as previously reported (Woo et al. 2002). In addition, twenty-four protein spots with 100% of homology and nine with over 80% were matched to ESTs (expressed sequence tags) after expanding the amino acid sequences of the protein spots by Database searches using the available rice EST databases at the NCBI (http://www/ncbi.nlm.nih.gov/) and DDBJ (http://www.ddbj.nig.ac.jp/). The chromosomal location of some proteins were also obtained from the rice genetic map provided by Japanese Rice Genome Research Program (http://rgp.dna.affrc.go.jp). The DNA sequence databases including EST have been reported for rice (Oryza sativa L.) now provides whole or partial gene sequence, and recent advances in protein characterization allow the linking proteins to DNA sequences in the functional analysis. This work shows that proteome analysis could be a useful tool strategy to link sequence information and to functional genomics.

Identification of 19 Species of Poisonous Plants from Jeju Island and Construction of a Database Using DNA-barcoding (DNA-barcoding을 이용한 제주도 자생 독성 식물 19종의 종 식별 및 데이터베이스 구축)

  • Kwon, Eunchae;Kim, Joo-Young;Chang, Miwha;Lee, Minji;Moon, Seohyun;Lee, Won-Hae
    • Korean Journal of Plant Resources
    • /
    • v.35 no.2
    • /
    • pp.346-361
    • /
    • 2022
  • Food poisoning accidents caused by poisonous plants occur every year. As certain poisonous plants are mistaken for edible plants causing food poisoning, accurate species identification of poisonous plants is required. DNA barcodes suitable for species identification of poisonous plants and database that can be used for accurate species identification are necessary for their use in forensic cases. In this study, species identification of 19 poisonous plants native to Jeju Island using seven DNA barcodes (trnH-psbA, trnL-trnF, trnL intron, rbcL, matK, ITS1-ITS4, 18S rRNA) was performed to construct a database containing sequence information and DNA barcode universality. trnL-trnF barcode and ITS1-ITS4 barcode were the easiest markers for PCR amplification and sequence retrieval, and the combination of the two barcodes enabled single species identification in 18 out of 19 plants. Therefore, when an investigation of unknown poisonous plants is requested, combination of trnL-trnF and ITS1-ITS4 barcodes is considered as a primary marker for species identification. The database of recommended DNA barcodes for each poisonous plant presented in this study will be helpful in plants poisoning cases.

Efficient Indexing for Large DNA Sequence Databases (대용량 DNA 시퀀스 데이타베이스를 위한 효율적인 인덱싱)

  • Won Jung-Im;Yoon Jee-Hee;Park Sang-Hyun;Kim Sang-Wook
    • Journal of KIISE:Databases
    • /
    • v.31 no.6
    • /
    • pp.650-663
    • /
    • 2004
  • In molecular biology, DNA sequence searching is one of the most crucial operations. Since DNA databases contain a huge volume of sequences, a fast indexing mechanism is essential for efficient processing of DNA sequence searches. In this paper, we first identify the problems of the suffix tree in aspects of the storage overhead, search performance, and integration with DBMSs. Then, we propose a new index structure that solves those problems. The proposed index consists of two parts: the primary part represents the trie as bit strings without any pointers, and the secondary part helps fast accesses of the leaf nodes of the trio that need to be accessed for post processing. We also suggest an efficient algorithm based on that index for DNA sequence searching. To verify the superiority of the proposed approach, we conducted a performance evaluation via a series of experiments. The results revealed that the proposed approach, which requires smaller storage space, achieves 13 to 29 times performance improvement over the suffix tree.

Bioinformatics for the Korean Functional Genomics Project

  • Kim, Sang-Soo
    • Proceedings of the Korean Society for Bioinformatics Conference
    • /
    • 2000.11a
    • /
    • pp.45-52
    • /
    • 2000
  • Genomic approach produces massive amount of data within a short time period, New high-throughput automatic sequencers can generate over a million nucleotide sequence information overnight. A typical DNA chip experiment produces tens of thousands expression information, not to mention the tens of megabyte image files, These data must be handled automatically by computer and stored in electronic database, Thus there is a need for systematic approach of data collection, processing, and analysis. DNA sequence information is translated into amino acid sequence and is analyzed for key motif related to its biological and/or biochemical function. Functional genomics will play a significant role in identifying novel drug targets and diagnostic markers for serious diseases. As an enabling technology for functional genomics, bioinformatics is in great need worldwide, In Korea, a new functional genomics project has been recently launched and it focuses on identi☞ing genes associated with cancers prevalent in Korea, namely gastric and hepatic cancers, This involves gene discovery by high throughput sequencing of cancer cDNA libraries, gene expression profiling by DNA microarray and proteomics, and SNP profiling in Korea patient population, Our bioinformatics team will support all these activities by collecting, processing and analyzing these data.

  • PDF

The List of Korean Organisms Registered in the NCBI Nucleotide Database for Environmental DNA Research (환경유전자 연구를 위한 NCBI Nucleotide 데이터베이스에 등록된 국내 생물 목록 현황)

  • Ihn-Sil Kwak;Chang Woo Ji;Won-Seok Kim;Dongsoo Kong
    • Korean Journal of Ecology and Environment
    • /
    • v.55 no.4
    • /
    • pp.352-359
    • /
    • 2022
  • Recently, with the development of genetic technology, interest in environmental DNA (eDNA) to study biodiversity according to molecular biological approaches is increasing. Environmental DNA has many advantages over traditional research methods for biological communities distributed in the environment but highly depends on the established base sequence database. This study conducted a comprehensive analysis of the habitat status and classification at the genus level, which is mainly used in eDNA (12S rRNA, 16S rRNA, 18S rRNA, COI, and CYTB), focusing on Korean registration taxon groups (phytoplankton, zooplankton, macroinvertebrates, and fish). As a result, phytoplankton and zooplankton showed the highest taxa proportion in 18S rRNA, and macroinvertebrates observed the highest ratio in the nucleotide sequence database in COI. In fish, all genes except 18S rRNA showed a high taxon ratio. Based on the Korean registration taxon group, the gene construction of the top 20 genera according to bio density observed that most of the phytoplankton were registered in 18S rRNA, and the most significant number of COI nucleotide sequences were established in macroinvertebrates. In addition, it was confirmed that there is a nucleotide sequence for the top 20 genera in 12S rRNA, 16S rRNA, and CYTB in fish. These results provided comprehensive information on the genes suitable for eDNA research for each taxon group.

Construction of EST Database for Comparative Gene Studies of Acanthamoeba

  • Moon, Eun-Kyung;Kim, Joung-Ok;Xuan, Ying-Hua;Yun, Young-Sun;Kang, Se-Won;Lee, Yong-Seok;Ahn, Tae-In;Hong, Yeon-Chul;Chung, Dong-Il;Kong, Hyun-Hee
    • Parasites, Hosts and Diseases
    • /
    • v.47 no.2
    • /
    • pp.103-107
    • /
    • 2009
  • The genus Acanthamoeba can cause severe infections such as granulomatous amebic encephalitis and amebic keratitis in humans. However, little genomic information of Acanthamoeba has been reported. Here, we constructed Acanthamoeba expressed sequence tags (EST) database (Acanthamoeba EST DB) derived from our 4 kinds of Acanthamoeba cDNA library. The Acanthamoeba EST DB contains 3,897 EST generated from amebae under various conditions of long term in vitro culture, mouse brain passage, or encystation, and downloaded data of Acanthamoeba from National Center for Biotechnology Information (NCBI) and Taxonomically Broad EST Database (TBestDB). The almost reported eDNA/genomic sequences of Acanthamoeba provide stand alone BLAST system with nucleotide (BLAST NT) and amino acid (BLAST AA) sequence database. In BLAST results, each gene links for the significant information including sequence data, gene orthology annotations, relevant references, and a BlastX result. This is the first attempt for construction of Acanthamoeba database with genes expressed in diverse conditions. These data were integrated into a database (http://www. amoeba.or.kr).

IMPLEMENTATION OF SUBSEQUENCE MAPPING METHOD FOR SEQUENTIAL PATTERN MINING

  • Trang, Nguyen Thu;Lee, Bum-Ju;Lee, Heon-Gyu;Ryu, Keun-Ho
    • Proceedings of the KSRS Conference
    • /
    • v.2
    • /
    • pp.627-630
    • /
    • 2006
  • Sequential Pattern Mining is the mining approach which addresses the problem of discovering the existent maximal frequent sequences in a given databases. In the daily and scientific life, sequential data are available and used everywhere based on their representative forms as text, weather data, satellite data streams, business transactions, telecommunications records, experimental runs, DNA sequences, histories of medical records, etc. Discovering sequential patterns can assist user or scientist on predicting coming activities, interpreting recurring phenomena or extracting similarities. For the sake of that purpose, the core of sequential pattern mining is finding the frequent sequence which is contained frequently in all data sequences. Beside the discovery of frequent itemsets, sequential pattern mining requires the arrangement of those itemsets in sequences and the discovery of which of those are frequent. So before mining sequences, the main task is checking if one sequence is a subsequence of another sequence in the database. In this paper, we implement the subsequence matching method as the preprocessing step for sequential pattern mining. Matched sequences in our implementation are the normalized sequences as the form of number chain. The result which is given by this method is the review of matching information between input mapped sequences.

  • PDF