• Title/Summary/Keyword: Biological Sequence Database

Search Result 92, Processing Time 0.022 seconds

Analysis of Partial cDNA Sequence from Human Fetal Liver

  • Kim, Jae-Wha;Song, Jae-Chan;Lee, In-Ae;Lee, Young-Hee;Nam, Myoung-Soo;Hahn, Yoon-Soo;Chung, Jae-Hoon;Choe, In-Seong
    • BMB Reports
    • /
    • v.28 no.5
    • /
    • pp.402-407
    • /
    • 1995
  • Single-run Partial cDNA sequencing was conducted on 1,592 randomly selected human fetal liver cDNA clones of Korean origin to isolate novel genes related to liver functions. Each partial cDNA sequence determined was analyzed by comparing it with the databases. GenBank, Protein Information Resource (PIR) and SWISS-PROT Protein Sequence Data Bank. From a set of 1.592 cDNA clones reported here, 1,433 (90.0% of the total) were informative cDNA sequences. The other 159 clones were identified as DNA sequences which had originated from the cloning vector. Among 1,433 informative partial cDNA sequences, 851 (59.3%) clones were revealed to be identical to known human genes. These known genes have been classified into 225 different kinds of genes. In addition, 340 clones (23.7%) showed various degrees of homology to previously known human genes. Ninety four (6.6%) clones contained various repeated sequences. Twenty four (1.7%) partial cDNA sequences were found to have considerable homology to known genes from evolutionarily distant organism such as yeast, rice, Arabidopsis, mouse and rat, based on database matches, whereas 124 (8.7%) had no Significant matches. Human homologues to functionally characterized genes from different organisms could be classified as candidates for novel human genes of similar functions. Information from the partial cDNA sequences in this study may facilitate the analysis of genes expressed in human fetal liver.

  • PDF

A Protein Sequence Prediction Method by Mining Sequence Data (서열 데이타마이닝을 통한 단백질 서열 예측기법)

  • Cho, Sun-I;Lee, Do-Heon;Cho, Kwang-Hwi;Won, Yong-Gwan;Kim, Byoung-Ki
    • The KIPS Transactions:PartD
    • /
    • v.10D no.2
    • /
    • pp.261-266
    • /
    • 2003
  • A protein, which is a linear polymer of amino acids, is one of the most important bio-molecules composing biological structures and regulating bio-chemical reactions. Since the characteristics and functions of proteins are determined by their amino acid sequences in principle, protein sequence determination is the starting point of protein function study. This paper proposes a protein sequence prediction method based on data mining techniques, which can overcome the limitation of previous bio-chemical sequencing methods. After applying multiple proteases to acquire overlapped protein fragments, we can identify candidate fragment sequences by comparing fragment mass values with peptide databases. We propose a method to construct multi-partite graph and search maximal paths to determine the protein sequence by assembling proper candidate sequences. In addition, experimental results based on the SWISS-PROT database showing the validity of the proposed method is presented.

Identification and Phylogenetic Analysis of the Human Endogenous Retrovirus HERV-W LTR Family in Placenta cDNA Library

  • Yi, Joo-Mi;Lee, Ji-Won;Shin, Kyung-Mi;Huh, Jae-Won;Lee, Won-Ho;Jang, Kyung-Lib;Kim, Heui-Soo
    • Animal cells and systems
    • /
    • v.5 no.3
    • /
    • pp.243-246
    • /
    • 2001
  • Human endoqenous retroviral long terminal repeats (LTRs) have been found to be coexpressed with sequences of genes closely located nearby. It has been suggested that the LTR elements have contributed to structural changes or genetic variations of human genome connected to various diseases and evolution. Using cDNA library derived from placenta tissue, we performed PCR amplification and identified five new HERV-W LTR elements. Those LTR elements showed a high degree of sequence similarity (98-99%) with HERV-W LTR (AF072500). A phylogenetic tree obtained by the neighbor-joining method revealed that HERV-W LTR elements could be mainly divided into two groups through evolutionary divergence. Five new HERV-W LTR elements (pla-1, 4, 5, 6, 7) belonged to the group I with AX000960, AF072504, and AF072506 from GenBank database. The data suggest that several copy numbers of the HERV-W LTR elements are transcribed in placenta and may contribute to the understanding of biological function such as human placental morphogenesis.

  • PDF

Discovering Sequence Association Rules for Protein Structure Prediction (단백질 구조 예측을 위한 서열 연관 규칙 탐사)

  • Kim, Jeong-Ja;Lee, Do-Heon;Baek, Yun-Ju
    • The KIPS Transactions:PartD
    • /
    • v.8D no.5
    • /
    • pp.553-560
    • /
    • 2001
  • Bioinformatics is a discipline to support biological experiment projects by storing, managing data arising from genome research. In can also lead the experimental design for genome function prediction and regulation. Among various approaches of the genome research, the proteomics have been drawing increasing attention since it deals with the final product of genomes, i.e., proteins, directly. This paper proposes a data mining technique to predict the structural characteristics of a given protein group, one of dominant factors of the functions of them. After explains associations among amino acid subsequences in the primary structures of proteins, which can provide important clues for determining secondary or tertiary structures of them, it defines a sequence association rule to represent the inter-subsequences. It also provides support and confidence measures, newly designed to evaluate the usefulness of sequence association rules, After is proposes a method to discover useful sequence association rules from a given protein group, it evaluates the performance of the proposed method with protein sequence data from the SWISS-PROT protein database.

  • PDF

Molecular Cloning and Phylogeny of the Human Endogenous Retrovirus HERV-W LTR Family in cDNA Library of Human Fetal Brain (인간 태아의 뇌로부터 만들어진 cDNA library에서 내생 레트로바이러스 HERV-W LTR의 클로닝 및 분자계통분류)

  • 이주민;허재원;신경미;이지원;이영춘;백인호;장경립;김희수
    • Journal of Life Science
    • /
    • v.11 no.4
    • /
    • pp.379-384
    • /
    • 2001
  • Long terminal repeats(LTRs) of the human endogenous retrovirus(HERV) heve been found to be coexpresed with genes located nearby. It has been suggested that the LTR elements have contributed to the genetic variation of human genome connected to various diseases. Recently, HERV-W family was identified in the cerebrospinal fluids and brains of individuals with schizophrenia. Using cHNA library derived from human fetal brain, we performed PCR amplification and identified seven new HERV-W LTR elements. Those LTR elements showed a high degree of sequence similarity(98∼99%) with HERV-W (AF072500). A phylogentic tree obtained by the neighbor-joining method revealed that seven new HERV-W LTR elements(FB-1, 2, 4, 8, 9, 10, 12) were closely related to the AX000960, AF072504, and AF072506 from Gen Bank database. Our data suggest that several copy numbers of the HERV-W LTR elements are expressed in human feta brain and may contribute to an understanding of biological function connected to neuropsychiatric diseases.

  • PDF

Expressed sequence tag analysis of Meretrix lusoria (Veneridae) in Korea (한국산 백합 (Meretrix lusoria) 의 전사체 분석)

  • Kang, Jung-Ha;Jeong, Ji Eun;Kim, Bong Seok;An, Chel-Min;Kang, Hyun-Sook;Kang, Se-Won;Hwang, Hee Ju;Han, Yeon Soo;Chae, Sung-Hwa;Ko, Hyun-Sook;Lee, Jun-Sang;Lee, Yong Seok
    • The Korean Journal of Malacology
    • /
    • v.28 no.4
    • /
    • pp.377-384
    • /
    • 2012
  • The importance of biological resources has been gradually increasing, and mollusks have been utilized as main fishery resources in terrestrial ecosystems. But little is known about genomic and transcriptional analysis in mollusks. This is the first report on the transcriptomic profile of Meretrix lusoria. In this study, we constructed cDNA library and determined 542 of distinct EST sequences composed of 284 singletons and 95 contigs. At first, we identified 180 of EST sequences that have significant hits on protein sequences of the exclusive Mollusks database through BLASTX program and 343 of EST sequences that have significant hits on NCBI NR database. We also found that 211 of putative sequences through local BLAST (blastx, E < e-10) search against KOG database were classified into 16 functional categories. Some kinds of immune response related genes encoding allograft inflammatory factor 1 (AIF-1), B-cell translocation gene 1 (BTG1), C-type lectin A, thioester-containing protein and 26S proteasome regulatory complex were identified. To determine phylogenetic relationship, we identified partial sequences of four genes (COX1, COX2, 12S rRNA and NADH dehydrogenase) that significantly matched with the mitochondrial genomes of 3 species-Ml (Meretrix lusoria), Mp (Meretrix petechialis) and Mm (Meretrix meretrix). As a result, we found that there was a little bit of a difference between sequences of Korean isolates and other known isolates. This study will be useful to develop breeding technology and might also be helpful to establish a classification system.

Rice Proteomics: A Functional Analysis of the Rice Genome and Applications (프로테옴 해석에 의한 벼 게놈 기능해석과 응용)

  • Woo, Sun-Hee;Kim, Hong-Sig;Song, Berm-Heun;Lee, Chul-Won;Park, Young-Mok;Jong, Seung-Keun;Cho, Yong-Gu
    • Journal of Plant Biotechnology
    • /
    • v.30 no.3
    • /
    • pp.281-291
    • /
    • 2003
  • In this review, we described the catalogues of the rice proteome which were constructed in our program, and functional characterization of some of these proteins was discussed. Mass-spectrometry is the most prevalent technique to rapidly identify a large number of proteome analysis. However, the conventional Western blotting/sequencing technique has been used in many laboratories. As a first step to efficiently construct protein cata-file in proteome analysis of major cereals, we have analyzed the N-terminal sequences of 100 rice embryo proteins and 70 wheat spike proteins separated by two-dimensional electrophoresis. Edman degradation revealed the N-terminal peptide sequences of only 31 rice proteins and 47 wheat proteins, suggesting that the rest of separated protein sports are N-terminally blocked. To efficiently determine the internal sequence of blocked proteins, we have developed a modified Cleveland peptide mapping method. Using this above method, the internal sequences of all blocked rice proteins(i, e., 69 proteins) were determined. Among these 100 rice proteins, thirty were proteins for which homologous sequence in the rice genome database could be identified. However, the rest of the proteins lacked homologous proteins. This appears to be consistent with the fact that about 45% of total rice cDNA have been deposited in the EMBL database. Also, the major proteins involved in the growth and development of rice can be identified using the proteome approach. Some of these proteins, including a calcium-binding protein that tuned out to be calreticulin, gibberellin-binding protein, which is ribulose-1.5-bisphosphate carboxylase/oxygense active in rice, and leginsulin-binding protein in soybean have functions in the signal transduction pathway. Proteomics is well suited not only to determine interaction between pairs of proteins, but also to identify multisubunit complexes. Currently, a protein-protein interaction database for plant proteins(http://genome.c.kanazawa-u.ac.jp/Y2H)could be a very useful tool for the plant research community. Also, the information thus obtained from the plant proteome would be helpful in predicting the function of the unknown proteins and would be useful be in the plant molecular breeding.

Twenty-five unrecorded bacterial species of the Republic of Korea belonging to the phylum Actinomycetota discovered during surveys in 2021

  • Inhyup Kim;Wan-Taek Im;Kiseong Joh;Myung Kyum Kim;Jung-Hoon Yoon;Wonyong Kim;Taegun Seo
    • Journal of Species Research
    • /
    • v.12 no.3
    • /
    • pp.229-239
    • /
    • 2023
  • We isolated and identified 25 unrecorded bacterial species belonging to the phylum Actinomycetota found in the Republic of Korea. Sequence comparison of 16S rRNA was performed using the NCBI BLAST and EzBioCloud database to identify 25 species, which had a 16S rRNA gene sequence similarity of >98.8% and were allocated as unrecorded species in the Republic of Korea. Among the 25 unrecorded bacterial strains, Streptomyces was the most common with nine species, followed by Leifsonia with two species. Isoptericola, Nocardioides, Dermacoccus, Sinomonas, Patulibacter, Marmoricola, Allobranchiibius, Aldersonia, Actinokineospora, Agromyces, Aeromicrobium, Cellulomonas, and Gordonia with one species each were also found. Twenty-five unrecorded species were excavated in various environments, such as tidal flats, ferns, soil, pine cones, moss, mud, wetlands, and plants. These isolates were characterized on the basis of their phylogenetic, biochemical properties, and morphological data, and species descriptions were provided.

Comparative Genomics Platform and Phylogenetic Analysis of Fungal Laccases and Multi-Copper Oxidases

  • Wu, Jiayao;Choi, Jaeyoung;Asiegbu, Fred O.;Lee, Yong-Hwan
    • Mycobiology
    • /
    • v.48 no.5
    • /
    • pp.373-382
    • /
    • 2020
  • Laccases (EC 1.10.3.2), a group of multi-copper oxidases (MCOs), play multiple biological functions and widely exist in many species. Fungal laccases have been extensively studied for their industrial applications, however, there was no database specially focused on fungal laccases. To provide a comparative genomics platform for fungal laccases, we have developed a comparative genomics platform for laccases and MCOs (http://laccase.riceblast.snu.ac. kr/). Based on protein domain profiles of characterized sequences, 3,571 laccases were predicted from 690 genomes including 253 fungi. The number of putative laccases and their properties exhibited dynamic distribution across the taxonomy. A total of 505 laccases from 68 genomes were selected and subjected to phylogenetic analysis. As a result, four clades comprised of nine subclades were phylogenetically grouped by their putative functions and analyzed at the sequence level. Our work would provide a workbench for putative laccases mainly focused on the fungal kingdom as well as a new perspective in the identification and classification of putative laccases and MCOs.

Cloning and Characterization of a Rice cDNA Encoding Glutamate Decarboxylase

  • Oh, Suk-Heung;Choi, Won-Gyu;Lee, In-Tae;Yun, Song-Joong
    • BMB Reports
    • /
    • v.38 no.5
    • /
    • pp.595-601
    • /
    • 2005
  • In this study, we have isolated a rice (Oryza sativa L.) glutamate decarboxylase (RicGAD) clone from a root cDNA library, using a partial Arabidopsis thaliana GAD gene as a probe. The rice root cDNA library was constructed with mRNA, which had been derived from the roots of rice seedlings subjected to phosphorus deprivation. Nucleotide sequence analysis indicated that the RicGAD clone was 1,712 bp long, and harbors a complete open reading frame of 505 amino acids. The 505 amino acid sequence deduced from this RicGAD clone exhibited 67.7% and 61.9% identity with OsGAD1 (AB056060) and OsGAD2 (AB056061) in the database, respectively. The 505 amino acid sequence also exhibited 62.9, 64.1, and 64.2% identity to Arabidopsis GAD (U9937), Nicotiana tabacum GAD (AF020425), and Petunia hybrida GAD (L16797), respectively. The RicGAD was found to possess a highly conserved tryptophan residue, but lacks the lysine cluster at the C-proximal position, as well as other stretches of positively charged residues. The GAD sequence was expressed heterologously using the high copy number plasmid, pVUCH. Our activation analysis revealed that the maximal activation of the RicGAD occurred in the presence of both $Ca^{2+}$ and calmodulin. The GAD-encoded 56~58 kDa protein was identified via Western blot analysis, using an anti-GAD monoclonal antibody. The results of our RT-PCR analyses revealed that RicGAD is expressed predominantly in rice roots obtained from rice seedlings grown under phosphorus deprivation conditions, and in non-germinated brown rice, which is known to have a limited phosphorus bioavailability. These results indicate that RicGAD is a $Ca^{2+}$/calmodulin-dependent enzyme, and that RicGAD is expressed primarily under phosphate deprivation conditions.