• 제목/요약/키워드: sequence database

Search Result 567, Processing Time 0.03 seconds

The List of Korean Organisms Registered in the NCBI Nucleotide Database for Environmental DNA Research (환경유전자 연구를 위한 NCBI Nucleotide 데이터베이스에 등록된 국내 생물 목록 현황)

  • Ihn-Sil Kwak;Chang Woo Ji;Won-Seok Kim;Dongsoo Kong
    • Korean Journal of Ecology and Environment
    • /
    • v.55 no.4
    • /
    • pp.352-359
    • /
    • 2022
  • Recently, with the development of genetic technology, interest in environmental DNA (eDNA) to study biodiversity according to molecular biological approaches is increasing. Environmental DNA has many advantages over traditional research methods for biological communities distributed in the environment but highly depends on the established base sequence database. This study conducted a comprehensive analysis of the habitat status and classification at the genus level, which is mainly used in eDNA (12S rRNA, 16S rRNA, 18S rRNA, COI, and CYTB), focusing on Korean registration taxon groups (phytoplankton, zooplankton, macroinvertebrates, and fish). As a result, phytoplankton and zooplankton showed the highest taxa proportion in 18S rRNA, and macroinvertebrates observed the highest ratio in the nucleotide sequence database in COI. In fish, all genes except 18S rRNA showed a high taxon ratio. Based on the Korean registration taxon group, the gene construction of the top 20 genera according to bio density observed that most of the phytoplankton were registered in 18S rRNA, and the most significant number of COI nucleotide sequences were established in macroinvertebrates. In addition, it was confirmed that there is a nucleotide sequence for the top 20 genera in 12S rRNA, 16S rRNA, and CYTB in fish. These results provided comprehensive information on the genes suitable for eDNA research for each taxon group.

Proteome Data Analysis of Hairy Root of Panax ginseng : Use of Expressed Sequence Tag Data of Ginseng for the Protein Identification (인삼 모상근 프로테옴 데이터 분석 : 인삼 EST database와의 통합 분석에 의한 단백질 동정)

  • Kwon, Kyung-Hoon;Kim, Seung-Il;Kim, Kyung-Wook;Kim, Eun-A;Cho, Kun;Kim, Jin-Young;Kim, Young-Hwan;Yang, Deok-Chun;Hur, Cheol-Goo;Yoo, Jong-Shin;Park, Young-Mok
    • Journal of Plant Biotechnology
    • /
    • v.29 no.3
    • /
    • pp.161-170
    • /
    • 2002
  • For the hairy root of Panax ginseng, we have got mass spectrums from MALDI/TOF/MS analysis and Tandem mass spectrums from ESI/Q-TOF/MS analysis. While mass spectrum provides the molecular weights of peptide fragments digested by protease such as trypsin, tandem mass spectrum produces amino acid sequence of digested peptides. Each amino acid sequences can be a query sequence in BLAST search to identify proteins. For the specimens of animals or plants of which genome sequences were known, we can easily identify expressed proteins from mass spectrums with high accuracy. However, for the other specimens such as ginseng, it is difficult to identify proteins with accuracy since all the protein sequences are not available yet. Here we compared the mass spectrums and the peptide amino acid sequences with ginseng expressed sequence tag (EST) DB. The matched EST sequence was used as a query in BLAST search for protein identification. They could offer the correct protein information by the sequence alignment with EST sequences. 90% of peptide sequences of ESI/Q-TOF/MS are matched with EST sequences. Comparing 68% matches of the same sequences with the nr database of NCBI, we got more matches by 22% from ginseng EST sequence search. In case of peptide mass fingerprinting from MALDI/TOF/MS, only about 19% (9 proteins of 47 spots) among peptide matches from nr DB were correlated with ginseng EST DB. From these results, we suggest that amino acid sequencing using tandem mass spectrum analysis may be necessary for protein identification in ginseng proteome analysis.

Genetic Variations of Candida glabrata Clinical Isolates from Korea using Multi-locus Sequence Typing (Multi-locus sequence typing을 이용한 한국에서 분리한 Candida glabrata 임상균주의 유전자 유형 분석)

  • Kang, Min Ji;Lee, Kyung Eun;Jin, Hyunwoo
    • Journal of Life Science
    • /
    • v.30 no.2
    • /
    • pp.122-128
    • /
    • 2020
  • Although Candida albicans is the major fungal pathogen of candidemia, severe infections by non-albicans Candida (NAC) spp. have been increasing in recent years. Among NAC spp., C. glabrata has emerged as the second most common pathogen. However, few studies have been conducted to investigate its structure, epidemiology, and basic biology. In the present study, multi-locus sequence typing (MLST) was performed with a total of 102 C. glabrata clinical isolates that were isolated from various types of clinical specimen. For MLST, six housekeeping genes-FKS, LEU2, NMT1, TRP1, UGP1, and URA3-were amplified and sequenced. The results were analyzed using the C. glabrata database. Out of a total of 3,345 base-pair DNA sequences, 49 variable nucleotide sites were found, and the results showed that 12 different sequence types (ST) were identified from the 102 clinical isolates. The data also demonstrated that the undetermined ST1 was the most predominant ST in Korea. Further, seven undetermined STs (UST) containing UST2-8 were classified at specific loci. The data from this study may provide a fundamental database for further studies on C. glabrata, including its epidemiology and evolution. The data may also contribute to the development of novel antifungal agents and diagnostic tests.

Genotyping of HLA-B by Polymerase Chain Reaction-Sequence Specific Primer (Polymerase Chain Reaction-Sequence Specific Primer를 이용한 HLA-B 유전자의 DNA 다형성 조사)

  • Jang, Soon-Mo
    • Korean Journal of Clinical Laboratory Science
    • /
    • v.39 no.3
    • /
    • pp.147-150
    • /
    • 2007
  • Most expressed HLA (human leukocyte antigen) loci exhibit a remarkable degree of allelic polymorphism, which derives from sequence differences predominantly localized to discrete hypervariable regions of the amino terminal domain of the molecule. In this study, the HLA-B genotypes were determined in twenty students unrelated koreans using the PCR-SSP (polymerase chain reaction-sequence specific primer) technique. Several specific primer pairs in assigning the HLA-B gene were used ($B^{\ast}4001/4007$, $B^{\ast}4901/5001/4501$, $B^{\ast}3701$, $B^{\ast}5801$). The results of PCR-SSP, the HLA-B3701 primer was detected one (5%), the $HLA-B^{\ast}5801$ were detected four (20%), the $HLA-B^{\ast}4001/4007$ were detected nineteen (95%) and the $HLA-B^{\ast}4901/5001/4501$ were detected twenty. This study shows that the PCR-SSP technique is relatively simple, fast and a practical tool for the determination of the HLA-B genotypes. Moreover, these results genotype frequency of the HLA-B gene could be useful for database study before being applied to individual identification and transplantation immunity.

  • PDF

Sequence Validation for the Identification of the White-Rot Fungi Bjerkandera in Public Sequence Databases

  • Jung, Paul Eunil;Fong, Jonathan J.;Park, Myung Soo;Oh, Seung-Yoon;Kim, Changmu;Lim, Young Woon
    • Journal of Microbiology and Biotechnology
    • /
    • v.24 no.10
    • /
    • pp.1301-1307
    • /
    • 2014
  • White-rot fungi of the genus Bjerkandera are cosmopolitan and have shown potential for industrial application and bioremediation. When distinguishing morphological characters are no longer present (e.g., cultures or dried specimen fragments), characterizing true sequences of Bjerkandera is crucial for accurate identification and application of the species. To build a framework for molecular identification of Bjerkandera, we carefully identified specimens of B. adusta and B. fumosa from Korea based on morphological characters, followed by sequencing the internal transcribed spacer region and 28S nuclear ribosomal large subunit. The phylogenetic analysis of Korean Bjerkandera specimens showed clear genetic differentiation between the two species. Using this phylogeny as a framework, we examined the identification accuracy of sequences available in GenBank. Analyses revealed that many Bjerkandera sequences in the database are either misidentified or unidentified. This study provides robust reference sequences for sequence-based identification of Bjerkandera, and further demonstrates the presence and dangers of incorrect sequences in GenBank.

DNA Polymorphism Analysis of the HLA-DRB1 Gene Using Polymerase Chain Reaction-Sequence Specific Primer (PCR-SSP) among Korean Subjects

  • Lee, Kyung-Ok;Park, Taek-Kyu;Park, Young-Suk;Oh, Moon-Ju;Kim, Yoon-Jung
    • BMB Reports
    • /
    • v.29 no.1
    • /
    • pp.45-51
    • /
    • 1996
  • Most expressed HLA loci exhibit a remarkable degree of allelic polymorphism, which derives from sequence differences predominantly localized to discrete hypervariable regions of the amino-terminal domain of the molecule. In this study, the HLA-DRB1 genotypes were determined in eighteen control cell lines and 112 unrelated Koreans using the PCR-SSP (Polymerase Chain Reaction-Sequence Specific Primer) technique. 29 specific primer pairs in assigning the DRB1 gene were used. The results of control cells correlated well with the data which was previously reported. The heterozygosity and homozygosity of the DRB1 gene were 0.786 and 0.214, respectively. In a total of 41 different DRB1 alleles and 83 genotypes, the most frequent allele and genotype were DRB1*04 and DRB1*0901/1501, respectively. This study shows that the PCR-SSP technique is relatively simple, fast and a practical tool for the determination of the HLA-DRBI genotypes. Moreover, these results-allele and genotype frequency and heterozygosity of the HLA DRB1 gene-could be useful for database study before being applied to individual identification and transplantation immunity.

  • PDF

GTVseq: A Web-based Genotyping Tool for Viral Sequences

  • Shin, Jae-Min;Park, Ho-Eun;Ahn, Yong-Ju;Cho, Doo-Ho;Kim, Ji-Han;Kee, Mee-Kyung;Kim, Sung-Soon;Lee, Joo-Shil;Kim, Sang-Soo
    • Genomics & Informatics
    • /
    • v.6 no.1
    • /
    • pp.54-58
    • /
    • 2008
  • Genotyping Tool for Viral SEQuences (GTVseq) provides scientists with the genotype information on the viral genome sequences including HIV-1, HIV-2, HBV, HCV, HTLV-1, HTLV-2, poliovirus, enterovirus, flavivirus, Hantavirus, and rotavirus. GTVseq produces alternative and additive genotype information for the query viral sequences based on two different, but related, scoring methods. The genotype information produced is reported in a graphical manner for the reference genotype matches and each graphical output is linked to the detailed sequence alignments between the query and the matched reference sequences. GTVseq also reports the potential 'repeats' and/or 'recombination' sequence region in a separated window. GTVseq does not replace completely other well-known genotyping tools such as NCBI's virus sequence genotyping tool (http://www.ncbi. nlm.nih.gov/projects/genotyping/formpage.cgi), but provides additional information useful in the confirmation or for further investigation of the genotype(s) for the newly isolated viral sequences.

Comparative Analysis of Expressed Sequence Tags from Flammulina velutipes at Different Developmental Stages

  • Joh, Joong-Ho;Kim, Kyung-Yun;Lim, Jong-Hyun;Son, Eun-Suk;Park, Hye-Ran;Park, Young-Jin;Kong, Won-Sik;Yoo, Young-Bok;Lee, Chang-Soo
    • Journal of Microbiology and Biotechnology
    • /
    • v.19 no.8
    • /
    • pp.774-780
    • /
    • 2009
  • Flammulina velutipes is a popular edible basidiomycete mushroom found in East Asia and is commonly known as winter mushroom. Mushroom development showing dramatic morphological changes by different environmental factors is scientifically and commercially interesting. To create a genetic database and isolate genes regulated during mushroom development, cDNA libraries were constructed from three developmental stages of mycelium, primordium, and fruit body in F. velutipes. We generated a total of 5,431 expressed sequence tags (ESTs) from randomly selected clones from the three cDNA libraries. Of these, 3,332 different unique genes (unigenes) were consistent with 2,442 (73%) singlets and 890 (27%) contigs. This corresponds to a redundancy of 39%. Using a homology search in the gene ontology database, the EST unigenes were classified into the three categories of molecular function (28%), biological process (29%), and cellular component (6%). Comparative analysis found great variations in the unigene expression pattern among the three different unigene sets generated from the cDNA libraries of mycelium, primordium, and fruit body. The 19-34% of total unigenes were unique to each unigene set and only 3% were shared among all three unigene sets. The unique and common representation in F. velutipes unigenes from the three different cDNA libraries suggests great differential gene expression profiles during the different developmental stages of F. velutipes mushroom.

Optimal Design of Thick Composite Wing Structure using Laminate Sequence Database (적층 시퀀스 데이터베이스를 이용한 복합재 날개 구조물의 최적화 설계)

  • Jang, Jun Hwan;Ahn, Sang Ho
    • Composites Research
    • /
    • v.30 no.1
    • /
    • pp.52-58
    • /
    • 2017
  • This paper presents the optimum design methodology for composite wing structure which automatically calculates the safety margin using optimization framework integrating failure modes. Particularly, its framework is possible to optimize sizing procedure to prevent failure mode which has the greatest effect on reducing the sizing time of composite structure. The main failure mode was set as the first ply failure, buckling failure mode, and bolted joint stress field, and the margin was calculated to minimize the weight. The design variable is a laminate sequence database and the responses are strain, buckling, bolted joint stress field. The objective function is the mass of the wing structure. The results of buckling analysis were compared using the finite element model to verify the robustness and reliability of Composite Optimizer.

Estimation of Substring Selectivity in Biological Sequence Database (생물학 서열 데이타베이스에서 부분 문자열의 선적도 추정)

  • 배진욱;이석호
    • Journal of KIISE:Databases
    • /
    • v.30 no.2
    • /
    • pp.168-175
    • /
    • 2003
  • Until now, substring selectivities have been estimated by two steps. First step is to build up a count-suffix tree, which has statistical information about substrings, and second step is to estimate substring selectivity using it. However, it's actually impossible to build up a count-suffix tree from biological sequences because their lengths are too long. So, this paper proposes a novel data structure, count q-gram tree, consisting of fixed length substrings. The Count q-gram tree retains the exact counts of all substrings whose lengths are equal to or less than q and this tree is generated in 0(N) time and in site not subject to total length of all sequences, N. This paper also presents an estimation technique, k-MO. k-MO can choose overlapping length of splitted substrings from a query string, and this choice will affect accuracy of selectivity and query processing time. Experiments show k-MO can estimate very accurately.