• 제목/요약/키워드: Biological Sequence Database

검색결과 93건 처리시간 0.027초

유기성 폐기물의 발생 악취 제거를 위한 Delftia sp.의 성장조건 최적화 (Growth Optimization of Delftia sp. for the Odor Control of Organic Waste)

  • 권혁구;정준오;추덕성;이장훈
    • 한국환경보건학회지
    • /
    • 제35권5호
    • /
    • pp.393-401
    • /
    • 2009
  • We isolated and identified a microorganism which was excellent for ammonia oxidation in the biological control of ammonia gas in odor producing materials from organic composting. The isolated strain was tested for growth characteristics and ammonia elimination efficiency under various conditions of temperature, pH, carbon concentration and ammonia concentration. The strain was isolated from a culture broth used in a $NO_2$ producing test with Griess-Ilosvay reagent. The results of 16S rRNA sequence from the isolated strain by using BLANST (Basic Local Alignment Search Tool) and confirming RDP (Ribosomal Database Project II) and ERRD (The European Ribosomal RNA Database) indicate that the strain is related to Delftia sp. UV-Spectrophotometer (Shimadzu, UVmini-1240) was used as a microbial growth test by measuring turbidity on OD660nm and ammonia concentration was measured by Spectrophotometer (HACH, DR-4000). The optimum growth culture conditions of the ammonia oxidizer Delftia sp. were $30^{\circ}C$, pH 7, glucose concentration 1.00% and $(NH_4)_2SO_4$ 0.5 g/l. Ammonia elimination efficiency was over 94% under the same conditions.

바이오그리드 컴퓨팅과 생명과학 연구에의 활용 (Bio Grid Computing and Biosciences Research Application)

  • 김태호;김의용;염재범;고원규;곽희철;주현
    • Bioinformatics and Biosystems
    • /
    • 제2권2호
    • /
    • pp.37-45
    • /
    • 2007
  • 생물정보학은 컴퓨터를 이용하여 방대한 양의 생물학적 데이터를 처리하고 그 결과를 분석하는 학문으로서 IT의 고속성장과 맞물려 점차 그 활용도를 넓혀가고 있다. 특히 의학, 생명과학 연구에 사용되는 데이터는 그 종류도 다양하고 크기가 매우 큰 것이 일반적인데, 이의 처리를 위해서는 고속 네트워크가 바탕이 된 그리드-컴퓨팅(Grid-Computing) 기술 접목이 필연적이다. 고속 네트워크 기술의 발전은 슈퍼컴퓨터를 대체해 컴퓨터 풀 내에 분산된 시스템들을 하나로 묶을 수 있는 그리드-컴퓨팅 분야를 선도하고 있다. 최근 생물정보학 분야에서도 이처럼 발전된 고성능 분산 컴퓨팅 기술을 이용하여 데이터의 신속한 처리와 관리의 효율성을 증대시키고 있는 추세이다. 그리드-컴퓨팅 기술은 크게 데이터 가공을 위한 응용 프로그램 개발과 데이터 관리를 위한 데이터베이스 구축으로 구분 지을 수 있다. 전자에 해당하는 생물정보 연구용 프로그램들은 mpiBLAST, ClustalW-MPI와 같은 MSA서열정렬 프로그램들을 꼽을 수 있으며, BioSimGrid, Taverna와 같은 프로젝트는 그리드-데이터베이스 (Grid-Database)기술을 바탕으로 개발되었다. 본 고에서는 미지의 생명현상을 탐구하고 연구하기 위하여 현재까지 개발된 그리드-컴퓨팅 환경과 의생명과학 연구를 위한 응용 프로그램들, 그리고 그리드-데이터베이스 기술 등을 소개한다.

  • PDF

DNA Sequences Compression using Repeat technique and Selective Encryption using modified Huffman's Technique

  • Syed Mahamud Hossein; Debashis De; Pradeep Kumar Das Mohapatra
    • International Journal of Computer Science & Network Security
    • /
    • 제24권8호
    • /
    • pp.85-104
    • /
    • 2024
  • The DNA (Deoxyribonucleic Acid) database size increases tremendously transmuting from millions to billions in a year. Ergo for storing, probing the DNA database requires efficient lossless compression and encryption algorithm for secure communication. The DNA short pattern repetitions are of paramount characteristics in biological sequences. This algorithm is predicated on probing exact reiterate, substring substitute by corresponding ASCII code and engender a Library file, as a result get cumulating of the data stream. In this technique the data is secured utilizing ASCII value and engendering Library file which acts as a signature. The security of information is the most challenging question with veneration to the communication perspective. The selective encryption method is used for security purpose, this technique is applied on compressed data or in the library file or in both files. The fractional part of a message is encrypted in the selective encryption method keeping the remaining part unchanged, this is very paramount with reference to selective encryption system. The Huffman's algorithm is applied in the output of the first phase reiterate technique, including transmuting the Huffman's tree level position and node position for encryption. The mass demand is the minimum storage requirement and computation cost. Time and space complexity of Repeat algorithm are O(N2) and O(N). Time and space complexity of Huffman algorithm are O(n log n) and O(n log n). The artificial data of equipollent length is additionally tested by this algorithm. This modified Huffman technique reduces the compression rate & ratio. The experimental result shows that only 58% to 100% encryption on actual file is done when above 99% modification is in actual file can be observed and compression rate is 1.97bits/base.

An assessment of the taxonomic reliability of DNA barcode sequences in publicly available databases

  • Jin, Soyeong;Kim, Kwang Young;Kim, Min-Seok;Park, Chungoo
    • ALGAE
    • /
    • 제35권3호
    • /
    • pp.293-301
    • /
    • 2020
  • The applications of DNA barcoding have a wide range of uses, such as in taxonomic studies to help elucidate cryptic species and phylogenetic relationships and analyzing environmental samples for biodiversity monitoring and conservation assessments of species. After obtaining the DNA barcode sequences, sequence similarity-based homology analysis is commonly used. This means that the obtained barcode sequences are compared to the DNA barcode reference databases. This bioinformatic analysis necessarily implies that the overall quantity and quality of the reference databases must be stringently monitored to not have an adverse impact on the accuracy of species identification. With the development of next-generation sequencing techniques, a noticeably large number of DNA barcode sequences have been produced and are stored in online databases, but their degree of validity, accuracy, and reliability have not been extensively investigated. In this study, we investigated the extent to which the amount and types of erroneous barcode sequences were deposited in publicly accessible databases. Over 4.1 million sequences were investigated in three largescale DNA barcode databases (NCBI GenBank, Barcode of Life Data System [BOLD], and Protist Ribosomal Reference database [PR2]) for four major DNA barcodes (cytochrome c oxidase subunit 1 [COI], internal transcribed spacer [ITS], ribulose bisphosphate carboxylase large chain [rbcL], and 18S ribosomal RNA [18S rRNA]); approximately 2% of erroneous barcode sequences were found and their taxonomic distributions were uneven. Consequently, our present findings provide compelling evidence of data quality problems along with insufficient and unreliable annotation of taxonomic data in DNA barcode databases. Therefore, we suggest that if ambiguous taxa are presented during barcoding analysis, further validation with other DNA barcode loci or morphological characters should be mandated.

Genome-wide survey and expression analysis of F-box genes in wheat

  • Kim, Dae Yeon;Hong, Min Jeong;Seo, Yong Weon
    • 한국작물학회:학술대회논문집
    • /
    • 한국작물학회 2017년도 9th Asian Crop Science Association conference
    • /
    • pp.141-141
    • /
    • 2017
  • The ubiquitin-proteasome pathway is the major regulatory mechanism in a number of cellular processes for selective degradation of proteins and involves three steps: (1) ATP dependent activation of ubiquitin by E1 enzyme, (2) transfer of activated ubiquitin to E2 and (3) transfer of ubiquitin to the protein to be degraded by E3 complex. F-box proteins are subunit of SCF complex and involved in specificity for a target substrate to be degraded. F-box proteins regulate many important biological processes such as embryogenesis, floral development, plant growth and development, biotic and abiotic stress, hormonal responses and senescence. However, little is known about the F-box genes in wheat. The draft genome sequence of wheat (IWGSC Reference Sequence v1.0 assembly) used to analysis a genome-wide survey of the F-box gene family in wheat. The Hidden Markov Model (HMM) profiles of F-box (PF00646), F-box-like (PF12937), F-box-like 2 (PF13013), FBA (PF04300), FBA_1 (PF07734), FBA_2 (PF07735), FBA_3 (PF08268) and FBD (PF08387) domains were downloaded from Pfam database were searched against IWGSC Reference Sequence v1.0 assembly. RNA-seq paired-end libraries from different stages of wheat, such as stages of seedling, tillering, booting, day after flowering (DAF) 1, DAF 10, DAF 20, and DAF 30 were conducted and sequenced by Illumina HiSeq2000 for expression analysis of F-box protein genes. Basic analysis including Hisat, HTseq, DEseq, gene ontology analysis and KEGG mapping were conducted for differentially expressed gene analysis and their annotation mappings of DEGs from various stages. About 950 F-box domain proteins identified by Pfam were mapped to wheat reference genome sequence by blastX (e-value < 0.05). Among them, more than 140 putative F-box protein genes were selected by fold changes cut-offs of > 2, significance p-value < 0.01, and FDR<0.01. Expression profiling of selected F-box protein genes were shown by heatmap analysis, and average linkage and squared Euclidean distance of putative 144 F-box protein genes by expression patterns were calculated for clustering analysis. This work may provide valuable and basic information for further investigation of protein degradation mechanism by ubiquitin proteasome system using F-box proteins during wheat development stages.

  • PDF

EST Analysis system for panning gene

  • Hur, Cheol-Goo;Lim, So-Hyung;Goh, Sung-Ho;Shin, Min-Su;Cho, Hwan-Gue
    • 한국생물정보학회:학술대회논문집
    • /
    • 한국생물정보시스템생물학회 2000년도 International Symposium on Bioinformatics
    • /
    • pp.21-22
    • /
    • 2000
  • Expressed sequence tags (EFTs) are the partial segments of cDNA produced from 5 or 3 single-pass sequencing of cDNA clones, error-prone and generated in highly redundant sets. Advancement and expansion of Genomics made biologists to generate huge amount of ESTs from variety of organisms-human, microorganisms as well as plants, and the cumulated number of ESTs is over 5.3 million, As the EST data being accumulate more rapidly, it becomes bigger that the needs of the EST analysis tools for extraction of biological meaning from EST data. Among the several needs of EST analyses, the extraction of protein sequence or functional motifs from ESTs are important for the identification of their function in vivo. To accomplish that purpose the precise and accurate identification of the region where the coding sequences (CDSs) is a crucial problem to solve primarily, and it will be helpful to extract and detect of genuine CD5s and protein motifs from EST collections. Although several public tools are available for EST analysis, there is not any one to accomplish the object. Furthermore, they are not targeted to the plant ESTs but human or microorganism. Thus, to correspond the urgent needs of collaborators deals with plant ESTs and to establish the analysis system to be used as general-purpose public software we constructed the pipelined-EST analysis system by integration of public software components. The software we used are as follows - Phred/Cross-match for the quality control and vector screening, NCBI Blast for the similarity searching, ICATools for the EST clustering, Phrap for EST contig assembly, and BLOCKS/Prosite for protein motif searching. The sample data set used for the construction and verification of this system was 1,386 ESTs from human intrathymic T-cells that verified using UniGene and Nr database of NCBI. The approach for the extraction of CDSs from sample data set was carried out by comparison between sample data and protein sequences/motif database, determining matched protein sequences/motifs that agree with our defined parameters, and extracting the regions that shows similarities. In recent future, in addition to these components, it is supposed to be also integrated into our system and served that the software for the peptide mass spectrometry fingerprint analysis, one of the proteomics fields. This pipelined-EST analysis system will extend our knowledge on the plant ESTs and proteins by identification of unknown-genes.

  • PDF

인간태아의 뇌로부터 유래된 cDNA liberary에서 내생레트로바이러스 HERV-W pol 유전자의 동정과 계통 (Identification and phylogenetic analysis of the human endogenous retrovirus HERV-W pol in cDNA library of human fetal brain)

  • Kim, Heui-Soo;Jeon, Seung-Heui;Yi, Joo-Mi;Kim, Tae-Hyung;Lee, Won-Ho
    • 생명과학회지
    • /
    • 제13권3호
    • /
    • pp.291-297
    • /
    • 2003
  • 인간 내생 레트로바이러스 HERV-W는 다발성 경화증 환자로부터 탐지된 MSRV와 연루되어 있다. 인간 태아의 뇌로부터 유래된 cDNA library를 이용하여 PCR법으로 2개의 HERV-W 패밀리(HWP-FB10과 HWP-FB12)를 동정하고 분석하였다. 그들은 HERV-W (accession no. AF009668)와 89%의 염기서열의 유사성을 보였다. Pol 유전자를 아미노산의 서열로 분석해 본 결과 점돌연변이 또는 삽입/결실로 말미암아 frameshift 및 종결코돈을 나타내었다. 유전자정보의 데이터베이스를 이용하여 HERV-W 패밀리간의 분자계통분류도를 작성해 본 결과 HWP-FB10은 인간의 염색체 7q21-22로부터 유래된 AC000064와 매우 가깝게 관련되어 있음을 시사하였다. 이들의 새로운 HERV-W pol 패밀리가 이웃하는 어떤 유전자와 상호 연결되어 있으며, 어떠한 기능을 수행하는지에 대한 전망에 대해 토의하였다.

The future of bioinformntics

  • Gribskov, Michael
    • 한국생물정보학회:학술대회논문집
    • /
    • 한국생물정보시스템생물학회 2003년도 제2차 연례학술대회 발표논문집
    • /
    • pp.1-1
    • /
    • 2003
  • It is clear that computers will play a key role in the biology of the future. Even now, it is virtually impossible to keep track of the key proteins, their names and associated gene names, physical constants(e.g. binding constants, reaction constants, etc.), and hewn physical and genetic interactions without computational assistance. In this sense, computers act as an auxiliary brain, allowing one to keep track of thousands of complex molecules and their interactions. With the advent of gene expression array technology, many experiments are simply impossible without this computer assistance. In the future, as we seek to integrate the reductionist description of life provided by genomic sequencing into complex and sophisticated models of living systems, computers will play an increasingly important role in both analyzing data and generating experimentally testable hypotheses. The future of bioinformatics is thus being driven by potent technological and scientific forces. On the technological side, new experimental technologies such as microarrays, protein arrays, high-throughput expression and three-dimensional structure determination prove rapidly increasing amounts of detailed experimental information on a genomic scale. On the computational side, faster computers, ubiquitous computing systems, high-speed networks provide a powerful but rapidly changing environment of potentially immense power. The challenges we face are enormous: How do we create stable data resources when both the science and computational technology change rapidly? How do integrate and synthesize information from many disparate subdisciplines, each with their own vocabulary and viewpoint? How do we 'liberate' the scientific literature so that it can be incorporated into electronic resources? How do we take advantage of advances in computing and networking to build the international infrastructure needed to support a complete understanding of biological systems. The seeds to the solutions of these problems exist, at least partially, today. These solutions emphasize ubiquitous high-speed computation, database interoperation, federation, and integration, and the development of research networks that capture scientific knowledge rather than just the ABCs of genomic sequence. 1 will discuss a number of these solutions, with examples from existing resources, as well as area where solutions do not currently exist with a view to defining what bioinformatics and biology will look like in the future.

  • PDF

Identification of Candidate Porcine miRNA-302/367 Cluster and Its Function in Somatic Cell Reprogramming

  • Son, Dong-Chan;Hwang, Jae Yeon;Lee, Chang-Kyu
    • Reproductive and Developmental Biology
    • /
    • 제38권2호
    • /
    • pp.79-84
    • /
    • 2014
  • MicroRNAs (miRNAs) are approximately 22 nucleotides of small noncoding RNAs that control gene expression at the posttranscriptional level through translational inhibition and destabilization of their target mRNAs. The miRNAs are phylogenetically conserved and have been shown to be instrumental in a wide variety of key biological processes including cell cycle regulation, apoptosis, metabolism, imprinting, and differentiation. Recently, a paper has shown that expression of the miRNA-302/367 cluster expressed abundantly in mouse and human embryonic stem cells (ESCs) can directly reprogram mouse and human somatic cells to induced pluripotent stem cells (iPSCs) efficiently in the absence of any of the four factors, Oct4, Sox2, c-Myc, and Klf4. To apply this efficient method to porcine, we analyzed porcine genomic sequence containing predicted porcine miRNA-302/367 cluster through ENSEMBL database, generated a non-replicative episomal vector system including miRNA-302/367 cluster originated from porcine embryonic fibroblasts (PEF), and tried to make porcine iPSCs by transfection of the miRNA-302/367 cluster. Colonies expressing EGFP and forming compact shape were found, but they were not established as iPSC lines. Our data in this study show that pig miRNA-302/367 cluster could not satisfy requirement of PEF reprogramming conditions for pluripotency. To make pig iPSC lines by miRNA, further studies on the role of miRNAs in pluripotency and new trials of transfection with conventional reprogramming factors are needed.

키틴/키토산 가수분해효소의 분류 및 특성 (Classification and Characteristics of Chitin/Chitosan Hydrolases)

  • 이한승
    • 생명과학회지
    • /
    • 제18권11호
    • /
    • pp.1617-1624
    • /
    • 2008
  • 키틴과 그 탈아세틸화된 형태인 키토산은 지구 상에 가장 풍부하게 존재하는 바이오매스의 하나이다. 키틴과 키토산은 항균활성, 면역증강, 중금속 흡착 등 다양한 생리활성을 보이고 있으며 식품, 의약품, 환경산업 등에서 다양하게 응용되고 있다. 이러한 키틴/키토산을 가수분해하는 효소들과 그 3차구조, 유전자들이 세균, 고세균, 진핵생물등 모든 생물종에서 보고되어 왔다. 탄수화물을 가수분해하는 효소들은 그 아미노산 서열에 따라 CAZy (Carbohydrate Active Enzymes) 데이터베이스에 분류되었는데 흥미롭게도 최근까지 키틴가수분해효소와 키토산가수분해효소들은 14개의 glycosyl hydrolase (GH) family들로 분류되어 있다(GH2, GH5, GH7, GH8, GH18, GH19, GH20, GH46, GH48, GH73, GH75, GH80, GH84, GH85). 본 총설에서는 새로운 유전자원를 찾기위한 한 방편으로서 최근에 새롭게 분류된 glycosyl hydrolase family의 분류법에 따라 각각의 GH family에 속하는 키틴/키토산가수분해효소의 종류 및 구조, 그리고 그 효소적 특징에 대하여 논하고자 한다.