• 제목/요약/키워드: encoded DNA sequences

검색결과 72건 처리시간 0.024초

DNA 서열을 위한 빠른 매칭 기법 (Fast Matching Method for DNA Sequences)

  • 김진욱;김은상;안융기;박근수
    • 한국정보과학회논문지:시스템및이론
    • /
    • 제36권4호
    • /
    • pp.231-238
    • /
    • 2009
  • DNA 서열은 각 종을 나타내는 근본적인 정보이며, 다른 종 간의 DNA 서열 비교는 중요한 작업이다. DNA 서열은 길이가 매우 길며 또 종의 종류도 다양하기 때문에, DNA 서열 비교에서는 빠른 매칭 뿐만 아니라 효율적인 저장도 중요한 요소이다. 즉, 인코딩 된 DNA 서열에 적합한 빠른 문자열 매칭 방법이 필요하다. 본 논문에서는 매칭 시 디코딩이 필요하지 않은 인코딩 된 DNA 서열을 위한 빠른 매칭 알고리즘을 제시한다. 제시하는 알고리즘은 네 문자 한 바이트 인코딩을 이용하며 서픽스 기법과 다중 패턴 매칭 기법을 접목하고 있다. 실험 결과로는 본 논문에서 제시하는 방법이 AGREP보다 약 다섯배 빠름을 보이는데, 이는 알려진 알고리즘들 중에서 가장 빠른 결과이다.

Taxonomy and phylogeny of the genus Cryptomonas (Cryptophyceae, Cryptophyta) from Korea

  • Choi, Bomi;Son, Misun;Kim, Jong Im;Shin, Woongghi
    • ALGAE
    • /
    • 제28권4호
    • /
    • pp.307-330
    • /
    • 2013
  • The genus Cryptomonas is easily recognized by having two flagella, green brownish color, and a swaying behavior. They have relatively simple morphology, and limited diagnostic characters, which present a major difficulty in differentiating between species of the genus. To understand species delineation and phylogenetic relationships among Cryptomonas species, the nuclear-encoded internal transcribed spacer 2 (ITS2), partial large subunit (LSU) and small subunit ribosomal DNA (rDNA), and chloroplast-encoded psbA and LSU rDNA sequences were determined and used for phylogenetic analyses, using Bayesian and maximum likelihood methods. In addition, nuclear-encoded ITS2 sequences were predicted to secondary structures, and were used to determine nine species and four unidentified species from 47 strains. Sequences of helix I, II, and IIIb in ITS2 secondary structure were very useful for the identification of Cryptomonas species. However, the helix IV was the most variable region across species in alignment. The phylogenetic tree showed that fourteen species were monophyletic. However, some strains of C. obovata had chloroplasts with pyrenoid while others were without pyrenoid, which used as a key character in few species. Therefore, classification systems depending solely on morphological characters are inadequate, and require the use of molecular data.

Cloning and characterization of polyA- RNA transcripts encoded by activated B1-like retrotransposons in mouse erythroleukemia MEL cells exposed to methylation inhibitors

  • Tezias, Sotirios S.;Tsiftsoglou, Asterios S.;Amanatiadou, Elsa P.;Vizirianakis, Ioannis S.
    • BMB Reports
    • /
    • 제45권2호
    • /
    • pp.126-131
    • /
    • 2012
  • We have previously identified a DNA silent region located downstream of the 3'-end of the ${\beta}^{major}$ globin gene (designated B1-559) that contains a B1 retrotransposon, consensus binding sites for erythroid specific transcription factors and shares the capacity to act as promoter in hematopoietic cells interacting with ${\beta}$-globin gene LCR sequences in vitro. In this study, we have cloned four new non-polyA RNA transcripts being detected upon blockade of murine erythroleukemia (MEL) cell differentiation to erythroid maturation by methylation inhibitors and demonstrated that two of them share high structural homology with sequences of B1 element found within the B1-559 region. Although it is not clear yet whether and how these RNAs interfere with induction of erythroid maturation, these data provide evidence for the first time showing that methylation inhibitors can activate silent repetitive DNA sequences in MEL cells and may have implications in cancer chemotherapy using demethylating drugs as antineoplastic agents.

유전자 및 유전체 연구 기술과 동향 (Trend and Technology of Gene and Genome Research)

  • 이진성;김기환;서동상;강석우;황재삼
    • 한국잠사곤충학회지
    • /
    • 제42권2호
    • /
    • pp.126-141
    • /
    • 2000
  • A major step towards understanding of the genetic basis of an organism is the complete sequence determination of all genes in target genome. The nucleotide sequence encoded in the genome contains the information that specifies the amino acid sequence of every protein and functional RNA molecule. In principle, it will be possible to identify every protein resposible for the structure and function of the body of the target organism. The pattern of expression in different cell types will specify where and when each protein is used. The amino acid sequence of the proteins encoded by each gene will be derived from the conceptional translation of the nucleotide sequence. Comparison of these sequences with those of known proteins, whose sequences are sorted in database, will suggest an approximate function for many proteins. This mini review describes the development of new sequencing methods and the optimization of sequencing strategies for whole genome, various cDNA and genomic analysis.

  • PDF

Archangium gephyra의 tubulysin 생합성 유전자 분석 (Analysis of Tubulysin Biosynthetic Genes in Archangium gephyra)

  • 최주오;박태준;강다운;이정주;김영필;이필구;정재용;조경연
    • 한국미생물·생명공학회지
    • /
    • 제49권3호
    • /
    • pp.458-465
    • /
    • 2021
  • Tubulysin은 다양한 암세포주에 대해 강한 항암활성을 보이는 점액세균 유래 이차대사 생리활성물질이다. 본 연구에서는 tubulysin을 생산하는 두 균주의 점액세균 Archangium gephyra MEHO_002와 MEHO_004의 유전체 분석을 통해 tubulysin 생합성 유전자들로 추정되는 유전자군을 발견하였으며, 플라스미드 삽입에 의한 유전자 불활성화를 통해 이들 유전자들이 tubulysin 생산과 직접 연관되어 있음을 확인하였다. A. gephyra MEHO_002와 MEHO_004 균주의 tubulysin 생합성 유전자군(tubA~tubF)은 DNA 염기서열이 서로 97% 동일하였으며, 암호화하는 단백질들의 아미노산 서열도 서로 97-100% 유사하였다. MEHO_002와 MEHO_004 균주의 tubulysin 생합성 유전자군은 tubulysin 생산 점액세균으로 알려진 Cystobacter sp. SBCb004의 tubulysin 생합성 유전자군과 DNA 염기서열이 86% 동일하였다. 유전자군의 구성은 tubZ 유전자가 존재하지 않는다는 점을 제외하고는 SBCb004의 tubulysin 생합성 유전자군 구성과 동일하였다. 각 유전자가 암호화하는 단백질의 아미노산 서열은 Cystobacter sp. SBCb004의 tubulysin 생합성 유전자가 암호화하는 단백질들과 88-97% 유사하였으며, 각 단백질들의 도메인 구성도 동일하였다.

Characterization of Structural Variations in the Context of 3D Chromatin Structure

  • Kim, Kyukwang;Eom, Junghyun;Jung, Inkyung
    • Molecules and Cells
    • /
    • 제42권7호
    • /
    • pp.512-522
    • /
    • 2019
  • Chromosomes located in the nucleus form discrete units of genetic material composed of DNA and protein complexes. The genetic information is encoded in linear DNA sequences, but its interpretation requires an understanding of three-dimensional (3D) structure of the chromosome, in which distant DNA sequences can be juxtaposed by highly condensed chromatin packing in the space of nucleus to precisely control gene expression. Recent technological innovations in exploring higher-order chromatin structure have uncovered organizational principles of the 3D genome and its various biological implications. Very recently, it has been reported that large-scale genomic variations may disrupt higher-order chromatin organization and as a consequence, greatly contribute to disease-specific gene regulation for a range of human diseases. Here, we review recent developments in studying the effect of structural variation in gene regulation, and the detection and the interpretation of structural variations in the context of 3D chromatin structure.

Pepstatin- Insensitive Carboxyl Proteinase: A Biochemical Marker for Late Lysosomes in Amoeba proteus

  • Hae Kyung Kwon;HyeonJung Kim;Tae In Ahn
    • Animal cells and systems
    • /
    • 제3권2호
    • /
    • pp.221-228
    • /
    • 1999
  • In order to find a biochemical marker for late Iysosomes, we characterized two cDNAs which were cloned by using a monoclonal antibody (mAb) against Iysosomes in Amoeba proteus as a probe. The two cDNAs, a 1.3-kb cDNA in pBSK-Iys45 and a 1.6-kb cDNA in pBSK-Iys60, were found to encode proteins homologous to pepstatin-insensitive carboxyl proteinases (PICPs). E. coli transformed with pBSK-Iys45 produced two immunopositive polypeptides (45 and 43 kDa) and the cDNA in 1274 bases encoded a 44,733-Da protein (Lys45) of 420 amino acids containing one site for a core oligosaccharide. On the other hand, E. coli transformed with pBSK-Iys60 produced several polypeptides (64, 54, 45, 41, and 37 kDa) reacting with the mAb. The cDNA contained 1629 bases and encoded a 59,231-Da protein (Lys60) of 530 amino acids containing two sites for asparagine-linked core oligosaccharides. These two cDNAs showed identities of 60.3% in nucleotide sequences and 23.6% in amino acid sequences. Lys45 and Lys60 appeared to share XXEFQK as a common antigenic domain. The amino acid sequence of the Lys45 protein showed 17.4% identity and 40.9% similarity to that of PICP from Pseudomonas sp. 101. On the other hand, Lys60 showed a 24.3% identity and 51.9% similarity with human Iysosomal PICP in the amino acid sequence. A putative active center for serine protease, GTS*xxxxxFxG, was found to be conserved among PICP homologues. The two PICPs are the first reported enzymatic markers for late Iysosomes.

  • PDF

Genetic Organization of the Recombinant Bacillus pasteurii Urease Genes Expressed in Escherichia coli

  • Kim, Sang-Dal;Hausinger, Robert P.
    • Journal of Microbiology and Biotechnology
    • /
    • 제4권2호
    • /
    • pp.108-112
    • /
    • 1994
  • The genetic organization of the urease gene cluster from an alkalophilic Bacillus pasteurii was determined by subcloning and Tn5 transposon mutagenesis of a 10.7 kilobasepair cloned fragment. A region of DNA between 5.0 and 6.0 kb in length is necessary for urease activity. In vitro transcription-translation analysis of transposon insertion mutants of the cloned urease genes demonstrated that the major ($M_r$ 67,000) and minor ($M_r$ 20,000) structural peptides of urease are encoded at one end of the urease gene cluster and at least 3 additional polypeptides are encoded by adjacent DNA sequences.

  • PDF

Isolation of $\beta$-Lactamase Inhibitory Protein from Streptomyces exfoliatus SMF19 and Cloning of the Corresponding Gene

  • PARK, HYEON-UNG;KYE JOON LEE
    • Journal of Microbiology and Biotechnology
    • /
    • 제6권6호
    • /
    • pp.369-374
    • /
    • 1996
  • The ${\beta}$-lactamase inhibitory protein (BLIP) produced by Streptomyces exfoliatus SMF19 was purified(33 kDa) and the N-terminal amino acid sequence was determined as NH2-ATSVVAWGGNND. Genomic DNA library of S. exfoliatus SMF19 was constructed in pWE15 and recombinants harbouring the corresponding gene were selected by colony hybridization to the mixture of 36-mer oligonucleotide designed from the N-terminal amino acid sequence. The corresponding gene (bliX) was isolated on a 4-kb ApaI fragment of S. exfoliatus SMF19 chromosomal DNA and then sequenced. The bliX consisting of 1, 119bp encoded a mature protein with a deduced amino acid sequence of 342 residues and also encoded a 40-amino-acid signal sequence. No significant sequence similarity to bliX was found by pairwise comparison using various protein and nucleotide sequences.

  • PDF

Survey on Nucleotide Encoding Techniques and SVM Kernel Design for Human Splice Site Prediction

  • Bari, A.T.M. Golam;Reaz, Mst. Rokeya;Choi, Ho-Jin;Jeong, Byeong-Soo
    • Interdisciplinary Bio Central
    • /
    • 제4권4호
    • /
    • pp.14.1-14.6
    • /
    • 2012
  • Splice site prediction in DNA sequence is a basic search problem for finding exon/intron and intron/exon boundaries. Removing introns and then joining the exons together forms the mRNA sequence. These sequences are the input of the translation process. It is a necessary step in the central dogma of molecular biology. The main task of splice site prediction is to find out the exact GT and AG ended sequences. Then it identifies the true and false GT and AG ended sequences among those candidate sequences. In this paper, we survey research works on splice site prediction based on support vector machine (SVM). The basic difference between these research works is nucleotide encoding technique and SVM kernel selection. Some methods encode the DNA sequence in a sparse way whereas others encode in a probabilistic manner. The encoded sequences serve as input of SVM. The task of SVM is to classify them using its learning model. The accuracy of classification largely depends on the proper kernel selection for sequence data as well as a selection of kernel parameter. We observe each encoding technique and classify them according to their similarity. Then we discuss about kernel and their parameter selection. Our survey paper provides a basic understanding of encoding approaches and proper kernel selection of SVM for splice site prediction.