• Title/Summary/Keyword: 보존서열 추출

Search Result 21, Processing Time 0.027 seconds

Feature selection and frequent pattern analysis in protein motif sequence (모티프 서열에서의 특징추출 및 빈발패턴 분석)

  • Kim, Dae-Sung;Lee, Bum-Ju;Ryu, Keun-Ho
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2007.05a
    • /
    • pp.10-13
    • /
    • 2007
  • 모티프는 진화과정을 거치면서 단백질 서열상에서 부분적으로 높게 보존된 지역을 의미한다. 이러한 모티프는 단백질의 기능과 구조를 예측하거나 생물학적으로 관련성이 있는 단백질의 공통적인 특성을 기술하는데 사용된다. 또한, 모티프와 단백질 서열의 상관관계는 생물학적 기능 예측에 필수적이며, 이러한 예측 문제는 모티프 검색을 통해 서열에 존재하는 빈발한 서열패턴과 구조패턴을 통해 단백질 서열에 대한 분석이 가능하다. 이 논문에서는 단백질 서열에 존재하는 2차 구조 특성과 빈발패턴을 검색하고 추출된 정보를 이용하여 단백질 기능 분류에 활용하고자 한다.

  • PDF

Mining Maximal Frequent Contiguous Sequences in Biological Data Sequences (생물학적 데이터 서열들에서 빈번한 최대길이 연속 서열 마이닝)

  • Kang, Tae-Ho;Yoo, Jae-Soo
    • The KIPS Transactions:PartD
    • /
    • v.15D no.2
    • /
    • pp.155-162
    • /
    • 2008
  • Biological sequences such as DNA sequences and amino acid sequences typically contain a large number of items. They have contiguous sequences that ordinarily consist of hundreds of frequent items. In biological sequences analysis(BSA), a frequent contiguous sequence search is one of the most important operations. Many studies have been done for mining sequential patterns efficiently. Most of the existing methods for mining sequential patterns are based on the Apriori algorithm. In particular, the prefixSpan algorithm is one of the most efficient sequential pattern mining schemes based on the Apriori algorithm. However, since the algorithm expands the sequential patterns from frequent patterns with length-1, it is not suitable for biological dataset with long frequent contiguous sequences. In recent years, the MacosVSpan algorithm was proposed based on the idea of the prefixSpan algorithm to significantly reduce its recursive process. However, the algorithm is still inefficient for mining frequent contiguous sequences from long biological data sequences. In this paper, we propose an efficient method to mine maximal frequent contiguous sequences in large biological data sequences by constructing the spanning tree with the fixed length. To verify the superiority of the proposed method, we perform experiments in various environments. As the result, the experiments show that the proposed method is much more efficient than MacosVSpan in terms of retrieval performance.

Selection of next-generation antigen protein for diagnosis of pfhrp2/pfhrp3 gene deleted plasmodium falciparum based on bioinformatics (pfhrp2/pfhrp3 유전자 결여 열대열 말라리아 특이 진단을 위한 생물정보학 기반 차세대 항원 단백질 선정)

  • Seo, Seung Hwan;Lee, Jihoo;Choi, Jae-Won;Kim, Hak Yong
    • Proceedings of the Korea Contents Association Conference
    • /
    • 2016.05a
    • /
    • pp.187-188
    • /
    • 2016
  • 열대열 말라리아(Plasmodium falciparum, P. falciparum, P. f) 신속진단키트의 경우, P. falciparum에 특이적인 단백질로써 Histidine Rich Protein 2 (PfHRP2)가 사용되고 있다. 그러나 최근 연구에서 남아메리카와 중앙아메리카를 중심으로 pfhrp2/pfhrp3 유전자가 결여된 P. falciparum 열원충이 나타나는 것으로 보고된 바 있다. 본 연구에서는 생물정보학을 기반으로 PfHRP2 항원 단백질을 대체할 수 있는 새로운 P. falciparum 특이 항원 단백질을 선정하고자, PlasmoDB에서 5,777개의 P. falciparum 관련 단백질 리스트를 얻었다. 이후 NCBI BLAST를 통해 단백질 아미노산 서열을 분석하고 정상인에게 존재하지 않으며, 동시에 다른 말라리아 열원충(P. vivax, P. ovale, P. malariae, P. knowlesi)에도 존재하지 않는 P. falciparum 특이 아미노산 서열을 가진 단백질 15개를 추출하였다. IEDB analysis를 이용하여 에피토프, 수용성, 베타-턴, 접근성, 유연성, 면역원성을 분석하여 높은 평균값을 갖는 상위 3개 단백질을 선별하였다. KEGG pathway와 EMBL-EBI를 통해 선별된 3개 단백질의 혈액내 검출 가능성 및 아미노산 서열의 보존성을 분석하여 최종적으로 Glutamate-Rich Protein (GLURP)을 선정하였다. AIDA를 통해 단백질 아미노산 서열을 이용한 3차 구조 예측으로 GLURP의 구조 및 항체와의 결합을 도식화하였다. 최종적으로 선정한 GLURP는 pfhrp2/pfhrp3 유전자 결여 P. falciparum까지 특이적으로 진단이 가능하여 차세대 P. falciparum 특이 신속진단키트 개발에 도움이 될 수 있을 것으로 기대한다.

  • PDF

Evaluation of the preservation state of human skeletal remains using real-time PCR (출토 인골 DNA의 real-time PCR 정량에 의한 보존상태 평가 연구 - 부여 오수리 출토 인골을 중심으로 -)

  • Kwon, Eun-Sil;Cho, Eun-Min;Kim, Sue-Hoon;Kang, Soyeong
    • 보존과학연구
    • /
    • s.32
    • /
    • pp.171-183
    • /
    • 2011
  • In this study molecular genetic analysis was carried out on 4 human skeletal remains from Osuri, Buyeo. We showed that real-time PCR is the method of the choice to assess the initial number of genuine ancient DNA molecules. Human mitochondrial DNA quantification was accomplished by the real-time PCR for the cytochrome b gene of the mitochondria. Histological results proved to be a good potentiality for biochemical analysis using biomolecule. The level of specimen's preservation state was proved that level of quantitative result was BO-04, BO-01, BO-03, BO-02. Continually, we showed that biochemical and biomolecule results for the level of preservation state were similar. This study will be useful to important material for predicting biochemistry and biology analysis of the ancient bone.

  • PDF

Identification and characteristics of DDX3 gene in the earthworm, Perionyx excavatus (팔딱이 지렁이(Perionyx excavatus) DDX3 유전자의 동정 및 특성)

  • Park, Sang Gil;Bae, Yoon-Hwan;Park, Soon Cheol
    • Journal of the Korea Organic Resources Recycling Association
    • /
    • v.23 no.1
    • /
    • pp.70-81
    • /
    • 2015
  • Helicases are known to be a proteins that use the chemical energy of NTP binding and hydrolyze to separate the complementary strands of double-stranded nucleic acids to single-stranded nucleic acids. They participate in various cellular metabolism in many organisms. DEAD-box proteins are ATP-dependent RNA helicase that participate in all biochemical steps involving RNA. DEAD-box3 (DDX3) gene is belonging to the DEAD-box family and plays an important role in germ cell development in many organisms including not only vertebrate, but also invertebrate during asexual and sexual reproduction and participates in stem cell differentiation during regeneration. In this study, in order to identify and characterize DDX3 gene in the earthworm, Perionyx excavatus having a powerful regeneration capacity, total RNA was isolated from adult head containing clitellum. Full length of DDX3 gene from P. excavatus, Pe-DDX3, was identified by RT-PCR using the total RNA from head as a template. Pe-DDX3 encoded a putative protein of 607 amino acids and it also has the nine conserved motifs of DEAD-box family, which is characteristic of DEAD-box protein family. It was confirmed that Pe-DDX3 has the nine conserved motifs by the comparison of entire amino acids sequence of Pe-DDX3 with other species of different taxa. Phylogenetic analysis revealed that Pe-DDX3 belongs to a DDX3 (PL10) subgroup of DEAD-box protein family. And it displayed a high homology with PL10a, b from P. dumerilii.

Comparative Study of Soil Bacterial Populations in Human Remains and Soil from Keundokgol Site at Buyeo (부여 큰독골 유적 출토 인골 조직 및 외부 토양의 세균 군집의 비교연구)

  • Kim, Yun-ji;Kim, Sue-hoon;Kwon, Eun-sil;Cho, Eun-min;Kang, So-yeong
    • Korean Journal of Heritage: History & Science
    • /
    • v.47 no.4
    • /
    • pp.92-105
    • /
    • 2014
  • Microbial characteristics of bacterial population were investigated in human remains and soil inside the bones in excavated grave no.4 and no.5 at Keundokgol site, Osu-ri, Buyeo. Phylogenetic characteristics of bacterial populations were analyzed by direct extracting of ancient DNA. In this study, based on the 16S rDNA sequences, in case of grave no.4, 319s from human remain were classified into 11 phyla, and 462s from soil were classified into 16 phyla. In case of grave no.5, 271s from human remain were classified into 10 phyla, and 497s from soil were classified into 11 phyla. Especially, Actinobacteria phylogenetic group are dominant group of bacterial populations in grave no.4 and no.5. Also, most of these were analyzed uncultured group. Thus, the discovery of a diversely microbial community and uncultured group was thought to be due to the specificity of the sample. Conclusively the general excavated human bones were contaminated with soil bacteria species their near around. This results contribute to preservation and management of ancient human bone from archaeological sites.

Molecular Cloning and Nucleotide Sequencing of a DNA Clone Encoding Arginine Decarboxylase in Rice (Oryza sativa L.) (벼의 arginine decarboxylase DNA clone의 재조합 및 염기서열 분석)

  • Hong, Sung-Hoi;Jeung, Ji-Ung;Ok, Sung-Han;Shin, Jeong-Sheop
    • Applied Biological Chemistry
    • /
    • v.39 no.2
    • /
    • pp.112-117
    • /
    • 1996
  • Arginine decarboxylase (ADC) is the first enzyme in one of the two pathways of diamine putrescine biosynthesis in plants. The genes encoding ADC have previously been cloned from Escherichia coli, oat and tomato genome. Two degenerate oligonucleotides (17-mer) corresponding to two conserved regions of ADC were used as primers in polymerase chain reaction of rice (Oryza sativa L.) genomic DNA, and an approximately 1.0 kbp fragment was obtained. This amplified PCR product showed an open reading frame which contains 1,022 bp of nucleotide sequences. This PCR product was cloned into pGEM-originated T vector and the short 500 bp PstI digested fragment was subcloned into pGEM-3zf(+/-) vectors to facilitate sequencing. The nucleotide sequence of this PCR product showed about 74% and 70% identity with the same regions of the oat and tomato ADC cDNA sequences, respectively. The predicted amino acid sequence exhibited 45% and 62% identity with oat and tomato ADC polypeptide fragments, respectively. The sequence similarities of 34%, 47% and 38% were previously reported in oat and E. coli, tomato and oat, and tomato and E. coli ADC amino acids, respectively. Therefore, similarities and identities between rice and oat or tomato are remarkably higher than those others of the previous reports. In the highly conserved regions in both the amino acid sequence and spacing regions among the sequences of these three, rice ADC open reading frame also has the exactly same regions with the striking similarity. RNA blot analysis showed that hnc is expressed as a transcript of approximately 2.5 kbP in the rice seedling leaf tissues.

  • PDF

Analysis and Verification of Ancient DNA (고대 DNA의 분석과 검증)

  • Jee, Sang-hyun;Seo, Min-seok
    • Korean Journal of Heritage: History & Science
    • /
    • v.40
    • /
    • pp.387-411
    • /
    • 2007
  • The analysis of ancient DNA (aDNA) has become increasingly considerable anthropological, archaeological, biological and public interest. Although this approach is complicated by the natural damage and exogenous contamination of a DNA, archaeologists and biologists have attempted to understand issues such as human evolutionary history, migration and social organization, funeral custom and disease, and even evolutionary phylogeny of extinct animals. Polymerase chain reaction(PCR) is powerful technique that analyzes DNA sequences from a little extract of an ancient specimen. However, deamination and fragmentation are common molecular damages of aDNA and cause enzymatic inhibition in PCR for DNA amplification. Besides, the deamination of a cytosine residue yielded an uracil residue in the ancient template, and results in the misincorporation of an adenine residue in PCR. This promotes a consistent substitution (cytosine thymine, guanine adenine) to original nucleotide sequences. Contamination with exogenous DNA is a major problem in aDNA analysis, and causes oversight as erroneous conclusion. This report represents serious problems that DNA modification and contamination are the main issues in result validation of aDNA analysis. Now, we introduce several criterions suggested to authenticate reliance of aDNA analysis by many researchers in this field.

Small CNN-RNN Engraft Model Study for Sequence Pattern Extraction in Protein Function Prediction Problems

  • Lee, Jeung Min;Lee, Hyun
    • Journal of the Korea Society of Computer and Information
    • /
    • v.27 no.8
    • /
    • pp.49-59
    • /
    • 2022
  • In this paper, we designed a new enzyme function prediction model PSCREM based on a study that compared and evaluated CNN and LSTM/GRU models, which are the most widely used deep learning models in the field of predicting functions and structures using protein sequences in 2020, under the same conditions. Sequence evolution information was used to preserve detailed patterns which would miss in CNN convolution, and the relationship information between amino acids with functional significance was extracted through overlapping RNNs. It was referenced to feature map production. The RNN family of algorithms used in small CNN-RNN models are LSTM algorithms and GRU algorithms, which are usually stacked two to three times over 100 units, but in this paper, small RNNs consisting of 10 and 20 units are overlapped. The model used the PSSM profile, which is transformed from protein sequence data. The experiment proved 86.4% the performance for the problem of predicting the main classes of enzyme number, and it was confirmed that the performance was 84.4% accurate up to the sub-sub classes of enzyme number. Thus, PSCREM better identifies unique patterns related to protein function through overlapped RNN, and Overlapped RNN is proposed as a novel methodology for protein function and structure prediction extraction.

Rough Computational Annotation and Hierarchical Conserved Area Viewing Tool for Genomes Using Multiple Relation Graph. (다중 관계 그래프를 이용한 유전체 보존영역의 계층적 시각화와 개략적 전사 annotation 도구)

  • Lee, Do-Hoon
    • Journal of Life Science
    • /
    • v.18 no.4
    • /
    • pp.565-571
    • /
    • 2008
  • Due to rapid development of bioinformatics technologies, various biological data have been produced in silico. So now days complicated and large scale biodata are used to accomplish requirement of researcher. Developing visualization and annotation tool using them is still hot issues although those have been studied for a decade. However, diversity and various requirements of users make us hard to develop general purpose tool. In this paper, I propose a novel system, Genome Viewer and Annotation tool (GenoVA), to annotate and visualize among genomes using known information and multiple relation graph. There are several multiple alignment tools but they lose conserved area for complexity of its constrains. The GenoVA extracts all associated information between all pair genomes by extending pairwise alignment. High frequency conserved area and high BLAST score make a block node of relation graph. To represent multiple relation graph, the system connects among associated block nodes. Also the system shows the known information, COG, gene and hierarchical path of block node. In this case, the system can annotates missed area and unknown gene by navigating the special block node's clustering. I experimented ten bacteria genomes for extracting the feature to visualize and annotate among them. GenoVA also supports simple and rough computational annotation of new genome.