• Title/Summary/Keyword: Distance between Amino Acid Sequences

Search Result 11, Processing Time 0.021 seconds

Signal Sequence Prediction Based on Hydrophobicity and Substitution Matrix (소수성과 치환행렬에 기반한 신호서열 예측)

  • Chi, Sang-Mun
    • Journal of KIISE:Software and Applications
    • /
    • v.34 no.7
    • /
    • pp.595-602
    • /
    • 2007
  • This paper proposes a method that discriminates signal peptide and predicts the cleavage site of the secretory proteins cleaved by the signal peptidase I. The preprocessing stage uses hydrophobicity scales of amino acids in order to predict the presence of signal sequence and the cleavage site. The preprocessing enhances the performance of the prediction method by eliminating the non-secretory proteins in the early stage of prediction. for the effective use of support vector machine for the signal sequence prediction, the biologically relevant distance between the amino acid sequences is defined by using the hydrophobicity and substitution matrix; the hydrophobicity can be used to Predict the location of amino acid in a cell and the substitution matrix represents the evolutionary relationships of amino acids. The proposed method showed 98.9% discrimination rates from signal sequences and 88% correct rate of the cleavage site prediction on Swiss-Prot release 50 protein database using the 5-fold-cross-validation. In the comparison tests, the proposed method has performed significantly better than other prediction methods.

Anlaysis of Eukaryotic Sequence Pattern using GenScan (GenScan을 이용한 진핵생물의 서열 패턴 분석)

  • Jung, Yong-Gyu;Lim, I-Suel;Cha, Byung-Heun
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.11 no.4
    • /
    • pp.113-118
    • /
    • 2011
  • Sequence homology analysis in the substances in the phenomenon of life is to create database by sorting and indexing and to demonstrate the usefulness of informatics. In this paper, Markov models are used in GenScan program to convert the pattern of complex eukaryotic protein sequences. It becomes impossible to navigate the minimum distance, complexity increases exponentially as the exact calculation. It is used scorecard in amino acid substitutions between similar amino acid substitutions to have a differential effect score, and is applied the Markov models sophisticated concealment of the transition probability model. As providing superior method to translate sequences homologous sequences in analysis using blast p, Markov models. is secreted protein structure of sequence translations.

Molecular Cloning of Cytochrome P450 Family Gene Fragment from Midgut of the Beet Armyworm, Spodoptera exigua

  • Moon, Jae-Yu;Lee, Pyeongjae;Cho, Il-Je;Kim, Iksoo;Lee, Heui-Sam
    • International Journal of Industrial Entomology and Biomaterials
    • /
    • v.4 no.2
    • /
    • pp.155-162
    • /
    • 2002
  • Cytochrome P45O (CYP) gene has been known to play one of the most important roles in metabolizing the exogenous materials. In insect, CYP is particularly known to detoxify toxic materials by adding oxygen molecule to the hydrophobic region of the materials. Thus, CYP-dependent metabolism is associated with the adaptation of insect to host plant chemicals. This in turn is known to be one of the driving forces for CYP diversification. In the present study, we cloned seven gene fragments of CYP 4 (CYP4) family from the midgut of the beet armyworm, Spodoptera exigua, through RT.PCT, Sequence analysis of the product showed the gene fragment to contain an open reading frame of ~150 amino acids, consisted of ~450 bp. The cloned gene fragments contained typical, conserved regions found in CYP4 family. Pairwise comparison of the deduced amino acid sequences among seven clones ranged in divergence from 0% to 52.86% and resulted in five distinct clones. The other two clones were identical or differ by one amino acid respectively to the corresponding clone, although each differed by ten nucleotides. Analysis of correlation between GenBank-registered, full length CYP4 and the cloned fragments resulted in statistically significant relationship ($r^{2}$ = 0.96085; p < 0.001), suggesting utility of the partial sequences as such full-length sequences. Phylogenetic analysis of the clones with GenBank-registered insect and mammal CYP4 family sequences by parsimony and several distance methods subdivided the clones into two groups: tones belonging to CYP4S and the others to CYP4M families.

Classification of Viruses Based on the Amino Acid Sequences of Viral Polymerases (바이러스 핵산중합효소의 아미노산 서열에 의한 바이러스 분류)

  • Nam, Ji-Hyun;Lee, Dong-Hun;Lee, Keon-Myung;Lee, Chan-Hee
    • Korean Journal of Microbiology
    • /
    • v.43 no.4
    • /
    • pp.285-291
    • /
    • 2007
  • According to the Baltimore Scheme, viruses are classified into 6 main classes based on their replication and coding strategies. Except for some small DNA viruses, most viruses code for their own polymerases: DNA-dependent DNA, RNA-dependent RNA and RNA-dependent DNA polymerases, all of which contain 4 common motifs. We undertook a phylogenetic study to establish the relationship between the Baltimore Scheme and viral polymerases. Amino acid sequence data sets of viral polymerases were taken from NCBI GenBank, and a multiple alignment was performed with CLUSTAL X program. Phylogenetic trees of viral polymerases constructed from the distance matrices were generally consistent with Baltimore Scheme with some minor exceptions. Interestingly, negative RNA viruses (Class V) could be further divided into 2 subgroups with segmented and non-segmented genomes. Thus, Baltimore Scheme for viral taxonomy could be supported by phylogenetic analysis based on the amino acid sequences of viral polymerases.

Differences in isolates of Tomato yellow leaf curl virus in tomato fields located in Daejeon and Chungcheongnam-do between 2017 and 2018

  • Oh, June-Pyo;Choi, Go-Woon;Kim, Jungkyu;Oh, Min-Hee;Kim, Kang-Hee;Park, Jongseok;Domier, Leslie L.;Hammond, John;Lim, Hyoun-Sub
    • Korean Journal of Agricultural Science
    • /
    • v.46 no.3
    • /
    • pp.507-517
    • /
    • 2019
  • To follow up on a 2017 survey of tomato virus diseases, samples with virus-like symptoms were collected from the same areas (Buyeo-gun, Chungchungnam-Do and Daejeon, Korea) in 2018. While in 2017 mixed infections of Tomato mosaic virus with either Tomato yellow leaf curl virus (TYLCV) or Tomato chlorosis virus were detected, only TYLCV was detected in symptomatic samples in 2018. TYLCV amplicons of c.777 bp representing the coat protein (CP) coding region were cloned from the TYLCV positive samples, and the sequence data showed a 97.17% to 98.84% nucleotide and 98.45% to 99.22% amino acid identity with the 2017 Buyeo-gun isolate (MG787542), which had the highest amino acid (aa) sequence identity of up to 99.2% with four 2018 Buyeo-gun sequences (MK521830, MK521833, MK521834, and MK521835). The lowest aa sequence identity of 98.45% was found in a 2018 Daejeon isolate (MK521836); the distance between Buyeo-gun and Daejeon is about 45 km. Phylogenetic analysis indicated that the currently reported CP sequences are most closely related to Korean sequences from Masan (HM130912), Goseong (JN680149), Busan (GQ141873), Boseong (GU325634), and the 2017 isolate TYLCV-N (MG787543) in the 'Japan' cluster of TYLCV isolates and distinct from the 'China' cluster isolates from Nonsan (GU325632), Jeonju (HM130913) and Jeju (GU325633, HM130914). Our survey data from 2017 and 2018 suggest that TYLCV has become established in Korea and may be spread by whitefly vectors from weed reservoirs within the farm environment.

Geographic Variation of Granulilittorina exigua (Littorinidae, Gastropoda) in Korea Based on the Mitochondrial Cytochrome b Gene Sequence

  • Song, Jun-Im;Suh, Jae-Hwa;Kim, Sook-Jung
    • Animal cells and systems
    • /
    • v.4 no.3
    • /
    • pp.267-272
    • /
    • 2000
  • Partial sequence of the mitochondrial cytochrome b gene was analyzed to investigate genetic variation from 10 geographic populations of Granulilittorina exigua in Korea. The sequence of 282 base pairs was determined by PCR-directed silver sequencing method. The sequences of two species within the genus Littorina reserved in NIH blast search were utilized to determine geographic variations of species referred. The levels of mtDNA sequence differences were 0.00-2.54% within populations and 0.71-4.43% between populations. There were four amino acid differences between representative species of the genera Granulilittorina and Littorina, but no differences within populations of the genus Granulilittorina. The UPGMA and the N-J trees based on Tamura-Nei genetic distance matrix were constructed, which showed that the genus Granulilittorina was divided into three groups such as eastern (even exception for Tokdo population), southern, and western regional populations. The degrees of genetic divergence within populations of each group were p=0.021, p=0.019, and p=0.018, respectively. The divergence between the eastern and southern populations was p=0.032, showing closer relationship than with the western populations (p=0.052). Based on the diverged time estimation, the eastern and southern populations of Granulilittorina exigua in Korea diverged from the western populations about 2.1 MYBP, and the eastern and southern populations diverged from each other about 1.3 MYBP.

  • PDF

Global Sequence Homology Detection Using Word Conservation Probability

  • Yang, Jae-Seong;Kim, Dae-Kyum;Kim, Jin-Ho;Kim, Sang-Uk
    • Interdisciplinary Bio Central
    • /
    • v.3 no.4
    • /
    • pp.14.1-14.9
    • /
    • 2011
  • Protein homology detection is an important issue in comparative genomics. Because of the exponential growth of sequence databases, fast and efficient homology detection tools are urgently needed. Currently, for homology detection, sequence comparison methods using local alignment such as BLAST are generally used as they give a reasonable measure for sequence similarity. However, these methods have drawbacks in offering overall sequence similarity, especially in dealing with eukaryotic genomes that often contain many insertions and duplications on sequences. Also these methods do not provide the explicit models for speciation, thus it is difficult to interpret their similarity measure into homology detection. Here, we present a novel method based on Word Conservation Score (WCS) to address the current limitations of homology detection. Instead of counting each amino acid, we adopted the concept of 'Word' to compare sequences. WCS measures overall sequence similarity by comparing word contents, which is much faster than BLAST comparisons. Furthermore, evolutionary distance between homologous sequences could be measured by WCS. Therefore, we expect that sequence comparison with WCS is useful for the multiple-species-comparisons of large genomes. In the performance comparisons on protein structural classifications, our method showed a considerable improvement over BLAST. Our method found bigger micro-syntenic blocks which consist of orthologs with conserved gene order. By testing on various datasets, we showed that WCS gives faster and better overall similarity measure compared to BLAST.

Prevalence of Tobacco mosaic virus in Iran and Evolutionary Analyses of the Coat Protein Gene

  • Alishiri, Athar;Rakhshandehroo, Farshad;Zamanizadeh, Hamid-Reza;Palukaitis, Peter
    • The Plant Pathology Journal
    • /
    • v.29 no.3
    • /
    • pp.260-273
    • /
    • 2013
  • The incidence and distribution of Tobacco mosaic virus (TMV) and related tobamoviruses was determined using an enzyme-linked immunosorbent assay on 1,926 symptomatic horticultural crops and 107 asymptomatic weed samples collected from 78 highly infected fields in the major horticultural crop-producing areas in 17 provinces throughout Iran. The results were confirmed by host range studies and reverse transcription-polymerase chain reaction. The overall incidence of infection by these viruses in symptomatic plants was 11.3%. The coat protein (CP) gene sequences of a number of isolates were determined and disclosed to be a high identity (up to 100%) among the Iranian isolates. Phylogenetic analysis of all known TMV CP genes showed three clades on the basis of nucleotide sequences with all Iranian isolates distinctly clustered in clade II. Analysis using the complete CP amino acid sequence showed one clade with two subgroups, IA and IB, with Iranian isolates in both subgroups. The nucleotide diversity within each subgroup was very low, but higher between the two clades. No correlation was found between genetic distance and geographical origin or host species of isolation. Statistical analyses suggested a negative selection and demonstrated the occurrence of gene flow from the isolates in other clades to the Iranian population.

Genetic Characterization of β-lactamase (VPA0477) in Vibrio parahaemolyticus (장염비브리오가 보유하는 β-lactamase (VPA0477)의 유전학적 특성)

  • Lee, Nam-Hyung;Song, Hyun-Jung;Park, Chang-Soo;Kim, Hee-Dai;Park, Kwon-Sam
    • Korean Journal of Fisheries and Aquatic Sciences
    • /
    • v.44 no.6
    • /
    • pp.597-604
    • /
    • 2011
  • Using 108 strains of Vibrio parahaemolyticus isolated from seawater, we investigated ampicillin-resistance profiles and the genetic characterization of ${\beta}$-lactamase (VPA0477). All of the strains studied, except one strain, were resistant to ampicillin. However, the strain that was susceptible to ampicillin had the same ${\beta}$-lactamase gene as the ampicillin-resistant strains. We compared ${\beta}$-lactamase promoter region sequences among five strains, including both ampicillin-resistant and -susceptible strains. In the susceptible strain, a nucleotide at position -19 in the methionine initiation codon for ${\beta}$-lactamase was not present in the ampicillin-resistant strains. The genes in the region containing the gene VPA0477 were present in all of the tested strains, and LA-PCR analysis showed that the distance between VPA0474 and VPA0479 in all of the V. parahaemolyticus samples was precisely 5.7 kb. In V. parahaemolyticus ${\beta}$-lactamase, four important structural features that are conserved in Class A ${\beta}$-lactamases were present in the deduced amino acid sequences. Taken together, our study demonstrates that V. parahaemolyticus ${\beta}$-lactamase is included in the Class A ${\beta}$-lactamase group, and some nucleotides within the promoter region are of particular importance for ${\beta}$-lactamase activity.

Protein Structure Alignment Based on Maximum of Residue Pair Distance and Similarity Graph (정렬된 잔기 사이의 최대거리와 유사도 그래프에 기반한 단백질 구조 정렬)

  • Kim, Woo-Cheol;Park, Sang-Hyun;Won, Jung-Im
    • Journal of KIISE:Databases
    • /
    • v.34 no.5
    • /
    • pp.396-408
    • /
    • 2007
  • After the Human Genome Project finished the sequencing of a human DNA sequence, the concerns on protein functions are increasing. Since the structures of proteins are conserved in divergent evolution, their functions are determined by their structures rather than by their amino acid sequences. Therefore, if similarities between two protein structures are observed, we could expect them to have common biological functions. So far, a lot of researches on protein structure alignment have been performed. However, most of them use RMSD(Root Mean Square Deviation) as a similarity measure with which it is hard to judge the similarity level of two protein structures intuitively. In addition, they retrieve only one result having the highest alignment score with which it is hard to satisfy various users of different purpose. To overcome these limitations, we propose a novel protein structure alignment algorithm based on MRPD(Maximum of Residue Pair Distance) and SG (Similarity Graph). MRPD is more intuitive similarity measure by which fast tittering of unpromising pairs of protein pairs is possible, and SG is a compact representation method for multiple alignment results with which users can choose the most plausible one among various users' needs by providing multiple alignment results without compromising the time to align protein structures.