• Title/Summary/Keyword: DNA Sequence Classification

Search Result 93, Processing Time 0.026 seconds

Classification in Different Genera by Cytochrome Oxidase Subunit I Gene Using CNN-LSTM Hybrid Model

  • Meijing Li;Dongkeun Kim
    • Journal of information and communication convergence engineering
    • /
    • v.21 no.2
    • /
    • pp.159-166
    • /
    • 2023
  • The COI gene is a sequence of approximately 650 bp at the 5' terminal of the mitochondrial Cytochrome c Oxidase subunit I (COI) gene. As an effective DeoxyriboNucleic Acid (DNA) barcode, it is widely used for the taxonomic identification and evolutionary analysis of species. We created a CNN-LSTM hybrid model by combining the gene features partially extracted by the Long Short-Term Memory ( LSTM ) network with the feature maps obtained by the CNN. Compared to K-Means Clustering, Support Vector Machines (SVM), and a single CNN classification model, after training 278 samples in a training set that included 15 genera from two orders, the CNN-LSTM hybrid model achieved 94% accuracy in the test set, which contained 118 samples. We augmented the training set samples and four genera into four orders, and the classification accuracy of the test set reached 100%. This study also proposes calculating the cosine similarity between the training and test sets to initially assess the reliability of the predicted results and discover new species.

Morphological Variation and Partial Mitochondrial Sequence Analysis of Echinoid Species from the Coasts of the East Sea (동해 연안에 서식하는 성게의 형태변이와 미토콘드리아 유전자 분석)

  • Shin, Ji-Hye;Kim, Sung-Gyu;Kim, Young-Dae;Sohn, Young-Chang
    • Journal of Aquaculture
    • /
    • v.21 no.3
    • /
    • pp.139-145
    • /
    • 2008
  • Morphological classification of echinoid species has many difficulties because of their phenotypic variations. In the present study, we analyzed morphotypes and partial mitochondrial 12S rDNA sequences of four sea urchin species classified as Pseudocentrotus depressus, Anthocidaris crassispina, Hemicentrotus pulcherrimus and Strongylocentrotus nudus, and unidentified four species collected from the coasts of the East sea. Their genomic DNAs were extracted from gonads and mitochondrial 12S rDNA sequences were amplified by the polymerase chain reaction (PCR) method. The sequence identities among the known four sea urchin species were 87.4-95.6%. The sequence identities among the unidentified four species were 99.4-99.6% and showed the highest homology to S. intermedius(99.8%). Thus, our phylogenetic tree indicates that the unidentified four species belong to S. intermedius.

Genetic Variations of Trichophyton rubrum Clinical Isolates from Korea

  • Yoon, Nam-Sup;Kim, Hyunjung;Park, Sung-Bae;Park, Min;Kim, Sunghyun;Kim, Young-Kwon
    • Biomedical Science Letters
    • /
    • v.24 no.3
    • /
    • pp.221-229
    • /
    • 2018
  • Trichophyton rubrum is one of the well-known pathogenic fungi and causes dermatophytosis and cutaneous mycosis in human world widely. However, there are not an available sequence type (ST) classification methods and previous studies for T. rubrum until now. Therefore, currently, molecular biological tools using their DNA sequences are used for genotype identification and classification. In the present study, in order to characterize the genetic diversity and the phylogenetic relation of T. rubrum clinical isolates, five different housekeeping genes, such as actin (ACT), calmodulin (CAL), RNA polymerase II (RPB2), superoxide dismutase 2 (SOD2), and ${\beta}$-tubulin (BT2) were analyzed using by multilocus sequence typing (MLST). Also, DNA sequence analysis was performed to examine the differences between the sequences of Trichophyton strains and the identified genetic variations sequence. As a result, most of the sequences were shown to have highly matched rates in their housekeeping genes. However, genetic variations were found on three different positions of ${\beta}$-tubulin gene and were shown to have changed from $C{\rightarrow}G$ (1766), $G{\rightarrow}T$ (1876), and $C{\rightarrow}A$ (1886). To confirm the association with T. rubrum inheritance, a phylogenetic tree analysis was performed. It was classified as four clusters, but there was little significant correlation. Even so, MLST analysis is believed to be helpful for determining the genetic variations of T. rubrum in cases where there is more large-scale data accumulation. In conclusion, the present study demonstrated the first MLST analysis of T. rubrum in Korea and explored the possibility that MLST could be a useful tool for studying the epidemiology and evolution of T. rubrum through further studies.

Survey on Nucleotide Encoding Techniques and SVM Kernel Design for Human Splice Site Prediction

  • Bari, A.T.M. Golam;Reaz, Mst. Rokeya;Choi, Ho-Jin;Jeong, Byeong-Soo
    • Interdisciplinary Bio Central
    • /
    • v.4 no.4
    • /
    • pp.14.1-14.6
    • /
    • 2012
  • Splice site prediction in DNA sequence is a basic search problem for finding exon/intron and intron/exon boundaries. Removing introns and then joining the exons together forms the mRNA sequence. These sequences are the input of the translation process. It is a necessary step in the central dogma of molecular biology. The main task of splice site prediction is to find out the exact GT and AG ended sequences. Then it identifies the true and false GT and AG ended sequences among those candidate sequences. In this paper, we survey research works on splice site prediction based on support vector machine (SVM). The basic difference between these research works is nucleotide encoding technique and SVM kernel selection. Some methods encode the DNA sequence in a sparse way whereas others encode in a probabilistic manner. The encoded sequences serve as input of SVM. The task of SVM is to classify them using its learning model. The accuracy of classification largely depends on the proper kernel selection for sequence data as well as a selection of kernel parameter. We observe each encoding technique and classify them according to their similarity. Then we discuss about kernel and their parameter selection. Our survey paper provides a basic understanding of encoding approaches and proper kernel selection of SVM for splice site prediction.

Feature Selection with Ensemble Learning for Prostate Cancer Prediction from Gene Expression

  • Abass, Yusuf Aleshinloye;Adeshina, Steve A.
    • International Journal of Computer Science & Network Security
    • /
    • v.21 no.12spc
    • /
    • pp.526-538
    • /
    • 2021
  • Machine and deep learning-based models are emerging techniques that are being used to address prediction problems in biomedical data analysis. DNA sequence prediction is a critical problem that has attracted a great deal of attention in the biomedical domain. Machine and deep learning-based models have been shown to provide more accurate results when compared to conventional regression-based models. The prediction of the gene sequence that leads to cancerous diseases, such as prostate cancer, is crucial. Identifying the most important features in a gene sequence is a challenging task. Extracting the components of the gene sequence that can provide an insight into the types of mutation in the gene is of great importance as it will lead to effective drug design and the promotion of the new concept of personalised medicine. In this work, we extracted the exons in the prostate gene sequences that were used in the experiment. We built a Deep Neural Network (DNN) and Bi-directional Long-Short Term Memory (Bi-LSTM) model using a k-mer encoding for the DNA sequence and one-hot encoding for the class label. The models were evaluated using different classification metrics. Our experimental results show that DNN model prediction offers a training accuracy of 99 percent and validation accuracy of 96 percent. The bi-LSTM model also has a training accuracy of 95 percent and validation accuracy of 91 percent.

Promoter classification using genetic algorithm controlled generalized regression neural network

  • Kim, Kun-Ho;Kim, Byun-Gwhan;Kim, Kyung-Nam;Hong, Jin-Han;Park, Sang-Ho
    • 제어로봇시스템학회:학술대회논문집
    • /
    • 2003.10a
    • /
    • pp.2226-2229
    • /
    • 2003
  • A new method is presented to construct a classifier. This was accomplished by combining a generalized regression neural network (GRNN) and a genetic algorithm (GA). The classifier constructed in this way is referred to as a GA-GRNN. The GA played a role of controlling training factors simultaneously. In GA optimization, neuron spreads were represented in a chromosome. The proposed optimization method was applied to a data set, consisted of 4 different promoter sequences. The training and test data were composed of 115 and 58 sequence patterns, respectively. The range of neuron spreads was experimentally varied from 0.4 to 1.4 with an increment of 0.1. The GA-GRNN was compared to a conventional GRNN. The classifier performance was investigated in terms of the classification sensitivity and prediction accuracy. The GA-GRNN significantly improved the total classification sensitivity compared to the conventional GRNN. Also, the GA-GRNN demonstrated an improvement of about 10.1% in the total prediction accuracy. As a result, the proposed GA-GRNN illustrated improved classification sensitivity and prediction accuracy over the conventional GRNN.

  • PDF

Interspecific Distinguishability of Veiled Lady Mushrooms (Dictyophora spp.) Based on rDNA-ITS Analysis (rDNA-ITS 분석에 의한 망태버섯속균(Dictyophora spp.)의 종간 구분 가능성)

  • Cheong, Jong-Chun;Lee, Myung-Chul;Kim, Bum-Gi;Park, Dong-Seok;Hong, Sung-Beom;Park, Jeong-Sik
    • The Korean Journal of Mycology
    • /
    • v.32 no.1
    • /
    • pp.1-7
    • /
    • 2004
  • To establish the phylogenetic relationships of Dictyophora spp., rDNA-ITS regions of 11 strains of veiled lady mushroom collected from various countries were amplified and sequenced. It was observed that the 11 strains were divided into four groups based on PCR band patterns of each ITS region cleaved by eight different restriction enzymes in cleaved amplified polymorphic sequence analysis (CAPS). The phylogenic relationship of each group by cleaved amplified polymorphic sequence (CAPS) analysis matches well with previously reported morphological phylogeny, such as 5 strains of D. indusiata, 4 strains of D. echinovolvata, and a strain of Phallus rugulosus. Sequence analysis using the cluster V methods showed more detail classification than CAPS analysis. The 5.8S region showed two point nucleotide base exchanges from G to A according to four groups, and four groups were subdivided by sequence variation of ITS I and ITS II regions. But sequence variation of Phallus rugulosus was not showed in full ITS region. This study further delineates the taxonomic level at which ITS sequences, in comparison to ribosomal gene sequence, are most useful in systematics and other mushroom study.

Sequence Similarity of Nuclear 18S rDNA from Morphologically Different Blades of the Seaweed Porphyra pseudolinearis (Rhodophyta) (긴잎돌김 Porphyra pseudolinearis의 엽체형간 18S rDNA 염기서열 상동성)

  • Jin Long-Guo;KIM Young-Dae;KIM Myung-Sook;JIN Hyung-Joo;CHO Ji-Young;CHOI Jae-Suk;HONG Yong-Ki;KIM Hyung Geun
    • Korean Journal of Fisheries and Aquatic Sciences
    • /
    • v.33 no.6
    • /
    • pp.496-500
    • /
    • 2000
  • Partial fragments of nuclear 185 rDNAs from morphologically wide and narrow thalli of the seaweed Porphyra pseudolineazis were amplified and sequenced to compare their DNA homology. Both sequences of 311 base pairs showed $100{\%}$ identical each other. They showed $97.7{\%}$ similarity with a wild strain collected at Sodol in Kangwondo, and $99.4{\%}$ similarity with the GenBank accession number AB013185 of the Japanese P. pseudolinearis. Thus the morphological difference of wide and narrow blades might not be a classification criterion for the sub-species level of P. pseudolinearis.

  • PDF

Construction of a full-length cDNA library from Pinus koraiensis and analysis of EST dataset (잣나무(Pinus koraiensis)의 cDNA library 제작 및 EST 분석)

  • Kim, Joon-Ki;Im, Su-Bin;Choi, Sun-Hee;Lee, Jong-Suk;Roh, Mark S.;Lim, Yong-Pyo
    • Korean Journal of Agricultural Science
    • /
    • v.38 no.1
    • /
    • pp.11-16
    • /
    • 2011
  • In this study, we report the generation and analysis of a total of 1,211 expressed sequence tags (ESTs) from Pinus koraiensis. A cDNA library was generated from the young leaf tissue and a total of 1,211 cDNA were partially sequenced. EST and unigene sequence quality were determined by computational filtering, manual review, and BLAST analyses. In all, 857 ESTs were acquired after the removal of the vector sequence and filtering over a minimum length 50 nucleotides. A total of 411 unigene, consisting of 89 contigs and 322 singletons, was identified after assembling. Also, we identified 77 new microsatellite-containing sequences from the unigenes and classified the structure according to their repeat unit. According to homology search with BLASTX against the NCBI database, 63.1% of ESTs were homologous with known function and 22.2% of ESTs were matched with putative or unknown function. The remaining 14.6% of ESTs showed no significant similarity to any protein sequences found in the public database. Gene ontology (GO) classification showed that the most abundant GO terms were transport, nucleotide binding, plastid, in terms biological process, molecular function and cellular component, respectively. The sequence data will be used to characterize potential roles of new genes in Pinus and provided for the useful tools as a genetic resource.

Cloning and Phylogenetic Analysis of Chitin Synthase Gene from Entomopathogenic Fungus, Beauveria brongniartii

  • Nam, Jin-Sik;Lee, Dong-Hun;Park, Ho-Yong;Bae, Kyung-Sook
    • Journal of Microbiology
    • /
    • v.35 no.3
    • /
    • pp.222-227
    • /
    • 1997
  • DNA fragments homologous to chitin synthase gene were amplified from the genomic DNA of Beauveria brongniartii by PCR using degenerate primers. Cloning and sequencing of the PCR-amplified fragments led to the identification of a gene, designated BbCHSl. Comparison of the deduced amino acid sequence of BbCHSl with those of other Euascomycetes revealed that BbCHSl is a gene for class II chitin synthase. The Blastp search of the deduced amino acid sequence of BbCHSl displayed the highest rate of similarity, 95.8%, with CHS2 of Metarhizium unisopliae. Phylogenetic analysis of the amino acid sequences confirmed the taxonomic and evolutionary position of B. brongniartii, which was previously derived by traditional fungal classification based on morphological features.

  • PDF