• 제목/요약/키워드: Sequence classification

Search Result 401, Processing Time 0.021 seconds

A model-free soft classification with a functional predictor

  • Lee, Eugene;Shin, Seung Jun
    • Communications for Statistical Applications and Methods
    • /
    • v.26 no.6
    • /
    • pp.635-644
    • /
    • 2019
  • Class probability is a fundamental target in classification that contains complete classification information. In this article, we propose a class probability estimation method when the predictor is functional. Motivated by Wang et al. (Biometrika, 95, 149-167, 2007), our estimator is obtained by training a sequence of functional weighted support vector machines (FWSVM) with different weights, which can be justified by the Fisher consistency of the hinge loss. The proposed method can be extended to multiclass classification via pairwise coupling proposed by Wu et al. (Journal of Machine Learning Research, 5, 975-1005, 2004). The use of FWSVM makes our method model-free as well as computationally efficient due to the piecewise linearity of the FWSVM solutions as functions of the weight. Numerical investigation to both synthetic and real data show the advantageous performance of the proposed method.

The Optimal Bispectral Feature Vectors and the Fuzzy Classifier for 2D Shape Classification

  • Youngwoon Woo;Soowhan Han;Park, Choong-Shik
    • Proceedings of the Korea Inteligent Information System Society Conference
    • /
    • 2001.01a
    • /
    • pp.421-427
    • /
    • 2001
  • In this paper, a method for selection of the optimal feature vectors is proposed for the classification of closed 2D shapes using the bispectrum of a contour sequence. The bispectrum based on third order cumulants is applied to the contour sequences of the images to extract feature vectors for each planar image. These bispectral feature vectors, which are invariant to shape translation, rotation and scale transformation, can be used to represent two-dimensional planar images, but there is no certain criterion on the selection of the feature vectors for optimal classification of closed 2D images. In this paper, a new method for selecting the optimal bispectral feature vectors based on the variances of the feature vectors. The experimental results are presented using eight different shapes of aircraft images, the feature vectors of the bispectrum from five to fifteen and an weighted mean fuzzy classifier.

  • PDF

DNA Sequence Classification Using a Generalized Regression Neural Network and Random Generator (난수발생기와 일반화된 회귀 신경망을 이용한 DNA 서열 분류)

  • 김성모;김근호;김병환
    • The Transactions of the Korean Institute of Electrical Engineers D
    • /
    • v.53 no.7
    • /
    • pp.525-530
    • /
    • 2004
  • A classifier was constructed by using a generalized regression neural network (GRU) and random generator (RG), which was applied to classify DNA sequences. Three data sets evaluated are eukaryotic and prokaryotic sequences (Data-I), eukaryotic sequences (Data-II), and prokaryotic sequences (Data-III). For each data set, the classifier performance was examined in terms of the total classification sensitivity (TCS), individual classification sensitivity (ICS), total prediction accuracy (TPA), and individual prediction accuracy (IPA). For a given spread, the RG played a role of generating a number of sets of spreads for gaussian functions in the pattern layer Compared to the GRNN, the RG-GRNN significantly improved the TCS by more than 50%, 60%, and 40% for Data-I, Data-II, and Data-III, respectively. The RG-GRNN also demonstrated improved TPA for all data types. In conclusion, the proposed RG-GRNN can effectively be used to classify a large, multivariable promoter sequences.

Industrial Process Monitoring and Fault Diagnosis Based on Temporal Attention Augmented Deep Network

  • Mu, Ke;Luo, Lin;Wang, Qiao;Mao, Fushun
    • Journal of Information Processing Systems
    • /
    • v.17 no.2
    • /
    • pp.242-252
    • /
    • 2021
  • Following the intuition that the local information in time instances is hardly incorporated into the posterior sequence in long short-term memory (LSTM), this paper proposes an attention augmented mechanism for fault diagnosis of the complex chemical process data. Unlike conventional fault diagnosis and classification methods, an attention mechanism layer architecture is introduced to detect and focus on local temporal information. The augmented deep network results preserve each local instance's importance and contribution and allow the interpretable feature representation and classification simultaneously. The comprehensive comparative analyses demonstrate that the developed model has a high-quality fault classification rate of 95.49%, on average. The results are comparable to those obtained using various other techniques for the Tennessee Eastman benchmark process.

Classification in Different Genera by Cytochrome Oxidase Subunit I Gene Using CNN-LSTM Hybrid Model

  • Meijing Li;Dongkeun Kim
    • Journal of information and communication convergence engineering
    • /
    • v.21 no.2
    • /
    • pp.159-166
    • /
    • 2023
  • The COI gene is a sequence of approximately 650 bp at the 5' terminal of the mitochondrial Cytochrome c Oxidase subunit I (COI) gene. As an effective DeoxyriboNucleic Acid (DNA) barcode, it is widely used for the taxonomic identification and evolutionary analysis of species. We created a CNN-LSTM hybrid model by combining the gene features partially extracted by the Long Short-Term Memory ( LSTM ) network with the feature maps obtained by the CNN. Compared to K-Means Clustering, Support Vector Machines (SVM), and a single CNN classification model, after training 278 samples in a training set that included 15 genera from two orders, the CNN-LSTM hybrid model achieved 94% accuracy in the test set, which contained 118 samples. We augmented the training set samples and four genera into four orders, and the classification accuracy of the test set reached 100%. This study also proposes calculating the cosine similarity between the training and test sets to initially assess the reliability of the predicted results and discover new species.

A STUDY OF INFERENCE IN CLASSIFIED CATALOGUE (분류목록의 추리성에 관한 연구)

  • Yoo Soyoung
    • Journal of the Korean Society for Library and Information Science
    • /
    • v.14
    • /
    • pp.3-18
    • /
    • 1987
  • The factors which can help the library users trace the specific subject that she or he needs are most important, when they are not sure of what they need exactly in front of a classified catalogue. This study is about what the factors are and how the factors affect the inference of users' reasoning structure. Since the classified catalogues are reflected by the classification structure, naturally the logic in the classification system becomes the focus of the study. This study concludes the classification system which enables the library users to use their reasoning capabilities, viz. the classification system which can help the users trace the specific subject even as they are not sure of the exact subject they need has following factors in the system. 1. It should have the validity based on the facts in the components of the classification system. 2. It should be logically arranged when the components of the classification system are placed in due sequence. 3. The notation of the system should be based on mnemonics. The reason is that the indispensable factors in the formation of inference of human reasoning structure are: 1. the premises which are based on the facts and 2. the logical relationship between the premises and conclusions which are induced from the premises.

  • PDF

Survey on Nucleotide Encoding Techniques and SVM Kernel Design for Human Splice Site Prediction

  • Bari, A.T.M. Golam;Reaz, Mst. Rokeya;Choi, Ho-Jin;Jeong, Byeong-Soo
    • Interdisciplinary Bio Central
    • /
    • v.4 no.4
    • /
    • pp.14.1-14.6
    • /
    • 2012
  • Splice site prediction in DNA sequence is a basic search problem for finding exon/intron and intron/exon boundaries. Removing introns and then joining the exons together forms the mRNA sequence. These sequences are the input of the translation process. It is a necessary step in the central dogma of molecular biology. The main task of splice site prediction is to find out the exact GT and AG ended sequences. Then it identifies the true and false GT and AG ended sequences among those candidate sequences. In this paper, we survey research works on splice site prediction based on support vector machine (SVM). The basic difference between these research works is nucleotide encoding technique and SVM kernel selection. Some methods encode the DNA sequence in a sparse way whereas others encode in a probabilistic manner. The encoded sequences serve as input of SVM. The task of SVM is to classify them using its learning model. The accuracy of classification largely depends on the proper kernel selection for sequence data as well as a selection of kernel parameter. We observe each encoding technique and classify them according to their similarity. Then we discuss about kernel and their parameter selection. Our survey paper provides a basic understanding of encoding approaches and proper kernel selection of SVM for splice site prediction.

Improvement of Performance of Malware Similarity Analysis by the Sequence Alignment Technique (서열 정렬 기법을 이용한 악성코드 유사도 분석의 성능 개선)

  • Cho, In Kyeom;Im, Eul Gyu
    • KIISE Transactions on Computing Practices
    • /
    • v.21 no.3
    • /
    • pp.263-268
    • /
    • 2015
  • Malware variations could be defined as malicious executable files that have similar functions but different structures. In order to classify the variations, this paper analyzed sequence alignment, the method used in Bioinformatics. This method found common parts of the Malwares' API call information. This method's performance is dependent on the API call information's length; if the length is too long, the performance should be very poor. Therefore we removed the repeated patterns in API call information in order to improve the performance of sequence alignment analysis, before the method was applied. Finally the similarity between malware was analyzed using sequence alignment. The experimental results with the real malware samples were presented.

Feature Selection with Ensemble Learning for Prostate Cancer Prediction from Gene Expression

  • Abass, Yusuf Aleshinloye;Adeshina, Steve A.
    • International Journal of Computer Science & Network Security
    • /
    • v.21 no.12spc
    • /
    • pp.526-538
    • /
    • 2021
  • Machine and deep learning-based models are emerging techniques that are being used to address prediction problems in biomedical data analysis. DNA sequence prediction is a critical problem that has attracted a great deal of attention in the biomedical domain. Machine and deep learning-based models have been shown to provide more accurate results when compared to conventional regression-based models. The prediction of the gene sequence that leads to cancerous diseases, such as prostate cancer, is crucial. Identifying the most important features in a gene sequence is a challenging task. Extracting the components of the gene sequence that can provide an insight into the types of mutation in the gene is of great importance as it will lead to effective drug design and the promotion of the new concept of personalised medicine. In this work, we extracted the exons in the prostate gene sequences that were used in the experiment. We built a Deep Neural Network (DNN) and Bi-directional Long-Short Term Memory (Bi-LSTM) model using a k-mer encoding for the DNA sequence and one-hot encoding for the class label. The models were evaluated using different classification metrics. Our experimental results show that DNN model prediction offers a training accuracy of 99 percent and validation accuracy of 96 percent. The bi-LSTM model also has a training accuracy of 95 percent and validation accuracy of 91 percent.

Generation of Finite Inductive, Pseudo Random, Binary Sequences

  • Fisher, Paul;Aljohani, Nawaf;Baek, Jinsuk
    • Journal of Information Processing Systems
    • /
    • v.13 no.6
    • /
    • pp.1554-1574
    • /
    • 2017
  • This paper introduces a new type of determining factor for Pseudo Random Strings (PRS). This classification depends upon a mathematical property called Finite Induction (FI). FI is similar to a Markov Model in that it presents a model of the sequence under consideration and determines the generating rules for this sequence. If these rules obey certain criteria, then we call the sequence generating these rules FI a PRS. We also consider the relationship of these kinds of PRS's to Good/deBruijn graphs and Linear Feedback Shift Registers (LFSR). We show that binary sequences from these special graphs have the FI property. We also show how such FI PRS's can be generated without consideration of the Hamiltonian cycles of the Good/deBruijn graphs. The FI PRS's also have maximum Shannon entropy, while sequences from LFSR's do not, nor are such sequences FI random.