• Title/Summary/Keyword: Sequence prediction

Search Result 427, Processing Time 0.031 seconds

ON EXTREMAL SORT SEQUENCES

  • Yun, Min-Young;Keum, Young-Wook
    • Journal of applied mathematics & informatics
    • /
    • v.9 no.1
    • /
    • pp.239-252
    • /
    • 2002
  • A sort sequence $S_n$ is sequence of all unordered pairs of indices in $I_n$={1,2,…n}. With a sort sequence $S_n$ = ($s_1,S_2,...,S_{\frac{n}{2}}$),one can associate a predictive sorting algorithm A($S_n$). An execution of the a1gorithm performs pairwise comparisons of elements in the input set X in the order defined by the sort sequence $S_n$ except that the comparisons whose outcomes can be inferred from the results of the preceding comparisons are not performed. A sort sequence is said to be extremal if it maximizes a given objective function. First we consider the extremal sort sequences with respect to the objective function $\omega$($S_n$) - the expected number of tractive predictions in $S_n$. We study $\omega$-extremal sort sequences in terms of their prediction vectors. Then we consider the objective function $\Omega$($S_n$) - the minimum number of active predictions in $S_n$ over all input orderings.

Online Selective-Sample Learning of Hidden Markov Models for Sequence Classification

  • Kim, Minyoung
    • International Journal of Fuzzy Logic and Intelligent Systems
    • /
    • v.15 no.3
    • /
    • pp.145-152
    • /
    • 2015
  • We consider an online selective-sample learning problem for sequence classification, where the goal is to learn a predictive model using a stream of data samples whose class labels can be selectively queried by the algorithm. Given that there is a limit to the total number of queries permitted, the key issue is choosing the most informative and salient samples for their class labels to be queried. Recently, several aggressive selective-sample algorithms have been proposed under a linear model for static (non-sequential) binary classification. We extend the idea to hidden Markov models for multi-class sequence classification by introducing reasonable measures for the novelty and prediction confidence of the incoming sample with respect to the current model, on which the query decision is based. For several sequence classification datasets/tasks in online learning setups, we demonstrate the effectiveness of the proposed approach.

Prediction of Salinity of Nakdong River Estuary Using Deep Learning Algorithm (LSTM) for Time Series Analysis (시계열 분석 딥러닝 알고리즘을 적용한 낙동강 하굿둑 염분 예측)

  • Woo, Joung Woon;Kim, Yeon Joong;Yoon, Jong Sung
    • Journal of Korean Society of Coastal and Ocean Engineers
    • /
    • v.34 no.4
    • /
    • pp.128-134
    • /
    • 2022
  • Nakdong river estuary is being operated with the goal of expanding the period of seawater inflow from this year to 2022 every month and creating a brackish water area within 15 km of the upstream of the river bank. In this study, the deep learning algorithm Long Short-Term Memory (LSTM) was applied to predict the salinity of the Nakdong Bridge (about 5 km upstream of the river bank) for the purpose of rapid decision making for the target brackish water zone and prevention of salt water damage. Input data were constructed to reflect the temporal and spatial characteristics of the Nakdong River estuary, such as the amount of discharge from Changnyeong and Hamanbo, and an optimal model was constructed in consideration of the hydraulic characteristics of the Nakdong River Estuary by changing the degree according to the sequence length. For prediction accuracy, statistical analysis was performed using the coefficient of determination (R-squred) and RMSE (root mean square error). When the sequence length was 12, the R-squred 0.997 and RMSE 0.122 were the highest, and the prior prediction time showed a high degree of R-squred 0.93 or more until the 12-hour interval.

Introduction to Gene Prediction Using HMM Algorithm

  • Kim, Keon-Kyun;Park, Eun-Sik
    • Journal of the Korean Data and Information Science Society
    • /
    • v.18 no.2
    • /
    • pp.489-506
    • /
    • 2007
  • Gene structure prediction, which is to predict protein coding regions in a given nucleotide sequence, is the most important process in annotating genes and greatly affects gene analysis and genome annotation. As eukaryotic genes have more complicated structures in DNA sequences than those of prokaryotic genes, analysis programs for eukaryotic gene structure prediction have more diverse and more complicated computational models. There are Ab Initio method, Similarity-based method, and Ensemble method for gene prediction method for eukaryotic genes. Each Method use various algorithms. This paper introduce how to predict genes using HMM(Hidden Markov Model) algorithm and present the process of gene prediction with well-known gene prediction programs.

  • PDF

Study on the Demand Prediction for Transportation System Utilizing Data Granulization (Data Granulization을 이용한 수송수요예측에 관한 연구)

  • 이덕규;홍태화;김학배;우광방
    • Proceedings of the KSR Conference
    • /
    • 1998.05a
    • /
    • pp.211-218
    • /
    • 1998
  • The demand prediction becomes an essential mean to utilize efficiently finite traffic facilities and to provide the optimized schedules for transportation system. The demand prediction is one of the critical complex management schemes for distibuting resources of transportation service by means of computer system. The construction of a prediction model is based on data granulization, followed by processing the raw input data and evaluating the predicted output values. A large number of economic-social parameters are also to be implemented in conventional prediction models which are only based on a sequence of past data. The proposed prediction models are classified by static and dynamic characteristics and its performances are evaluated utilizing computer simulation.

  • PDF

Assessment of artificial neural network model for real-time dam inflow prediction (실시간 댐 유입량 예측을 위한 인공신경망 모형의 활용성 평가)

  • Heo, Jae-Yeong;Bae, Deg-Hyo
    • Journal of Korea Water Resources Association
    • /
    • v.54 no.spc1
    • /
    • pp.1131-1141
    • /
    • 2021
  • In this study, the artificial neural network model is applied for real-time dam inflow prediction and then evaluated for the prediction lead times (1, 3, 6 hr) in dam basins in Korea. For the training and testing the model, hourly precipitation and inflow are used as input data according to average annual inflow. The results show that the model performance for up to 6 hour is acceptable because the NSE is 0.57 to 0.79 or higher. Totally, the predictive performance of the model in dry seasons is weaker than the performance in wet seasons, and this difference in performance increases in the larger basin. For the 6 hour prediction lead time, the model performance changes as the sequence length increases. These changes are significant for the dry season with increasing sequence length compared to the wet season. Also, with increasing the sequence length, the prediction performance of the model improved during the dry season. Comparison of observed and predicted hydrographs for flood events showed that although the shape of the prediction hydrograph is similar to the observed hydrograph, the peak flow tends to be underestimated and the peak time is delayed depending on the prediction lead time.

Sequential Analysis of Earth Retaining Structures Using p-y Curves for Subgrade Reaction

  • Kim, Hwang;Cha
    • Geotechnical Engineering
    • /
    • v.12 no.3
    • /
    • pp.149-164
    • /
    • 1996
  • The sequential behavior of earth retaining structure is investigated by using soil springs in elasto -plastic soil. Mathematical model that can be used to construct the p-y curves for subgrade modulus is proposed by using piecewise linear function. The excavation sequence of retaining wall is analyzed by the beam -column method. Reliability on the developed computer program is verfied through the comparison between the prediction and the in -situ measuidments. It is concluded that the proposed method simulates well the construction sequence and thus represents a significant improvement in the prediction of deflections of anchored wall excavation. Based on the results the proposed method can be effectively used for the evaluation of the relative importance of the parameters employed in a sensitivity analysis.

  • PDF

Structural Analysis of Recombinant Human Preproinsulins by Structure Prediction, Molecular Dynamics, and Protein-Protein Docking

  • Jung, Sung Hun;Kim, Chang-Kyu;Lee, Gunhee;Yoon, Jonghwan;Lee, Minho
    • Genomics & Informatics
    • /
    • v.15 no.4
    • /
    • pp.142-146
    • /
    • 2017
  • More effective production of human insulin is important, because insulin is the main medication that is used to treat multiple types of diabetes and because many people are suffering from diabetes. The current system of insulin production is based on recombinant DNA technology, and the expression vector is composed of a preproinsulin sequence that is a fused form of an artificial leader peptide and the native proinsulin. It has been reported that the sequence of the leader peptide affects the production of insulin. To analyze how the leader peptide affects the maturation of insulin structurally, we adapted several in silico simulations using 13 artificial proinsulin sequences. Three-dimensional structures of models were predicted and compared. Although their sequences had few differences, the predicted structures were somewhat different. The structures were refined by molecular dynamics simulation, and the energy of each model was estimated. Then, protein-protein docking between the models and trypsin was carried out to compare how efficiently the protease could access the cleavage sites of the proinsulin models. The results showed some concordance with experimental results that have been reported; so, we expect our analysis will be used to predict the optimized sequence of artificial proinsulin for more effective production.

Backbone 1H, 15N and 13C Resonance Assignment and Secondary Structure Prediction of HP0062 (O24902_HELPY) from Helicobacter pylori

  • Jang, Sun-Bok;Ma, Chao;Park, Sung-Jean;Kwon, Ae-Ran;Lee, Bong-Jin
    • Journal of the Korean Magnetic Resonance Society
    • /
    • v.13 no.2
    • /
    • pp.117-125
    • /
    • 2009
  • HP0062 is an 86 residue hypothetical protein from Helicobacter pylori strain 26695. HP0062 was identified ESAT-6/WXG100 superfamily protein based on structure and sequence alignment and also contains leucine zipper domain sequence. Here, we report the sequence-specific backbone resonance assignment of HP0062. About 97.7% of all $^1H_N,\;^{15}N,\;^{13}C_{\alpha},\;^{13}C_{\beta}\;and\;^{13}C=O$ resonances were assigned unambiguously. We could predict the secondary structure of HP0062 by analyzing the deviation of the $^{13}C_{alpha}\;and\;^{13}C_{\beta}$ chemical shifts from their respective random coil values. Secondary structure prediction shows that HP0062 consist of two ${\alpha}$-helices. This study is a prerequisite for determining the solution structure of HP0062 and can be used for the study on interaction between HP0062 and DNA and other Helicobacter pylori proteins.

Protein Disorder/Order Region Classification Using EPs-TFP Mining Method (EPs-TFP 마이닝 기법을 이용한 단백질 Disorder/Order 지역 분류)

  • Lee, Heon Gyu;Shin, Yong Ho
    • Journal of Korea Society of Industrial Information Systems
    • /
    • v.17 no.6
    • /
    • pp.59-72
    • /
    • 2012
  • Since a protein displays its specific functions when disorder region of protein sequence transits to order region with provoking a biological reaction, the separation of disorder region and order region from the sequence data is urgently necessary for predicting three dimensional structure and characteristics of the protein. To classify the disorder and order region efficiently, this paper proposes a classification/prediction method using sequence data while acquiring a non-biased result on a specific characteristics of protein and improving the classification speed. The emerging patterns based EPs-TFP methods utilizes only the essential emerging pattern in which the redundant emerging patterns are removed. This classification method finds the sequence patterns of disorder region, such sequence patterns are frequently shown in disorder region but relatively not frequently in the order region. We expand P-tree and T-tree conceptualized TFP method into a classification/prediction method in order to improve the performance of the proposed algorithm. We used Disprot 4.9 and CASP 7 data to evaluate EPs-TFP technique, the results of order/disorder classification show sensitivity 73.6, specificity 69.51 and accuracy 74.2.