• Title/Summary/Keyword: sequence-to-sequence model

Search Result 1,626, Processing Time 0.027 seconds

Anomaly Detection for User Action with Generative Adversarial Networks (적대적 생성 모델을 활용한 사용자 행위 이상 탐지 방법)

  • Choi, Nam woong;Kim, Wooju
    • Journal of Intelligence and Information Systems
    • /
    • v.25 no.3
    • /
    • pp.43-62
    • /
    • 2019
  • At one time, the anomaly detection sector dominated the method of determining whether there was an abnormality based on the statistics derived from specific data. This methodology was possible because the dimension of the data was simple in the past, so the classical statistical method could work effectively. However, as the characteristics of data have changed complexly in the era of big data, it has become more difficult to accurately analyze and predict the data that occurs throughout the industry in the conventional way. Therefore, SVM and Decision Tree based supervised learning algorithms were used. However, there is peculiarity that supervised learning based model can only accurately predict the test data, when the number of classes is equal to the number of normal classes and most of the data generated in the industry has unbalanced data class. Therefore, the predicted results are not always valid when supervised learning model is applied. In order to overcome these drawbacks, many studies now use the unsupervised learning-based model that is not influenced by class distribution, such as autoencoder or generative adversarial networks. In this paper, we propose a method to detect anomalies using generative adversarial networks. AnoGAN, introduced in the study of Thomas et al (2017), is a classification model that performs abnormal detection of medical images. It was composed of a Convolution Neural Net and was used in the field of detection. On the other hand, sequencing data abnormality detection using generative adversarial network is a lack of research papers compared to image data. Of course, in Li et al (2018), a study by Li et al (LSTM), a type of recurrent neural network, has proposed a model to classify the abnormities of numerical sequence data, but it has not been used for categorical sequence data, as well as feature matching method applied by salans et al.(2016). So it suggests that there are a number of studies to be tried on in the ideal classification of sequence data through a generative adversarial Network. In order to learn the sequence data, the structure of the generative adversarial networks is composed of LSTM, and the 2 stacked-LSTM of the generator is composed of 32-dim hidden unit layers and 64-dim hidden unit layers. The LSTM of the discriminator consists of 64-dim hidden unit layer were used. In the process of deriving abnormal scores from existing paper of Anomaly Detection for Sequence data, entropy values of probability of actual data are used in the process of deriving abnormal scores. but in this paper, as mentioned earlier, abnormal scores have been derived by using feature matching techniques. In addition, the process of optimizing latent variables was designed with LSTM to improve model performance. The modified form of generative adversarial model was more accurate in all experiments than the autoencoder in terms of precision and was approximately 7% higher in accuracy. In terms of Robustness, Generative adversarial networks also performed better than autoencoder. Because generative adversarial networks can learn data distribution from real categorical sequence data, Unaffected by a single normal data. But autoencoder is not. Result of Robustness test showed that he accuracy of the autocoder was 92%, the accuracy of the hostile neural network was 96%, and in terms of sensitivity, the autocoder was 40% and the hostile neural network was 51%. In this paper, experiments have also been conducted to show how much performance changes due to differences in the optimization structure of potential variables. As a result, the level of 1% was improved in terms of sensitivity. These results suggest that it presented a new perspective on optimizing latent variable that were relatively insignificant.

Measurement of missing video frames in NPP control room monitoring system using Kalman filter

  • Mrityunjay Chaubey;Lalit Kumar Singh;Manjari Gupta
    • Nuclear Engineering and Technology
    • /
    • v.55 no.1
    • /
    • pp.37-44
    • /
    • 2023
  • Using the Kalman filtering technique, we propose a novel method for estimating the missing video frames to monitor the activities inside the control room of a nuclear power plant (NPP). The purpose of this study is to reinforce the existing security and safety procedures in the control room of an NPP. The NPP control room serves as the nervous system of the plant, with instrumentation and control systems used to monitor and control critical plant parameters. Because the safety and security of the NPP control room are critical, it must be monitored closely by security cameras in order to assess and reduce the onset of any incidents and accidents that could adversely impact the safety of the NPP. However, for a variety of technical and administrative reasons, continuous monitoring may be interrupted. Because of the interruption, one or more frames of the video may be distorted or missing, making it difficult to identify the activity during this time period. This could endanger overall safety. The demonstrated Kalman filter model estimates the value of the missing frame pixel-by-pixel using information from the frame that occurred in the video sequence before it and the frame that will occur in the video sequence after it. The results of the experiment provide evidence of the effectiveness of the algorithm.

Best Practice on Automatic Toon Image Creation from JSON File of Message Sequence Diagram via Natural Language based Requirement Specifications

  • Hyuntae Kim;Ji Hoon Kong;Hyun Seung Son;R. Young Chul Kim
    • International journal of advanced smart convergence
    • /
    • v.13 no.1
    • /
    • pp.99-107
    • /
    • 2024
  • In AI image generation tools, most general users must use an effective prompt to craft queries or statements to elicit the desired response (image, result) from the AI model. But we are software engineers who focus on software processes. At the process's early stage, we use informal and formal requirement specifications. At this time, we adapt the natural language approach into requirement engineering and toon engineering. Most Generative AI tools do not produce the same image in the same query. The reason is that the same data asset is not used for the same query. To solve this problem, we intend to use informal requirement engineering and linguistics to create a toon. Therefore, we propose a sequence diagram and image generation mechanism by analyzing and applying key objects and attributes as an informal natural language requirement analysis. Identify morpheme and semantic roles by analyzing natural language through linguistic methods. Based on the analysis results, a sequence diagram and an image are generated through the diagram. We expect consistent image generation using the same image element asset through the proposed mechanism.

Implicit Treatment of Technical Specification and Thermal Hydraulic Parameter Uncertainties in Gaussian Process Model to Estimate Safety Margin

  • Fynan, Douglas A.;Ahn, Kwang-Il
    • Nuclear Engineering and Technology
    • /
    • v.48 no.3
    • /
    • pp.684-701
    • /
    • 2016
  • The Gaussian process model (GPM) is a flexible surrogate model that can be used for nonparametric regression for multivariate problems. A unique feature of the GPM is that a prediction variance is automatically provided with the regression function. In this paper, we estimate the safety margin of a nuclear power plant by performing regression on the output of best-estimate simulations of a large-break loss-of-coolant accident with sampling of safety system configuration, sequence timing, technical specifications, and thermal hydraulic parameter uncertainties. The key aspect of our approach is that the GPM regression is only performed on the dominant input variables, the safety injection flow rate and the delay time for AC powered pumps to start representing sequence timing uncertainty, providing a predictive model for the peak clad temperature during a reflood phase. Other uncertainties are interpreted as contributors to the measurement noise of the code output and are implicitly treated in the GPM in the noise variance term, providing local uncertainty bounds for the peak clad temperature. We discuss the applicability of the foregoing method to reduce the use of conservative assumptions in best estimate plus uncertainty (BEPU) and Level 1 probabilistic safety assessment (PSA) success criteria definitions while dealing with a large number of uncertainties.

Structural Analysis of Recombinant Human Preproinsulins by Structure Prediction, Molecular Dynamics, and Protein-Protein Docking

  • Jung, Sung Hun;Kim, Chang-Kyu;Lee, Gunhee;Yoon, Jonghwan;Lee, Minho
    • Genomics & Informatics
    • /
    • v.15 no.4
    • /
    • pp.142-146
    • /
    • 2017
  • More effective production of human insulin is important, because insulin is the main medication that is used to treat multiple types of diabetes and because many people are suffering from diabetes. The current system of insulin production is based on recombinant DNA technology, and the expression vector is composed of a preproinsulin sequence that is a fused form of an artificial leader peptide and the native proinsulin. It has been reported that the sequence of the leader peptide affects the production of insulin. To analyze how the leader peptide affects the maturation of insulin structurally, we adapted several in silico simulations using 13 artificial proinsulin sequences. Three-dimensional structures of models were predicted and compared. Although their sequences had few differences, the predicted structures were somewhat different. The structures were refined by molecular dynamics simulation, and the energy of each model was estimated. Then, protein-protein docking between the models and trypsin was carried out to compare how efficiently the protease could access the cleavage sites of the proinsulin models. The results showed some concordance with experimental results that have been reported; so, we expect our analysis will be used to predict the optimized sequence of artificial proinsulin for more effective production.

Protein Ontology: Semantic Data Integration in Proteomics

  • Sidhu, Amandeep S.;Dillon, Tharam S.;Chang, Elizabeth;Sidhu, Baldev S.
    • Proceedings of the Korean Society for Bioinformatics Conference
    • /
    • 2005.09a
    • /
    • pp.388-391
    • /
    • 2005
  • The Protein Structural and Functional Conservation need a common language for data definition. With the help of common language provided by Protein Ontology the high level of sequence and functional conservation can be extended to all organisms with the likelihood that proteins that carry out core biological processes will again be probable orthologues. The structural and functional conservation in these proteins presents both opportunities and challenges. The main opportunity lies in the possibility of automated transfer of protein data annotations from experimentally traceable model organisms to a less traceable organism based on protein sequence similarity. Such information can be used to improve human health or agriculture. The challenge lies in using a common language to transfer protein data annotations among different species of organisms. First step in achieving this huge challenge is producing a structured, precisely defined common vocabulary using Protein Ontology. The Protein Ontology described in this paper covers the sequence, structure and biological roles of Protein Complexes in any organism.

  • PDF

Modification of cell wall structural carbohydrate in the hybrid poplar expressing Medicago R2R3-MYB transcription factor MtMYB70

  • Kim, Sun Hee;Choi, Young Im;Jin, Hyunjung;Shin, Soo-Jeong;Park, Jong-Sug;Kwon, Mi
    • Journal of Plant Biotechnology
    • /
    • v.42 no.2
    • /
    • pp.93-103
    • /
    • 2015
  • The isolation, cloning, and characterization of an R2R3-MYB transcription factor gene (MtMYB70) from the model legume Medicago truncatula is reported. MtMYB70 consists of a 768-bp coding sequence corresponding to 255 amino acids. Sequence alignment revealed that MtMYB70 cDNA contains conserved R2R3-type MYB domains with highly divergent C terminal regions. MtMYB70 was found to have relatively low sequence homology with known R2R3-MYB genes. Phylogenetic analysis placed the R2R3-MYB domain of MtMYB70 closest to PtMYB1, a known activator of lignin biosynthesis. Overexpression of MtMYB70 under the control of the 35S promoter in transgenic poplar did not cause a significant difference in total lignin content relative to the control, but glucan content was significantly increased in transgenic poplar. Therefore, MtMYB70 might have regulatory role in the biosynthesis of cell wall structural carbohydrates.

Prediction of DO Concentration in Nakdong River Estuary through Case Study Based on Long Short Term Memory Model (Long Short Term Memory 모델 기반 Case Study를 통한 낙동강 하구역의 용존산소농도 예측)

  • Park, Seongsik;Kim, Kyunghoi
    • Journal of Korean Society of Coastal and Ocean Engineers
    • /
    • v.33 no.6
    • /
    • pp.238-245
    • /
    • 2021
  • In this study, we carried out case study to predict dissolved oxygen (DO) concentration of Nakdong river estuary with LSTM model. we aimed to figure out a optimal model condition and appropriate predictor for prediction in dissolved oxygen concentration with model parameter and predictor as cases. Model parameter case study results showed that Epoch = 300 and Sequence length = 1 showed higher accuracy than other conditions. In predictor case study, it was highest accuracy where DO and Temperature were used as a predictor, it was caused by high correlation between DO concentration and Temperature. From above results, we figured out an appropriate model condition and predictor for prediction in DO concentration of Nakdong river estuary.

Chemical Structure Study on Copolyterephthalates Based on Ethylene Glycol and 1, 4-Cyclohexane Dimethanol by High Resolution NMR Analysis (고분해능 NMR 분석법에 의한 에틸렌글리콜과 1, 4-시클로헥산디메탄올의 테레프탈산 공중합체의 화학구조 연구)

  • Yoo, Hee-Yeoul;Kim, Sang-Wook;Okui, Norimasa
    • Applied Chemistry for Engineering
    • /
    • v.4 no.4
    • /
    • pp.770-775
    • /
    • 1993
  • Chemical structure of poly(ethylene terephthalate-co-1, 4-cyclohexylene dimethylene terephthalate), P(ET-CT) copolyesters was investigated by High Resolution NMR analysis. The copolymer composition and isomeric ratio were determined by methylene resonance peaks which were separated into three peaks corresponding to ET, trans CT and cis CT units, respectively. The copolymer sequence distribution was evaluated from the carbon resonance peaks connected to carbonyl groups in benzene, indicating died distribution. According to statistics model, these copolyesters are almost random copolymers. The copolymer sequence distribution could be simulated and its averaged length was calculated by random copolymer statistics.

  • PDF

THE LUMINOSITY FUNCTION AND INITIAL MASS FUNCTION FOR THE PLEIADES CLUSTER

  • LEE SEE WOO;SUNG HWANKYUNG
    • Journal of The Korean Astronomical Society
    • /
    • v.28 no.1
    • /
    • pp.45-59
    • /
    • 1995
  • In the best observed Pleiades cluster, the luminosity function(LF) and mass function(MF) for main sequence(MS) stars extended to $Mv{\approx}15.5(V{\approx}21)$ are very similar to the initial luminosity function(ILF) and initial mass function(IMF) for field stars in the solar neighborhood showing a bump at log $m{\simeq}-0.05$ and a dip at log $m{\simeq}-0.12$. This dip is equivalent to the Wielen dip appearing in the LF for the field stars. The occurence of these bump and dip is independent of adopted mass-luminosity relation(MLR) . and their characteristics could be explained by a time-dependent bimodal IMF. The model with this IMF gives a total cluster mass of $\~700M_\bigodot,\;\~25$ brown dwarfs and $\~3$ white dwarfs if the upper mass limit of progenitor of white dwarf is greater than $4.5M_\bigodot$. The cluster age on the basis of LF for brightest stars is given by $\~8\times10^7yr$ and all stars in the cluster lie along the single age sequence in the C-M diagram without showing a large dispersion from the sequence.

  • PDF