• Title/Summary/Keyword: Data Model

Search Result 47,168, Processing Time 0.062 seconds

Cluster-based Deep One-Class Classification Model for Anomaly Detection

  • Younghwan Kim;Huy Kang Kim
    • Journal of Internet Technology
    • /
    • v.22 no.4
    • /
    • pp.903-911
    • /
    • 2021
  • As cyber-attacks on Cyber-Physical System (CPS) become more diverse and sophisticated, it is important to quickly detect malicious behaviors occurring in CPS. Since CPS can collect sensor data in near real time throughout the process, there have been many attempts to detect anomaly behavior through normal behavior learning from the perspective of data-driven security. However, since the CPS datasets are big data and most of the data are normal data, it has always been a great challenge to analyze the data and implement the anomaly detection model. In this paper, we propose and evaluate the Clustered Deep One-Class Classification (CD-OCC) model that combines the clustering algorithm and deep learning (DL) model using only a normal dataset for anomaly detection. We use auto-encoder to reduce the dimensions of the dataset and the K-means clustering algorithm to classify the normal data into the optimal cluster size. The DL model trains to predict clusters of normal data, and we can obtain logit values as outputs. The derived logit values are datasets that can better represent normal data in terms of knowledge distillation and are used as inputs to the OCC model. As a result of the experiment, the F1 score of the proposed model shows 0.93 and 0.83 in the SWaT and HAI dataset, respectively, and shows a significant performance improvement over other recent detectors such as Com-AE and SVM-RBF.

A study on the construction of the quality prediction model by artificial neural intelligence through integrated learning of CAE-based data and experimental data in the injection molding process (사출성형공정에서 CAE 기반 품질 데이터와 실험 데이터의 통합 학습을 통한 인공지능 품질 예측 모델 구축에 대한 연구)

  • Lee, Jun-Han;Kim, Jong-Sun
    • Design & Manufacturing
    • /
    • v.15 no.4
    • /
    • pp.24-31
    • /
    • 2021
  • In this study, an artificial neural network model was constructed to convert CAE analysis data into similar experimental data. In the analysis and experiment, the injection molding data for 50 conditions were acquired through the design of experiment and random selection method. The injection molding conditions and the weight, height, and diameter of the product derived from CAE results were used as the input parameters for learning of the convert model. Also the product qualities of experimental results were used as the output parameters for learning of the convert model. The accuracy of the convert model showed RMSE values of 0.06g, 0.03mm, and 0.03mm in weight, height, and diameter, respectively. As the next step, additional randomly selected conditions were created and CAE analysis was performed. Then, the additional CAE analysis data were converted to similar experimental data through the conversion model. An artificial neural network model was constructed to predict the quality of injection molded product by using converted similar experimental data and injection molding experiment data. The injection molding conditions were used as input parameters for learning of the predicted model and weight, height, and diameter of the product were used as output parameters for learning. As a result of evaluating the performance of the prediction model, the predicted weight, height, and diameter showed RMSE values of 0.11g, 0.03mm, and 0.05mm and in terms of quality criteria of the target product, all of them showed accurate results satisfying the criteria range.

An Extension of Product Data Model for Calculating Product-level Carbon Footprint (제품수준 탄소배출이력 계산을 위한 제품자료모델 확장)

  • Do, Nam-Chui
    • Korean Journal of Computational Design and Engineering
    • /
    • v.16 no.4
    • /
    • pp.268-276
    • /
    • 2011
  • The product-level carbon footprint (PCF) is a comprehensive and widely accepted metric for sustainable product development. However, since a full PCF study in general is time and cost intensive, it is not feasible for the product development team to synchronize the activity to the main product development process. In addition, the current dedicated life cycle assessment (LCA) tools for calculating PCF, separated from the main product data management systems, have limitations to provide timely PCF information for design decision makings and collaborations between design and environment engineers. This paper examines the possibility of the extension of the current product data model that can support the PCF calculation with PDM (Product Data Management) databases. The product data model can represent not only the content of products but also context or system information of the products. The product data model can be implemented as a PDM database that can satisfy the needs for handy and timely PCF calculations from the consistent product data for dynamic design decision makings and engineering collaborations.

The Possibility of Daily Flow Data Generation from 8-Day Intervals Measured Flow Data for Calibrating Watershed Model (유역모형 구축을 위한 8일간격 유량측정자료의 일유량 확장 가능성)

  • Kim, Sangdan;Kang, Du Kee;Kim, Moon Su;Shin, Hyun Suk
    • Journal of Korean Society on Water Environment
    • /
    • v.23 no.1
    • /
    • pp.64-71
    • /
    • 2007
  • In this study daily flow data is constructed from 8-day intervals flow data which has been measured by Nakdong River Water Environmental Laboratory. TANK model is used to expand 8-day intervals flow data into daily flow data. Using the Sequential quadratic programing, TANK model is auto-calibrated with daily precipitation and 8-day interval flow data. Generated and measured daily surface flow, ground water flow data and ground water recharge are shown to be in a good agreement. From this result, it is thought that this method has the potential to provide daily flow data for calibrating an watershed model such as SWAT.

Product data model for PLM system

  • Li, Yumei;Wan, Li;Xiong, Tifan
    • International Journal of CAD/CAM
    • /
    • v.11 no.1
    • /
    • pp.1-10
    • /
    • 2011
  • Product lifecycle management (PLM) is a new business strategy for enterprise's product R&D. A PLM system holds and maintaining the integrity of the product data produced throughout its entire lifecycle. There is, therefore, a need to build a safe and effective product data model to support PLM system. The paper proposes a domain-based product data model for PLM. The domain modeling method is introduced, including the domain concept and its defining standard along the product evolution process. The product data model in every domain is explained, and the mapping rules among these models are discussed. Mapped successively among these models, product data can be successfully realized the dynamic evolution and the historical traceability in PLM system.

  • PDF

A Hybrid SVM Classifier for Imbalanced Data Sets (불균형 데이터 집합의 분류를 위한 하이브리드 SVM 모델)

  • Lee, Jae Sik;Kwon, Jong Gu
    • Journal of Intelligence and Information Systems
    • /
    • v.19 no.2
    • /
    • pp.125-140
    • /
    • 2013
  • We call a data set in which the number of records belonging to a certain class far outnumbers the number of records belonging to the other class, 'imbalanced data set'. Most of the classification techniques perform poorly on imbalanced data sets. When we evaluate the performance of a certain classification technique, we need to measure not only 'accuracy' but also 'sensitivity' and 'specificity'. In a customer churn prediction problem, 'retention' records account for the majority class, and 'churn' records account for the minority class. Sensitivity measures the proportion of actual retentions which are correctly identified as such. Specificity measures the proportion of churns which are correctly identified as such. The poor performance of the classification techniques on imbalanced data sets is due to the low value of specificity. Many previous researches on imbalanced data sets employed 'oversampling' technique where members of the minority class are sampled more than those of the majority class in order to make a relatively balanced data set. When a classification model is constructed using this oversampled balanced data set, specificity can be improved but sensitivity will be decreased. In this research, we developed a hybrid model of support vector machine (SVM), artificial neural network (ANN) and decision tree, that improves specificity while maintaining sensitivity. We named this hybrid model 'hybrid SVM model.' The process of construction and prediction of our hybrid SVM model is as follows. By oversampling from the original imbalanced data set, a balanced data set is prepared. SVM_I model and ANN_I model are constructed using the imbalanced data set, and SVM_B model is constructed using the balanced data set. SVM_I model is superior in sensitivity and SVM_B model is superior in specificity. For a record on which both SVM_I model and SVM_B model make the same prediction, that prediction becomes the final solution. If they make different prediction, the final solution is determined by the discrimination rules obtained by ANN and decision tree. For a record on which SVM_I model and SVM_B model make different predictions, a decision tree model is constructed using ANN_I output value as input and actual retention or churn as target. We obtained the following two discrimination rules: 'IF ANN_I output value <0.285, THEN Final Solution = Retention' and 'IF ANN_I output value ${\geq}0.285$, THEN Final Solution = Churn.' The threshold 0.285 is the value optimized for the data used in this research. The result we present in this research is the structure or framework of our hybrid SVM model, not a specific threshold value such as 0.285. Therefore, the threshold value in the above discrimination rules can be changed to any value depending on the data. In order to evaluate the performance of our hybrid SVM model, we used the 'churn data set' in UCI Machine Learning Repository, that consists of 85% retention customers and 15% churn customers. Accuracy of the hybrid SVM model is 91.08% that is better than that of SVM_I model or SVM_B model. The points worth noticing here are its sensitivity, 95.02%, and specificity, 69.24%. The sensitivity of SVM_I model is 94.65%, and the specificity of SVM_B model is 67.00%. Therefore the hybrid SVM model developed in this research improves the specificity of SVM_B model while maintaining the sensitivity of SVM_I model.

An application to Multivariate Zero-Inflated Poisson Regression Model

  • Kim, Kyung-Moo
    • Journal of the Korean Data and Information Science Society
    • /
    • v.14 no.2
    • /
    • pp.177-186
    • /
    • 2003
  • The Zero-Inflated Poisson regression is a model for count data with exess zeros. When the correlated response variables are intrested, we have to extend the univariate zero-inflated regression model to multivariate model. In this paper, we study and simulate the multivariate zero-inflated regression model. A real example was applied to this model. Regression parameters are estimated by using MLE's. We also compare the fitness of multivariate zero-inflated Poisson regression model with the decision tree model.

  • PDF

Bayesian Typhoon Track Prediction Using Wind Vector Data

  • Han, Minkyu;Lee, Jaeyong
    • Communications for Statistical Applications and Methods
    • /
    • v.22 no.3
    • /
    • pp.241-253
    • /
    • 2015
  • In this paper we predict the track of typhoons using a Bayesian principal component regression model based on wind field data. Data is obtained at each time point and we applied the Bayesian principal component regression model to conduct the track prediction based on the time point. Based on regression model, we applied to variable selection prior and two kinds of prior distribution; normal and Laplace distribution. We show prediction results based on Bayesian Model Averaging (BMA) estimator and Median Probability Model (MPM) estimator. We analysis 8 typhoons in 2006 using data obtained from previous 6 years (2000-2005). We compare our prediction results with a moving-nest typhoon model (MTM) proposed by the Korea Meteorological Administration. We posit that is possible to predict the track of a typhoon accurately using only a statistical model and without a dynamical model.

Densification Analysis for SiC Powder under Cold Compaction (냉간압축 하에서 실리콘 카바이드 분말의 치밀화해석)

  • Park, Hwan;Kim, Ki-Tae
    • Journal of the Korean Ceramic Society
    • /
    • v.37 no.6
    • /
    • pp.589-595
    • /
    • 2000
  • Densification behavior of SiC powder was investigated under cold compaction. A special form of the Cap model was proposed from experimental data of SiC powder under triaxial compression. To compare with experimental data of SiC powder under cold compaction, the proposed constitutive model was implemented into a finite element program (ABAQUS). Finite element calculations from the Cam-Clay model and the modified Drucker-Prager model were also compared with experimental data of SiC powder. The agreements between experimental data and finite element results obtained from the proposed constitutive model are reasonably good. In die pressing, finite element results obtained from the Cam-Clay model and the modified Drucker-Prager model, however, show lower average density of SiC powder compacts compared to experimental data.

  • PDF

A Prediction Model Based on Relevance Vector Machine and Granularity Analysis

  • Cho, Young Im
    • International Journal of Fuzzy Logic and Intelligent Systems
    • /
    • v.16 no.3
    • /
    • pp.157-162
    • /
    • 2016
  • In this paper, a yield prediction model based on relevance vector machine (RVM) and a granular computing model (quotient space theory) is presented. With a granular computing model, massive and complex meteorological data can be analyzed at different layers of different grain sizes, and new meteorological feature data sets can be formed in this way. In order to forecast the crop yield, a grey model is introduced to label the training sample data sets, which also can be used for computing the tendency yield. An RVM algorithm is introduced as the classification model for meteorological data mining. Experiments on data sets from the real world using this model show an advantage in terms of yield prediction compared with other models.