• Title/Summary/Keyword: Multivariate Data

Search Result 1,996, Processing Time 0.027 seconds

Abnormality Detection to Non-linear Multivariate Process Using Supervised Learning Methods (지도학습기법을 이용한 비선형 다변량 공정의 비정상 상태 탐지)

  • Son, Young-Tae;Yun, Deok-Kyun
    • IE interfaces
    • /
    • v.24 no.1
    • /
    • pp.8-14
    • /
    • 2011
  • Principal Component Analysis (PCA) reduces the dimensionality of the process by creating a new set of variables, Principal components (PCs), which attempt to reflect the true underlying process dimension. However, for highly nonlinear processes, this form of monitoring may not be efficient since the process dimensionality can't be represented by a small number of PCs. Examples include the process of semiconductors, pharmaceuticals and chemicals. Nonlinear correlated process variables can be reduced to a set of nonlinear principal components, through the application of Kernel Principal Component Analysis (KPCA). Support Vector Data Description (SVDD) which has roots in a supervised learning theory is a training algorithm based on structural risk minimization. Its control limit does not depend on the distribution, but adapts to the real data. So, in this paper proposes a non-linear process monitoring technique based on supervised learning methods and KPCA. Through simulated examples, it has been shown that the proposed monitoring chart is more effective than $T^2$ chart for nonlinear processes.

Prediction of compressive strength of bacteria incorporated geopolymer concrete by using ANN and MARS

  • X., John Britto;Muthuraj, M.P.
    • Structural Engineering and Mechanics
    • /
    • v.70 no.6
    • /
    • pp.671-681
    • /
    • 2019
  • This paper examines the applicability of artificial neural network (ANN) and multivariate adaptive regression splines (MARS) to predict the compressive strength of bacteria incorporated geopolymer concrete (GPC). The mix is composed of new bacterial strain, manufactured sand, ground granulated blast furnace slag, silica fume, metakaolin and fly ash. The concentration of sodium hydroxide (NaOH) is maintained at 8 Molar, sodium silicate ($Na_2SiO_3$) to NaOH weight ratio is 2.33 and the alkaline liquid to binder ratio of 0.35 and ambient curing temperature ($28^{\circ}C$) is maintained for all the mixtures. In ANN, back-propagation training technique was employed for updating the weights of each layer based on the error in the network output. Levenberg-Marquardt algorithm was used for feed-forward back-propagation. MARS model was developed by establishing a relationship between a set of predictors and dependent variables. MARS is based on a divide and conquers strategy partitioning the training data sets into separate regions; each gets its own regression line. Six models based on ANN and MARS were developed to predict the compressive strength of bacteria incorporated GPC for 1, 3, 7, 28, 56 and 90 days. About 70% of the total 84 data sets obtained from experiments were used for development of the models and remaining 30% data was utilized for testing. From the study, it is observed that the predicted values from the models are found to be in good agreement with the corresponding experimental values and the developed models are robust and reliable.

An Effective Multivariate Control Framework for Monitoring Cloud Systems Performance

  • Hababeh, Ismail;Thabain, Anton;Alouneh, Sahel
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.13 no.1
    • /
    • pp.86-109
    • /
    • 2019
  • Cloud computing systems' performance is still a central focus of research for determining optimal resource utilization. Running several existing benchmarks simultaneously serves to acquire performance information from specific cloud system resources. However, the complexity of monitoring the existing performance of computing systems is a challenge requiring an efficient and interactive user directing performance-monitoring system. In this paper, we propose an effective multivariate control framework for monitoring cloud systems performance. The proposed framework utilizes the hardware cloud systems performance metrics, collects and displays the performance measurements in terms of meaningful graphics, stores the graphical information in a database, and provides the data on-demand without requiring a third party software. We present performance metrics in terms of CPU usage, RAM availability, number of cloud active machines, and number of running processes on the selected machines that can be monitored at a high control level by either using a cloud service customer or a cloud service provider. The experimental results show that the proposed framework is reliable, scalable, precise, and thus outperforming its counterparts in the field of monitoring cloud performance.

Prediction of the compressive strength of self-compacting concrete using surrogate models

  • Asteris, Panagiotis G.;Ashrafian, Ali;Rezaie-Balf, Mohammad
    • Computers and Concrete
    • /
    • v.24 no.2
    • /
    • pp.137-150
    • /
    • 2019
  • In this paper, surrogate models such as multivariate adaptive regression splines (MARS) and M5P model tree (M5P MT) methods have been investigated in order to propose a new formulation for the 28-days compressive strength of self-compacting concrete (SCC) incorporating metakaolin as a supplementary cementitious materials. A database comprising experimental data has been assembled from several published papers in the literature and the data have been used for training and testing. In particular, the data are arranged in a format of seven input parameters covering contents of cement, coarse aggregate to fine aggregate ratio, water, metakaolin, super plasticizer, largest maximum size and binder as well as one output parameter, which is the 28-days compressive strength. The efficiency of the proposed techniques has been demonstrated by means of certain statistical criteria. The findings have been compared to experimental results and their comparisons shows that the MARS and M5P MT approaches predict the compressive strength of SCC incorporating metakaolin with great precision. The performed sensitivity analysis to assign effective parameters on 28-days compressive strength indicates that cementitious binder content is the most effective variable in the mixture.

The Methodological Aspects of Forecasting and the Analysis of Macroeconomic Indicators

  • VYBOROVA, Elena Nikolaevna
    • East Asian Journal of Business Economics (EAJBE)
    • /
    • v.10 no.2
    • /
    • pp.31-42
    • /
    • 2022
  • Purpose - The main research goals by macroeconomic analysis is to assess the effectiveness of state regulation, the sustainability of development, and the financial stability of the state. Research design, Data, and methodology - The research were analyzed using the methods of multivariate statistics and application of the software package Stat graphics. The volume of data from the 1995 to the 2021 was analyzed by Russian Federation. The scale of research on Belarus: to be analyzed the amount of data from the 2015 by 2021, on Kazakhstan - from the 19941, on Kyrgyzstan - from the 2002, on Tajikistan - from the 2008, on Armenia - from the 2021, on Japan - since the 1970, on China - since the 1950, on South Korea - since the 1953. Result - The methods of multivariate statistics was demonstrated exact of result in forecasting of macroeconomic indicators. The most of tendency with the accurate results of are described using the second-degree polynomials. In the most research of country there are the macroeconomic proportion are broken. Conclusion - In the countries studied, the monetary aggregates have a significant growth rate. The shares with a substantial monetary stock and the speed of its growth are divided in the two groups: having placements in the real sectors of the economy and not having received the same result of development from the growth of the monetary stock.

An Outlier Detection Algorithm and Data Integration Technique for Prediction of Hypertension (고혈압 예측을 위한 이상치 탐지 알고리즘 및 데이터 통합 기법)

  • Khongorzul Dashdondov;Mi-Hye Kim;Mi-Hwa Song
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2023.05a
    • /
    • pp.417-419
    • /
    • 2023
  • Hypertension is one of the leading causes of mortality worldwide. In recent years, the incidence of hypertension has increased dramatically, not only among the elderly but also among young people. In this regard, the use of machine-learning methods to diagnose the causes of hypertension has increased in recent years. In this study, we improved the prediction of hypertension detection using Mahalanobis distance-based multivariate outlier removal using the KNHANES database from the Korean national health data and the COVID-19 dataset from Kaggle. This study was divided into two modules. Initially, the data preprocessing step used merged datasets and decision-tree classifier-based feature selection. The next module applies a predictive analysis step to remove multivariate outliers using the Mahalanobis distance from the experimental dataset and makes a prediction of hypertension. In this study, we compared the accuracy of each classification model. The best results showed that the proposed MAH_RF algorithm had an accuracy of 82.66%. The proposed method can be used not only for hypertension but also for the detection of various diseases such as stroke and cardiovascular disease.

Multiple imputation and synthetic data (다중대체와 재현자료 작성)

  • Kim, Joungyoun;Park, Min-Jeong
    • The Korean Journal of Applied Statistics
    • /
    • v.32 no.1
    • /
    • pp.83-97
    • /
    • 2019
  • As society develops, the dissemination of microdata has increased to respond to diverse analytical needs of users. Analysis of microdata for policy making, academic purposes, etc. is highly desirable in terms of value creation. However, the provision of microdata, whose usefulness is guaranteed, has a risk of exposure of personal information. Several methods have been considered to ensure the protection of personal information while ensuring the usefulness of the data. One of these methods has been studied to generate and utilize synthetic data. This paper aims to understand the synthetic data by exploring methodologies and precautions related to synthetic data. To this end, we first explain muptiple imputation, Bayesian predictive model, and Bayesian bootstrap, which are basic foundations for synthetic data. And then, we link these concepts to the construction of fully/partially synthetic data. To understand the creation of synthetic data, we review a real longitudinal synthetic data example which is based on sequential regression multivariate imputation.

Imputation of Missing SST Observation Data Using Multivariate Bidirectional RNN (다변수 Bidirectional RNN을 이용한 표층수온 결측 데이터 보간)

  • Shin, YongTak;Kim, Dong-Hoon;Kim, Hyeon-Jae;Lim, Chaewook;Woo, Seung-Buhm
    • Journal of Korean Society of Coastal and Ocean Engineers
    • /
    • v.34 no.4
    • /
    • pp.109-118
    • /
    • 2022
  • The data of the missing section among the vertex surface sea temperature observation data was imputed using the Bidirectional Recurrent Neural Network(BiRNN). Among artificial intelligence techniques, Recurrent Neural Networks (RNNs), which are commonly used for time series data, only estimate in the direction of time flow or in the reverse direction to the missing estimation position, so the estimation performance is poor in the long-term missing section. On the other hand, in this study, estimation performance can be improved even for long-term missing data by estimating in both directions before and after the missing section. Also, by using all available data around the observation point (sea surface temperature, temperature, wind field, atmospheric pressure, humidity), the imputation performance was further improved by estimating the imputation data from these correlations together. For performance verification, a statistical model, Multivariate Imputation by Chained Equations (MICE), a machine learning-based Random Forest model, and an RNN model using Long Short-Term Memory (LSTM) were compared. For imputation of long-term missing for 7 days, the average accuracy of the BiRNN/statistical models is 70.8%/61.2%, respectively, and the average error is 0.28 degrees/0.44 degrees, respectively, so the BiRNN model performs better than other models. By applying a temporal decay factor representing the missing pattern, it is judged that the BiRNN technique has better imputation performance than the existing method as the missing section becomes longer.

Fuaay Decision Tree Induction to Obliquely Partitioning a Feature Space (특징공간을 사선 분할하는 퍼지 결정트리 유도)

  • Lee, Woo-Hang;Lee, Keon-Myung
    • Journal of KIISE:Software and Applications
    • /
    • v.29 no.3
    • /
    • pp.156-166
    • /
    • 2002
  • Decision tree induction is a kind of useful machine learning approach for extracting classification rules from a set of feature-based examples. According to the partitioning style of the feature space, decision trees are categorized into univariate decision trees and multivariate decision trees. Due to observation error, uncertainty, subjective judgment, and so on, real-world data are prone to contain some errors in their feature values. For the purpose of making decision trees robust against such errors, there have been various trials to incorporate fuzzy techniques into decision tree construction. Several researches hove been done on incorporating fuzzy techniques into univariate decision trees. However, for multivariate decision trees, few research has been done in the line of such study. This paper proposes a fuzzy decision tree induction method that builds fuzzy multivariate decision trees named fuzzy oblique decision trees, To show the effectiveness of the proposed method, it also presents some experimental results.

Estimation of Genetic Variance Components of Body Size Measurements in Hanwoo (Korean Cattle) Using a Multivariate Linear Model

  • Lee, Jung-Jae;Kim, Nae-Soo
    • Journal of Animal Science and Technology
    • /
    • v.52 no.3
    • /
    • pp.167-174
    • /
    • 2010
  • The objectives of this study were to quantify the combination values of the principal components and factors calculated using body measurements of Hanwoo (Korean Cattle) and estimate their heritabilities. The technique of multivariate analysis was used to reduce a large number of variables to a smaller number of new variables and characterize cattle according to body shape. The analyses were performed using 1,979 cattle at 12 months of age and 936 cattle at 24 months of age. The data for the analyses was obtained from progeny tests performed on Korean Cattle for 6 years from 2003 to 2008. The phenotypic correlations among these traits were estimated to range from 0.32 to 0.90 at 12 months of age and from 0.21 to 0.82 at 24 months of age. The first principal components (PC1s) indicated a weighed average of overall body measurements, accounting for 99.91% of the total variation for both periods of test. The two first PCs had positive coefficients for all body measurements. The major sources of PC, such as chest girth (CG), body length (BL), rump height (RH), and wither height (WH) were similar for both test periods. The heritabilities for PC1, the first factor score (FS1), and the second factor score (FS2) were estimated by multivariate REML method. The estimated heritabilities for PC1, FS1, and FS2 were 0.33, 0.38, and 0.40, respectively, at 12 months of age and 0.26, 0.76, and 0.58 at 24 months of age. Further studies are needed to determine whether the heritabilities of FS1 and FS2 at 24 months of age were overestimated.