• Title/Summary/Keyword: Multivariate Statistical Method

Search Result 294, Processing Time 0.025 seconds

An Outlier Detection Using Autoencoder for Ocean Observation Data (해양 이상 자료 탐지를 위한 오토인코더 활용 기법 최적화 연구)

  • Kim, Hyeon-Jae;Kim, Dong-Hoon;Lim, Chaewook;Shin, Yongtak;Lee, Sang-Chul;Choi, Youngjin;Woo, Seung-Buhm
    • Journal of Korean Society of Coastal and Ocean Engineers
    • /
    • v.33 no.6
    • /
    • pp.265-274
    • /
    • 2021
  • Outlier detection research in ocean data has traditionally been performed using statistical and distance-based machine learning algorithms. Recently, AI-based methods have received a lot of attention and so-called supervised learning methods that require classification information for data are mainly used. This supervised learning method requires a lot of time and costs because classification information (label) must be manually designated for all data required for learning. In this study, an autoencoder based on unsupervised learning was applied as an outlier detection to overcome this problem. For the experiment, two experiments were designed: one is univariate learning, in which only SST data was used among the observation data of Deokjeok Island and the other is multivariate learning, in which SST, air temperature, wind direction, wind speed, air pressure, and humidity were used. Period of data is 25 years from 1996 to 2020, and a pre-processing considering the characteristics of ocean data was applied to the data. An outlier detection of actual SST data was tried with a learned univariate and multivariate autoencoder. We tried to detect outliers in real SST data using trained univariate and multivariate autoencoders. To compare model performance, various outlier detection methods were applied to synthetic data with artificially inserted errors. As a result of quantitatively evaluating the performance of these methods, the multivariate/univariate accuracy was about 96%/91%, respectively, indicating that the multivariate autoencoder had better outlier detection performance. Outlier detection using an unsupervised learning-based autoencoder is expected to be used in various ways in that it can reduce subjective classification errors and cost and time required for data labeling.

A Study on Fault Detection of Cycle-based Signals using Wavelet Transform (웨이블릿을 이용한 주기 신호 데이터의 이상 탐지에 관한 연구)

  • Lee, Jae-Hyun;Kim, Ji-Hyun;Hwang, Ji-Bin;Kim, Sung-Shick
    • Journal of the Korea Society for Simulation
    • /
    • v.16 no.4
    • /
    • pp.13-22
    • /
    • 2007
  • Fault detection of cycle-based signals is typically performed using statistical approaches. Univariate SPC using few representative statistics and multivariate analysis methods such as PCA and PLS are the most popular methods for analyzing cycle-based signals. However, such approaches are limited when dealing with information-rich cycle-based signals. In this paper, process fault defection method based on wavelet analysis is proposed. Using Haar wavelet, coefficients that well reflect the process condition are selected. Next, Hotelling's $T^2$ chart using selected coefficients is constructed for assessment of process condition. To enhance the overall efficiency of fault detection, the following two steps are suggested, i.e. denoising method based on wavelet transform and coefficient selection methods using variance difference. For performance evaluation, various types of abnormal process conditions are simulated and the proposed algorithm is compared with other methodologies.

  • PDF

Severity Measurement Methods and Comparing Hospital Death Rates for Coronary Artery Bypass Graft Surgery (관상동맥우회술의 중증도 측정과 병원 사망률 비교에 관한 연구)

  • Ahn, Hyung-Sik;Shin, Young-Soo;Kwon, Young-Dae
    • Journal of Preventive Medicine and Public Health
    • /
    • v.34 no.3
    • /
    • pp.244-252
    • /
    • 2001
  • Objective : Health insurers and policy makers are increasingly examining the hospital mortality rate as an indicator of hospital quality and performance. To be meaningful, a risk-adjustment of the death rates must be implemented. This study reviewed 5 severity measurement methods and applied them to the same data set to determine whether judgments regarding the severity-adjusted hospital mortality rates were sensitive to the specific severity measure. Methods : The medical records of 584 patients who underwent coronary artery bypass graft surgery in 6 general hospitals during 1996 and 1997 were reviewed by trained nurses. The MedisGroups, Disease Staging, Computerized Severity Index, APACHE III and KDRG were used to quantify severity of the patients. The predictive probability of death was calculated for each patient in the sample from a multivariate logistic regression model including the severity score, age and sex to evaluate the hospitals' performance, the ratio of the observed number of deaths to the expected number for each hospital was calculated. Results : The overall in-hospital mortality rate was 7.0%, ranging from 2.7% to 15.7% depending on the particular hospital. After the severity adjustment, the mortality rates for each hospital showed little difference according to the severity measure. The 5 severity measurement methods varied in their statistical performance. All had a higher c statistic and $R^2$ than the model containing only age and sex. There was a little difference in the relative hospital performance evaluation by the severity measure. Conclusion : These results suggest that judgments regarding a hospital's performance based on severity adjusted mortality can be sensitive to the severity measurement method. Although the 5 severity measures regarding hospital performance concurred, more often than would be expected by chance, the assessment of an individual hospital mortality rates varied by the different severity measurement method used.

  • PDF

Estimation and Performance Analysis of Risk Measures using Copula and Extreme Value Theory (코퓰러과 극단치이론을 이용한 위험척도의 추정 및 성과분석)

  • Yeo, Sung-Chil
    • The Korean Journal of Applied Statistics
    • /
    • v.19 no.3
    • /
    • pp.481-504
    • /
    • 2006
  • VaR, a tail-related risk measure is now widely used as a tool for a measurement and a management of financial risks. For more accurate measurement of VaR, recently we are particularly concerned about the approach based on extreme value theory rather than the traditional method based on the assumption of normal distribution. However, many studies about the approaches using extreme value theory was done only for the univariate case. In this paper, we discuss portfolio risk measurements with modelling multivariate extreme value distributions by combining copulas and extreme value theory. We also discuss the estimation of ES together with VaR as portfolio risk measures. Finally, we investigate the relative superiority of EVT-copula approach than variance-covariance method through the back-testing of an empirical data.

Multidimensional scaling of categorical data using the partition method (분할법을 활용한 범주형자료의 다차원척도법)

  • Shin, Sang Min;Chun, Sun-Kyung;Choi, Yong-Seok
    • The Korean Journal of Applied Statistics
    • /
    • v.31 no.1
    • /
    • pp.67-75
    • /
    • 2018
  • Multidimensional scaling (MDS) is an exploratory analysis of multivariate data to represent the dissimilarity among objects in the geometric low-dimensional space. However, a general MDS map only shows the information of objects without any information about variables. In this study, we used MDS based on the algorithm of Torgerson (Theory and Methods of Scaling, Wiley, 1958) to visualize some clusters of objects in categorical data. For this, we convert given data into a multiple indicator matrix. Additionally, we added the information of levels for each categorical variable on the MDS map by applying the partition method of Shin et al. (Korean Journal of Applied Statistics, 28, 1171-1180, 2015). Therefore, we can find information on the similarity among objects as well as find associations among categorical variables using the proposed MDS map.

Analysis of Agricultural Characters to Establish the Evaluating Protocol and Standard Assessment for Genetically Modified Peppers (GM 고추의 환경위해성 평가 프로토콜 작성을 위한 농업적 형질 분석)

  • Cho, Dong-Wook;Chung, Kyu-Hwan
    • Journal of Environmental Science International
    • /
    • v.20 no.9
    • /
    • pp.1183-1190
    • /
    • 2011
  • This study was aimed to establish the evaluating protocol and standard assessment for genetically modified (GM) hot pepper and to find out a proper statistic method to analyze for equality of agricultural characters between GM and non-GM pepper lines. GM and non-GM hot pepper lines were cultivated in two GMO fields in the middle region of Korea and total of 52 agricultural characters were collected during the plant growing season for 4 years, 2007 to 2010. Levene's test was conducted to confirm the homogeneity of raw data before statistic analysis. Two-way ANOVA in the multivariate tests and t-test were conducted to analyze 52 agricultural characters in order to find out the equality between H15 and P2377. From the statistical analysis through two-way ANOVA, 16 out of 16 plant growth traits, 9 out of 18 green fruit traits and 7 out of 18 red fruit traits among 4 years and 9 out of 16 plant growth traits, 4 out of 18 green fruit traits and 3 out of 18 red fruit traits between H15 and P2377 have shown the statistic differences. With the same raw data of 52 agricultural characters, t-test was also conducted. Based on the result from t-test, only 1 out of 16 plant growth traits, 2 out of 18 green fruit traits and 1 out of 18 red fruit traits have shown the differences between H15 and P2377, so that it was concluded that there is no statistic difference between H15 and P2377 in terms of agricultural characters. Also, the t-test is a proper statistic method to analyze each trait between GM and its control lines in order to evaluate agricultural characters.

Prognostic Value of Esophageal Resectionline Involvement in a Total Gastrectomy for Gastric Cancer (위전절제술 시 식도측 절제연 암 침윤의 예후적 가치)

  • Kwon, Sung-Joon
    • Journal of Gastric Cancer
    • /
    • v.1 no.3
    • /
    • pp.168-173
    • /
    • 2001
  • Purpose: A positive esophageal margin is encountered in a total gastrectomy not infrequently. The aim of this retrospective review was to evaluate whether a positive esophageal margin predisposes a patient to loco-regional recurrence and whether it has an independent impact on long-term survival. Materials and Methods: A retrospective review of 224 total gastrectomies for adenocarcinomas was undertaken. The Chisquare test was used to determine the statistical significance of differences, and the Kaplan-Meier method was used to calculate survival rates. Significant differences in the survival rates were assessed using the log-rank test, and independent prognostic significance was evaluated using the Cox regression method. Results: The prevalence of esophageal margin involvement was $3.6\%$ (8/224). Univariate analysis showed that advanced stage (stage III/IV), tumor size ($\geq$5 cm), tumor site (whole or upper one-third of the stomach), macroscopic type (Borrmann type 4), esophageal invasion, esophageal margin involvement, lymphatic invasion, and venous invasion affected survival. Multivariate analysis demonstrated that TNM stage, venous invasion, and esophageal margin involvement were the only significant factors influencing the prognosis. All patients with a positive esophageal margin died with metastasis before local recurrence became a problem. A macroscopic proximal distance of more than 6 cm of esophagus was needed to be free of tumors, excluding one exceptional case which involved 15 cm of esophagus. Conclusion: All of the patients with a positive proximal resection margin after a total gastrectomy had advanced disease with a poor prognosis, but they were not predisposed to anastomotic recurrence. Early detection and extended, but reasonable, surgical resection of curable lesions are mandatory to improve the prognosis.

  • PDF

A Study on the Assessment of Pollution Level of Precipitation at Kangwha, 1992 (江華地域 降水의 汚染度 評價에 關한 硏究)

  • 강공언;강병욱;김희강
    • Journal of Korean Society for Atmospheric Environment
    • /
    • v.11 no.1
    • /
    • pp.57-68
    • /
    • 1995
  • Precipitation samples were collected by a wet-only automatic acid precipitation sampler at Kangwha island on the western coast in Korea, through January until December 1992. pH, electric conductivity and the concentrations of major water-soluble ion components such as N $H_{4}$$^{+}$, $Ca^{2+}$, $K^{+}$, $Mg^{2+}$, N $a^{+}$, N $O_{3}$$^{[-10]}$ , S $O_{4}$$^{2-}$ and C $l^{[-10]}$ were measured. From the result of checking the validity for assesment of pollution level of precipitation samples by pH using correlation analysis between pH and major components, and t-test of chemical composition between acid rain and non-acid rain, pH proved to be not satisfactory for its pillution level. A more comprehensive method is therefore required. In order to estimate the monthly analytical result of chemical composition of precipitation samples comprehensively, a cluster analysis was used among the various multivariate statistical analysis. As a result of making a cluster analysis for separating the monthly precipitation samples into homogeneous patterns by setting the concentrations of nine major water-soluble ion components as a variable, three homogeneous patterns were obtained. The first pattern was a group of months having average ion concentrations, the second a guoup of months having low ion concentration, and the third a group of months having high ion concentrations. Thus, it was indicated that the pollution level of precipitation was higher on February and lower on May, June, August and September than the other months. As a result, this analysis method could be estimated the chemical coposition of precipitation regionally as well as monthly.monthly.

  • PDF

Hierarchically penalized sparse principal component analysis (계층적 벌점함수를 이용한 주성분분석)

  • Kang, Jongkyeong;Park, Jaeshin;Bang, Sungwan
    • The Korean Journal of Applied Statistics
    • /
    • v.30 no.1
    • /
    • pp.135-145
    • /
    • 2017
  • Principal component analysis (PCA) describes the variation of multivariate data in terms of a set of uncorrelated variables. Since each principal component is a linear combination of all variables and the loadings are typically non-zero, it is difficult to interpret the derived principal components. Sparse principal component analysis (SPCA) is a specialized technique using the elastic net penalty function to produce sparse loadings in principal component analysis. When data are structured by groups of variables, it is desirable to select variables in a grouped manner. In this paper, we propose a new PCA method to improve variable selection performance when variables are grouped, which not only selects important groups but also removes unimportant variables within identified groups. To incorporate group information into model fitting, we consider a hierarchical lasso penalty instead of the elastic net penalty in SPCA. Real data analyses demonstrate the performance and usefulness of the proposed method.

Performance Evaluation and Forecasting Model for Retail Institutions (유통업체의 부실예측모형 개선에 관한 연구)

  • Kim, Jung-Uk
    • Journal of Distribution Science
    • /
    • v.12 no.11
    • /
    • pp.77-83
    • /
    • 2014
  • Purpose - The National Agricultural Cooperative Federation of Korea and National Fisheries Cooperative Federation of Korea have prosecuted both financial and retail businesses. As cooperatives are public institutions and receive government support, their sound management is required by the Financial Supervisory Service in Korea. This is mainly managed by CAEL, which is changed by CAMEL. However, NFFC's business section, managing the finance and retail businesses, is unified and evaluated; the CAEL model has an insufficient classification to evaluate the retail industry. First, there is discrimination power as regards CAEL. Although the retail business sector union can receive a higher rating on a CAEL model, defaults have often been reported. Therefore, a default prediction model is needed to support a CAEL model. As we have the default prediction model using a subdivision of indexes and statistical methods, it can be useful to have a prevention function through the estimation of the retail sector's default probability. Second, separating the difference between the finance and retail business sectors is necessary. Their businesses have different characteristics. Based on various management indexes that have been systematically managed by the National Fisheries Cooperative Federation of Korea, our model predicts retail default, and is better than the CAEL model in its failure prediction because it has various discriminative financial ratios reflecting the retail industry situation. Research design, data, and methodology - The model to predict retail default was presented using logistic analysis. To develop the predictive model, we use the retail financial statements of the NFCF. We consider 93 unions each year from 2006 to 2012 to select confident management indexes. We also adapted the statistical power analysis that is a t-test, logit analysis, AR (accuracy ratio), and AUROC (Area Under Receiver Operating Characteristic) analysis. Finally, through the multivariate logistic model, we show that it is excellent in its discrimination power and higher in its hit ratio for default prediction. We also evaluate its usefulness. Results - The statistical power analysis using the AR (AUROC) method on the short term model shows that the logistic model has excellent discrimination power, with 84.6%. Further, it is higher in its hit ratio for failure (prediction) of total model, at 94%, indicating that it is temporally stable and useful for evaluating the management status of retail institutions. Conclusions - This model is useful for evaluating the management status of retail union institutions. First, subdividing CAEL evaluation is required. The existing CAEL evaluation is underdeveloped, and discrimination power falls. Second, efforts to develop a varied and rational management index are continuously required. An index reflecting retail industry characteristics needs to be developed. However, extending this study will need the following. First, it will require a complementary default model reflecting size differences. Second, in the case of small and medium retail, it will need non-financial information. Therefore, it will be a hybrid default model reflecting financial and non-financial information.