• 제목/요약/키워드: Multivariate Statistical Method

검색결과 294건 처리시간 0.03초

해양 이상 자료 탐지를 위한 오토인코더 활용 기법 최적화 연구 (An Outlier Detection Using Autoencoder for Ocean Observation Data)

  • 김현재;김동훈;임채욱;신용탁;이상철;최영진;우승범
    • 한국해안·해양공학회논문집
    • /
    • 제33권6호
    • /
    • pp.265-274
    • /
    • 2021
  • 해양 이상 자료 탐지의 연구는 이전부터 활발하게 이루어지고 있으며, 통계 및 거리 기반의 기계 학습 알고리즘을 활용하는 기법들이 개발되었다. 최근에는 AI 기반의 해양 자료 이상 탐지 기법이 많은 관심을 받고 있으며, AI를 활용한 해양 이상 자료 탐지 기법은 정답이 주어지는 지도학습 기법이 주를 이루고 있다. 이러한 방법은 학습에 필요한 모든 자료에 수작업으로 분류 정보(라벨)를 지정해야 한다는 점에서 많은 시간과 비용이 요구된다. 본 연구에서는 이러한 문제를 극복하기 위해 비지도학습 기반의 오토인코더를 이상 자료 탐지 기법에 사용하였다. 실험으로는 오토인코더의 평가를 위해 단변수·다변수학습 두가지 실험을 구성하였고, 단변수 학습은 기상청에서 제공하는 덕적도 부이 정점 관측 자료 중 수온만 사용하였으며, 다변수 학습은 수온과 기온, 풍향, 풍속, 기압, 습도 등을 사용하였다. 사용기간은 1996~2020년의 25년간이며 학습 자료에 해양-기상 자료의 특성을 고려한 전처리 기법을 적용하였다. 학습된 다변수와 단변수 오토인코더를 활용하여 실제 표층 수온에 대한 이상 탐지를 시도하였다. 모델성능 비교를 위해 오차를 삽입한 합성 자료에 다변수와 단변수 오토인코더를 포함한 여러 이상 탐지 기법을 적용하여 정량적으로 평가하였으며, 다변수/단변수의 정확도가 각각 약 96%/91%로써 다변수 오토인코더가 더 나은 이상자료 탐지 성능을 보였다. 오토인코더를 이용한 비지도학습 기반 이상 탐지 기법은 주관적 판단에 의한 오류와 자료 라벨링에 필요한 시간과 비용을 줄일 수 있다는 점에서 다양하게 활용될 것으로 판단된다.

웨이블릿을 이용한 주기 신호 데이터의 이상 탐지에 관한 연구 (A Study on Fault Detection of Cycle-based Signals using Wavelet Transform)

  • 이재현;김지현;황지빈;김성식
    • 한국시뮬레이션학회논문지
    • /
    • 제16권4호
    • /
    • pp.13-22
    • /
    • 2007
  • 주기 신호 데이터를 가지는 공정의 이상 탐지를 위해 대표값을 사용하는 단변량 SPC 차트나 PCA, PLS 등과 같은 다변량 통계방법들이 사용되고 있다. 이러한 방법들은 주기 신호 데이터의 다양한 정보를 분석하는데 한계가 있다. 본 연구에서는 Haar 웨이블릿 변환을 이용하여 주기 신호의 형태를 반영하는 웨이블릿 계수를 구하고, 이 계수들에 SPC 차트를 적용하여 공정 이상여부를 탐지하였다. 본 논문에서는 보다 효율적인 이상 신호 탐지를 위해 웨이블릿을 이용한 잡음 제거 기법과 Haar 웨이블릿 계수의 분산 차이를 이용한 중요 계수 선택 방법을 제안하였다. 다양한 이상 상황에 대하여 시뮬레이션을 통하여 제안한 알고리즘의 효율성을 확인하였다.

  • PDF

관상동맥우회술의 중증도 측정과 병원 사망률 비교에 관한 연구 (Severity Measurement Methods and Comparing Hospital Death Rates for Coronary Artery Bypass Graft Surgery)

  • 안형식;신영수;권영대
    • Journal of Preventive Medicine and Public Health
    • /
    • 제34권3호
    • /
    • pp.244-252
    • /
    • 2001
  • Objective : Health insurers and policy makers are increasingly examining the hospital mortality rate as an indicator of hospital quality and performance. To be meaningful, a risk-adjustment of the death rates must be implemented. This study reviewed 5 severity measurement methods and applied them to the same data set to determine whether judgments regarding the severity-adjusted hospital mortality rates were sensitive to the specific severity measure. Methods : The medical records of 584 patients who underwent coronary artery bypass graft surgery in 6 general hospitals during 1996 and 1997 were reviewed by trained nurses. The MedisGroups, Disease Staging, Computerized Severity Index, APACHE III and KDRG were used to quantify severity of the patients. The predictive probability of death was calculated for each patient in the sample from a multivariate logistic regression model including the severity score, age and sex to evaluate the hospitals' performance, the ratio of the observed number of deaths to the expected number for each hospital was calculated. Results : The overall in-hospital mortality rate was 7.0%, ranging from 2.7% to 15.7% depending on the particular hospital. After the severity adjustment, the mortality rates for each hospital showed little difference according to the severity measure. The 5 severity measurement methods varied in their statistical performance. All had a higher c statistic and $R^2$ than the model containing only age and sex. There was a little difference in the relative hospital performance evaluation by the severity measure. Conclusion : These results suggest that judgments regarding a hospital's performance based on severity adjusted mortality can be sensitive to the severity measurement method. Although the 5 severity measures regarding hospital performance concurred, more often than would be expected by chance, the assessment of an individual hospital mortality rates varied by the different severity measurement method used.

  • PDF

코퓰러과 극단치이론을 이용한 위험척도의 추정 및 성과분석 (Estimation and Performance Analysis of Risk Measures using Copula and Extreme Value Theory)

  • 여성칠
    • 응용통계연구
    • /
    • 제19권3호
    • /
    • pp.481-504
    • /
    • 2006
  • 금융위험의 측정 및 관리를 위한 도구로서 분포의 꼬리 부분과 관련한 위험척도로 VaR가 현재 널리 활용되고 있다. 특히 VaR의 정확한 추정을 위해 정규분포를 가정한 기존의 방법보다는 극단치이론을 이용한 방법이 최근 관심을 끌고 있다. 지금까지 극단치이론을 이용한VaR의 추정에 관한 연구는 대부분 단변량의 경우에 대해 이루어졌다. 본 논문에서는 코퓰러를 극단치이론에 결부시켜 다변량 극단치분포를 모형화하여 포트폴리오 위험측정을 다루고 있다. 특히 본 연구에서는 포트폴리오 위험 척도로 VaR와 더불어 ES에 대한 추정 방법도 함께 논의하였다. 포트폴리오 위험측정을 위한 방법으로 본 논문에서 논의한 코퓰러-극단치이론에 의한 접근방법이 기존의 분산-공분산 방법보다 상대적으로 우수한지를 실증자료에 대한 사후검증을 통해 살펴보았다.

분할법을 활용한 범주형자료의 다차원척도법 (Multidimensional scaling of categorical data using the partition method)

  • 신상민;천선경;최용석
    • 응용통계연구
    • /
    • 제31권1호
    • /
    • pp.67-75
    • /
    • 2018
  • 다차원척도법은 개체간의 비유사성을 저차원 공간에 기하적으로 표현하기 위한 다변량 자료의 탐색적 분석기법이다. 그러나 일반적인 다차원척도그림에서는 개체들의 유사성 정보만이 표현될 뿐 변수와 관련된 정보가 나타나지 않기 때문에 그림의 해석 상에 한계점이 존재한다. 본 연구에서는 범주형 자료를 다중표시행렬로 변환하고 Torgerson (1958)의 알고리즘에 의한 다차원척도법을 적용하여 개체들의 군집화 성향과 군집들의 상대적 크기를 다차원척도그림으로 시각화하였다. 그리고 Shin 등 (2015)의 분할법을 적용하여 범주형변수의 범주수준별 정보를 다차원척도그림 상에 투영하여 추가적인 정보를 표현하였다. 따라서 본 연구에서 제안하고자 하는 다차원척도그림을 이용하면 개체들의 유사성 정보와 함께 범주형변수들 사이의 연관성도 탐색할 수 있는 장점이 있다.

GM 고추의 환경위해성 평가 프로토콜 작성을 위한 농업적 형질 분석 (Analysis of Agricultural Characters to Establish the Evaluating Protocol and Standard Assessment for Genetically Modified Peppers)

  • 조동욱;정규환
    • 한국환경과학회지
    • /
    • 제20권9호
    • /
    • pp.1183-1190
    • /
    • 2011
  • This study was aimed to establish the evaluating protocol and standard assessment for genetically modified (GM) hot pepper and to find out a proper statistic method to analyze for equality of agricultural characters between GM and non-GM pepper lines. GM and non-GM hot pepper lines were cultivated in two GMO fields in the middle region of Korea and total of 52 agricultural characters were collected during the plant growing season for 4 years, 2007 to 2010. Levene's test was conducted to confirm the homogeneity of raw data before statistic analysis. Two-way ANOVA in the multivariate tests and t-test were conducted to analyze 52 agricultural characters in order to find out the equality between H15 and P2377. From the statistical analysis through two-way ANOVA, 16 out of 16 plant growth traits, 9 out of 18 green fruit traits and 7 out of 18 red fruit traits among 4 years and 9 out of 16 plant growth traits, 4 out of 18 green fruit traits and 3 out of 18 red fruit traits between H15 and P2377 have shown the statistic differences. With the same raw data of 52 agricultural characters, t-test was also conducted. Based on the result from t-test, only 1 out of 16 plant growth traits, 2 out of 18 green fruit traits and 1 out of 18 red fruit traits have shown the differences between H15 and P2377, so that it was concluded that there is no statistic difference between H15 and P2377 in terms of agricultural characters. Also, the t-test is a proper statistic method to analyze each trait between GM and its control lines in order to evaluate agricultural characters.

위전절제술 시 식도측 절제연 암 침윤의 예후적 가치 (Prognostic Value of Esophageal Resectionline Involvement in a Total Gastrectomy for Gastric Cancer)

  • 권성준
    • Journal of Gastric Cancer
    • /
    • 제1권3호
    • /
    • pp.168-173
    • /
    • 2001
  • Purpose: A positive esophageal margin is encountered in a total gastrectomy not infrequently. The aim of this retrospective review was to evaluate whether a positive esophageal margin predisposes a patient to loco-regional recurrence and whether it has an independent impact on long-term survival. Materials and Methods: A retrospective review of 224 total gastrectomies for adenocarcinomas was undertaken. The Chisquare test was used to determine the statistical significance of differences, and the Kaplan-Meier method was used to calculate survival rates. Significant differences in the survival rates were assessed using the log-rank test, and independent prognostic significance was evaluated using the Cox regression method. Results: The prevalence of esophageal margin involvement was $3.6\%$ (8/224). Univariate analysis showed that advanced stage (stage III/IV), tumor size ($\geq$5 cm), tumor site (whole or upper one-third of the stomach), macroscopic type (Borrmann type 4), esophageal invasion, esophageal margin involvement, lymphatic invasion, and venous invasion affected survival. Multivariate analysis demonstrated that TNM stage, venous invasion, and esophageal margin involvement were the only significant factors influencing the prognosis. All patients with a positive esophageal margin died with metastasis before local recurrence became a problem. A macroscopic proximal distance of more than 6 cm of esophagus was needed to be free of tumors, excluding one exceptional case which involved 15 cm of esophagus. Conclusion: All of the patients with a positive proximal resection margin after a total gastrectomy had advanced disease with a poor prognosis, but they were not predisposed to anastomotic recurrence. Early detection and extended, but reasonable, surgical resection of curable lesions are mandatory to improve the prognosis.

  • PDF

江華地域 降水의 汚染度 評價에 關한 硏究 (A Study on the Assessment of Pollution Level of Precipitation at Kangwha, 1992)

  • 강공언;강병욱;김희강
    • 한국대기환경학회지
    • /
    • 제11권1호
    • /
    • pp.57-68
    • /
    • 1995
  • Precipitation samples were collected by a wet-only automatic acid precipitation sampler at Kangwha island on the western coast in Korea, through January until December 1992. pH, electric conductivity and the concentrations of major water-soluble ion components such as N $H_{4}$$^{+}$, $Ca^{2+}$, $K^{+}$, $Mg^{2+}$, N $a^{+}$, N $O_{3}$$^{[-10]}$ , S $O_{4}$$^{2-}$ and C $l^{[-10]}$ were measured. From the result of checking the validity for assesment of pollution level of precipitation samples by pH using correlation analysis between pH and major components, and t-test of chemical composition between acid rain and non-acid rain, pH proved to be not satisfactory for its pillution level. A more comprehensive method is therefore required. In order to estimate the monthly analytical result of chemical composition of precipitation samples comprehensively, a cluster analysis was used among the various multivariate statistical analysis. As a result of making a cluster analysis for separating the monthly precipitation samples into homogeneous patterns by setting the concentrations of nine major water-soluble ion components as a variable, three homogeneous patterns were obtained. The first pattern was a group of months having average ion concentrations, the second a guoup of months having low ion concentration, and the third a group of months having high ion concentrations. Thus, it was indicated that the pollution level of precipitation was higher on February and lower on May, June, August and September than the other months. As a result, this analysis method could be estimated the chemical coposition of precipitation regionally as well as monthly.monthly.

  • PDF

계층적 벌점함수를 이용한 주성분분석 (Hierarchically penalized sparse principal component analysis)

  • 강종경;박재신;방성완
    • 응용통계연구
    • /
    • 제30권1호
    • /
    • pp.135-145
    • /
    • 2017
  • 주성분 분석(principal component analysis; PCA)은 서로 상관되어 있는 다변량 자료의 차원을 축소하는 대표적인 기법으로 많은 다변량 분석에서 활용되고 있다. 하지만 주성분은 모든 변수들의 선형결합으로 이루어지므로, 그 결과의 해석이 어렵다는 한계가 있다. sparse PCA(SPCA) 방법은 elastic net 형태의 벌점함수를 이용하여 보다 성긴(sparse) 적재를 가진 수정된 주성분을 만들어주지만, 변수들의 그룹구조를 이용하지 못한다는 한계가 있다. 이에 본 연구에서는 기존 SPCA를 개선하여, 자료가 그룹화되어 있는 경우에 유의한 그룹을 선택함과 동시에 그룹 내 불필요한 변수를 제거할 수 있는 새로운 주성분 분석 방법을 제시하고자 한다. 그룹과 그룹 내 변수 구조를 모형 적합에 이용하기 위하여, sparse 주성분 분석에서의 elastic net 벌점함수 대신에 계층적 벌점함수 형태를 고려하였다. 또한 실제 자료의 분석을 통해 제안 방법의 성능 및 유용성을 입증하였다.

유통업체의 부실예측모형 개선에 관한 연구 (Performance Evaluation and Forecasting Model for Retail Institutions)

  • 김정욱
    • 유통과학연구
    • /
    • 제12권11호
    • /
    • pp.77-83
    • /
    • 2014
  • Purpose - The National Agricultural Cooperative Federation of Korea and National Fisheries Cooperative Federation of Korea have prosecuted both financial and retail businesses. As cooperatives are public institutions and receive government support, their sound management is required by the Financial Supervisory Service in Korea. This is mainly managed by CAEL, which is changed by CAMEL. However, NFFC's business section, managing the finance and retail businesses, is unified and evaluated; the CAEL model has an insufficient classification to evaluate the retail industry. First, there is discrimination power as regards CAEL. Although the retail business sector union can receive a higher rating on a CAEL model, defaults have often been reported. Therefore, a default prediction model is needed to support a CAEL model. As we have the default prediction model using a subdivision of indexes and statistical methods, it can be useful to have a prevention function through the estimation of the retail sector's default probability. Second, separating the difference between the finance and retail business sectors is necessary. Their businesses have different characteristics. Based on various management indexes that have been systematically managed by the National Fisheries Cooperative Federation of Korea, our model predicts retail default, and is better than the CAEL model in its failure prediction because it has various discriminative financial ratios reflecting the retail industry situation. Research design, data, and methodology - The model to predict retail default was presented using logistic analysis. To develop the predictive model, we use the retail financial statements of the NFCF. We consider 93 unions each year from 2006 to 2012 to select confident management indexes. We also adapted the statistical power analysis that is a t-test, logit analysis, AR (accuracy ratio), and AUROC (Area Under Receiver Operating Characteristic) analysis. Finally, through the multivariate logistic model, we show that it is excellent in its discrimination power and higher in its hit ratio for default prediction. We also evaluate its usefulness. Results - The statistical power analysis using the AR (AUROC) method on the short term model shows that the logistic model has excellent discrimination power, with 84.6%. Further, it is higher in its hit ratio for failure (prediction) of total model, at 94%, indicating that it is temporally stable and useful for evaluating the management status of retail institutions. Conclusions - This model is useful for evaluating the management status of retail union institutions. First, subdividing CAEL evaluation is required. The existing CAEL evaluation is underdeveloped, and discrimination power falls. Second, efforts to develop a varied and rational management index are continuously required. An index reflecting retail industry characteristics needs to be developed. However, extending this study will need the following. First, it will require a complementary default model reflecting size differences. Second, in the case of small and medium retail, it will need non-financial information. Therefore, it will be a hybrid default model reflecting financial and non-financial information.