• Title/Summary/Keyword: 통계치

Search Result 2,364, Processing Time 0.026 seconds

Outlier Detection Using Support Vector Machines (서포트벡터 기계를 이용한 이상치 진단)

  • Seo, Han-Son;Yoon, Min
    • Communications for Statistical Applications and Methods
    • /
    • v.18 no.2
    • /
    • pp.171-177
    • /
    • 2011
  • In order to construct approximation functions for real data, it is necessary to remove the outliers from the measured raw data before constructing the model. Conventionally, visualization and maximum residual error have been used for outlier detection, but they often fail to detect outliers for nonlinear functions with multidimensional input. Although the standard support vector regression based outlier detection methods for nonlinear function with multidimensional input have achieved good performance, they have practical issues in computational cost and parameter adjustments. In this paper we propose a practical approach to outlier detection using support vector regression that reduces computational time and defines outlier threshold suitably. We apply this approach to real data examples for validity.

Fast robust variable selection using VIF regression in large datasets (대형 데이터에서 VIF회귀를 이용한 신속 강건 변수선택법)

  • Seo, Han Son
    • The Korean Journal of Applied Statistics
    • /
    • v.31 no.4
    • /
    • pp.463-473
    • /
    • 2018
  • Variable selection algorithms for linear regression models of large data are considered. Many algorithms are proposed focusing on the speed and the robustness of algorithms. Among them variance inflation factor (VIF) regression is fast and accurate due to the use of a streamwise regression approach. But a VIF regression is susceptible to outliers because it estimates a model by a least-square method. A robust criterion using a weighted estimator has been proposed for the robustness of algorithm; in addition, a robust VIF regression has also been proposed for the same purpose. In this article a fast and robust variable selection method is suggested via a VIF regression with detecting and removing potential outliers. A simulation study and an analysis of a dataset are conducted to compare the suggested method with other methods.

Separating Signals and Noises Using Mixture Model and Multiple Testing (혼합모델 및 다중 가설 검정을 이용한 신호와 잡음의 분류)

  • Park, Hae-Sang;Yoo, Si-Won;Jun, Chi-Hyuck
    • The Korean Journal of Applied Statistics
    • /
    • v.22 no.4
    • /
    • pp.759-770
    • /
    • 2009
  • A problem of separating signals from noises is considered, when they are randomly mixed in the observation. It is assumed that the noise follows a Gaussian distribution and the signal follows a Gamma distribution, thus the underlying distribution of an observation will be a mixture of Gaussian and Gamma distributions. The parameters of the mixture model will be estimated from the EM algorithm. Then the signals and noises will be classified by a fixed threshold approach based on multiple testing using positive false discovery rate and Bayes error. The proposed method is applied to a real optical emission spectroscopy data for the quantitative analysis of inclusions. A simulation is carried out to compare the performance with the existing method using 3 sigma rule.

A Study on the Stochastic Modeling for Stream Flow Generation (하천유량의 모의발생을 위한 추계학적 모형의 적용에 관한 연구)

  • Lee, Joo-Heon
    • Journal of the Korean Society of Hazard Mitigation
    • /
    • v.1 no.2 s.2
    • /
    • pp.115-121
    • /
    • 2001
  • The purpose of the synthetic generation of monthly river flows based on the short term observed data by means of stochastic models is to provide abundant input data to the water resources systems of which the system performance and operation policy are to be determined beforehand. In this study, a multivariate autoregressive model has been applied to generate monthly flows of the multi sites considering the correlations between each site. The model performance was examined using statistical comparisons between the historical and generated monthly series such as mean, variance, skewness and correlation coefficients. The results of this study showed that the modeled generated flows were statistically similar to the historical flows.

  • PDF

L-Estimation for the Parameter of the AR(l) Model (AR(1) 모형의 모수에 대한 L-추정법)

  • Han Sang Moon;Jung Byoung Cheal
    • The Korean Journal of Applied Statistics
    • /
    • v.18 no.1
    • /
    • pp.43-56
    • /
    • 2005
  • In this study, a robust estimation method for the first-order autocorrelation coefficient in the time series model following AR(l) process with additive outlier(AO) is investigated. We propose the L-type trimmed least squares estimation method using the preliminary estimator (PE) suggested by Rupport and Carroll (1980) in multiple regression model. In addition, using Mallows' weight function in order to down-weight the outlier of X-axis, the bounded-influence PE (BIPE) estimator is obtained and the mean squared error (MSE) performance of various estimators for autocorrelation coefficient are compared using Monte Carlo experiments. From the results of Monte-Carlo study, the efficiency of BIPE(LAD) estimator using the generalized-LAD to preliminary estimator performs well relative to other estimators.

A Multivariate Model Development For Stream Flow Generation (다변량 모형에 의한 하천유량의 모의 발생)

  • 정상만
    • Water for future
    • /
    • v.24 no.4
    • /
    • pp.67-72
    • /
    • 1991
  • Various modeling approaches to study along term behavior of streamflow or groundwater storagge have been conducted. In this study, a Multivariate AR (1) Model has been applied to generate monthly flows of the one key station which has historical flows using monthly flows of the three subordinate stations. The Model performance was examined using statistical comparisons between the historical and generated monthly series such as mean, various, skewness. Also, the correlation coefficients(lag-zero, and lag-one)between the two monthly flows were compared. The results showed that the modeled generated flows were statistically similar to the historical flows.

  • PDF

Comparing Highway Traffic Noise Emission Levels Using Individual UofL State - specific Data - Based on Open Space - (루이빌대 개별State-specific 데이터를 이용한 도로 교통소음 수준 비교 - 오픈공간에서 -)

  • Teak K.;Roswell A. Harris;Louis F. Cohn
    • Transactions of the Korean Society for Noise and Vibration Engineering
    • /
    • v.14 no.4
    • /
    • pp.276-286
    • /
    • 2004
  • 현재. 미 연방도로부에서는 도로교통소음분석을 위한 예측모형 (TNM & STAMINA)을 미 전 지역에 제공하고 있고, 이와 관련된 여러가지 연구논문들이 수행되고 있는바, 모델을 이용한 예측치와 실측치 간의 비교$.$분석 연구논문을 통하여 차이점이 존재하는 것을 증명하고 있다. 따라서 본 연구논문은 소음예측모형의 핵심자료로 사용될 수 있는 루이빌대(UofL) 회귀모형들을 차종별 (소형, 중형, 대형) 그리고 주별 (아리조나. 콜로라도, 조지아, 캔사스, 와싱톤)로 구분하여 그 차이점을 통계적으로 비교$.$분석$.$결론을 도출하였다. 그 결과 아리조나와 콜로라도(중대형)를 제외한 나머지 개별 State-specific데이터는 통계적으로 서로 다른 것으로 나타났다.

The 3-hour-interval prediction of ground-level temperature using Dynamic linear models in Seoul area (동적선형모형을 이용한 서울지역 3시간 간격 기온예보)

  • 손건태;김성덕
    • The Korean Journal of Applied Statistics
    • /
    • v.15 no.2
    • /
    • pp.213-222
    • /
    • 2002
  • The 3-hour-interval prediction of ground-level temperature up to +45 hours in Seoul area is performed using dynamic linear models(DLM). Numerical outputs and observations we used as input values of DLM. According to compare DLM forecasts to RDAPS forecasts using RMSE, DLM improve the accuracy of prediction and systematic error of numerical model outputs are eliminated by DLM.

The maximum likelihood estimation and testing of gene frequencies of generalized ABO-like blood group systems (일반화된 ABO-식 혈액형의 유전자 빈도에 대한 최우추정 및 검정)

  • 이준영;신한풍
    • The Korean Journal of Applied Statistics
    • /
    • v.2 no.1
    • /
    • pp.35-47
    • /
    • 1989
  • This article deals with the method of ML among the methods of estimating m gene frequenecies in the Generalized ABO-like Blood Group Systems and with the statistical testing about the differencies of gene frequencies by using these estimators. Especially, the generalization about the Homogeneity testing problem is tried and thus it enables us to test of Homogeneity of m gene frequencies. Finally, in the example, ML estimator is compared with other estimators suggested by Bernstein method, by adjusted Bernstein method and by modified Bernstein method, and statistical testing in the above is carried out by using orthogonal partitioning.

데이터 마이닝에서 배깅과 부스팅 알고리즘 비교 분석

  • Lee, Yeong-Seop;O, Hyeon-Jeong
    • Proceedings of the Korean Statistical Society Conference
    • /
    • 2003.05a
    • /
    • pp.97-102
    • /
    • 2003
  • 데이터 마이닝의 여러 기법중 모형의 변동성을 줄이고 정확도가 높은 분류자를 형성하기 위하여 다양한 앙상블 기법이 연구되고 있다. 그 중에서 배깅과 부스팅 방법이 가장 널리 알려져 있다. 여러 가지 데이터에 이 두 방법을 적용하여 오분류율을 구하여 비교한 후 각 데이터 특성을 입력변수로 하고 배깅과 부스팅 중 더 낮은 오분류율을 갖는 알고리즘을 목표변수로 하여 의사결정나무를 형성하였다. 이를 통해서 배깅과 부스팅 알고리즘이 어떠한 데이터 특성의 패턴이 존재하는지 분석한 결과 부스팅 알고리즘은 관측치, 입력변수, 목표변수 수가 큰 것이 적합하고 반면에 배깅 알고리즘은 관측치, 입력변수, 목표변수 수의크기가 작은 것이 적합함을 알 수 있었다.

  • PDF