• Title/Summary/Keyword: statistics techniques

Search Result 794, Processing Time 0.022 seconds

A survey on unsupervised subspace outlier detection methods for high dimensional data (고차원 자료의 비지도 부분공간 이상치 탐지기법에 대한 요약 연구)

  • Ahn, Jaehyeong;Kwon, Sunghoon
    • The Korean Journal of Applied Statistics
    • /
    • v.34 no.3
    • /
    • pp.507-521
    • /
    • 2021
  • Detecting outliers among high-dimensional data encounters a challenging problem of screening the variables since relevant information is often contained in only a few of the variables. Otherwise, when a number of irrelevant variables are included in the data, the distances between all observations tend to become similar which leads to making the degree of outlierness of all observations alike. The subspace outlier detection method overcomes the problem by measuring the degree of outlierness of the observation based on the relevant subsets of the entire variables. In this paper, we survey recent subspace outlier detection techniques, classifying them into three major types according to the subspace selection method. And we summarize the techniques of each type based on how to select the relevant subspaces and how to measure the degree of outlierness. In addition, we introduce some computing tools for implementing the subspace outlier detection techniques and present results from the simulation study and real data analysis.

A STUDY ON PROCESS CAPABILITY INDICES FOR NON-NORMAL DATA

  • Kwon Seungsoo;Park Sung H.;Xu Jichao
    • Proceedings of the Korean Society for Quality Management Conference
    • /
    • 1998.11a
    • /
    • pp.159-173
    • /
    • 1998
  • Quality characteristics on the properties of process capability indices (PCIs) are often required to be normally distributed. But, if a characteristic is not normally distributed, serious errors can result from normal-based techniques. In this case, we may well consider the use of new PCIs specially designed to be robust for non-normality. In this paper, a newly proposed measure of process capability is introduced and compared with existing PCIs using the simulated non-normal data.

  • PDF

Nonparametric Estimation of Distribution Function using Bezier Curve

  • Bae, Whasoo;Kim, Ryeongah;Kim, Choongrak
    • Communications for Statistical Applications and Methods
    • /
    • v.21 no.1
    • /
    • pp.105-114
    • /
    • 2014
  • In this paper we suggest an efficient method to estimate the distribution function using the Bezier curve, and compare it with existing methods by simulation studies. In addition, we suggest a robust version of cross-validation criterion to estimate the number of Bezier points, and showed that the proposed method is better than the existing methods based on simulation studies.

Goodness-of Fit Tests in Regression via Nonparametric Function Techniques

  • Kim, Jong-Tae;Moon, Gyoung-Ae
    • Journal of the Korean Data and Information Science Society
    • /
    • v.5 no.2
    • /
    • pp.95-106
    • /
    • 1994
  • A proposed test statistic is obtained by multiplying constant weights by the Neumann smooth type statistic discussed by Eubank and Hart(1993) in order to observe the effect of weight. It has very good results of power studies. Another advantage of this test is that it simultaneously provides an important diagnostic tools that can be used in many cases to determine how the model should be adjusted.

  • PDF

Estimation of the Number of Change-Points with Local Linear Fit

  • Kim, Jong-Tae;Choi, Hey-Mi
    • Journal of the Korean Data and Information Science Society
    • /
    • v.13 no.2
    • /
    • pp.251-260
    • /
    • 2002
  • The aim of this paper is to consider of detecting the location, the jump size and the number of change-points in regression functions by using the local linear fit which is one of nonparametric regression techniques. It is obtained the asymptotic properties of the change points and the jump sizes. and the correspondin grates of convergence for change-point estimators.

  • PDF

Sub-gaussian Techniques in Obtaining Laws of Large Numbers in $L^1$(R)

  • Lee, Sung-Ho;Lee, Robert -Taylor
    • Journal of the Korean Statistical Society
    • /
    • v.23 no.1
    • /
    • pp.39-51
    • /
    • 1994
  • Some exponential moment inequalities for sub-gaussian random variables are studied in this paper. These inequalities are used to obtain laws of large numbers for random variable and random elements in $L^1(R)$.

  • PDF

Test for the Exponential Distribution Based on Multiply Type-II Censored Samples

  • Kang, Suk-Bok;Lee, Sang-Ki
    • Communications for Statistical Applications and Methods
    • /
    • v.13 no.3
    • /
    • pp.537-550
    • /
    • 2006
  • In this paper, we develope three modified empirical distribution function type tests, the modified Cramer-von Mises test, the modified Anderson-Darling test, and the modified Kolmogorov-Smirnov test for the two-parameter exponential distribution with unknown parameters based on multiply Type-II censored samples. For each test, Monte Carlo techniques are used to generate the critical values. The powers of these tests are also investigated under several alternative distributions.

A Comparison of Capabilities of Data Mining Tools

  • Choi, Youn-Seok;Kim, Jong-Geoun;Lee, Jong-Hee
    • Communications for Statistical Applications and Methods
    • /
    • v.8 no.2
    • /
    • pp.531-541
    • /
    • 2001
  • In this study, we compare the capabilities of the data mining tools of the most updated version objectively and provide the useful information in which enterprises and universities chose them. In particular, we compare the SAS/Enterprise Miner 3.0, SPSS/Clementine 5.2 and IBM/Intelligent Miner 6.1 which are well known and easily gotten.

  • PDF

Testing Goodness-of-Fit for No Effect Models

  • Sungho Lee;Jongtae Kim;GyoungAe Moon
    • Communications for Statistical Applications and Methods
    • /
    • v.5 no.3
    • /
    • pp.935-944
    • /
    • 1998
  • This paper investigates the problem of goodness of fit tests for no effect model. The proposed test statistic $Z_{mn}$ is obtained by multiplying constant on the model free curve estimation techniques. The small and large sample properties of$Z_{mn}$ are investigated and the good results of power studies for the proposed test are illustrated.

  • PDF