• 제목/요약/키워드: Skewed data

검색결과 206건 처리시간 0.027초

Nonparametric two sample tests for scale parameters of multivariate distributions

  • Chavan, Atul R;Shirke, Digambar T
    • Communications for Statistical Applications and Methods
    • /
    • 제27권4호
    • /
    • pp.397-412
    • /
    • 2020
  • In this paper, a notion of data depth is used to propose nonparametric multivariate two sample tests for difference between scale parameters. Data depth can be used to measure the centrality or outlying-ness of the multivariate data point relative to data cloud. A difference in the scale parameters indicates the difference in the depth values of a multivariate data point. By observing this fact on a depth vs depth plot (DD-plot), we propose nonparametric multivariate two sample tests for scale parameters of multivariate distributions. The p-values of these proposed tests are obtained by using Fisher's permutation approach. The power performance of these proposed tests has been reported for few symmetric and skewed multivariate distributions with the existing tests. Illustration with real-life data is also provided.

Comprehensive comparison of normality tests: Empirical study using many different types of data

  • Lee, Chanmi;Park, Suhwi;Jeong, Jaesik
    • Journal of the Korean Data and Information Science Society
    • /
    • 제27권5호
    • /
    • pp.1399-1412
    • /
    • 2016
  • We compare many normality tests consisting of different sources of information extracted from the given data: Anderson-Darling test, Kolmogorov-Smirnov test, Cramervon Mises test, Shapiro-Wilk test, Shaprio-Francia test, Lilliefors, Jarque-Bera test, D'Agostino' D, Doornik-Hansen test, Energy test and Martinzez-Iglewicz test. For the purpose of comparison, those tests are applied to the various types of data generated from skewed distribution, unsymmetric distribution, and distribution with different length of support. We then summarize comparison results in terms of two things: type I error control and power. The selection of the best test depends on the shape of the distribution of the data, implying that there is no test which is the most powerful for all distributions.

Spatial Prediction Based on the Bayesian Kriging with Box-Cox Transformation

  • Choi, Jung-Soon;Park, Man-Sik
    • Communications for Statistical Applications and Methods
    • /
    • 제16권5호
    • /
    • pp.851-858
    • /
    • 2009
  • In the last decades, there has been much interest in climate variability because its change has dramatic effects on humanity. Especially, the precipitation data are measured over space and their spatial association is so complicated. So we should take into account such a spatial dependency structure while analyzing the data. However, in linear models for analyzing the data, data sets show severely skewed distribution. In the paper, we consider the Box-Cox transformation to satisfy the normal distribution prior to the analysis, and employ a Bayesian hierarchical framework to investigate the spatial patterns. The data set we considered is monthly average precipitation of the third quarter of 2007 obtained from 347 automated monitoring stations in Contiguous South Korea.

ROBUST MEASURES OF LOCATION IN WATER-QUALITY DATA

  • Kim, Kyung-Sub;Kim, Bom-Chul;Kim, Jin-Hong
    • Water Engineering Research
    • /
    • 제3권3호
    • /
    • pp.195-202
    • /
    • 2002
  • The mean is generally used as a point estimator in water-quality data. Unfortunately, the nonnormal and skewed distributions of data hinder the direct application of the mean, which is inappropriate statistics in this case. The use of robust statistics such as L, M, and R-estimators are recommended and become more efficient. The median (L-estimator), the biweight (M-estimator), and the Hodges-Lehmann method (R-estimator) are briefly introduced and applied in this paper. From the actual data analyses, it is known that the median does not guarantee robustness for a small number of data sets, and robust measures of location or the arithmetic mean without outliers are highly recommended if the distribution has tails or outliers. Care must be taken to measure the location because water quality level within a water body can change depending on the selected point estimator.

  • PDF

Black Hispanic and Black Non-Hispanic Breast Cancer Survival Data Analysis with Half-normal Model Application

  • Khan, Hafiz Mohammad Rafiqullah;Saxena, Anshul;Vera, Veronica;Abdool-Ghany, Faheema;Gabbidon, Kemesha;Perea, Nancy;Stewart, Tiffanie Shauna-Jeanne;Ramamoorthy, Venkataraghavan
    • Asian Pacific Journal of Cancer Prevention
    • /
    • 제15권21호
    • /
    • pp.9453-9458
    • /
    • 2014
  • Background: Breast cancer is the second leading cause of cancer death for women in the United States. Differences in survival of breast cancer have been noted among racial and ethnic groups, but the reasons for these disparities remain unclear. This study presents the characteristics and the survival curve of two racial and ethnic groups and evaluates the effects of race on survival times by measuring the lifetime data-based half-normal model. Materials and Methods: The distributions among racial and ethnic groups are compared using female breast cancer patients from nine states in the country all taken from the National Cancer Institute's Surveillance, Epidemiology, and End Results cancer registry. The main end points observed are: age at diagnosis, survival time in months, and marital status. The right skewed half-normal statistical probability model is used to show the differences in the survival times between black Hispanic (BH) and black non-Hispanic (BNH) female breast cancer patients. The Kaplan-Meier and Cox proportional hazard ratio are used to estimate and compare the relative risk of death in two minority groups, BH and BNH. Results: A probability random sample method was used to select representative samples from BNH and BH female breast cancer patients, who were diagnosed during the years of 1973-2009 in the United States. The sample contained 1,000 BNH and 298 BH female breast cancer patients. The median age at diagnosis was 57.75 years among BNH and 54.11 years among BH. The results of the half-normal model showed that the survival times formed positive skewed models with higher variability in BNH compared with BH. The Kaplan-Meir estimate was used to plot the survival curves for cancer patients; this test was positively skewed. The Kaplan-Meier and Cox proportional hazard ratio for survival analysis showed that BNH had a significantly longer survival time as compared to BH which is consistent with the results of the half-normal model. Conclusions: The findings with the proposed model strategy will assist in the healthcare field to measure future outcomes for BH and BNH, given their past history and conditions. These findings may provide an enhanced and improved outlook for the diagnosis and treatment of breast cancer patients in the United States.

Validation Comparison of Credit Rating Models Using Box-Cox Transformation

  • Hong, Chong-Sun;Choi, Jeong-Min
    • Journal of the Korean Data and Information Science Society
    • /
    • 제19권3호
    • /
    • pp.789-800
    • /
    • 2008
  • Current credit evaluation models based on financial data make use of smoothing estimated default ratios which are transformed from each financial variable. In this work, some problems of the credit evaluation models developed by financial experts are discussed and we propose improved credit evaluation models based on the stepwise variable selection method and Box-Cox transformed data whose distribution is much skewed to the right. After comparing goodness-of-fit tests of these models, the validation of the credit evaluation models using statistical methods such as the stepwise variable selection method and Box-Cox transformation function is explained.

  • PDF

An Estimation of VaR in Stock Markets Using Transformations

  • Yeo, In-Kwon;Jeong, Choo-Mi
    • Journal of the Korean Data and Information Science Society
    • /
    • 제16권3호
    • /
    • pp.567-580
    • /
    • 2005
  • It is usually assumed that asset returns in the stock market are normally distributed. However, analyses of real data show that the distribution tends to be skewed and to have heavier tails than those of the normal distribution. In this paper, we investigate the method of estimating the value at risk(VaR) of stock returns. The VaR is computed by using the transformation and back-transformation method. The analysis of KOSPI and KOSDAQ data shows that the proposed estimation outperformed that under the normal assumption.

  • PDF

개별 관측치에서 지수변환을 이용한 EWMA 관리도 적용기법 (EWMA chart Application using the Transformation of the Exponential with Individual Observations)

  • 지선수
    • 산업경영시스템학회지
    • /
    • 제22권52호
    • /
    • pp.337-345
    • /
    • 1999
  • The long-tailed, positively skewed exponential distribution can be made into an almost symmetric distribution by taking the exponent of the data. In these situations, to use the traditional shewhart control limits on an individuals chart would be impractical and inconvenient. The transformed data, approximately bell-shaped, can be plotted conveniently on the individuals chart and exponentially weighted moving average chart. In this paper, using modifying statistics with transformed exponential of the data, we give a method for constructing control charts. Selecting method of exponent for individual chart is evaluated. And consider that smaller weight being assigned to the older data as time process and properties and taking method of exponent($\theta$), weighting factor($\alpha$) are suggested. Our recommendation, on the basis result of simulation, is practical method for EWMA chart.

  • PDF

A spatial heterogeneity mixed model with skew-elliptical distributions

  • Farzammehr, Mohadeseh Alsadat;McLachlan, Geoffrey J.
    • Communications for Statistical Applications and Methods
    • /
    • 제29권3호
    • /
    • pp.373-391
    • /
    • 2022
  • The distribution of observations in most econometric studies with spatial heterogeneity is skewed. Usually, a single transformation of the data is used to approximate normality and to model the transformed data with a normal assumption. This assumption is however not always appropriate due to the fact that panel data often exhibit non-normal characteristics. In this work, the normality assumption is relaxed in spatial mixed models, allowing for spatial heterogeneity. An inference procedure based on Bayesian mixed modeling is carried out with a multivariate skew-elliptical distribution, which includes the skew-t, skew-normal, student-t, and normal distributions as special cases. The methodology is illustrated through a simulation study and according to the empirical literature, we fit our models to non-life insurance consumption observed between 1998 and 2002 across a spatial panel of 103 Italian provinces in order to determine its determinants. Analyzing the posterior distribution of some parameters and comparing various model comparison criteria indicate the proposed model to be superior to conventional ones.

영상검지 카메라를 이용한 도로상의 차량흐름 계측방안 연구 (The Development of Camera Detection System for the Measurement Road Traffic Data)

  • 김희식;김진만
    • 한국안전학회지
    • /
    • 제18권4호
    • /
    • pp.23-27
    • /
    • 2003
  • To improve the road transportation safety, the road traffic data is monitored by applying an image detection system. The road traffic safety is analysed using image processing techniques. For more accurate measurement, the coordinate matching of real road data to image is one of the most essential parts of the image detection technique. The road image is skewed at the input screen, because the video camera is installed at the roadside. A fast and precise algorithm for the coordinate matching is developed to convert image coordinates into road coordinates.