• Title/Summary/Keyword: outlier's test

Search Result 36, Processing Time 0.021 seconds

Influence in Testing the Equality of Two Covariance Matrices (두개의 공분산 행렬의 동질성 검정에서의 영향치 분석)

  • Myung Geun Kim
    • The Korean Journal of Applied Statistics
    • /
    • v.7 no.2
    • /
    • pp.213-224
    • /
    • 1994
  • A diagnostic method useful for detecting outliers in testing the equality of two covariance metrics is developed using the influence curve approach. This method is easily generalized to more than two covariance matrices. A sample version for the influence measure of detecting outliers is considered based on the empirical distribution functions. The sample version includes as its component terms the well-known test statistic for detecting one outlier at a time introduced by Wilks and its generalization to the two-group case.

  • PDF

A study on the Flood Frequency Analyzed in Consideration of Low Outliers. (Low Outliers를 고려한 홍수빈도분석에 관한 연구)

  • 이순혁;홍성표;박명근
    • Magazine of the Korean Society of Agricultural Engineers
    • /
    • v.30 no.4
    • /
    • pp.62-70
    • /
    • 1988
  • This study was conducted to solve the problems for the unsuitable parameters and the uncertainty of design flood can be appeared by low outliers were inclined to the lower part from the trend of the balance of the data. Derivation of reasonable design flood was attempted finally by modification of low outliers with analysis of flood frequency by means of Log Pearson Type Ill distribution. Three subwatersheds were selected as studying basins with the annual maximum series including low outliers along Geum River basin. The results through this study were analyzed and summarized as follows. 1. Log Pearson Type In distribution was confirmed as a reasonable one by X$^2$ goodness of fit test at Gong Ju, Gyu Am, og Cheon watershed along Geum River basin. 2. Probable flood flows for each watershed were derivated by flood frequency curve with outliers. 3. Weighted skew coefficient for each watershed was calculated for the evaluation of freq- uency factor which is needed for the modification of low outlier. 4. It was confirrned that adjusted frequency curve has a lower tendency than that of deletion of low outlier in common at all watersheds. 5. Final probable flood flows were derivated by modification with evaluation of modified basic statistics for three watersheds. 6. In comparison with a frequency curve with modification and one with outlier, The former has a higher probable flood flow within three years of return periods than that of the latter, and vice versa over three years of return periods.

  • PDF

The Mean Reverting Behavior of Inflation in the Philippines

  • CAMBA, Abraham C. Jr.;CAMBA, Aileen L.
    • The Journal of Asian Finance, Economics and Business
    • /
    • v.8 no.10
    • /
    • pp.239-247
    • /
    • 2021
  • Central Bank authorities should carefully manage inflation rate uncertainties to achieve economic growth and development not only in the short-run but also in the long-run. Since inflation is a key macroeconomic variable, an increased understanding about its behavior is undoubtedly important. Thus, paper employs unit root with breakpoints to examine the mean reverting behavior of inflation rate in the Philippines using monthly data from 2002 to 2020. Empirically, the unit root breakpoint innovational and additive outlier tests favor the stationarity or mean reverting behavior of inflation in the Philippines. Also, results of standard unit root tests, ADF, PP, GLS-Dickey-Fuller, KPSS and NP, provide strong evidence of mean reverting processes. The mean reverting behavior of inflation rate reveals that the monetary policy using inflation targeting framework has succeeded in reducing chronic inflation persistence in the Philippines. Thus, this research supports inflation targeting policy that aims to maintain general price level stability for the Philippine economy's long-term growth and development prospects. The findings of this research remain important for the central bankers for not only providing them better understanding about the behavior of inflation rate, but also helping them formulate and implement policy reforms related to money, credit and banking.

Estimation of Design Rainfall Using 3 Parameter Probability Distributions (3변수 확률분포에 의한 설계강우량 추정)

  • Lee, Soon Hyuk;Maeng, Sung Jin;Ryoo, Kyong Sik
    • Proceedings of the Korea Water Resources Association Conference
    • /
    • 2004.05b
    • /
    • pp.595-598
    • /
    • 2004
  • This research seeks to derive the design rainfalls through the L-moment with the test of homogeneity, independence and outlier of data on annual maximum daily rainfall at 38 rainfall stations in Korea. To select the appropriate distribution of annual maximum daily rainfall data by the rainfall stations, Generalized Extreme Value (GEV), Generalized Logistic (GLO), Generalized Pareto (GPA), Generalized Normal (GNO) and Pearson Type 3 (PT3) probability distributions were applied and their aptness were judged using an L-moment ratio diagram and the Kolmogorov-Smirnov (K-S) test. Parameters of appropriate distributions were estimated from the observed and simulated annual maximum daily rainfall using Monte Carlo techniques. Design rainfalls were finally derived by GEV distribution, which was proved to be more appropriate than the other distributions.

  • PDF

Frequency Analysis of Extreme Rainfall Using 3 Parameter Probability Distributions (3변수 확률분포형에 의한 극치강우의 빈도분석)

  • Kim, Byeong-Jun;Maeng, Sung-Jin;Ryoo, Kyong-Sik;Lee, Soon-Hyuk
    • Journal of The Korean Society of Agricultural Engineers
    • /
    • v.46 no.3
    • /
    • pp.31-42
    • /
    • 2004
  • This research seeks to derive the design rainfalls through the L-moment with the test of homogeneity, independence and outlier of data on annual maximum daily rainfall at 38 rainfall stations in Korea. To select the appropriate distribution of annual maximum daily rainfall data by the rainfall stations, Generalized Extreme Value (GEV), Generalized Logistic (GLO), Generalized Pareto (GPA), Generalized Normal (GNO) and Pearson Type 3 (PT3) probability distributions were applied and their aptness were judged using an L-moment ratio diagram and the Kolmogorov-Smirnov (K-S) test. Parameters of appropriate distributions were estimated from the observed and simulated annual maximum daily rainfall using Monte Carlo techniques. Design rainfalls were finally derived by GEV distribution, which was proved to be more appropriate than the other distributions.

Effect of Genetic Correlations on the P Values from Randomization Test and Detection of Significant Gene Groups (유전자 연관성이 랜덤검정 P값과 유의 유전자군의 탐색에 미치는 영향)

  • Yi, Mi-Sung;Song, Hae-Hiang
    • The Korean Journal of Applied Statistics
    • /
    • v.22 no.4
    • /
    • pp.781-792
    • /
    • 2009
  • At an early stage of genomic investigations, a small sample of microarrays is used in gene expression experiments to identify small subsets of candidate genes for a further accurate investigation. Unlike the statistical analysis methods for a large sample of microarrays, an appropriate statistical method for identifying small subsets is a randomization test that provides exact P values. These exact P values from a randomization test for a small sample of microarrays are discrete. The possible existence of differentially expressed genes in the sample of a full set of genes can be tested for the null hypothesis of a uniform distribution. Subsets of smaller P values are of prime interest for a further accurate investigation and identifying these outlier cells from a multinomial distribution of P values is possible by M test of Fuchs et al. (1980). Above all, the genome-wide gene expressions in microarrays are correlated, but the majority of statistical analysis methods in the microarray analysis are based on an independence assumption of genes and ignore the possibly correlated expression levels. We investigated with simulation studies the effect that correlated gene expression levels could have on the randomization test results and M test results, and found that the effects are often not ignorable.

A Heuristic Outlier Filtering Algorithm for Generating Link Travel Time using Taxi GPS Probes in Urban Arterial (링크통행시간 생성을 위한 이상치 제거 알고리즘 개발)

  • Choi, Keechoo;Choi, Yoon-Hyuk
    • KSCE Journal of Civil and Environmental Engineering Research
    • /
    • v.26 no.5D
    • /
    • pp.731-738
    • /
    • 2006
  • Facing congestion, people want to know traffic information about their routes, especially real-time link travel time (LTT). In this paper, as a sequel paper of the previous non-taxi based LTT generating study by Choi et al. (1998), taxi based GPS probes have been tried to produce LTT for urban arterials. Taxis in itself are good deployment mode of GPS probes although it by nature experiences boarding and alighting time noises which should be accounted. A heuristic real-time dynamic outlier filter algorithm for taxi GPS probe has been developed focusing on urban arterials. An actual traffic survey for dynamic link travel times has been conducted using license plate method for the test arterials of Seoul city transportation network. With the algorithm, it is estimated that 70% of outliers have been filtered and the relative error has been improved by 73.7%. The filtering algorithm developed here would be expected to be in use for other spatial sites with some calibration efforts. Some limitations and future research agenda have also been discussed.

Robust inference with order constraint in microarray study

  • Kang, Joonsung
    • Communications for Statistical Applications and Methods
    • /
    • v.25 no.5
    • /
    • pp.559-568
    • /
    • 2018
  • Gene classification can involve complex order-restricted inference. Examining gene expression pattern across groups with order-restriction makes standard statistical inference ineffective and thus, requires different methods. For this problem, Roy's union-intersection principle has some merit. The M-estimator adjusting for outlier arrays in a microarray study produces a robust test statistic with distribution-insensitive clustering of genes. The M-estimator in conjunction with a union-intersection principle provides a nonstandard robust procedure. By exact permutation distribution theory, a conditionally distribution-free test based on the proposed test statistic generates corresponding p-values in a small sample size setup. We apply a false discovery rate (FDR) as a multiple testing procedure to p-values in simulated data and real microarray data. FDR procedure for proposed test statistics controls the FDR at all levels of ${\alpha}$ and ${\pi}_0$ (the proportion of true null); however, the FDR procedure for test statistics based upon normal theory (ANOVA) fails to control FDR.

Statistical Outliers in Florida Counties at the Presidential Election 2000 (2000년 미국대선 플로리다주의 투표결과 분석)

  • 김현철
    • The Korean Journal of Applied Statistics
    • /
    • v.15 no.1
    • /
    • pp.21-32
    • /
    • 2002
  • We searched out in the votes data of the State of Florida at presidential election 2000. We used a multivariate regression analysis. We got there were several outliers including Palm Beach County. It means that we should analyze the number of disqualified ballots which were double-punched as well as the votes, to insist the " Butterfly Ballot" made Palm Beach outlier.

Evolutionary Computing Driven Extreme Learning Machine for Objected Oriented Software Aging Prediction

  • Ahamad, Shahanawaj
    • International Journal of Computer Science & Network Security
    • /
    • v.22 no.2
    • /
    • pp.232-240
    • /
    • 2022
  • To fulfill user expectations, the rapid evolution of software techniques and approaches has necessitated reliable and flawless software operations. Aging prediction in the software under operation is becoming a basic and unavoidable requirement for ensuring the systems' availability, reliability, and operations. In this paper, an improved evolutionary computing-driven extreme learning scheme (ECD-ELM) has been suggested for object-oriented software aging prediction. To perform aging prediction, we employed a variety of metrics, including program size, McCube complexity metrics, Halstead metrics, runtime failure event metrics, and some unique aging-related metrics (ARM). In our suggested paradigm, extracting OOP software metrics is done after pre-processing, which includes outlier detection and normalization. This technique improved our proposed system's ability to deal with instances with unbalanced biases and metrics. Further, different dimensional reduction and feature selection algorithms such as principal component analysis (PCA), linear discriminant analysis (LDA), and T-Test analysis have been applied. We have suggested a single hidden layer multi-feed forward neural network (SL-MFNN) based ELM, where an adaptive genetic algorithm (AGA) has been applied to estimate the weight and bias parameters for ELM learning. Unlike the traditional neural networks model, the implementation of GA-based ELM with LDA feature selection has outperformed other aging prediction approaches in terms of prediction accuracy, precision, recall, and F-measure. The results affirm that the implementation of outlier detection, normalization of imbalanced metrics, LDA-based feature selection, and GA-based ELM can be the reliable solution for object-oriented software aging prediction.