Search | Korea Science

Park, Cheolyong
- Journal of the Korean Data and Information Science Society
- /
- v.27 no.4
- /
- pp.855-863
- /
- 2016
In this study, a simple diagnostic statistic for determining the size of random forest is proposed. This method is based on MV (margin of victory), a scaled difference in the votes at the infinite forest between the first and second most popular categories of the current random forest. We can note that if MV is negative then there is discrepancy between the current and infinite forests. More precisely, our method is based on the proportion of cases that -MV is greater than a fixed small positive number (say, 0.03). We derive an appropriate diagnostic statistic for our method and then calculate the distribution of the statistic. A simulation study is performed to compare our method with a recently proposed diagnostic statistic.
https://doi.org/10.7465/jkdi.2016.27.4.855 인용 PDF KSCI

Ahn, Chul H.
- Communications for Statistical Applications and Methods
- /
- v.7 no.2
- /
- pp.403-414
- /
- 2000
A diagnostic tool for testing homogeneity for random effects is proposed in unbalanced linear mixed model based on score statistic. The finite sample behavior of the test statistic is examined using Monte Carlo experiments examine the chi-square approximation of the test statistic under the null hypothesis.
PDF

Ahn, Byoung Jin
- Journal of Korean Society for Quality Management
- /
- v.19 no.2
- /
- pp.40-51
- /
- 1991
The $C_B$ statistic, a generalization of Mallows's $C_L$ statistic, is developed to determine the shrinkage parameter. Since not all cases in a data set play an equal role in forming $C_B$, a subdivision of $C_B$ into individual components for each case is developed. This subdivision is useful both as an aid in understanding $C_B$ and as a diagnostic procedure.
PDF

강은미
- The Korean Journal of Applied Statistics
- /
- v.6 no.1
- /
- pp.67-78
- /
- 1993
A new diagnostic statistic for detecting outliers and influential observations in linear models is suggested and studied in this paper. The proposed statistic is a weighted sum of two measures; one is for detecting outliers and the other is for detecting influential observations. The merit of this statistic is that it is possible to distinguish outliers from influential observations. We have done some Monte-Carlo Simulation to find the probability distribution of this statistic.
PDF

Kang, Eun M.;Park, Sung H.
- Journal of Korean Society for Quality Management
- /
- v.16 no.2
- /
- pp.18-33
- /
- 1988
A new diagnostic statistic for detecting outliers and influential observations in linear models is suggested and studied in this paper. The proposed statistic is a weighted sum of two measures ; one is for detecting outliers and the other is for detecting influential ovservations. The merit of this statistic is that it is possible to distinguish outliers from influential observations. This statistic can be used for not only regression models but also factorial design models. A Monte Carlo simulation study is reported to suggest critical values for detecting outliers and influential observations for simple regression models when the number of observations is 11. 21, 31, 41 or 51.
PDF

Kim, Myung Geun
- Communications for Statistical Applications and Methods
- /
- v.23 no.3
- /
- pp.231-239
- /
- 2016
A graphical diagnostic method based on multiple case deletions in a regression context is introduced by using the sampling distribution of the difference between two least squares estimators with and without multiple cases. Principal components analysis plays a key role in deriving this diagnostic method. Multiple case deletions of test statistic are also considered when a new observation is fitted to a given regression model. The result is useful for detecting influential observations in econometric data analysis, for example in checking whether the consumption pattern at a later time is the same as the one found before or not, as well as for investigating the influence of cases in the usual regression model. An illustrative example is given.
https://doi.org/10.5351/CSAM.2016.23.3.231 인용 PDF KSCI

Kim, Jong-Tae;Moon, Gyoung-Ae
- Journal of the Korean Data and Information Science Society
- /
- v.5 no.2
- /
- pp.95-106
- /
- 1994
A proposed test statistic is obtained by multiplying constant weights by the Neumann smooth type statistic discussed by Eubank and Hart(1993) in order to observe the effect of weight. It has very good results of power studies. Another advantage of this test is that it simultaneously provides an important diagnostic tools that can be used in many cases to determine how the model should be adjusted.
PDF

Ahn, Chul-Hwan
- Journal of the Korean Statistical Society
- /
- v.19 no.2
- /
- pp.171-175
- /
- 1990
A diagnostic test for detecting nonconstant variance in mixed linear models based on the score statistic is derived through the technique of model expansion, and compared to the log likelihood ratio test.
PDF

Park, Cheolyong
- Journal of the Korean Data and Information Science Society
- /
- v.28 no.3
- /
- pp.515-524
- /
- 2017
In this study, a measure of discrepancy based on MV (margin of victory) has been suggested that might be useful in determining the size of random forest for classification. Here MV is a scaled difference in the votes, at infinite random forest, of two most popular classes of current random forest. More specifically, max(-MV,0) is proposed as a reasonable measure of discrepancy by noting that negative MV values mean a discrepancy in two most popular classes between the current and infinite random forests. We propose an appropriate diagnostic statistic based on this measure that might be useful for the determination of random forest size, and then we derive its asymptotic distribution. Finally, a simulation study has been conducted to compare the performances, in finite samples, between this proposed statistic and other recently proposed diagnostic statistics.
https://doi.org/10.7465/jkdi.2017.28.3.515 인용 PDF KSCI

Jung, Kang-Mo
- 한국데이터정보과학회:학술대회논문집
- /
- 2001.10a
- /
- pp.13-16
- /
- 2001
We compare methods for detecting influential observations that have a large influence on the likelihood ratio test statistics that the two sets of variables are uncorrelated with one another. For this purpose we derive results of the deletion diagnostic, the influence function, the standardized influence matrix and the local influence. An illustrative example is given.
PDF