A simple diagnostic statistic for determining the size of random forest

Park, Cheolyong;

doi:10.7465/jkdi.2016.27.4.855

Journal of the Korean Data and Information Science Society

Volume 27 Issue 4
/
Pages.855-863
/
2016
/
1598-9402(pISSN)

The Korean Data and Information Science Society (한국데이터정보과학회)

DOI QR Code

A simple diagnostic statistic for determining the size of random forest

랜덤포레스트의 크기 결정을 위한 간편 진단통계량

Park, Cheolyong (Major in Statistics, Keimyung University)

박철용 (계명대학교 통계학전공)

Received : 2016.06.23
Accepted : 2016.07.19
Published : 2016.07.31

https://doi.org/10.7465/jkdi.2016.27.4.855 Citation PDF KSCI

Download PDF

⟨ Previous Next ⟩

Abstract

In this study, a simple diagnostic statistic for determining the size of random forest is proposed. This method is based on MV (margin of victory), a scaled difference in the votes at the infinite forest between the first and second most popular categories of the current random forest. We can note that if MV is negative then there is discrepancy between the current and infinite forests. More precisely, our method is based on the proportion of cases that -MV is greater than a fixed small positive number (say, 0.03). We derive an appropriate diagnostic statistic for our method and then calculate the distribution of the statistic. A simulation study is performed to compare our method with a recently proposed diagnostic statistic.

이 연구에서는 RF (random forest)의 크기 결정을 위한 간편 진단통계량을 제안한다. 이 방법은 현재까지 생성된 의사결정나무의 1등과 2등인 집단이 무한히 생성된 의사결정나무에서 차지하는 승리표차인 MV (margin of victory)에 근거한다. 따라서 MV가 음수이면 현재의 RF와 무한 RF 사이에 괴리가 생기는 것을 의미한다. 이 연구에서 제안하는 방법은 -MV가 고정된 작은 양수 (예를 들면 0.03)보다 큰 개체의 비율에 근거한다. 이 방법에 의한 적절한 통계량 도출과 함께 이 통계량의 이론적인 분포를 유도한다. 또한 최근에 제안된 진단통계량과 성능을 비교하는 모의실험을 수행한다.

Keywords

References

Banfield, R. E., Hall, L. O., Bowyer, K. W. and Kegelmeyer, W. P. (2007). A comparison of decision tree creation techniques. IEEE Transactions on Pattern Recognition and Machine Learning, 29, 173-180. https://doi.org/10.1109/TPAMI.2007.250609
Breiman, L. (1996). Bagging predictors. Machine Learning, 24, 123-140.
Breiman, L. (2001). Random forest. Machine Learning, 45, 5-32. https://doi.org/10.1023/A:1010933404324
Choi, S. H. and Kim, H. (2016). Tree size determination for classification ensemble. Journal of the Korean Data & Information Science Society, 27, 255-264. https://doi.org/10.7465/jkdi.2016.27.1.255
Dudoit, S., Fridlyand, J. and Speed, T. P. (2002). Comparison of discrimination methods for the classification of tumors using gene expression data. Journal of the American Statistical Society, 97, 77-87. https://doi.org/10.1198/016214502753479248
Hamza, M. and Larocque, D. (2005). An empirical comparison of ensemble methods based on classification trees. Journal of Statistical Computation and Simulation, 75, 629-643. https://doi.org/10.1080/00949650410001729472
Hernandez-Lobato, D., Martinez-Munoz, G. and Suarez, A. (2011). Inference on prediction of ensembles of infinite size. Pattern Recognition, 44, 1426-1434. https://doi.org/10.1016/j.patcog.2010.12.021
Hernandez-Lobato, D., Martinez-Munoz, G. and Suarez, A. (2013). How large should ensembles of classifiers be? Pattern Recognition, 46, 1323-1336. https://doi.org/10.1016/j.patcog.2012.10.021
Park, C. (2010). Simple hypotheses testing for the number of trees in a random forest. Journal of the Korean Data & Information Science Society, 21, 371-377.
Shapire, R., Freund, Y., Bartlett, P. and Lee, W. (1998). Boosting the margin: A new explanation for the effectiveness of voting methods. Annals of Statistics, 26, 1651-1686. https://doi.org/10.1214/aos/1024691352

Cited by

시간단위 전력사용량 시계열 패턴의 군집 및 분류분석 vol.28, pp.2, 2016, https://doi.org/10.7465/jkdi.2017.28.2.395
랜덤포레스트의 크기 결정에 유용한 승리표차에 기반한 불일치 측도 vol.28, pp.3, 2016, https://doi.org/10.7465/jkdi.2017.28.3.515
돌발홍수 예보를 위한 빅데이터 분석방법 vol.15, pp.11, 2016, https://doi.org/10.14400/jdc.2017.15.11.245

Journal of the Korean Data and Information Science Society

A simple diagnostic statistic for determining the size of random forest

랜덤포레스트의 크기 결정을 위한 간편 진단통계량

Abstract

Keywords

References

Cited by

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)