A Study on Improving Classification Performance for Manufacturing Process Data with Multicollinearity and Imbalanced Distribution |
Lee, Chae Jin
(LG Home Entertainment Company)
Park, Cheong-Sool (School of Industrial Management Engineering, Korea University) Kim, Jun Seok (School of Industrial Management Engineering, Korea University) Baek, Jun-Geol (School of Industrial Management Engineering, Korea University) |
1 | Allison, P., Altman, M., Gill, J., and McDonald, M. P. (2004), Convergence problems in logistic regression, Numerical issues in statistical computing for the social scientist, 238-252. |
2 | Banks, D. L. and Giovanni P. (1991), Preanalysis of Superlarge Industrial Datasets, I (S) DS, Duke University, USA. |
3 | Benjamini, Y. and Hochberg, Y. (1995), Controlling the false discovery rate : A practical and powerful approach to multiple testing, Journal of the Royal Statistical Society : Series B(Methodological), 57, 289-300. |
4 | Boeuf, J. P. (2003), Plasma display panels : physics, recent developments and key issues, Journal of physics D : Applied physics, 36(6), R53. DOI |
5 | Breiman, L., Friedman, J., Olshen, R., and Stone, C. (1984), Classification and Regression Trees, Wadsworth, Califonia, USA |
6 | Byeon, S. K., Kang, C. W., and Sim S., B. (2004), Defect Type Prediction Method in Manufacturing Process Using Data Mining Technique, Journal of industrial and systems engineering, 27(2), 10-16. |
7 | Cunningham, Sean P., Costas, J. Spanos, and Katalin Voros. (1995), Semiconductor yield improvement : results and best practices, Semiconductor Manufacturing IEEE Transactions, 8(2), 103-109. DOI |
8 | Dudoit, S., Shaffer, J. P., and Boldrick, J. C. (2003), Multiple hypothesis testing in microarray experiments, Statistical Science, 18(1), 71-103. DOI |
9 | Farcomeni, A. (2008), A review of modern multiple hypothesis testing, with particular attention to the false discovery proportion, Statistical Methods in Medical Research, 17(4), 347-388. DOI |
10 | Fernandez, G. (2010), Statistical Data mining using SAS applications, 2nd edition, CRC press, New Yok, USA. |
11 | Gibbons, J. D. (1993), Nonparametric statistics : An introduction Vol. 90, Sage, California, USA. |
12 | HALL, Mark A. (1999), Correlation-based feature selection for machine learning, Ph.D. Thesis, The University of Waikato. |
13 | Hochberg, Y. and Tamhane, A. (1987), Multiple Comparison Procedures, Wiley, New York, USA. |
14 | Jang, Y. S., Kim J. W., and Hur J. (2008), Combined application of data imbalance reduction techniques using genetic algorithm, Journal of Intelligence and Information Systems, 14(3), 133-154. |
15 | Jang, W. C. (2013), Multiple testing and its applications in high-dimension, Journal of the Korean data & information science society, 24(5), 1063-1076. 과학기술학회마을 DOI ScienceOn |
16 | John, G. H., Kohavi, R., and Pfleger, K. (1994), Irrelevant features and the subset selection Problem, ICML, 94, 121-129. |
17 | Kim, J. H. and Jeong, J. B. (2004), Classification of class-imbalanced data : Effect of over-sampling and under-sampling of training data, The Korean Journal of Applied Statistics, 17(3), 445-457. 과학기술학회마을 DOI ScienceOn |
18 | Kubat, M., Holte, R., and Matwin, S. (1997), Learning when negative examples abound, Proceedings of the 9th European Conference on Machine Learning, ECML-97, 146-153. |
19 | Koksal, G., Batmaz, I., and Testik, M. C. (2011), A review of data mining applications for quality improvement in manufacturing industry, Expert Systems with Applications, 38(10), 13448-13467. DOI |
20 | Lemon, S. C., Roy, J., Clark, M. A., Friedmann, P. D., and Rakowski, W. (2003), Classification and regression tree analysis in public health : methodological review and comparison with logistic regression, Annals of Behavioral Medicine, 26(3), 172-181. DOI |
21 | Lin, W. J. and Chen, J. J. (2012), Class-imbalanced classifiers for high-dimensional data, Briefings in bioinformatics, 14(1), 13-26. DOI |
22 | Little, R. J. and Rubin, D. B. (2002), Statistical Analysis with Missing Data, 2nd edition, John Wiley and Sons, New York. |
23 | Park, J. H. and Byun, J. H. (2002), An analysis method of superlarge manufacturing process data using cleaning and graphical analysis, Journal of the Korean Society for Quality Management, 30(2), 72-85. 과학기술학회마을 |
24 | Polo, J. L., Berzal, F., and Cubero, J. C. (2006), Taking class importance into account, In Hybrid Information Technology, ICHIT'06. International Conference on, 1, 1-6. |
25 | Pyle, D. (1999), Data preparation for data mining, Morgan Kaufmann, San Francisco, USA. |
26 | Shmueli, G., Patel, N. R., and Bruce, P. C. (2011), Data Mining for Business Intelligence : Concepts, Techniques, and Applications in Microsoft Office Excel with XLMiner, 2nd edition, Wiley, New York, USA. |
27 | Storey, J. D. (2002), A direct approach to false discovery rates. Journal of the Royal Statistical Society : Series B (Statistical Methodology), 64(3). |
28 | Weiss, G. M. and Provost, F. (2001), The effect of class distribution on classifier learning : an empirical study, Technical Report ML-TR-44, Department of Computer Science, Rutgers University. |
29 | Strobl, C., Boulesteix, A. L., Kneib, T., Augustin, T., and Zeileis, A. (2008), Conditional variable importance for random forests, BMC bioinformatics, 9(1), 307. DOI |
30 | Van Hulse, J., Khoshgoftaar, T. M., and Napolitano, A. (2007), Experimental perspectives on learning from imbalanced data, In Proceedings of the 24th international conference on Machine learning, 935-942. |
31 | Zeng, H. and Cheun, T. (2008), Feature selection for clustering high dimensional data, Lecture Notes in Artificial Intelligence, 5351, 913-922. |