Browse > Article
http://dx.doi.org/10.7232/JKIIE.2014.40.4.404

Under Sampling for Imbalanced Data using Minor Class based SVM (MCSVM) in Semiconductor Process  

Pak, Sae-Rom (School of Industrial Management Engineering, Korea University)
Kim, Jun Seok (School of Industrial Management Engineering, Korea University)
Park, Cheong-Sool (School of Industrial Management Engineering, Korea University)
Park, Seung Hwan (School of Industrial Management Engineering, Korea University)
Baek, Jun-Geol (School of Industrial Management Engineering, Korea University)
Publication Information
Journal of Korean Institute of Industrial Engineers / v.40, no.4, 2014 , pp. 404-414 More about this Journal
Abstract
Yield prediction is important to manage semiconductor quality. Many researches with machine learning algorithms such as SVM (support vector machine) are conducted to predict yield precisely. However, yield prediction using SVM is hard because extremely imbalanced and big data are generated by final test procedure in semiconductor manufacturing process. Using SVM algorithm with imbalanced data sometimes cause unnecessary support vectors from major class because of unselected support vectors from minor class. So, decision boundary at target class can be overwhelmed by effect of observations in major class. For this reason, we propose a under-sampling method with minor class based SVM (MCSVM) which overcomes the limitations of ordinary SVM algorithm. MCSVM constructs the model that fixes some of data from minor class as support vectors, and they can be good samples representing the nature of target class. Several experimental studies with using the data sets from UCI and real manufacturing process represent that our proposed method performs better than existing sampling methods.
Keywords
Imbalanced Data; Under-Sampling; MCSVM; Support Vectors; Semiconductor Process;
Citations & Related Records
Times Cited By KSCI : 7  (Citation Analysis)
연도 인용수 순위
1 Yan, R., Liu, Y., Jin, R., and Hauptmann, A. (2003), On predicting rare classes with SVM ensembles in scene classification. In Acoustics, Speech, and Signal Processing, 2003, Proceedings (ICASSP '03), 2003 IEEE International Conference on, 3, III-21.
2 Li, T. S. and Huang, C. L. (2009), Defect spatial pattern recognition using a hybrid SOM-SVM approach in semiconductor manufacturing, Expert Systems with Applications, 36(1), 374-385.   DOI   ScienceOn
3 Scholkopf, B. and Smola, A. J. (2002), Learning with Kernels : Support Vector Machines, Regularization, Optimization and Beyond, MIT press.
4 Shin, H. and Cho, S. (2006), Response modeling with support vector machines, Expert Systems with Applications, 30(4), 746-760.   DOI   ScienceOn
5 Yen, S. J. and Lee, Y. S. (2009), Cluster-based under-sampling approaches for imbalanced data distributions, Expert Systems with Applications, 36(3), 5718-5727.   DOI   ScienceOn
6 Kang, P. and Cho, S. (2006), EUS SVMs : Ensemble of under-sampled SVMs for data imbalance problems, In Neural Information Processing (837-846), Springer Berlin Heidelberg.
7 Han, H. Y. (2009), Introduction of Patter Recognition, HANBIT Media, Seoul Korea.
8 Hsu, C. W., Chang, C. C., and Lin, C. J. (2003), A practical guide to support vector classification.
9 Jang, D. Y. and Bae, S. J., (2009), Hybrid Datamining Algorithm for Monitoring Input Variables in Semiconductor Manufacturing Process, IE Interfaces, 563-569.
10 Kim, J. W., Park, J. S., Kim, J. S., Kim, S. S., and Baek, J. G. (2014), Update Cycle Detection Method of Control Limits using Control Chart Performance Evaluation Model, Journal of the Korean Institute of Industrial Engineering, 40(1), 43-51.   과학기술학회마을   DOI   ScienceOn
11 Kim, M. J. (2012), Ensemble Learning with Support Vector Machines for Bond Rating, Journal of Intelligence and Information Systems, 18(2), 29-45.
12 Kim, M. S. and Baek, J. G. (2011), Fail Prediction of DRAM Module Outgoing Quality Assurance Inspection using Ensemble Learning Algorithm, IE Interfaces, 25(2), 178-186.   과학기술학회마을   DOI   ScienceOn
13 Kim, S. C. (2010), A Joint Design of Rectifying Inspection Plans and Service Capacities for Multi-Products, Journal of the Korea Operations Research and Management Science Society, 35(1), 97-109.   과학기술학회마을
14 Kim, S. E., Kang, J. H., Park, J. H., Kim, S. S., and Baek, J. G. (2012), Fault Detection of Unbalanced Cycle Signal Data Using SOMbased Feature Signal Extraction Method, Journal of The Korea Society for Simulation, 21(2), 79-90.
15 Kymal, C. and Patiyasevi, P. (2006), Semiconductor quality initiatives : How to maintain quality in this fast-changing industry, Quality Digest, 26(4), 43-48.
16 Chang, C. C. and Lin, C. J. (2001b), Training n-support vector classifiers : theory and algorithms, Neural Computation, 13(9), 2119-2147.   DOI   ScienceOn
17 Bache, K. and Lichman, M. (2013), UCI Machine Learning Repository, http://archive.ics.uci.edu/ml, Irvine, CA : University of California, School of Information and Computer Science.
18 Chawla, N. V., Bowyer, K. W., Hall, L. O., and Kegelmeyer, W. P. (2011), SMOTE : synthetic minority over-sampling technique, arXiv preprint arXiv : 1106.1813.
19 Baek, D. H. and Han, C. H. (2003), Application of Data mining for improving and predicting yield in wafer fabrication system, Journal of Intelligence and Information Systems, 9(1), 157-177.
20 Barandela, R., Sanchez, J. S., Garcia, V., and Rangel, E. (2003), Strategies for learning in class imbalance problems, Pattern Recognition, 36(3), 849-851.   DOI   ScienceOn
21 Chyi, Y.-M. (2003), Classification analysis techniques for skewed class distribution problems, Master thesis, Department of Information Management, National Sun Yat-Sen University.
22 An, D. W., Ko, H. H., Kim, J. H., Baek, J. G., and Kim, S. S. (2009), A Yields Prediction in the Semiconductor Manufacturing Process Using Stepwise Support Vector Machine, IE interfaces, 22(3), 252-262.   과학기술학회마을
23 Akbani, R., Kwek, S., and Japkowicz, N. (2004), Applying support vector machines to imbalanced datasets, In Machine Learning : ECML 2004(39-50). Springer Berlin Heidelberg.
24 Ciciani, B. and Iazeolla, G. (1991), A Markov chain-based yield formula for VLSI fault-tolerant chips, Computer-Aided Design of Integrated Circuits and Systems, IEEE Transactions on, 10(2), 252-259.   DOI   ScienceOn
25 Cortes, C. and Vapnik, V. (1995), Support-vector networks, Machine learning, 20(3), 273-297.
26 Crosier, R. B. (1988), Multivariate generalizations of cumulative sum quality-control schemes, Technometrics, 30(3), 291-303.   DOI   ScienceOn
27 Wu, G. and Chang, E. Y. (2003), Adaptive feature-space conformal transformation for imbalanced-data learning, In ICML, 816-823.
28 Goldberg, D. (1991), What every computer scientist should know about floating-point arithmetic, ACM Computing Surveys (CSUR), 23(1), 5-48.   DOI
29 Kim, K., Hwang, C. G., and Lee, J. G. (1998), DRAM technology perspective for gigabit era. Electron Devices, IEEE Transactions on, 45(3), 598-608.   DOI   ScienceOn
30 Cristianini, N. and Shawe-Taylor, J. (2000), An introduction to support vector machines and other kernel-based learning methods, Cambridge University press.