Browse > Article
http://dx.doi.org/10.11627/jksie.2022.45.4.086

Anomaly Detection Model Based on Semi-Supervised Learning Using LIME: Focusing on Semiconductor Process  

Kang-Min An (Department of Management Consulting, Graduate School of Hanyang University)
Ju-Eun Shin (Department of Management Consulting, Graduate School of Hanyang University)
Dong Hyun Baek (Division of Business Administration, Hanyang University)
Publication Information
Journal of Korean Society of Industrial and Systems Engineering / v.45, no.4, 2022 , pp. 86-98 More about this Journal
Abstract
Recently, many studies have been conducted to improve quality by applying machine learning models to semiconductor manufacturing process data. However, in the semiconductor manufacturing process, the ratio of good products is much higher than that of defective products, so the problem of data imbalance is serious in terms of machine learning. In addition, since the number of features of data used in machine learning is very large, it is very important to perform machine learning by extracting only important features from among them to increase accuracy and utilization. This study proposes an anomaly detection methodology that can learn excellently despite data imbalance and high-dimensional characteristics of semiconductor process data. The anomaly detection methodology applies the LIME algorithm after applying the SMOTE method and the RFECV method. The proposed methodology analyzes the classification result of the anomaly classification model, detects the cause of the anomaly, and derives a semiconductor process requiring action. The proposed methodology confirmed applicability and feasibility through application of cases.
Keywords
Semiconductor Fabrication Process; SMOTE; RFECV; Anomaly Detection; LIME;
Citations & Related Records
Times Cited By KSCI : 2  (Citation Analysis)
연도 인용수 순위
1 Chauhan, K.K., Joshi, G., Kaur, M., and Vig, R., Semiconductor wafer defect classification using convolution neural network: a binary case, In IOP Conference Series: Materials Science and Engineering, 2022, Vol. 1225, No. 1, pp. 012060, IOP Publishing.
2 Chawla, N.V., Bowyer, K.W., Hall, L.O., and Kegelmeyer, W. P., SMOTE: synthetic minority over-sampling technique, Journal of Artificial Intelligence Research, 2002, Vol. 16, pp. 321-357.   DOI
3 Choi, S.J., Technology of Design and Manufacturing Process of Nano Semiconductor Devices, Free Academy, 2021.
4 Cox, D.R., The regression analysis of binary sequences, Journal of the Royal Statistical Society: Series B (Methodological), 1958, Vol. 20, No. 2, pp. 215-232.   DOI
5 Doran, D., Schulz, S., and Besold, T.R., What does explainable AI really mean? A new conceptualization of perspectives, 2017, arXiv preprint arXiv:1710.00794.
6 Ertekin, S., Huang, J., Bottou, L., and Giles, L., Learning on the border: active learning in imbalanced data classification, In Proceedings of the Sixteenth ACM Conference on Conference on Information and Knowledge Management, 2007, pp. 127-136.
7 Goodlin, B.E., Boning, D.S., Sawin, H.H., and Wise, B.M., Simultaneous fault detection and classification for semiconductor manufacturing tools, Journal of the Electrochemical Society, 2003, Vol. 150, No.12, G778.
8 Guyon, I., Weston, J., Barnhill, S., and Vapnik, V., Gene selection for cancer classification using support vector machines, Machine Learning, 2002, Vol. 46, No. 1, pp. 389-422.   DOI
9 Hawkins, D.M., Identification of outliers, Biometrical Journal, 1980, London: Chapman and Hall, Vol. 29, pp. 198-198.
10 Heo, S.W. and Baek, D.H., A Methodology for Bankruptcy Prediction in Imbalanced Datasets using eXplainable AI, Journal of Korean Society of Industrial and Systems Engineering, 2022, Vol.45, No.2, pp. 65-76.   DOI
11 Jung, I.S., The Future of the Semiconductor Empire, Ire media, 2021.
12 Ke, G., Meng, Q., Finley, T., Wang, T., Chen, W., Ma, W., and Liu, T.Y., Lightgbm: A Highly Efficient Gradient Boosting Decision Tree, Advances in Neural Information Processing Systems, 2017, Vol. 30, pp. 3146-3154.
13 Kim, C.G. and Kang, J.W., LSTM based Anomaly Detection on semiconductor manufacturing data, Proceedings of the Korean Information Science Society Conference, 2017, pp. 760-762.
14 Kim, D.W., Shin, G.Y., Yun, J.Y., Kim, S.S., and Han, M.M., Application of Discrete Wavelet Transforms to Identify Unknown Attacks in Anomaly Detection Analysis, Journal of Internet Computing and Services, 2021, Vol. 22, No. 3, pp. 45- 52.   DOI
15 Kim, H.S. and Lee, H.S., Fault Detect and Classification Framework for Semiconductor Manufacturing Processes using Missing Data Estimation and Generative Adversary Network, Journal of the Korean Society of Intelligent Systems, 2018, Vol. 28, No.4, pp. 393-400.
16 Kim, J.E., Park, N.S., Yun, S.J., Chae, S.H., and Yoon, S.M., Application of Isolation Forest Technique for Outlier Detection in Water Quality Data, Journal of Korean Society of Environmental Engineers, 2018, Vol. 40, No. 12, pp. 473-480.   DOI
17 Kim, J.K., Han, Y.S., and Lee, J.S., Data imbalance problem solving for smote based oversampling: Study on fault detection prediction model in semiconductor manufacturing process, Advanced Science and Technology Letters, 2016, Vol. 133, pp. 79-84.
18 Korea Semiconductor Industry Association, Silicon Times, Vol. 601, 2021, https://ksia.or.kr/mail/20210607/1.pdf.   DOI
19 Kim, J.W., A Study on Deterministic Utilization of Facilities for Allocation in the Semiconductor Manufacturing, Journal of Korean Society of Industrial and Systems Engineering, 2016, Vol. 39, No. 1, pp. 153-161.   DOI
20 Kim, J.W., Strategies to leverage manufacturing big data, haum, 2020.
21 Kwon, C.M., Python Machine Learning Complete Guide, Wikibooks, 2020.
22 Lee, J.H., A New Abnormal Yields Detection Methodology in the Semiconductor Manufacturing Process, Journal of Information Technology Applications & Management, 2008, Vol. 15, No.1, pp. 243-260.
23 Lee, Y.J., Park, G.A., and Kim, S.J., Analysis of Landslide Hazard Area using Logistic Regression Analysis and AHP (Analytical Hierarchy Process) Approach, Journal of the Korean Society of Civil Engineers D, 2006, Vol. 26, No. 5D, pp. 861-867.
24 Li, J., Cheng, K., Wang, S., Morstatter, F., Trevino, R.P., Tang, J., and Liu, H., Feature selection: A data perspective, ACM computing surveys (CSUR), 2017, Vol. 50, No. 6, pp. 1-45.
25 Liao, D.Y., Chen, C.Y., Tsai, W.P., Chen, H.T., Wu, Y.T., and Chang, S.C., Anomaly detection for semiconductor tools using stacked autoencoder learning, In 2018 International Symposium on Semiconductor Manufacturing (ISSM), 2018, pp. 1-4, IEEE.
26 Liu, F.T., Ting, K.M., and Zhou, Z., Isolation Forest, 2008 Eighth IEEE International Conference on Data Mining, 2008, pp. 413-422.
27 Liu, J., Hu, Q., and Yu, D., A comparative study on rough set based class imbalance learning, Knowledge-Based Systems, 2008, Vol. 21, No. 8, pp. 753-763.
28 Michael McCann and Adrian Johnston., UCI Machine Learning Repository, 2008, https://archive.ics.uci.edu/ ml/datasets/SECOM.
29 Maggipinto, M., Beghi, A., and Susto, G.A., A Deep Convolutional Autoencoder-Based Approach for Anomaly Detection With Industrial, Non-Images, 2-Dimensional Data: A Semiconductor Manufacturing Case Study, IEEE Transactions on Automation Science and Engineering, 2022.
30 McKinsey & Company, Game changers: Five opportunities for US growth and renewal, McKinsey Global Institute, 2013.
31 Nam, C.H. and Jang, K.S., Korean Sentiment Model Interpretation using LIME Algorithm, Journal of the Korea Institute of Information and Communication Engineering, 2021, Vol. 25, No. 12, pp. 1784-1789.
32 Randolph-Gips, M., A new neural network to process missing data without Imputation, In 2008 Seventh International Conference on Machine Learning and Applications, 2008, pp. 756-762, IEEE.
33 Raschka, S. and Mirjalili, V., Machine Learning Textbook with Python, Scikit-Learn, TensorFlow, gilbut, 2019, pp. 137-140.
34 Ribeiro, M.T., Singh, S., and Guestrin, C., "Why should i trust you?" Explaining the predictions of any classifier, In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2016, pp. 1135-1144.
35 Rothman, D., Hands-On Explainable AI(XAI) with Python, DK Road Books, 2021.
36 Samek, W., Wiegand, T., and Muller, K.R., Explainable artificial intelligence: Understanding, visualizing and interpreting deep learning models, 2017, arXiv preprint arXiv:1708.08296.
37 Schlosser, T., Friedrich, M., Beuth, F., and Kowerko, D., Improving automated visual fault inspection for semiconductor manufacturing using a hybrid multistage system of deep neural networks, Journal of Intelligent Manufacturing, 2022, pp. 1-25.
38 Stehman, S.V., Selecting and interpreting measures of thematic classification accuracy, Remote sensing of Environment, 1997, Vol. 62, No. 1, pp. 77-89.   DOI
39 Scholkopf, B., Platt, J.C., Shawe-Taylor, J., Smola, A.J., and Williamson, R.C., Estimating the support of a high-dimensional distribution, Neural Computation, 2001, Vol. 13, No. 7, pp. 1443-1471.   DOI
40 Software Policy Research Institute, Explainable AI, 2021, https://spri.kr/posts/view/23296?code=industry_trend.
41 Stekhoven, D.J. and Buhlmann, P., MissForest-non-parametric missing value imputation for mixed-type data, Bioinformatics, 2012, Vol. 28, No. 1, pp. 112-118.   DOI
42 Susto, G.A., Terzi, M., and Beghi, A., Anomaly detection approaches for semiconductor manufacturing, Procedia Manufacturing, 2017, Vol. 11, pp. 2018-2024.
43 West, D., Dellana, S., and Qian, J., Neural network ensemble strategies for financial decision applications, Computers and Operations Research, 2005, Vol.32, No.10, pp. 2543-2559.   DOI
44 XGBoost Tutorials, XGBoost Tutorials - xgboost 1.4.0-SNAPSHOT documentation, https://xgboost.read thedocs.io/en/latest/tutorials/index.html. 
45 Breiman, L., Random Forests, Machine Learning, 2001, Vol. 45, pp. 5-32.   DOI
46 Al Sarah, N., Rifat, F.Y., Hossain, M.S., and Narman, H.S., An Efficient Android Malware Prediction Using Ensemble machine learning algorithms, Procedia Computer Science, 2021, Vol. 191, pp. 184-191.
47 An, J.H., XAI, Explanable Artificial Intelligence, Dissects Artificial Intelligence, Wikibooks, 2020.
48 Andrew Ng, Developing and Evaluating an Anomaly Detection System[Video], coursera, n.d., https://www.coursera.org/learn/machine-learning.
49 Chandola, V., Banerjee, A., and Kumar, V., Anomaly detection: A survey, ACM computing surveys (CSUR), 2009, Vol. 41, No. 3, pp. 1-58.   DOI