DOI QR코드

DOI QR Code

Finding the Optimal Data Classification Method Using LDA and QDA Discriminant Analysis

  • Kim, SeungJae (Department of Convergence, Honam University) ;
  • Kim, SungHwan (National Program of Excellence in Software center, Chosun University)
  • Received : 2020.10.02
  • Accepted : 2020.11.28
  • Published : 2020.12.31

Abstract

With the recent introduction of artificial intelligence (AI) technology, the use of data is rapidly increasing, and newly generated data is also rapidly increasing. In order to obtain the results to be analyzed based on these data, the first thing to do is to classify the data well. However, when classifying data, if only one classification technique belonging to the machine learning technique is applied to classify and analyze it, an error of overfitting can be accompanied. In order to reduce or minimize the problems caused by misclassification of the classification system such as overfitting, it is necessary to derive an optimal classification by comparing the results of each classification by applying several classification techniques. If you try to interpret the data with only one classification technique, you will have poor reasoning and poor predictions of results. This study seeks to find a method for optimally classifying data by looking at data from various perspectives and applying various classification techniques such as LDA and QDA, such as linear or nonlinear classification, as a process before data analysis in data analysis. In order to obtain the reliability and sophistication of statistics as a result of big data analysis, it is necessary to analyze the meaning of each variable and the correlation between the variables. If the data is classified differently from the hypothesis test from the beginning, even if the analysis is performed well, unreliable results will be obtained. In other words, prior to big data analysis, it is necessary to ensure that data is well classified to suit the purpose of analysis. This is a process that must be performed before reaching the result by analyzing the data, and it may be a method of optimal data classification.

Keywords

References

  1. H. S. Lee, E. A. Kwak, D. S. Han, "A Study on Factors Affecting Avoidance of Based on Big Data AI Retargeting Advertising", Korean Association for Advertising and Public Relations, Advertising Research (120), pp.80-111, 2019(3).
  2. S. W. Jeon, J. H. Lee, J. T. Lee, "A Study on the Users Intention to Adopt an Intelligent Service: Focusing on the Factors Affecting the Perceived Necessity of Conversational A.I. Service", Journal of Korea Technology Innovation Society ,Korea Technology Innovation Society, Vol. 22, NO. 2, pp.242-264, 2019(4). https://doi.org/10.35978/jktis.2019.04.22.2.242
  3. Y. W. Park, D. G. Won, SB Choi, "Modeling the Dynamics of Wildbird's Avian Influenza Using the System Dynamics", The Korea Contents Society, Vol.7, No.1, pp.1130-1135, 2009(5).
  4. J. H. Lee, M. J. Lee, W. K. Kim, H. G. Kim, "A Study on Perception of Swimsuit Using Big Data Text-Mining Analysis", Korean Journal of Sport Science, Vol.28. No.1, pp.104-116, 2017. https://doi.org/10.24985/kjss.2017.28.1.104
  5. C. N. Jun, I. W. Seo, "Analyzing the Bigdata for Practical Using into Technology Marketing : Focusing on the Potential Buyer Extraction", Korean Strategic Marketing Association, Vol.21, No.2(58), pp. 181-203. 2013(6).
  6. M. S. Suh, D. H. Kim, "A Study on the Changing Direction of FinTech Service Model based on Big Data", Global e-Business Association, Vol.20, No.2, pp.195-213, 2019(4).
  7. M. K. Jung, S. Y. Kwon, "A Study on Internet of Things based on Semantic for Library", Journal of Korean Library and Information Science Society, Vol.45, No.2, pp.235-260, 2014(6). https://doi.org/10.16981/kliss.45.2.201406.235
  8. J. H. Jeong, "A study on the Techniques Trends and Prospects for Internet of Things", Korea Information Assurance Society, Vol.14, No.7, pp.65-73, 2014.
  9. Y. W. Park, D. G. Won, S. B. Choi, "Modeling the Dynamics of Wildbird's Avian Influenza Using the System Dynamics", The Korea Contents Society, Vol.7, No.1, pp.1130-1135, 2009(5).
  10. S. H. Lee, D. W. Lee, "A Study on Internet of Things in IT Convergence Period", Journal of Digital Convergence, Vol.12, No.7, pp.267-272, 2014(7) https://doi.org/10.14400/JDC.2014.12.7.267
  11. T. M. Mitchell, "The discipline of machine learniing(Vol. 9)", Carnegie Mellon University, Shcool of Computer Science, MachineLearning Department, 2006.
  12. J. B. Park, S. J. Park, J. J. Jung, Y. W. Kim, "Development of Intelligent Video Surveillance Technology to Solve Problem of Deteriorating Arrest Rate by Improving CCTV Constraint", The Journal of The Korean Institute of Communication Sciences(Information and Communication). Vol 37. No. 1, pp.17-24.2019(12).
  13. H. H. Lim, S. J. Kim, B. J. Lee, K. T. Kim, H. Y. Youn, "Problems and Solutions for Machine Learning", Proceedings of the Korean Society of Computer Information Conference, Korean Society of Computer Information, Vol 26. No. 2, pp.33- 34.2018(7).
  14. S. H. Choi, M. S. Do, "Prediction of Asphalt Pavement Service Life using Deep Learning", International Journal of Highway Engineering, Korean Society of Road Engineers, Vol 20. No. 2, pp.57-65. 2018(4).
  15. Y. J. Kim, J. W. Ryu, W. M. Song, M. W. Kim, "Fire Probability Prediction Based on Weather Information Using Decision Tree", Journal of KIISE, JOK:software and application", Vol.40, No.11, 2013.11.
  16. N. K. Um, S. H. Woo, S. H. Lee, "The Hybrid Model using SVM and Decision Tree for Intrusion Detection", KIPS Transactions on Computer and Communication Systems, Vol.14, No.1, pp.1-6, 2007.
  17. K. N. Lee, H. C. Lee, "A Study on the Combined Decision Tree(C4.5) and Neural Network Algorithm for Classification of Mobile Telecommunication Customer", Korea Intelligent Information Systems Society, Vol.9, No.1, pp.139-155, 2003.06
  18. P. J. Kim, "An Analytical Study on Automatic Classification of Domestic Journal articles Using Random Forest", Journal of the Korean Society for information Management , vol.36. no.2, pp.57-77, 2018. https://doi.org/10.3743/KOSIM.2019.36.2.057
  19. Taegyun. & Yi, Gwan-Su, "Application of Random Forest algorithm for the decision support system of medical diagnosis with the selection of significant clinical test", The Transaction of the Korean Institute of Electrical Engineers, 57(6), pp. 1058-1062, 2008.
  20. M. O. Faruqe, M. Al Mehedi Hasan, "Face Recognition Using PCA and SVM", Anti-counterfeiting, Security, and Identification in Communication, 2009. ASID 2009. 3rd International Conference on, pp. 97-101, 2009.
  21. J. H. Yoo, M. H. Cho, "A Study on Trust, Conflict and Types of Welfare Consciousness among Korean 20s and 30s - Using Latent Class Analysis and Logistic Regression-", KOREAN SOCIETY AND PUBLIC ADMINISTRACTION, Seoul Association For Public Administration, Vol.27, No.1, pp.171-207, 2016(5).
  22. J. H. Kwon, E. H. Lee, "Predicting Game Addiction in Adolescents: An Application of Discriminant Function Analysis", The Korean Journal of Health Psychology, The Korean Psychological Association, Vol.10, NO.1, pp.95-112, 2005(3).
  23. J. W. Hwa, C. Y. Park, "Variable Selection in Linear Discriminant Analysis", Journal of The Korean Data Analysis Society(JKDAS), Vol.11, No.1, pp.381-389, 2009
  24. Y. H. Oh, H. Kim, J. S. Yun, J. S. Lee, "Using Data Mining Techniques to Predict Win-Loss in Korean Professional Baseball Games", Journal of the Korean Institute of Industrial Engineers(KIIE), Vol.40, No.1, pp.8-17, 2014(2). https://doi.org/10.7232/JKIIE.2014.40.1.008
  25. J. K. Lee, J. S. Kim, "Study on the Deacidification of Wine Made from Campbell Early", Korean Journal of Food Science and Technology, Korean Society of Food Science and Technology, Vol.38, No.3, pp.408-413, 2006(6).
  26. D. H. Kim, G. S. Baek, Y. D. Kim, "A study on complexity of deep learning model", Journal of the Korean Data And Information Science Society, The Korean Data and Information Science Society, Vol.28, NO.6, pp.1217-1227, 2017(11). https://doi.org/10.7465/jkdi.2017.28.6.1217