DOI QR코드

DOI QR Code

Exploring the Performance of Multi-Label Feature Selection for Effective Decision-Making: Focusing on Sentiment Analysis

효과적인 의사결정을 위한 다중레이블 기반 속성선택 방법에 관한 연구: 감성 분석을 중심으로

  • 원종윤 (성균관대학교 경영대학 일반대학원) ;
  • 이건창 (성균관대학교 경영대학)
  • Received : 2022.04.12
  • Accepted : 2022.12.28
  • Published : 2023.02.28

Abstract

Management decision-making based on artificial intelligence(AI) plays an important role in helping decision-makers. Business decision-making centered on AI is evaluated as a driving force for corporate growth. AI-based on accurate analysis techniques could support decision-makers in making high-quality decisions. This study proposes an effective decision-making method with the application of multi-label feature selection. In this regard, We present a CFS-BR (Correlation-based Feature Selection based on Binary Relevance approach) that reduces data sets in high-dimensional space. As a result of analyzing sample data and empirical data, CFS-BR can support efficient decision-making by selecting the best combination of meaningful attributes based on the Best-First algorithm. In addition, compared to the previous multi-label feature selection method, CFS-BR is useful for increasing the effectiveness of decision-making, as its accuracy is higher.

본 연구는 인공지능 기법 중 다중레이블 속성선택 방법을 적용하여 복잡한 경영환경에서 의사결정의 효과성을 증대시키는 방안을 설명한다. 인공지능 기반의 의사결정 시스템은 의사결정자의 선택과 판단을 돕거나, 대신하는 중요한 역할을 한다. 더욱이 최근 인공지능을 중심으로 한 비즈니스 의사결정은 기업의 성장 동력으로 평가받는데, 이를 위해서는 효과적인 의사결정 방법이 수반되어야 한다. 이에 본 연구는 의미 있는 속성값을 선별하는 CFS-BR(이진연관성 접근 기반의 상관관계 속성선택 모델)을 제안하여, 효과적인 의사결정을 지원하는 것을 돕는다. 예시데이터와 실증데이터의 분석 결과, CFS-BR은 유의미한 속성을 최상우선선별 알고리즘 기반으로 최상의 조합을 선별하므로 효율적 의사결정을 지원할 수 있고, 기존의 다중 레이블 속성선택 방법과 비교하였을 때 정확도가 높은 것으로 보아 효과적인 의사결정을 증대시키는 데 유용하다.

Keywords

References

  1. 미아오쉬, 이재성, "음악감성 인식 정확도 향상을 위한 노이즈 제거 기술의 효과 비교 연구", 인공지능인문학연구, 제1권, 2018, pp. 97-123, Available at http://dx.doi.org/10.46397/JAIH.1.5. 
  2. 민동영, 조성준, "코스피 상장 기업의 다중 레이블 분류를 위한 산업군 키워드 사전의 구축: 단어 임베딩 공간 사이의 선형변환학습을 중심으로", 대한산업공학회 추계학술대회 논문집, 2017, pp. 2426-2468. 
  3. 임소라, 권용진, "특허문서 필드의 기능적 특성을 활용한 IPC 다중 레이블 분류", 인터넷정보학회논문지, 제18권, 제1호, 2017, pp. 77-88.  https://doi.org/10.7472/jksii.2017.18.1.77
  4. 임채현, 손민지, 김명호, "다중 레이블 분류를 활용한 안면 피부 질환 인식에 관한 연구", 정보처리학회논문지/소프트웨어 및 데이터 공학, 제10권, 제12호, 2021. 
  5. 장수진, 위정아, 김영빈, "합성곱 신경망의 멀티 레이블 학습을 통한 한국 영화 포스터의 장르 예측", 한국 HCI 학회 학술대회, 2019, pp. 746-749. 
  6. 정풀잎, 안현철, 곽기영, "텍스트 마이닝과 소셜 네트워크 분석을 이용한 스마트폰 디자인의 핵심속성 및 가치 식별", 대한경영학회지, 제32권, 제1호, 2019, pp. 27-47, Available at http://dx.doi.org/10.18032/kaaba.2019.32.1.27. 
  7. Agrawal, R., A. Gupta, Y. Prabhu, and M. Varma, "Multi-label learning with millions of labels: Recommending advertiser bid phrases for web pages", 22nd International Conference on World Wide Web, 2013, pp. 13-24, Available at http://dx.doi.org/10.1145/2488388.2488391. 
  8. Ahmadi, Z. and S. Kramer, "A label compression method for online multi-label classification", Pattern Recognition Letters, 2018, pp. 64-71, Available at http://dx.doi.org/10.1016/j.patrec.2018.04.015. 
  9. Azhagusundari, B. and A. S. Thanamani, "Feature selection based on information gain", International Journal of Innovative Technology and Exploring Engineering, Vol. 2, No. 2, 2013, pp. 18-21. 
  10. Bogaert, M., J. Lootens, D. Van den Poel, and M. Ballings, "Evaluating multi-label classifiers and recommender systems in the financial service sector", European Journal of Operational Research, Vol. 279, No. 2, 2019, pp. 620-634, Available at http://dx.doi.org/10.1016/j.ejor.2019.05.037. 
  11. Bromuri, S., D. Zufferey, J. Hennebert, and M. Schumacher, "Multi-label classification of chronically ill patients with bag of words and supervised dimensionality reduction algorithms", Journal of Biomedical Informatics, Vol. 51, 2014, pp. 165-175, Available at http://dx.doi.org/10.1016/j.jbi.2014.05.010. 
  12. Cai, J., J. Luo, S. Wang, and S. Yang, "Feature selection in machine learning: A new perspective", Neurocomputing, Vol. 300, 2018, pp. 70-79, Available at http://dx.doi.org/10.1016/j.neucom.2017.11.077. 
  13. Chatterjee, A., U. Gupta, M. K. Chinnakotla, R. Srikanth, M. Galley, and P. Agrawal, "Understanding emotions in text using deep learning and big data", Computers in Human Behavior, Vol. 93, 2019, pp. 309-317, Available at http://dx.doi.org//10.1016/j.chb.2018.12.029. 
  14. Corani, G. and M. Scanagatta, "Air pollution prediction via multi-label classification", Environmental Modelling & Software, Vol. 80, 2016, pp. 259-264, Available at http://dx.doi.org/10.1016/j.envsoft.2016.02.030. 
  15. Dash, M. and H. Liu, "Feature selection for classification", Intelligent Data Analysis, Vol. 1, No. 3, 1997, pp. 131-156, Available at http://dx.doi.org/10.3233/IDA-1997-1302. 
  16. de Morais, J. I., H. Q. Abonizio, G. M. Tavares, A. A. da Fonseca, and S. Barbon, "Deciding among fake, satirical, objective and legitimate news: A multi-label classification system", The XV Brazilian Symposium on Information Systems, 2019, pp. 1-8, Available at http://dx.doi.org/10.1145/3330204.3330231. 
  17. Doquire, G. and M. Verleysen, "Feature selection for multi-label classification problems", International Work-Conference on Artificial Neural Networks, 2011, pp. 9-16. 
  18. Doshi, M., "Correlation based feature selection (CFS) technique to predict student performance", International Journal of Computer Networks & Communications, Vol. 6, No. 3, 2014, pp. 197-206.  https://doi.org/10.5121/ijcnc.2014.6315
  19. Elisseeff, A. and J. Weston, "A kernel method for multi-labelled classification", Neural Information Processing Systems, 2002, pp. 681-687. 
  20. Erevelles, S., N. Fukawa, and L. Swayne, "Big Data consumer analytics and the transformation of marketing", Journal of Business Research, Vol. 69, No. 2, 2016, pp. 897-904, Available at http://dx.doi.org/10.1016/j.jbusres.2015.07.001. 
  21. Folorunso, S. O., S. G. Fashoto, J. Olaomi, and O. Y. Fashoto, "A multi-label learning model for psychotic diseases in Nigeria", Informatics in Medicine Unlocked, 2020, Available at http://dx.doi.org/10.1016/j.imu.2020.100326. 
  22. Fujii, M., H. Sakaji, S. Masuyama, and H. Sasaki, "Extraction and classification of risk-related sentences from securities reports", International Journal of Information Management Data Insights, Vol. 2, No. 2, 2022, 100096. 
  23. George, G., M. R. Haas, and A. Pentland, "Big data and management", 2014, Available at http://dx.doi.org/10.5465/amj.2014.4002. 
  24. Gera, M. and S. Goel, "Data mining-techniques, methods and algorithms: A review on tools and their validity", International Journal of Computer Applications, Vol. 113, 2015, pp. 22-29.  https://doi.org/10.5120/19926-2042
  25. Ghamrawi, N. and A. McCallum, "Collective multi-label classification", The 14th ACM international conference on Information and Knowledge Management, 2005, pp. 195-200. 
  26. Giatsoglou, M., M. G. Vozalis, K. Diamantaras, A. Vakali, G. Sarigiannidis, and K. C. Chatzisavvas, "Sentiment analysis leveraging emotions and word embeddings", Expert Systems with Applications, Vol. 69, 2017, pp. 214-224, Available at http://dx.doi.org/10.1016/j.eswa.2016.10.043. 
  27. Gupta, A., P. Panagiotopoulos, and F. Bowen, "An orchestration approach to smart city data ecosystems", Technological Forecasting and Social Change, Vol 153, 2020, 119929, Available at http://dx.doi.org/10.1016/j.techfore.2020.119929. 
  28. Gupta, P., T. K. Sharma, and D. Mehrotra, Label Powerset Based Multi-label Classification for Mobile Applications, In Soft Computing: Theories and Applications, Springer, Singapore, 2019. 
  29. Guyon, I. and A. Elisseeff, "An introduction to variable and feature selection", Journal of Machine Learning Research, Vol. 3, No. 3, 2003, pp. 1157-1182. 
  30. He, W., F. K. Wang, and V. Akula, "Managing extracted knowledge from big social media data for business decision making", Journal of Knowledge Management, 2017. 
  31. Henrique, B. M., V. A. Sobreiro, and H. Kimura, "Literature review: Machine learning techniques applied to financial market prediction", Expert Systems with Applications, Vol. 124, 2019, pp. 226-251, Available at http://dx.doi.org//10.1016/j.eswa.2019.01.012. 
  32. Jabreel, M. and A. Moreno, "A deep learning-based approach for multi-label emotion classification in tweets", Applied Sciences, Vol. 9, No. 6, 2019, pp. 1-16, Available at http://dx.doi.org/10.3390/app9061123. 
  33. Jiang, A., C. Wang, and Y. Zhu, "Calibrated rank-svm for multi-label image categorization", IEEE International joint conference on Neural Networks, 2008, pp. 1450-1455, Available at http://dx.doi.org/10.1109/IJCNN.2008.4633988. 
  34. Jungjit, S., M. Michaelis, A. A. Freitas, and J. Cinatl, "Two extensions to multi-label correlation-based feature selection: A case study in bioinformatics", IEEE International Conference on Systems, Man, and Cybernetics, 2013, pp. 1519-1524. 
  35. Khan, A. U. R., M. Khan, and M. B. Khan, "Naive Multi-label classification of YouTube comments using comparative opinion mining", Procedia Computer Science, Vol. 82, 2016, pp. 57-64, Available at http://dx.doi.org/10.1016/j.procs.2016.04.009. 
  36. Le, T. H., C. Arcodia, M. A. Novais, A. Kralj, and T. C. Phan, "Exploring the multi-dimensionality of authenticity in dining experiences using online reviews", Tourism Management, Vol. 85, 2021, 104292. 
  37. Lee, J. and D. W. Kim, "SCLS: Multi-label feature selection based on scalable criterion for large label set", Pattern Recognition, Vol. 66, 2017, pp. 342-352, Available at http://dx.doi.org/10.1016/j.patcog.2017.01.014. 
  38. Lee, J., I. Yu, J. Park, and D. W. Kim, "Memetic feature selection for multilabel text categorization using label frequency difference", Information Sciences, Vol. 485, 2019, pp. 263-280, Available at http://dx.doi.org/10.1016/j.ins.2019.02.021. 
  39. Li, J., K. Cheng, S. Wang, F. Morstatter, R. P. Trevino, J. Tang, and H. Liu, "Feature selection: A data perspective", ACM Computing Surveys, Vol. 50, No. 6, 2017, pp. 1-45, Available at http://dx.doi.org/10.1145/3136625. 
  40. Lin, S. C., C. J. Chen, and T. J. Lee, "A Multi-Label Classification With Hybrid Label-Based Meta-Learning Method in Internet of Things", IEEE Access, Vol. 8, 2020, pp. 42261-42269.  https://doi.org/10.1109/ACCESS.2020.2976851
  41. Liu, L., D. Dzyabura, and N. Mizik, "Visual listening in: Extracting brand image portrayed on social media", Marketing Science, Vol. 39, No. 4, 2020, pp 669-686.  https://doi.org/10.1287/mksc.2020.1226
  42. Liu, S. M. and J. H. Chen, "A multi-label classification based approach for sentiment classification", Expert Systems with Applications, Vol. 42, No. 3, 2015, pp. 1083-1093, Available at http://dx.doi.org/10.1016/j.eswa.2014.08.036. 
  43. Liu, W. and I. Tsang, "On the optimality of classifier chain for multi-label classification", Neural Information Processing Systems, 2015, pp. 712-720. 
  44. Marcheggiani, D., O. Tackstrom, A. Esuli, and F. Sebastiani, "Hierarchical multi-label conditional random fields for aspect-oriented opinion mining", European Conference on Information Retrieval, 2014, pp. 273-285. 
  45. Miao, J. and L. Niu, "A survey on feature selection", Procedia Computer Science, Vol. 91, 2016, pp. 919-926, Available at http://dx.doi.org/10.1016/j.procs.2016.07.111. 
  46. Montaes, E., R. Senge, J. Barranquero, J. R. Quevedo, J. J. del Coz, and E. Hllermeier, "Dependent binary relevance models for multi-label classification", Pattern Recognition, Vol. 47, No. 3, 2014, pp. 1494-1508, Available at http://dx.doi.org/10.1016/j.patcog.2013.09.029. 
  47. Nam, J., J. Kim, E. L. Menca, I. Gurevych, and J. Frnkranz, "Large-scale multi-label text classification?revisiting neural networks", Joint European Conference on Machine Learning and Knowledge Discovery in Databases, pp. 437-452, 2014. 
  48. Pereira, R. B., A. Plastino, B. Zadrozny, and L. H. Merschmann, "Correlation analysis of performance measures for multi-label classification", Information Processing & Management, Vol. 54, No. 3, 2018, pp. 359-369, Available at http://dx.doi.org/10.1016/j.ipm.2018.01.002. 
  49. Phillips-Wren, G., M. Daly, and F. Burstein, "Reconciling business intelligence, analytics and decision support systems: More data, deeper insight", Decision Support Systems, Vol. 146, 2021. 
  50. Priyadarsini, M. J. P., K. Murugesan, S. R. Inbathini, J. Vishal, S. Anand, and R. N. Nair, "Performance Evaluation of LDA, CCA and AAM", Research Journal of Applied Sciences, Engineering and Technology, Vol. 9, No. 9, 2015, pp. 685-699, Available at http://dx.doi.org/10.1109/ICOEI.2018.8553811. 
  51. Reyes, O., C. Morell, and S. Ventura, "Scalable extensions of the ReliefF algorithm for weighting and selecting features on the multi-label learning context", Neurocomputing, No. 161, 2015, pp. 168-182. 
  52. Rokach, L., A. Schclar, and E. Itach, "Ensemble methods for multi-label classification", Expert Systems with Applications, Vol. 41, No. 16, 2014, pp. 7507-7523, Available at http://dx.doi.org/10.1016/j.eswa.2014.06.015. 
  53. Schlegelmilch, B. B., K. Sharma, and S. Garg, Employing machine learning for capturing COVID-19 consumer sentiments from six countries: A methodological illustration", International Marketing Review, 2022. 
  54. Spolar, N., E. A. Cherman, M. C. Monard, and H. D. Lee, "A comparison of multi-label feature selection methods using the problem transformation approach", Electronic Notes in Theoretical Computer Science, Vol. 292, 2013, pp. 135-151, Available at http://dx.doi.org/10.1016/j.entcs.2013.02.010. 
  55. Stamatescu, G., I. Fagarasan, and A. Sachenko, "Sensing and data-driven control for smart building and smart city systems", Journal of Sensor, 2019, Available at http://dx.doi.org/10.1155/2019/4528034. 
  56. Sun, L., M. Kudo, and K. Kimura, "Multi-label classification with meta-label-specific features", 23rd International Conference on Pattern Recognition, 2016, pp. 1612-1617, Available at http://dx.doi.org/10.1109/ICPR.2016.7899867. 
  57. Tsoumakas, G. and I. Katakis, "Multi-label classification: An overview", International Journal of Data Warehousing and Mining, Vol. 3, No. 3, 2007, pp. 1-13.  https://doi.org/10.4018/jdwm.2007070101
  58. Tsoumakas, G., I. Katakis, and I. Vlahavas, Mining Multi-label Data. In Data mining and Knowledge Discovery Handbook, Springer, Boston, MA, 2009. 
  59. Tsoumakas, G., E. Spyromitros-Xioufis, J. Vilcek, and I. Vlahavas, "Mulan: A java library for multi-label learning", The Journal of Machine Learning Research, Vol. 12, 2011, pp. 2411-2414. 
  60. Vens, C., J. Struyf, L. Schietgat, S. Deroski, and H. Blockeel, "Decision trees for hierarchical multi-label classification", Machine Learning, Vol. 73, No. 2, 2008, pp. 185-214.  https://doi.org/10.1007/s10994-008-5077-3
  61. Wang, J. and J. D. Zucker, "Solving multiple-instance problem: A lazy learning approach", International Conference on Machine Learning, pp. 1119-1125, 2000. 
  62. Wang, J., Y. Yang, J. Mao, Z. Huang, C. Huang, and W. Xu, "Cnn-rnn: A unified framework for multi-label image classification", The IEEE conference on Computer Vision and Pattern Recognition, 2016, pp. 2285-2294. 
  63. Wang, R., S. Ye, K. Li, and S. Kwong, "Bayesian Network Based Label Correlation Analysis For Multi-label Classifier Chain", 2019, arXiv preprint arXiv:1908.02172. 
  64. Wehrmann, J. and R. C. Barros, "Movie genre classification: A multi-label approach based on convolutions through time", Applied Soft Computing, Vol. 61, 2017, pp. 973-982, Available at http://dx.doi.org/10.1016/j.asoc.2017.08.029. 
  65. Wu, G., R. Zheng, Y. Tian, and D. Liu, "Joint Ranking SVM and Binary Relevance with robust Low-rank learning for multi-label classification", Neural Networks, Vol. 122, 2020, pp. 24-39, Available at http://dx.doi.org/10.1016/j.neunet.2019.10.002. 
  66. Xu, S., X. Yang, H. Yu, D. J. Yu, J. Yang, and E. C. Tsang, "Multi-label learning with label-specific feature reduction", Knowledge-Based Systems, Vol. 104, 2016, pp. 52-61, Available at http://dx.doi.org/10.1016/j.knosys.2016.04.012. 
  67. Yassine, A., S. Singh, M. S. Hossain, and G. Muhammad, "IoT big data analytics for smart homes with fog and cloud computing", Future Generation Computer Systems, Vol. 91, 2019, pp. 563-573, Available at http://dx.doi.org/10.1016/j.future.2018.08.040. 
  68. Zhang, M. L. and Z. H. Zhou, "A k-nearest neighbor based algorithm for multi-label classification", IEEE International Conference on Granular Computing, 2005, pp. 718-721, Available at http://dx.doi.org/10.1109/GRC.2005.1547385. 
  69. Zhang, M. L. and Z. H. Zhou, "A review on multi-label learning algorithms", IEEE Transactions on Knowledge and Data Engineering, Vol. 26, No. 8, 2013, pp. 1819-1837, Available at http://dx.doi.org/10.1109/TKDE.2013.39. 
  70. Zhang, M. L. and Z. H. Zhou, "ML-KNN: A lazy learning approach to multi-label learning", Pattern Recognition, Vol. 40, No. 7, 2007, pp. 2038-2048, Available at http://dx.doi.org/10.1016/j.patcog.2006.12.019. 
  71. Zhang, M. L., Y. K. Li, X. Y. Liu, and X. Geng, "Binary relevance for multi-label learning: An overview", Frontiers of Computer Science, Vol. 12, No. 2, 2018, pp. 191-202, Available at http://dx.doi.org/10.1007/s11704-017-7031-7.