DOI QR코드

DOI QR Code

Analyzing the discriminative characteristic of cover letters using text mining focused on Air Force applicants

텍스트 마이닝을 이용한 공군 부사관 지원자 자기소개서의 차별적 특성 분석

  • Kwon, Hyeok (Department of Industrial Engineering, Yonsei University) ;
  • Kim, Wooju (Department of Industrial Engineering, Yonsei University)
  • 권혁 (연세대학교 산업공학과) ;
  • 김우주 (연세대학교 산업공학과)
  • Received : 2021.05.21
  • Accepted : 2021.07.29
  • Published : 2021.09.30

Abstract

The low birth rate and shortened military service period are causing concerns about selecting excellent military officers. The Republic of Korea entered a low birth rate society in 1984 and an aged society in 2018 respectively, and is expected to be in a super-aged society in 2025. In addition, the troop-oriented military is changed as a state-of-the-art weapons-oriented military, and the reduction of the military service period was implemented in 2018 to ease the burden of military service for young people and play a role in the society early. Some observe that the application rate for military officers is falling due to a decrease of manpower resources and a preference for shortened mandatory military service over military officers. This requires further consideration of the policy of securing excellent military officers. Most of the related studies have used social scientists' methodologies, but this study applies the methodology of text mining suitable for large-scale documents analysis. This study extracts words of discriminative characteristics from the Republic of Korea Air Force Non-Commissioned Officer Applicant cover letters and analyzes the polarity of pass and fail. It consists of three steps in total. First, the application is divided into general and technical fields, and the words characterized in the cover letter are ordered according to the difference in the frequency ratio of each field. The greater the difference in the proportion of each application field, the field character is defined as 'more discriminative'. Based on this, we extract the top 50 words representing discriminative characteristics in general fields and the top 50 words representing discriminative characteristics in technology fields. Second, the number of appropriate topics in the overall cover letter is calculated through the LDA. It uses perplexity score and coherence score. Based on the appropriate number of topics, we then use LDA to generate topic and probability, and estimate which topic words of discriminative characteristic belong to. Subsequently, the keyword indicators of questions used to set the labeling candidate index, and the most appropriate index indicator is set as the label for the topic when considering the topic-specific word distribution. Third, using L-LDA, which sets the cover letter and label as pass and fail, we generate topics and probabilities for each field of pass and fail labels. Furthermore, we extract only words of discriminative characteristics that give labeled topics among generated topics and probabilities by pass and fail labels. Next, we extract the difference between the probability on the pass label and the probability on the fail label by word of the labeled discriminative characteristic. A positive figure can be seen as having the polarity of pass, and a negative figure can be seen as having the polarity of fail. This study is the first research to reflect the characteristics of cover letters of Republic of Korea Air Force non-commissioned officer applicants, not in the private sector. Moreover, these methodologies can apply text mining techniques for multiple documents, rather survey or interview methods, to reduce analysis time and increase reliability for the entire population. For this reason, the methodology proposed in the study is also applicable to other forms of multiple documents in the field of military personnel. This study shows that L-LDA is more suitable than LDA to extract discriminative characteristics of Republic of Korea Air Force Noncommissioned cover letters. Furthermore, this study proposes a methodology that uses a combination of LDA and L-LDA. Therefore, through the analysis of the results of the acquisition of non-commissioned Republic of Korea Air Force officers, we would like to provide information available for acquisition and promotional policies and propose a methodology available for research in the field of military manpower acquisition.

저출산 문제로 인한 병역자원 감소와 병 복무기간 단축에 따른 군 간부 대비 병 복무 선호 현상은 우수한 군 간부확보정책에 대한 추가적인 고찰을 필요로 한다. 이와 관련된 연구들은 대부분 사회과학에서 주로 사용되는 방법론으로 분석하였으나, 본 연구는 대량의 문헌조사에 적합한 텍스트 마이닝의 방법론으로 접근한다. 이를 위해, 본 연구는 공군 부사관 지원자 자기소개서에서 차별적인 특성의 단어들을 추출하고 합격 및 불합격의 극성을 분석한다. 본 연구는 총 3단계로 이루어졌다. 첫번째, 지원분야를 일반분야와 기술분야로 나누고, 자기소개서에서 특성을 가지는 단어들을 분야별 빈도수 비율의 차이대로 순서화 한다. 각 지원분야별 비율의 차이가 클수록 해당 지원분야의 특성을 나타내는 것으로 정의하였다. 두번째, 이 특성을 나타내는 단어들을 LDA를 통해 단어들의 Topic을 군집화하고 이를 바탕으로 Label을 정의하였다. 세번째, 이 군집화 된 지원분야별 단어들을 L-LDA를 통해 합격과 불합격의 극성을 분석하였다. L-LDA값의 차이가 합격에 가까울수록 합격자들이 많이 사용하는 단어로 정의하였다. 본 연구를 통해, 공군 부사관 자기소개서의 차별적 특성을 추출하기에는 LDA보다 L-LDA가 더 적합함을 알 수 있다. 또한, 이러한 방법론은 별도의 서면 또는 대면 설문 방식이 아니라, 대량 문서에 대한 텍스트 마이닝 기법을 적용하여 분석시간을 단축하고, 전체 모집단에 대한 신뢰성을 높일 수 있다. 따라서 본 연구인 공군 부사관 선발결과 분석을 통해, 선발제도 및 홍보제도에 활용 가능한 정보를 제공하고, 군 인력획득 분야 연구에 있어 활용 가능한 방법론을 제안하고자 한다.

Keywords

References

  1. Allahyari, M., S. Pouriyeh, M. Assefi, S. Safaei, E. D. Trippe, J. B. Gutierrez, K. Kochut, "A Brief Survey of Text Mining: Classification, Clustering and Extraction Techniques," arXiv: 1707.02919v2(2015).
  2. Bae, S. H., X. Ku, C. Park, J. Ki, "A Latent Topic Modeling approach for Subject Summarization of Research on the Military Art and Science in South Korea," Korean Journal of Military Art and Science, Vol.76, No.2(2020), 181~216. https://doi.org/10.31066/KJMAS.2020.76.2.008
  3. Baek, S. Y., J. U. Leem, H. J. Kwon, "An Empirical Study on The Relationship Between Professional Soldiers Selection Variables and Job Satisfaction, Job Performance," Journal of Employment and Career, Vol.9, No.2(2019), 95-116. https://doi.org/10.35273/jec.2019.9.2.005
  4. Blei, D. M., A. Y. Ng, M. I. Jordan, "Latent dirichlet allocation," Journal of Machine Learning Research, Vol.3(2003), 993-1022.
  5. Dohkgoh, S., P. R. Kim, "The deepening of low birthrates and the issue of military manpower acquisition in developed countries," KIDA Defense Weekly, Vol.1652(2017).
  6. Jeon, G. W., I. Kang, J. H. Jeon, "Systematic Analysis on the Trend of Defense Technologies Using Topic Modeling : A Case of an Armoured Fighting Vehicle," The Journal of Business and Economics, Vol.36, NO.1(2020), 69-94.
  7. Kim, D. W., J. Y. Kang, J. I. Lim, "Comparative Analysis of Job Satisfaction Factors, Using LDA Topic Modeling by Industries : The Case Study of Job Planet Reviews," Journal of Information Technology Services, Vol.15, No.3(2016), 157-171. https://doi.org/10.9716/KITS.2016.15.3.157
  8. Kim, H. J., W. J. Kim, "A Study on Automatic Analysis System of National Defense Articles," Journal of the KIMST, Vol.21, No.1(2018), 86-93.
  9. Kim, H. K., "A Study on Teaching How to Write a Cover Letter for a Job," The Society Of Korean Literary Criticism, Vol.51(2014), 7-34.
  10. Kim, S. G., J. Y. Kang, "Analyzing the discriminative attributes of products using text mining focused on cosmetic reviews," Information Processing and Management, Vol.54, No.6(2018), 938-957. https://doi.org/10.1016/j.ipm.2018.06.003
  11. Kim, Y. S., H. S. Moon, J. K. Kim, "Self Introduction Essay Classification Using Doc2Vec for Efficient Job Matching," Journal of Information Technology Service, Vol.19, No.1(2020), 103-113. https://doi.org/10.9716/KITS.2020.19.1.103
  12. Lee, C. Y., H. S. Moon, "Study on analysis of North Korea's news trends associated with provocations using text mining," Journal of National Defence Studies, Vol.59, No.4(2016), 103-124.
  13. Lee, D. G., I. H. Kim, "An Analysis of Self-introduction Texts based on Statistical Text Analysis," Korean Cultural Studies, Vol.81(2018), 649-684. https://doi.org/10.17948/kcs.2018..81.649
  14. Lee, J. H., S. H. Jung, J. H. Kim, E. J. Min, U. Y. Yeo, J. W. Kim, "Product Evaluation Criteria Extraction through Online Review Analysis : Using LDA and k-Nearest Neighbor Approach," Journal of Intelligence and Information Systems, Vol.26, No.1(2020), 97-117.
  15. Lee, M. C., H. J. Kim, "Construction of Event Networks from Large News Data Using Text Mining Techniques," Journal of Intelligence and Information Systems, Vol.24, No.1(2018), 183-203. https://doi.org/10.13088/JIIS.2018.24.1.183
  16. Lim, S. S., M. G. Lee, "A study on military organizational tasks analysis methodology," The Korean Data and Information Science Society, Vol.30, No.1(2019), 139-157. https://doi.org/10.7465/jkdi.2019.30.1.139
  17. Moon, S. H., J. Y. Kang, "A study on detective story authors' style differentiation and style structure based on Text Mining," Journal of Intelligence and Information Systems, Vol.25, No.3(2019), 89-115. https://doi.org/10.13088/JIIS.2019.25.3.089
  18. Newman, D., J. H. Lau, K. Grieser, T. Baldwin, "Automatic evaluation of topic coherence," In Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics, (2010), 100-108.
  19. Oh, S. H., H. J. Kim, "A Study on the 'Low Fertility' Research Trends Using Text Mining Technique: Focusing on the Comparison with the Process of Low Fertility Policy," Health and Social Welfare Review, Vol.40, No.3 (2020), 492-533. https://doi.org/10.15709/HSWR.2020.40.3.492
  20. Ramage, D., D. Hall, R. Nallapati and C. D. Manning, "Labeled LDA: A supervised topic model for credit attribution in multi-labeled corpora," Proceedings of the 2009 conference on empirical methods in natural language processing, (2009).
  21. Shin, J. S., "A Study on Teaching Method of Self-introduction for Employment," A collection of Southeast Asian literature, Vol.40(2015), 83-113.
  22. Tan, A. H., "Text mining: The state of the art and the challenges," Proceedings of the PAKDD 1999 Workshop on Knowledge Discovery from Advanced Databases, (1999), 65-70.
  23. Teh, Y. W., M. I. Jordan, M. J. Beal and D. M. Blei, "Hierarchical Dirichlet Processes," Journal of the American Statistical Association, Vol.101, No.476(2006), 1566-1581. https://doi.org/10.1198/016214506000000302
  24. Yoon, S., S. Kim, K. Shin, "Development of the Accident Prediction Model for Enlisted Men through an Integrated Approach to Datamining and Textmining," Journal of Intelligence and Information Systems, Vol.21, No.3(2015), 1-17. https://doi.org/10.13088/JIIS.2015.21.3.01