DOI QR코드

DOI QR Code

Vulnerability Threat Classification Based on XLNET AND ST5-XXL model

  • Chae-Rim Hong (Department of AI & Bigdata, aSSIST University) ;
  • Jin-Keun Hong (Div. of Advanced IT, Baekseok University)
  • Received : 2024.06.25
  • Accepted : 2024.07.05
  • Published : 2024.08.31

Abstract

We provide a detailed analysis of the data processing and model training process for vulnerability classification using Transformer-based language models, especially sentence text-to-text transformers (ST5)-XXL and XLNet. The main purpose of this study is to compare the performance of the two models, identify the strengths and weaknesses of each, and determine the optimal learning rate to increase the efficiency and stability of model training. We performed data preprocessing, constructed and trained models, and evaluated performance based on data sets with various characteristics. We confirmed that the XLNet model showed excellent performance at learning rates of 1e-05 and 1e-04 and had a significantly lower loss value than the ST5-XXL model. This indicates that XLNet is more efficient for learning. Additionally, we confirmed in our study that learning rate has a significant impact on model performance. The results of the study highlight the usefulness of ST5-XXL and XLNet models in the task of classifying security vulnerabilities and highlight the importance of setting an appropriate learning rate. Future research should include more comprehensive analyzes using diverse data sets and additional models.

Keywords

References

  1. A. Chowdhery, S. Narang, J. Devlin, M. Bosma, G. Mishra, et al., PaLM: Scaling Language Modeling with Pathways," pp. 1-87, Apr. 2022. https://arxiv.org/abs/2204.02311. DOI: https://doi.org/10.48550/arXiv.2204.02311
  2. J. Ni, G. H. Abrego, N. Constant, J. Ma, K. Hall, D. Cer, Y. Yang, "Sentence-T5: Scalable Sentence Encoder s from Pre-trained Text-to-Text Models," in Proc. ACL, pp. 1864-1874, May 22-27, 2022. https://aclanthology.org/2022.findings-acl.146. DOI: http://doi.org/10.18653/v1/2022.findings-acl.146.
  3. S. Kaniovski, S. Kurz, "Representaiton-Compatible Power Indices," Springer, Vol. 264, No. 1, pp. 235-265, May 2018. https://arxiv.org/abs/1506.05963. DOI: http://doi.org/10.1007/s10479-017-2672-3.
  4. K. He, X. Zhang, S. Ren, J. Sun, "Deep Residual Learning for Image Recognition," pp. 1-12, Dec. 2015. https://arxiv.org/abs/1512.03385. DOI: https://doi.org/10.48550/arXiv.1512.03385.
  5. F. Ramoliya, R. Kakkar, R. Gupta, S. Tanwar, S. Agrawai, "SEAM: Deep Learning-based Secure Message Exchange Framework for Autonomous EVs," in Proc. IEEE Globecom Workshops, pp. 80-85, March 21, 2023. DOI: https://doi.org/10.1109/GCWkshps58843.2023.10465168.
  6. J. Ni, G. H. Abrego, N. Constant, J. Ma, K. Hall, aD. Cer, Y. Yang, "Sentence-T5: Scalable Encoders from Pretrained Text-to-Text Models," in Proc. ACL, pp.1864-1874, May 22-27, 2022.
  7. A. Vaswani, N. Shazeer , N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, I. Polosukhin, "Attention is All You Need," in Proc. 31st conference on NIPS, pp. 1-15, Aug. 2023. https://arxiv.org/abs/1706.03762. DOI: http://doi.org/10.48550/arXiv.1706.03762.
  8. https://www.kaggle.com/datasets/ramoliyafenil/text-based-cyber-threat-detection
  9. S.-S. Kim, "Deep Learning-Based Inverse Design for Engineering Systems: A Study on Supervised and Unsupervised Learning Models," Journal of Internet, Broadcasting and Communication, Vol. 16, No. 2, pp. 127-135, May 2024. DOI: http://dx.doi.org/10.7236/IJIBC.2024.16.2.127.
  10. Z. Yang, Z. Dai, Y. Yang, J. Carbonell, R. Salahutdinov, Q. V. Le, "XLNet: Generalized Autoregressive Pretraining for Language Understanding," in Proc. 33rd conference on NeurIPS, pp. 1-18, Jan. 2020. https://arxiv.org/abs/1906.08237v2. DOI: https://doi.org/10.48550/arXiv.1906.08237.