Investigating Predictive Features for Authorship Verification of Arabic Tweets

  • 투고 : 2022.06.05
  • 발행 : 2022.06.30


The goal of this research is to look into different techniques to solve the problem of authorship verification for Arabic short writings. Despite the widespread usage of Twitter among Arabs, short text research has so far focused on authorship verification in languages other than Arabic, such as English, Spanish, and Greek. To the best of the researcher's knowledge, no study has looked into the task of verifying Arabic-language Twitter texts. The impact of Stylometric and TF-IDF features of very brief texts (Arabic Twitter postings) on user verification was explored in this study. In addition, an analytical analysis was done to see how meta-data from Twitter tweets, such as time and source, can help to verify users perform better. This research is significant on the subject of cyber security in Arabic countries.



  1. N. Roy, "Authorship Analysis as a Text Classification or Clustering Problem," 2019. [Online]. Available:
  2. O. Halvani, C. Winter, and A. Pflug, "Authorship verification for different languages, genres and topics," DFRWS 2016 EU - Proc. 3rd Annu. DFRWS Eur., vol. 16, pp. S33-S43, 2016.
  3. H. Azarbonyad, "Time-Aware Authorship Attribution for Short Text Streams," ACM, pp. 727-730, 2015.
  4. A. A. E. Ahmed, I. Traore, P. O. B. Stn, C. S. C. Victoria, and B. C. V. W. Canada, "Detecting Computer Intrusions Using Behavioral Biometrics," PST, 2005.
  5. N. Potha and E. Stamatatos, "An improved impostors method for authorship verification," Lect. Notes Comput. Sci. (including Subser. Lect. Notes Artif. Intell. Lect. Notes Bioinformatics), vol. 10456 LNCS, pp. 138-144, 2017.
  6. A. Altamimi, N. Clarke, and S. Furnell, "Multi-Platform Authorship Verification," Proc. Third Cent. Eur. Cybersecurity Conf. ACM, p. 13, 2019.
  7. S. Kumar, "Assessment on Stylometry for Multilingual Manuscript," IOSR J. Eng., vol. 02, no. 09, pp. 01-06, 2012.
  8. V. Keselj, F. Peng, N. Cercone, and C. Thomas, "N-gram-based author profiles for authorship attribution," in Proceedings of the conference pacific association for computational linguistics, PACLING, 2003, vol. 3, pp. 255-264.
  9. M. Al-Sarem, A. H. Emara, W. Cherif, M. Kissi, and A. A. Wahab, "Combination of stylo-based features and frequency-based features for identifying the author of short Arabic text," in ACM International Conference Proceeding Series, 2018.
  10. H. Ahmed, "The Role of Linguistic Feature Categories in Authorship Verification," Procedia Comput. Sci., vol. 142, pp. 214-221, 2018.
  11. H. Ahmed, "Dynamic Similarity Threshold in Authorship Verification: Evidence from Classical Arabic," Procedia Comput. Sci., vol. 117, no. 0, pp. 145-152, 2017.
  12. D. C. Castro, Y. A. Arcia, M. P. Brioso, and R. M. Guillena, "Authorship verification, average similarity analysis," Int. Conf. Recent Adv. Nat. Lang. Process. RANLP, vol. 2015-Janua, pp. 84-90, 2015.
  13. S. Ouamour, S. Khennouf, S. Bourib, H. Hadjadj, and H. Sayoud, "Effect of the text size on stylometry-application on Arabic religious texts," Adv. Intell. Syst. Comput., vol. 453, pp. 215-228, 2016.
  14. H. Ahmed, "Sample Size in Arabic Authorship Verification," pp. 1-8.
  15. H. Ahmed, "Distance-Based Authorship Verification Across Modern Standard Arabic Genres."
  16. A. Rabab'Ah, M. Al-Ayyoub, Y. Jararweh, and M. Aldwairi, "Authorship attribution of Arabic tweets," Proc. IEEE/ACS Int. Conf. Comput. Syst. Appl. AICCSA, pp. 1-6, 2017.
  17. O. Obeid et al., "CAMeL tools: An open source python toolkit for arabic natural language processing," Lr. 2020 - 12th Int. Conf. Lang. Resour. Eval. Conf. Proc., pp. 7022-7032, 2020.
  18. J. S. Li, L. Chen, P. Singh, and C. C. Tappert, "SPECIAL ISSUE PAPER A comparison of classi fi ers and features for authorship authentication of social networking messages," no. August 2016, pp. 1-15, 2017.
  19. R. Zheng, Y. Qin, Z. Huang, and H. Chen, "Authorship analysis in cybercrime investigation," Lect. Notes Comput. Sci. (including Subser. Lect. Notes Artif. Intell. Lect. Notes Bioinformatics), vol. 2665, pp. 59-73, 2003.
  20. R. Zheng, J. Li, H. Chen, and Z. Huang, "A Framework for Authorship Identification of Online Messages : Writing-Style Features and," vol. 57, no. 3, pp. 378-393, 2006.
  21. M. Al-Ayyoub, A. Alwajeeh, and I. Hmeidi, "An extensive study of authorship authentication of Arabic articles," Int. J. Web Inf. Syst., vol. 13, no. 1, pp. 85-104, 2017.
  22. M. Ikonomakis, S. Kotsiantis, and V. Tampakas, "Text classification using machine learning techniques," WSEAS Trans. Comput., vol. 4, no. 8, pp. 966-974, 2005.
  23. O. Halvani, L. Graner, R. Regev, and P. Marquardt, "An Improved Topic Masking Technique for Authorship Analysis," pp. 1-20, 2020.
  24. R. Kaur, S. Singh, and H. Kumar, "AuthCom: Authorship verification and compromised account detection in online social networks using AHP-TOPSIS embedded profiling based technique," Expert Syst. Appl., vol. 113, pp. 397-414, 2018.
  25. F. Johansson, L. Kaati, and A. Shrestha, "Timeprints for identifying social media users with multiple aliases," Secur. Inform., vol. 4, no. 1, p. 7, 2015.
  26. I. Mishra, S. Dongre, Y. Kanwar, and J. Prakash, "Detecting Users with Multiple Aliases on Twitter," in 2018 8th International Conference on Cloud Computing, Data Science & Engineering (Confluence), 2018, pp. 560-563.
  27. K. S. Hussein, "Authorship verification in Arabic using function words: A controversial case study of imam Ali's book peak of eloquence," Int. J. Humanit. Arts Comput., vol. 13, no. 1-2, pp. 223-248, 2019.