[KSCI] Korea Science Citation Index Service

http://dx.doi.org/10.3743/KOSIM.2022.39.1.091

A Deep Learning-based Depression Trend Analysis of Korean on Social Media

Park, Seojeong (Department of Library and Information Science, Yonsei University)
Lee, Soobin (Department of Library and Information Science, Yonsei University)
Kim, Woo Jung (Department of Psychiatry, Yongin Severance Hospital, Yonsei University College of Medicine)
Song, Min (Department of Library and Information Science, Yonsei University)

Publication Information

Journal of the Korean Society for information Management / v.39, no.1, 2022 , pp. 91-117 More about this Journal

Abstract

The number of depressed patients in Korea and around the world is rapidly increasing every year. However, most of the mentally ill patients are not aware that they are suffering from the disease, so adequate treatment is not being performed. If depressive symptoms are neglected, it can lead to suicide, anxiety, and other psychological problems. Therefore, early detection and treatment of depression are very important in improving mental health. To improve this problem, this study presented a deep learning-based depression tendency model using Korean social media text. After collecting data from Naver KonwledgeiN, Naver Blog, Hidoc, and Twitter, DSM-5 major depressive disorder diagnosis criteria were used to classify and annotate classes according to the number of depressive symptoms. Afterwards, TF-IDF analysis and simultaneous word analysis were performed to examine the characteristics of each class of the corpus constructed. In addition, word embedding, dictionary-based sentiment analysis, and LDA topic modeling were performed to generate a depression tendency classification model using various text features. Through this, the embedded text, sentiment score, and topic number for each document were calculated and used as text features. As a result, it was confirmed that the highest accuracy rate of 83.28% was achieved when the depression tendency was classified based on the KorBERT algorithm by combining both the emotional score and the topic of the document with the embedded text. This study establishes a classification model for Korean depression trends with improved performance using various text features, and detects potential depressive patients early among Korean online community users, enabling rapid treatment and prevention, thereby enabling the mental health of Korean society. It is significant in that it can help in promotion.

Keywords

topic modeling; deep learning; sentiment analysis; social media;

Citations & Related Records

Reference

1	Guntuku, S. C., Yaden, D. B., Kern, M. L., Ungar, L. H., & Eichstaedt, J. C. (2017). Detecting depression and mental illness on social media: an integrative review. Current Opinion in Behavioral Sciences, 18, 43-49. https://doi.org/10.1016/j.cobeha.2017.07.005 DOI
2	KNU Korean Emotion Dictionary (2018, November 5). Available: https://github.com/park1200656/KnuSentiLex
3	Schwartz, H. A., Eichstaedt, J., Kern, M., Park, G., Sap, M., Stillwell, D., Kosinski, M., & Ungar, L. (2014). Towards assessing changes in degree of depression through facebook. In Proceedings of the Workshop on Computational Linguistics and Clinical Psychology: From Linguistic Signal to Clinical Reality, 118-125. https://doi.org/10.3115/v1/w14-3214 DOI
4	Athiwaratkun, B., Wilson, A. G., & Anandkumar, A. (2018). Probabilistic fasttext for multi-sense word embeddings. arXiv. https://doi.org/10.48550/arXiv.1806.02901
5	Yin, Z. & Shen, Y. (2018). On the dimensionality of word embedding. arXiv preprint arXiv:1812.04224. https://doi.org/10.48550/arXiv.1812.04224
6	Zhang, L., Huang, X., Liu, T., Li, A., Chen, Z., & Zhu, T. (2014). Using linguistic features to estimate suicide probability of Chinese microblog users. In International Conference on Human Centered Computing, 549-559. Springer, Cham. https://doi.org/10.1007/978-3-319-15554-8_45 DOI
7	Zhao, J., Zhou, Y., Li, Z., Wang, W., & Chang, K. W. (2018). Learning gender-neutral word embeddings. arXiv preprint arXiv:1809.01496. https://doi.org/10.18653/v1/d18-1521
8	Aizawa, A. (2003). An information-theoretic perspective of tf-idf measures. Information Processing & Management, 39(1), 45-65. http://doi.org/10.1109/ICHI.2018.00058 DOI
9	Alessa, A., Faezipour, M., & Alhassan, Z. (2018). Text classification of flu-related tweets using fasttext with sentiment and keyword features. In 2018 Institute of Electrical and Electronics Engineers International Conference on Healthcare Informatics (ICHI), 366-367. http://doi.org/10.1109/ICHI.2018.00058 DOI
10	Blondel, V. D., Guillaume, J. L., Lambiotte, R., & Lefebvre, E. (2008). Fast unfolding of communities in large networks. Journal of Statistical Mechanics: Theory and Experiment, 2008(10), P10008. https://doi.org/10.1088/1742-5468/2008/10/P10008 DOI
11	Callon, M., Courtial, J. P., Turner, W. A., & Bauin, S. (1983). From translations to problematic networks: An introduction to co-word analysis. Social Science Information, 22(2), 191-235. https://doi/org/10.1177/053901883022002003 DOI
12	Lilleberg, J., Zhu, Y., & Zhang, Y. (2015). Support vector machines and word2vec for text classification with semantic features. In 2015 IEEE 14th International Conference on Cognitive Informatics & Cognitive Computing (ICCI* CC), 136-140. Institute of Electrical and Electronics Engineers. https://doi.org/10.1109/ICCI-CC.2015.7259377 DOI
13	Conway, M. & O'Connor, D. (2016). Social media, big data, and mental health: current advances and ethical implications. Current Opinion in Psychology, 9, 77-82. https://doi.org/10.1016/j.copsyc.2016.01.004 DOI
14	Coppersmith, G., Dredze, M., Harman, C., Hollingshead, K., & Mitchell, M. (2015). CLPsych 2015 shared task: Depression and PTSD on Twitter. In Proceedings of the 2nd Workshop on Computational Linguistics and Clinical Psychology: From Linguistic Signal to Clinical Reality, 31-39. https://doi.org/10.3115/v1/w15-1204 DOI
15	De Choudhury, M., Kiciman, E., Dredze, M., Coppersmith, G., & Kumar, M. (2016). Discovering shifts to suicidal ideation from mental health content in social media. In Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems, 2098-2110. https://doi.org/10.1145/2858036.2858207 DOI
16	Al Essa, A. (2018). Efficient Text Classification with Linear Regression Using a Combination of Predictors for Flu Outbreak Detection. Doctoral dissertation, University of Bridgeport.
17	Blei, D. M., Ng, A. Y., & Jordan, M. I. (2003). Latent dirichlet allocation. The Journal of Machine Learning Research, 3, 993-1022. https://doi.org/10.1016/b978-0-12-411519-4.00006-9 DOI
18	Lalithamani, N., Thati, L. S., & Adhikesavan, R. (2014). Sentence level sentiment polarity calculation for customer reviews by considering complex sentential structures. IJRET: International Journal of Research in Engineering and Technology, 3(3), 433-438. https://doi.org/10.15623/ijret.2014.0303081 DOI
19	Liang, H., Sun, X., Sun, Y., & Gao, Y. (2017). Text feature extraction based on deep learning: a review. EURASIP Journal on Wireless Communications and Networking, 2017(1), 1-12. https://doi.org/10.1186/s13638-017-0993-1 DOI
20	Liu, G. & Guo, J. (2019). Bidirectional LSTM with attention mechanism and convolutional layer for text classification. Neurocomputing, 337, 325-338. https://doi.org/10.1016/j.neucom.2019.01.078 DOI
21	Martin, L., Muller, B., Suarez, P. J. O., Dupont, Y., Romary, L., de la Clergerie, E. V., Seddah, D., & Sagot, B. (2019). Camembert: a tasty french language model. https://doi.org/10.18653/v1/2020.acl-main.645
22	Resnik, P., Garron, A., & Resnik, R. (2013). Using topic modeling to improve prediction of neuroticism and depression in college students. In Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, 1348-1353. url: https://www.aclweb.org/anthology/D13-1133
23	Nam, K. K., Ackerman, M. S., & Adamic, L. A. (2009). Questions in, knowledge in? A study of Naver's question answering community. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, 779-788. https://doi.org/10.1145/1518701.1518821 DOI
24	Pasupa, K. & Ayutthaya, T. S. N. (2019). Thai sentiment analysis with deep learning techniques: A comparative study based on word embedding, POS-tag, and sentic features. Sustainable Cities and Society, 50, 101615. https://doi.org/10.1016/j.scs.2019.101615 DOI
25	Pennington, J., Socher, R., & Manning, C. D. (2014). Glove: Global vectors for word representation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), 1532-1543. https://doi.org/10.3115/v1/D14-1162 DOI
26	Yun-tao, Z., Ling, G., & Yong-cheng, W. (2005). An improved TF-IDF approach for text classification. Journal of Zhejiang University-Science A, 6(1), 49-55. https://doi.org/10.1007/BF02842477 DOI
27	Resnik, P., Armstrong, W., Claudino, L., Nguyen, T., Nguyen, V. A., & Boyd-Graber, J. (2015). Beyond LDA: exploring supervised topic modeling for depression-related language in Twitter. In Proceedings of the 2nd Workshop on Computational Linguistics and Clinical Psychology: From Linguistic Signal to Clinical Reality, 99-107. https://doi.org/10.3115/v1/w15-1212 DOI
28	Lim, J. H., Kim, H. K., & Kim, Y. K. (2020). Recent R&D trends for pretrained language model. Electronics and Telecommunications Trends, 35(3), 9-19. https://doi.org/10.22648/ETRI.2020.J.350302 DOI
29	Moon, E. & Han, S. (2011). A qualitative method to find influencers using similarity-based approach in the blogosphere. International Journal of Social Computing and Cyber-Physical Systems, 1(1), 56-78. https://doi.org/10.1504/ijsccps.2011.043604 DOI
30	Petterson, J., Smola, A. J., Caetano, T. S., Buntine, W. L., & Narayanamurthy, S. M. (2010). Word features for latent dirichlet allocation. In NIPS, 1921-1929. https://doi.org/10.1.1.942.7045
31	Lee G. (2019). Korean Ebedding. Korea: Acorn Publishing.
32	Ruas, T., Ferreira, C. H. P., Grosky, W., de Franca, F. O., & de Medeiros, D. M. R. (2020). Enhanced word embeddings using multi-semantic representation through lexical chains. Information Sciences, 532, 16-32. https://doi.org/10.1016/j.ins.2020.04.048 DOI
33	Coppersmith, G., Dredze, M., & Harman, C. (2014, June). Quantifying mental health signals in Twitter. In Proceedings of the Workshop on Computational Linguistics and Clinical Psychology: From Linguistic Signal to Clinical Reality, 51-60. https://doi.org/10.3115/v1/w14-3207 DOI
34	Wang, Z. Y., Li, G., Li, C. Y., & Li, A. (2012). Research on the semantic-based co-word analysis. Scientometrics, 90(3), 855-875. https://doi.org/10.1007/s11192-011-0563-y DOI
35	Devlin, J., Chang, M. W., Lee, K., & Toutanova, K. (2018). Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv. http://arxiv.org/abs/1810.04805
36	Kim Y. (2014). Convolutional neural networks for sentence classification. EMNLP2014-2014 Conference on Empirical Methods in Natural Language Processig, Association for Computational Linguistics, 1746-1751. https://doi.org/10.3115/v1/d14-1181 DOI
37	Cheng, C. H. & Chen, H. H. (2019). Sentimental text mining based on an additional features method for text classification. PloS One, 14(6), e0217591. https://doi.org/10.1371/journal.pone.0217591 DOI
38	Orabi, A. H., Buddhitha, P., Orabi, M. H., & Inkpen, D. (2018). Deep learning for depression detection of twitter users. In Proceedings of the Fifth Workshop on Computational Linguistics and Clinical Psychology: From Keyboard to Clinic, 88-97. https://doi.org/10.18653/v1/W18-0609 DOI
39	Chronis, G. & Erk, K. (2020). When is a bishop not like a rook? When it's like a rabbi! Multi-prototype BERT embeddings for estimating semantic relationships. In Proceedings of the 24th Conference on Computational Natural Language Learning, 227-244. https://doi.org/10.18653/v1/2020.conll-1.17 DOI
40	Tadesse, M. M., Lin, H., Xu, B., & Yang, L. (2019). Detection of depression-related posts in reddit social media forum. IEEE(Institute of Electrical and Electronics Engineers) Access, 7, 44883-44893. https://doi.org/10.1109/ACCESS.2019.2909180 DOI
41	Mowery, D., Smith, H., Cheney, T., Stoddard, G., Coppersmith, G., Bryan, C., & Conway, M. (2017). Understanding depressive symptoms and psychosocial stressors on Twitter: a corpus-based study. Journal of Medical Internet Research, 19(2), e48. https://doi.org/10.2196/jmir.6895 DOI
42	World Health Organization (2020). Available: https://www.who.int/health-topics/depression#tab=tab_1
43	Trotzek, M., Koitka, S., & Friedrich, C. M. (2018). Early detection of depression based on linguistic metadata augmented classifiers revisited. In International Conference of the Cross-Language Evaluation Forum for European Languages, 191-202. Springer, Cham. https://doi.org/10.1007/978-3-319-98932-7_18 DOI
44	Tsugawa, S., Kikuchi, Y., Kishino, F., Nakajima, K., Itoh, Y., & Ohsaki, H. (2015). Recognizing depression from twitter activity. In Proceedings of the 33rd annual ACM conference on Human Factors in Computing Systems, 3187-3196. https://doi.org/10.1145/2702123.2702280 DOI
45	Turney, P. D. & Pantel, P. (2010). From frequency to meaning: Vector space models of semantics. Journal of Artificial Intelligence Research, 37, 141-188. https://doi.org/10.1613/jair.2934 DOI
46	Friedrich, M. J. (2017). Depression is the leading cause of disability around the world. Jama, 317(15), 1517-1517. https://doi.org/10.1001/jama.2017.3826 DOI
47	Qaiser, S. & Ali, R. (2018). Text mining: use of TF-IDF to examine the relevance of words to documents. International Journal of Computer Applications, 181(1), 25-29. https://doi.org/10.5120/ijca2018917395 DOI

KSCI

A Deep Learning-based Depression Trend Analysis of Korean on Social Media 딥러닝 기반 소셜미디어 한글 텍스트 우울 경향 분석

A Deep Learning-based Depression Trend Analysis of Korean on Social Media