DOI QR코드

DOI QR Code

Segmenting Chinese Texts into Words for Semantic Network Analysis

  • Danowski, James A. (Department of Communication, University of Illinois at Chicago)
  • Published : 2017.12.31

Abstract

Unlike most languages, written Chinese has no spaces between words. Word segmentation must be performed before semantic network analysis can be conducted. This paper describes how to perform Chinese word segmentation using the Stanford Natural Language Processing group's Stanford Word Segmenter v. 3.8.0, released in June 2017.

Keywords

References

  1. Carley, K. (1993). Coding choices for textual analysis: a comparison of content analysis and map analysis. Sociological Methodology 23: 75-126. https://doi.org/10.2307/271007
  2. Carley, K. (1997a). Extracting team mental models through textual analysis. Journal of Organizational Behavior 18: 533-558. https://doi.org/10.1002/(SICI)1099-1379(199711)18:1+<533::AID-JOB906>3.3.CO;2-V
  3. Carley, K. (1997b.) Network text analysis: The network position of concepts. in text analysis for the social sciences, Carl W. Roberts (ed). Mahwah, NJ: Lawrence Erlbaum Associates, Inc. Pp. 79-102.
  4. Corman, S., Kuhn, T., McPhee, R. and Dooley, K. (2001). Studying complex discursive systems: Centering resonance analysis of communication. Human Communication Research 28(2): 157-206.
  5. Danowski, J. A. (1982). A network-based content analysis methodology for computer-mediated communication: An illustration with a computer bulletin board, in M. Burgoon (Ed.), Communication Yearbook 5 (pp. 904-925). New Brunswick, NJ: Transaction Books.
  6. Danowski, J. A. (1993a). WORDIJ: A word pair approach to information retrieval. Proceedings of the DARPA/NIST TREC Conference (pp. 131-136.) Washington, DC: National Institute of Standards and Technology.
  7. Danowski, J. A. (1993b). Network analysis of message content. G. Barnett, & W. Richards (eds.). Progress in communication sciences XII (pp. 197-222). Norwood, NJ: Ablex.
  8. Danowski, J.A. (2009). Network analysis of message content. In K. Krippendorff & M. Bock (Eds.) The content analysis reader (pp. 421-430). Sage Publications.
  9. Danowski, J. A. (2012a, August). Semantic network analysis of islamist sources using time slices as nodes and semantic similarity as link strengths: Some implications for propaganda analysis about jihad. In Intelligence and Security Informatics Conference (EISIC), 2012 European (pp. 164-171). IEEE.
  10. Danowski, J. A. (2012b). Analyzing change over time in organizations' publics with a semantic network include list: An illustration with Facebook. In Advances in Social Networks Analysis and Mining (ASONAM), 2012 IEEE/ACM International Conference on (pp. 954-959). IEEE.
  11. Danowski, J. A. (2017a). WORDij version 4.0: 64-bit. [computer program]. Madison, WI: Communication and Technology Sciences. (https://wordij.net)
  12. Danowski, J. A. (2017b). Creating Network-Based Communication Interventions to Increase Community Resilience: A Demonstration for an African Nation Recovering from MuslimChristian Civil War. Presented at the 1st North American Social Networks Conference NASN2017, Washington, DC, 26-30 July 2017.
  13. Danowski, J. A., & Cepela, N. (2010). Automatic mapping of social networks of actors from text corpora: Time series analysis. In Data Mining for Social Network Data (pp. 31-46). Springer US.
  14. Danowski, J. A., & Park, H. W. (2014). Arab Spring effects on meanings for islamist web terms and on web hyperlink networks among muslim-majority nations: A naturalistic field experiment. Journal of Contemporary Eastern Asia, 13(2).
  15. Lafferty, J.; McCallum, A.; and Pereira, F. C. (2001). Conditional random fields: Probabilistic models for segmenting and labeling sequence data.
  16. Lewis, E. T., Carley, K. M., & Diesner, J. (2016). Displaying responsiveness or asserting identity in organizational language: how concept networks capture rhetorical strategies. Center for the Computational Analysis of Social and Organizational Systems.
  17. Pei, W., Ge, T., & Chang, B. (2014, June). Max-margin tensor neural network for chinese word segmentation. In ACL (1)(pp. 293-303).
  18. Peng, F., Feng, F., & McCallum, A. (2004, August). Chinese segmentation and new word detection using conditional random fields. In Proceedings of the 20th international conference on Computational Linguistics (p. 562). Association for Computational Linguistics.
  19. Tseng, H., Chang, P., Andrew, G., Jurafsky, D., & Manning, D. ( 2005). A conditional random field word segmenter for SIGHAN Bakeoff 2005. In Proceedings of the 4th SIGHAN Workshop on Chinese Language Processing, volume 171. Jeju Island, Korea
  20. Wang, X., Utiyama, M., Finch, A. M., & Sumita, E. (2014, June). Empirical Study of Unsupervised Chinese Word Segmentation Methods for SMT on Large-scale Corpora. In ACL (2) (pp. 752-758).
  21. Yang, S., & Gonzalez-Bailon, S. (2017). Semantic networks and applications in public opinion research. The Oxford Handbook of Political Networks, 327.
  22. Yuan, E, J. Feng, M. & Danowski, J, A. (2013). Privacy in semantic networks on chinese social media: The case of Sina Weibo. Journal of Communication 63 (2013) 1011-1031. https://doi.org/10.1111/jcom.12058
  23. Zywica, J., & Danowski, J. (2008). The faces of Facebookers: Investigating social enhancement and social compensation hypotheses; predicting Facebook$^{TM}$ and offline popularity from sociability and self‐esteem, and mapping the meanings of popularity with semantic networks. Journal of Computer‐Mediated Communication, 14(1), 1-34.