Browse > Article
http://dx.doi.org/10.5626/JCSE.2012.6.2.143

Topic Classification for Suicidology  

Read, Jonathon (Language Technology Group, Department of Informatics, University of Oslo)
Velldal, Erik (Language Technology Group, Department of Informatics, University of Oslo)
Ovrelid, Lilja (Language Technology Group, Department of Informatics, University of Oslo)
Publication Information
Journal of Computing Science and Engineering / v.6, no.2, 2012 , pp. 143-150 More about this Journal
Abstract
Computational techniques for topic classification can support qualitative research by automatically applying labels in preparation for qualitative analyses. This paper presents an evaluation of supervised learning techniques applied to one such use case, namely, that of labeling emotions, instructions and information in suicide notes. We train a collection of one-versus-all binary support vector machine classifiers, using cost-sensitive learning to deal with class imbalance. The features investigated range from a simple bag-of-words and n-grams over stems, to information drawn from syntactic dependency analysis and WordNet synonym sets. The experimental results are complemented by an analysis of systematic errors in both the output of our system and the gold-standard annotations.
Keywords
Affect recognition; Sentiment analysis; Skewed class distribution; Text classification;
Citations & Related Records
연도 인용수 순위
  • Reference
1 B. Pang, L. Lee, and S. Vaithyanathan, "Thumbs up? sentiment classification using machine learning techniques," Proceedings of the Conference on Empirical Methods in Natural Language Processing, Philadelphia, PA, 2002, pp. 79-86.
2 K. McCarthy, B. Zabar, and G. Weiss, "Does cost-sensitive learning beat sampling for classifying rare classes?" Proceedings of the 1st International Workshop on Utility-Based Data Mining, Chicago, IL, 2005, pp. 69-77.
3 K. Morik, P. Brockhausen, and T. Joachims, "Combining statistical learning with a knowledge-based approach: a case study in intensive care monitoring." Proceedings of the 16th International Conference on Machine Learning, Bled, Slovenia, 1999, pp. 268-277.
4 Y. Xu, Y. Wang, J. Liu, Z. Tu, J. T. Sun, J. Tsujii, and E. Chang, "Suicide note sentiment classification: a supervised approach augmented by web data," Biomedical Informatics Insights, vol. 5, no. Suppl 1, pp. 31-41, 2012.
5 H. Yang, A. willis, A. de Roeck, and B. Nuseibeh, "A hybrid model for automatic emotion recognition in suicide notes," Biomedical Informatics Insights, vol. 5, no. Suppl 1, pp. 17-30, 2012.
6 S. Sohn, M. Torii, D. Li, K. Wagholikar, S. Wu, and H. Liu, "A hybrid approach to sentiment sentence classification in suicide notes," Biomedical Informatics Insights, vol. 5, no. Suppl 1, pp. 43-50, 2012.
7 C. Cherry, S. M. Mohammad, and B. de Bruijn, "Binary classifiers and latent sequence models for emotion detection in suicide notes," Biomedical Informatics Insights, vol. 5, no. Suppl 1, pp. 147-154, 2012.
8 J. Read and J. Carroll, "Weakly supervised techniques for domain-independent sentiment classification," Proceedings of the 1st International CIKM Workshop on Topic-Sentiment Analysis for Mass Opinion, Hong Kong, 2009, pp. 45-52.
9 K. B. Duan and S. S. Keerthi, "Which is the best multiclass SVM method? An empirical study," Proceedings of the 6th International Workshop on Multiple Classifier Systems, Seaside, CA, 2005, pp. 278-285.
10 V. N. Vapnik, The Nature of Statistical Learning Theory, New York: Springer, 1995.
11 S. Bird and E. Loper, "NLTK: the natural language toolkit," Proceedings of the ACL on Interactive Poster and Demonstration Sessions, Barcelona, Spain, article no. 31, 2004.
12 T. Joachims, "Text categorization with support vector machines: learning with many relevant features," Proceedings of the 10th European Conference on Machine Learning, Chemnitz, Germany, 1998, pp. 137-142.
13 T. Joachims, "Making large-scale support vector machine learning practical," Advances in Kernel Methods: Support Vector Learning, Cambridge: MIT Press, 1999. pp. 169-184.
14 M. F. Porter, "An algorithm for suffix stripping," Program: Electronic Library and Information Systems, vol. 14, no. 3, pp. 130-137, 1980.   DOI
15 H. Schmid, "Probabilistic part-of-speech tagging using decision trees," Proceedings of the International Conference on New Methods in Language Processing, Manchester, UK, 1994, pp. 44-49.
16 J. Nivre, J. Hall, J. Nilsson, G. Eryigit, and S. Marinov, "Labeled pseudo-projective dependency parsing with support vector machines," Proceedings of the 10th Conference on Computational Natural Language Learning, New York, NY, 2006, pp. 221-225.
17 R. Johansson and P. Nugues, "Extended constituent-todependency conversion for English," Proceedings of the 16th Nordic Conference of Computational Linguistics, Tartu, Estonia, 2007, pp. 105-112.
18 C. Fellbaum, WordNet: An Electronic Lexical Database, Cambridge: MIT Press, 1998.
19 World Health Organization, Suicide prevention (SUPRE) [Internet]. http://www.who.int/mental_health/prevention/suicide/suicideprevent/en/.
20 H. Hjelmeland and B. L. Knizek, "Why we need qualitative research in suicidology," Suicide and Life-Threatening Behavior, vol. 40, no. 1, pp. 74-80, 2010.   DOI
21 P. Katz, M. Singleton, and R. Wicentowski, "SWAT-MP: the SemEval-2007 systems for task 5 and task 14," Proceedings of the 4th International Workshop on Semantic Evaluations, Prague, Czech, 2007, pp. 308-313.
22 J. P. Pestian, P. Matykiewicz, M. Linn-Gust, B. South, O. Uzuner, J. Wiebe, K. B. Cohen, J. Hurdle, and C. Brew, "Sentiment analysis of suicide notes: a shared task," Biomedical Informatics Insights, vol. 5, no. Suppl 1, pp. 3-16, 2012.
23 C. Strapparava and R. Mihalcea, "SemEval-2007 task 14: affective text," Proceedings of the 4th International Workshop on Semantic Evaluations, Prague, Czech, 2007, pp. 70-74.
24 P. Ekman, "Biological and cultural contributions to body and facial movement," The Anthropology of the Body, New York: Academic Press, 1977. pp. 39-84.
25 Z. Kozareva, B. Navarro, S. Vazquez, and A. Montoyo, "UA-ZBSA: a headline emotion classification through web information," Proceedings of the 4th International Workshop on Semantic Evaluations, Prague, Czech, 2007, pp. 334-337.
26 F. R. Chaumartin, "UPAR7: a knowledge-based system for headline sentiment tagging," Proceedings of the 4th International Workshop on Semantic Evaluations, Prague, Czech, 2007, pp. 422-425.
27 C. Strapparava and A. Valitutti, "WordNet-affect: an affective extension of WordNet," Proceedings of the 4th International Conference on Language Resources and Evaluation, Lisbon, Portugal, 2004, pp. 1083-1086.
28 C. Strapparava and R. Mihalcea, "Annotating and identifying emotions in text," Intelligent Information Access. Studies in Computational Intelligence vol. 301, Heidelberg:Springer Berlin, 2010, pp. 21-38.   DOI