텍스트 마이닝 기법을 활용한 고전 추리 소설 작가 간 문체적 차이와 문체 구조에 대한 연구

A study on detective story authors' style differentiation and style structure based on Text Mining

  • 문석형 (아주대학교 e-비즈니스학과) ;
  • 강주영 (아주대학교 e-비즈니스학과)
  • 투고 : 2019.06.19
  • 심사 : 2019.09.17
  • 발행 : 2019.09.30


본 연구는 고전 추리 소설 작가로 유명한 아서 코난 도일과 애거서 크리스티의 문체적 차이점을 데이터 분석을 통해 제시하고, 나아가 텍스트 마이닝에 입각한 문체 연구의 해석적 방법론을 제시하고자 시행되었다. 추리 소설의 핵심 요소인 사건과 인물에 더해 작가의 문법적인 집필 방식을 문체로 정의하고 분석을 시도하였다. 작가 별로 각 2권, 총 4권의 책을 선정하였으며 문장 단위로 텍스트를 나누어 데이터를 확보하였다. 각 문장에 따른 감성 점수를 부여한 뒤 페이지 진행에 따른 감성을 시각화하였으며, 페이지에 따라 토픽 모델링을 적용하여 소설 속 사건 진행 흐름을 파악할 수 있었다. 동시 발생 매트릭스(co-occurrence matrix)를 구성하고 네트워크 분석(Network Analysis)을 시행함으로써 사건이 진행되는 과정에서 인물들 간 관계의 변화를 확인할 수 있었다. 또한 전체 문장을 총 6가지 문체를 기준으로 문법적인 체계를 나누어 작가 간, 그리고 작품 간 집필 방식의 차이점을 확인하였다. 이러한 일련의 연구 과정은 문체에 대한 이해를 바탕으로 글 전체의 맥락을 파악할 수 있도록 도움을 줄 수 있으며, 나아가 기존에 개별적으로 진행되었던 문체 연구를 통합시킴으로써 문체 구조에 대한 이해를 도울 수 있다. 그리고 이러한 선행된 이해를 통해 온라인 텍스트를 비롯한 비정형 데이터 속 문체의 존재를 발견하고 구체화하는 작업에 기여할 수 있다. 뉴미디어를 포함한 온라인 텍스트를 심도 있게 분석하고자 하는 시도가 증가하고 있는 상황에서 해당 연구들과 연계를 통해 보다 의미 있는 온라인 텍스트 분석에 기여할 것으로 기대된다.

This study was conducted to present the stylistic differences between Arthur Conan Doyle and Agatha Christie, famous as writers of classical mystery novels, through data analysis, and further to present the analytical methodology of the study of style based on text mining. The reason why we chose mystery novels for our research is because the unique devices that exist in classical mystery novels have strong stylistic characteristics, and furthermore, by choosing Arthur Conan Doyle and Agatha Christie, who are also famous to the general reader, as subjects of analysis, so that people who are unfamiliar with the research can be familiar with them. The primary objective of this study is to identify how the differences exist within the text and to interpret the effects of these differences on the reader. Accordingly, in addition to events and characters, which are key elements of mystery novels, the writer's grammatical style of writing was defined in style and attempted to analyze it. Two series and four books were selected by each writer, and the text was divided into sentences to secure data. After measuring and granting the emotional score according to each sentence, the emotions of the page progress were visualized as a graph, and the trend of the event progress in the novel was identified under eight themes by applying Topic modeling according to the page. By organizing co-occurrence matrices and performing network analysis, we were able to visually see changes in relationships between people as events progressed. In addition, the entire sentence was divided into a grammatical system based on a total of six types of writing style to identify differences between writers and between works. This enabled us to identify not only the general grammatical writing style of the author, but also the inherent stylistic characteristics in their unconsciousness, and to interpret the effects of these characteristics on the reader. This series of research processes can help to understand the context of the entire text based on a defined understanding of the style, and furthermore, by integrating previously individually conducted stylistic studies. This prior understanding can also contribute to discovering and clarifying the existence of text in unstructured data, including online text. This could help enable more accurate recognition of emotions and delivery of commands on an interactive artificial intelligence platform that currently converts voice into natural language. In the face of increasing attempts to analyze online texts, including New Media, in many ways and discover social phenomena and managerial values, it is expected to contribute to more meaningful online text analysis and semantic interpretation through the links to these studies. However, the fact that the analysis data used in this study are two or four books by author can be considered as a limitation in that the data analysis was not attempted in sufficient quantities. The application of the writing characteristics applied to the Korean text even though it was an English text also could be limitation. The more diverse stylistic characteristics were limited to six, and the less likely interpretation was also considered as a limitation. In addition, it is also regrettable that the research was conducted by analyzing classical mystery novels rather than text that is commonly used today, and that various classical mystery novel writers were not compared. Subsequent research will attempt to increase the diversity of interpretations by taking into account a wider variety of grammatical systems and stylistic structures and will also be applied to the current frequently used online text analysis to assess the potential for interpretation. It is expected that this will enable the interpretation and definition of the specific structure of the style and that various usability can be considered.



  1. Blei, D. M., Ng, A. Y. and Jordan, M. I., "Latent Dirichlet Allocation", Journal of machine Learning research, Vol. 3, No. Jan, 2003, 993-1022.
  2. Borgatti, S. P., Mehra, A., Brass, D. J. and Labianca, G., "Network Analysis in the Social Sciences", science, Vol. 323, No. 5916, 2009, 892-895.
  3. Chae, S. H., Lim, J. I. and Kang, J., "A Comparative Analysis of Social Commerce and Open Market Using User Reviews in Korean Mobile Commerce", Journal of Intelligence and Information Systems, Vol. 21, No. 4, 2015, 53-77.
  4. Cho, H. J., Kang, J. and Jung, D. Y., "An Exploratory Study on Mobile App Review through Comparative Analysis between South Korea and U.S.", Journal of Information Technology Services Vol. 15, No. 2, 2016, 169-184.
  5. Cho, H. J., Kim, S. G. and Kang, J. Y., "An Empirical Analysis of Doppelganger Brand Image Effects: Focused on the Internet Community", The Journal of Information Systems, Vol. 26, No. 1, 2017, 21-51.
  6. Cho, K., H, "Texsorte Und Stil -Eine Analyse Der Textsorte "Leserkommentar" Und "Kommentar" Im Deutschen", Koreanische Zeitschrift fuer Deutschunterricht, Vol. 46, No. 1, 2009, 61-82.
  7. Choi, S. R. and Yoo, J. W., "Present of the Analysis Method of the Validation between the Story Proceeding and the Character - by the Generative Trajectory of Meaning with Greimass and Enneagram", Journal of Digital Design, Vol. 14, No. 2, 2014, 139-147.
  8. Hong, J., Kim, S., Park, J. and Choi, J., "A Malicious Comments Detection Technique on the Internet Using Sentiment Analysis and Svm", Journal of the Korea Institute of Information and Communication Engineering, Vol. 20, No. 2, 2016, 260-267.
  9. Hwang, S., "How Korean Top 100 Companies Use Social Network Services: An Analysis of Relationship Cultivation Strategies, Message Topics, and Posting Types", Studies of Broadcasting Culture, Vol. 25, No. 1, 2013, 235-273.
  10. Jang, P.-S., "Study on Principal Sentiment Analysis of Social Data", Journal of the Korea Society of Computer and Information, Vol. 19, No. 12, 2014, 49-56.
  11. Jeong, E. G., " The Characteristic and Meaning of Narrative Style Based on a Point of View of a Novel", SOONGSILOHMUN, Vol. 24, 2010, 39-68.
  12. Kang, B., Song, M. and Jho, W., "A Study on Opinion Mining of Newspaper Texts Based on Topic Modeling", JOURNAL OF THE KOREAN SOCIETY FOR LIBRARY AND INFORMATION SCIENCE, Vol. 47, No. 4, 2013, 315-334.
  13. Kim, S. G., Cho, H. J. and Kang, J. Y., "The Status of Using Text Mining in Academic Research and Analysis Methods", Journal of Information Technology and Architecture, Vol. 13, No. 2, 2016, 317-329.
  14. Kim, S. G. and Kang, J., "Analyzing the Discriminative Attributes of Products Using Text Mining Focused on Cosmetic Reviews", Information Processing & Management, Vol. 54, No. 6, 2018, 938-957.
  15. Knoke, D. and Kuklinski, J. H., Network Analysis: Basic Concepts. Markets, Hierarchies and Networks: The Coordination of Social Life. SAGE, 1991.
  16. Lee, J. O., Stylism. Seoul: Salim. Seoul: Salim, 2006.
  17. Lee, S. Y. and Lee, K. M., "A Reply Graph-Based Social Mining Method with Topic Modeling", Journal of Korean Institute of Intelligent Systems, Vol. 24, No. 6, 2014, 640-645.
  18. Lee, H. S. and Jeon, M. G., " A corpus stylistic analysis of Jonathan Swifts writing style in Gullivers Travels and A Tale of a Tub", Korea Journal of English Language and Linguistics, Vol. 19, No. 1, 2019, 120-141.
  19. Oberreuter, G. and Velasquez, J. D., "Text Mining Applied to Plagiarism Detection: The Use of Words for Detecting Deviations in the Writing Style", Expert Systems with Applications, Vol. 40, No. 9, 2013, 3756-3763.
  20. Pang, B. and Lee, L., "Opinion Mining and Sentiment Analysis", Foundations and trends in information retrieval, Vol. 2, No. 1-2, 2008, 1-135.
  21. Park, G.-M., Kim, S.-H. and Cho, H.-G., "Analysis of Social Network According to the Distance of Characters Statements", JOURNAL OF THE KOREA CONTENTS ASSOCIATION, Vol. 13, No. 4, 2013, 427-439.
  22. Pavlyshenko, B., "Clustering of Authors' Texts of English Fiction in the Vector Space of Semantic Fields", Cybernetics and Information Technologies, Vol. 14, No. 3, 2014, 25-36.
  23. Scott, J., "Social Network Analysis", Sociology, Vol. 22, No. 1, 1988, 109-127.
  24. Suh, J. H., "Comparing Writing Style Feature-Based Classification Methods for Estimating User Reputations in Social Media", SpringerPlus, Vol. 5, No. 1, 2016, 261.
  25. Suh, Y. H., " An Analysis of Style in Hemingways Short Story A Canary for One with Special Focus on the Function of Repetition", The Journal of Linguistics Science, Vol. 87, 2018, 329-344.
  26. Sung, M. and Cho, J., "Corporate Communication Management on Social Networking Sites : Analysis of Communication Strategies on Corporate Facebook Pages", Journal of Communication Science, Vol. 16, No. 4, 2016, 41-82.
  27. Yang, N.-Y., Kim, S.-G. and Kang, J.-Y., "Researcher and Research Area Recommendation System for Promoting Convergence Research Using Text Mining and Messenger Ui", The Journal of Information Systems, Vol. 27, No. 4, 2018, 71-96.