DOI QR코드

DOI QR Code

Discovering Meaningful Trends in the Inaugural Addresses of United States Presidents Via Text Mining

텍스트마이닝을 활용한 미국 대통령 취임 연설문의 트렌드 연구

  • Cho, Su Gon (Department of Industrial Management Engineering, Korea University) ;
  • Cho, Jaehee (Business School, Kwangwoon University) ;
  • Kim, Seoung Bum (Department of Industrial Management Engineering, Korea University)
  • Received : 2015.02.25
  • Accepted : 2015.05.11
  • Published : 2015.10.15

Abstract

Identification of meaningful patterns and trends in large volumes of text data is an important task in various research areas. In the present study, we propose a procedure to find meaningful tendencies based on a combination of text mining, cluster analysis, and low-dimensional embedding. To demonstrate applicability and effectiveness of the proposed procedure, we analyzed the inaugural addresses of the presidents of the United States from 1789 to 2009. The main results of this study show that trends in the national policy agenda can be discovered based on clustering and visualization algorithms.

Keywords

References

  1. Aggarwal, C. C. and Zhai, C. (2012), Mining text data, Springer.
  2. Akimoto, M. (2010), Language Change and Variation from Old English to Late Modern English, Peter Lang. New York, U.S.
  3. Bird, S. (2006), NLTK : the natural language toolkit, In Proceedings of the COLING/ACL on Interactive presentation sessions, 69-72.
  4. Chakraborty, G., Pagolu, M., and Garla, S. (2013), Text Mining and Analysis : Practical Methods, Examples, and Case Studies Using SAS, SAS Institute.
  5. Chen, Y. T. and Chen, M. C. (2011), Using chi-square statistics to measure similarities for text categorization, Expert systems with applications, 38, 3085-3090. https://doi.org/10.1016/j.eswa.2010.08.100
  6. Cho, S. G. and Kim, S. B. (2012), Finding Meaningful Pattern of Key Words in IIE Transactions Using Text Mining, Journal of the Korean Institute of Industrial Engineers, 38(1), 67-73. https://doi.org/10.7232/JKIIE.2012.38.1.067
  7. Cho, G. H., Lim, S. Y., and Hur, S. (2014), An Analysis of the Research Methodologies and Techniques in the Industrial Engineering Using Text Mining, Journal of the Korean Institute of Industrial Engineers, 40(1), 52-59. https://doi.org/10.7232/JKIIE.2014.40.1.052
  8. Chris, D. P. (1990), Another Stemmer, ACM SIGIR Forum, 24(3), 56-61. https://doi.org/10.1145/101306.101310
  9. Gillani, S. A. and Ko, A. (2014), Process-based knowledge extraction in a public authority : A text mining approach, In Electronic Government and the Information Systems Perspective, 91-103.
  10. Gordon, A. D. (1999), Classification, Champman and Hall, New York, USA.
  11. Hartigan, J. A. (1975), Clustering Algorithms, John Wiley and Sons, New York, USA.
  12. Hu, X. and Liu, H. (2012), Text analytics in social media, Mining text data, 385-414.
  13. Huang, A. (2008), Similarity measures for text document clustering, Proceedings of the sixth new zealand computer science research student conference, 49-56.
  14. Hung, J. L. and Zhang, K. (2012), Examining mobile learning trends 2003-2008 : A categorical meta-trend analysis using text mining techniques, Journal of Computing in Higher Education, 24(1), 1-17. https://doi.org/10.1007/s12528-011-9044-9
  15. Jain, A. K. and Dubes, R. C. (1988), Algorithms for clustering data, Prentice-Hall, Inc.
  16. Jivani, A. G. (2011), A comparative study of stemming algorithms, Int. J. Comp. Tech. Appl, 2(6), 1930-1938.
  17. Julia, B., Silvia, C., and Giuliana, D. (2013), Variation and Change in Spoken and Written Discourse : Perspectives from Corpus Linguistics, John Benjamins publishing company, Philadelphia, U.S.
  18. Kam, J. S., Kim, M. W., and Hyun, B. H. (2013), A Study on Analysis of Patent Information Based Biotechnology Research Trend and Promising Research Themes, The Korea Society for Innovation Management and Economics, 21(2), 25-56.
  19. Kim, H. Y. (2013), Analysis of an Inaugural Address of Korean Presidents Based on Network, Korea Content Association, 3(2), 67-68.
  20. Kim, H. Y., Kim, H. G., and Kang, B. M. (2012), A Trend Analysis of Curtural comsumption Based on Newspaper Texts, Journal of KIISE : Software and Applications, 39(3), 244-251.
  21. Kim, H. (2014), A Study on Presidential Leadership and Policy Agenda Setting Pattern : A Content Analysis of Korean Presidential Addresses, Journal of Korean Politics, 23(2), 77-102.
  22. Kim, M. and Koo, P. (2013), A Study on Big Data Based Investment Strategy Using Internet Search Trends, Journal of the Korean Operations Research and Management Science Society, 38(4), 53-64. https://doi.org/10.7737/JKORMS.2013.38.4.053
  23. Kim, M., Notkin, D., Grossman, D., and Wilson, G. (2013), Identifying and summarizing systematic code changes via rule inference, Software Engineering, IEEE Transactions on, 39, 45-62. https://doi.org/10.1109/TSE.2012.16
  24. Kim, Y., Tian, Y., Jeong, Y., Jihee, R., and Myaeng, S. H. (2009), Automatic discovery of technology trends from patent text. Proceedings of the 2009 ACM symposium on Applied Computing, 1480-1487.
  25. Lee, Y. J., Seo, J. H., and Choi, J. T. (2014), Fashion Trend Marketing Prediction Analysis Based on Opinion Mining Applying SNS Text Contents, The Journal of Korean Institute of Information Technology, 12(12), 163-170.
  26. Lim, E. T. (2002), Five trends in presidential rhetoric : An analysis of rhetoric from George Washington to Bill Clinton, Presidential Studies Quarterly, 32(2), 328-348. https://doi.org/10.1111/j.0360-4918.2002.00223.x
  27. Liu, B. (2012), Sentiment analysis and opinion mining, Synthesis Lectures on Human Language Technologies, 5(1), 1-167. https://doi.org/10.2200/S00416ED1V01Y201204HLT016
  28. Lovins, J. B. (1968), Development of a stemming algorithm, MIT Information Processing Group, Electronic Systems Laboratory.
  29. Min, K. Y., Kim, H. T., and Ji, Y. G. (2014), A Pilot Study on Applying Text Mining Tools to Analyzing Steel Industry Trends : A Case Study of the Steel Industry for the Company "P", Society for EBusiness Studies, 19(3), 51-64.
  30. Pai, M. Y., Chen, M. Y., Chu, H. C., and Chen, Y. M. (2013), Development of a semantic-based content mapping mechanism for information retrieval, Expert Systems with Applications, 40, 2447-2461. https://doi.org/10.1016/j.eswa.2012.10.056
  31. Park, H., Seo, W., Coh, B., Lee, J. and Yoon, J. (2014), Technology Opportunity Discovery Based on Firms' Technologies and Products, Journal of the Korean Institute of Industrial Engineers, 40(5), 442-450. https://doi.org/10.7232/JKIIE.2014.40.5.442
  32. Porter, M. (2001), Snowball : A language for stemming algorithms, http://snowball.tartarus.org/texts/introduction.html.
  33. Porter, M. F. (1980), An algorithm for suffix stripping, Program : electronic library and information systems, 14(3), 130-137. https://doi.org/10.1108/eb046814
  34. Pramokchon, P. and Piamsa-nga, P. (2014), A feature score for classifying class-imbalanced data, In Computer Science and Engineering Conference (ICSEC), 409-414.
  35. Rajaraman, A. and Ullman, J. D. (2011), Mining of massive datasets, Cambridge University Press.
  36. Rebholz-Schuhmann, D., Kirsch, H., and Couto, F. (2005), Facts from text-Is text mining ready to deliver?, PLoS biology, 3(2), e65. https://doi.org/10.1371/journal.pbio.0030065
  37. Rousseeuw, P. J. (1987), Silhouettes : a graphical aid to the interpretation and validation of cluster analysis, Journal of computational and applied mathematics, 20, 53-65. https://doi.org/10.1016/0377-0427(87)90125-7
  38. Rowie, S. T. and Saul, L. K. (2000), Nonlinear Dimensionality Reduction by Locally Linear Embedding, SCIENCE, 290(5500), 2000-2326.
  39. Saul, L. K., and Roweis, S. T. (2000), An Introduction to Locally Linear Embedding, http://cs.nyu.edu/-roweis/lle/publications.html.
  40. Zhang, J., Kawai, Y., and Kumamoto, T. (2010), A Flexible Re-ranking System Based on Sub-keyword Extraction and Importance Adjustment, IAENG International Journal of Computer Science, 37(3), 1-8.

Cited by

  1. An Analysis of Causes of Marine Incidents at sea Using Big Data Technique vol.24, pp.4, 2018, https://doi.org/10.7837/kosomes.2018.24.4.408