Browse > Article
http://dx.doi.org/10.5351/KJAS.2021.34.2.267

Research trends in statistics for domestic and international journal using paper abstract data  

Yang, Jong-Hoon (Department of Applied Statistics, Chung-Ang University)
Kwak, Il-Youp (Department of Applied Statistics, Chung-Ang University)
Publication Information
The Korean Journal of Applied Statistics / v.34, no.2, 2021 , pp. 267-278 More about this Journal
Abstract
As time goes by, the amount of data is increasing regardless of government, business, domestic or overseas. Accordingly, research on big data is increasing in academia. Statistics is one of the major disciplines of big data research, and it will be interesting to understand the research trend of statistics through big data in the growing number of papers in statistics. In this study, we analyzed what studies are being conducted through abstract data of statistical papers in Korea and abroad. Research trends in domestic and international were analyzed through the frequency of keyword data of the papers, and the relationship between the keywords was visualized through the Word Embedding method. In addition to the keywords selected by the authors, words that are importantly used in statistical papers selected through Textrank were also visualized. Lastly, 10 topics were investigated by applying the LDA technique to the abstract data. Through the analysis of each topic, we investigated which research topics are frequently studied and which words are used importantly.
Keywords
text mining; word embedding; topic modeling;
Citations & Related Records
연도 인용수 순위
  • Reference
1 Kim SY (2020). Analysis on status and trends of SIAM journal papers using text mining, Journal of the Korea Contents Association, 20, 212-222.
2 Landauer TK, Foltz PW, and Laham D (1998). An introduction to latent semantic analysis, Discourse processes, 25, 259-284.   DOI
3 Maaten L and Hinton G (2008). Visualizing data using t-SNE, Journal of Machine Learning Research, 9, 2579-2605.
4 Mai F, Galke L, and Scherp A (2019). CBOW is not all you need: Combining CBOW with the compositional matrix space model, CoRR.
5 Mihalcea R, Tarau P (2004). TextRank: bringing order into texts. In Proceedings of EMNLP-04and the 2004 Conference on Empirical Methods in Natural Language Processing.
6 Mikolov T, Sutskever I, Chen K, Corrado G, and Dean J (2013). Distributed rep-resentations of words and phrases and their compositionality, Neural and Information Processing System (NIPS)
7 Papadimitriou C, Raghavan P, Tamaki H, and Vempala S (1998). Latent semantic indexing: a probabilistic analysis. In Proceedings of ACM PODS, 159-168.
8 Joulin A, Grave E, Bojanowski P, Douze M, Jegou H, and Mikolov T (2016). Fasttext.zip: Compressing text classification models, arXiv preprint arXiv:1612.03651.
9 Rong X (2014). Word2vec parameter learning explained. arXiv.
10 Roweis TS and Saul KL (2000). Nonlinear Dimensionality Reduction by Locally Linear Embedding, Science, 290, 2323-2326.   DOI
11 Jeon YB, Ryu SR, Song JH, and Kim HJ (2017), Analysis of research trends in artificial intelligence using text mining techniques, Proceedings of the Korea Intelligent Information Systems Society, 39-40.
12 Pennington J, Socher R, and Manning CD (2014). Glove: global vectors forword representation, EMNLP, 14, 1532-1543.
13 Sievert C, Shirley K (2014). LDAvis: A method for visualizing and interpreting topics, Proceedings of the Workshop on Interactive Language Learning, Visualization, and Interfaces, 63-70.
14 Yin Z and Shen Y (2018). On the dimensionality of word embedding, Advances in Neural Information Processing Systems 31, 895-906.
15 Blei MD, Ng YA, Jordan IM (2003). Latent dirichlet allocation, Journal of Machine Learning Research, 3, 993-1022.
16 Brin S and Page L (1998). The anatomy of a large-scale hypertextual web search engine. In Proceedings of the Seventh International Conference on World Wide Web 7, 107-117.
17 Brownlee J (2020). A gentle introduction to the bag-of-words model. In Deep Learning for Natural Language Processing
18 Choi CH and LEE JB (2017). The knowledge structure analysis on Taekwondo researches : Application of key-word network analysis, The Korean Journal of Physical Education, 56, 627-644.   DOI
19 Cox FT and Cox MAA (2000). Multidimensional scaling 2nd ed, Chapman and Hall.
20 Goldberg Y and Levy O (2014). Word2vec explained: deriving mikolov et al.'s negative-sampling word-embedding method. arXiv
21 Lee IS, Park SH, Baek JG (2015). Identification of research trends in the manufacturing system field through text mining, Proceedings of the Spring Conference of the Korean Institute of Industrial Engineers, 4201-4205
22 Jolliffe IT (1986). Principal Component Analysis, Springer Verlag.