DOI QR코드

DOI QR Code

Text Mining Analysis on the Research Field of the Coastal and Ocean Engineering Based on the SCOPUS Bibliographic Information

해안해양공학 연구 분야의 SCOPUS 서지정보 Text Mining 분석

  • Lee, Gi Seop (Ocean Data Science Section, Korea Institute of Ocean Science & Technology) ;
  • Cho, Hong Yeon (Ocean Data Science Section, Korea Institute of Ocean Science & Technology) ;
  • Han, Jae Rim (Ocean Data Science Section, Korea Institute of Ocean Science & Technology)
  • 이기섭 (한국해양과학기술원 해양자료실) ;
  • 조홍연 (한국해양과학기술원 해양자료실) ;
  • 한재림 (한국해양과학기술원 해양자료실)
  • Received : 2018.01.31
  • Accepted : 2018.02.18
  • Published : 2018.02.28

Abstract

Numerous research papers have been accumulated due to the development and computerization of bibliometrics. This made it difficult to review all of the related papers published worldwide to conduct the study. However, due to the development of Natural language processing techniques, the tendency analysis of published research papers has become easier. In this study, text mining analysis using the statistical computing language R was carried out based on the bibliographic information of SCOPUS DB (Data Base) in the field of coastal and ocean engineering. As expected, the term 'wave' predominates, and it was confirmed that numerical analysis and hydraulic experiments were still dominant from the terms 'numerical model', 'numerical simulation', and 'experimental study'. In addition, recent use of the term 'wave energy' related to marine energy has been recognized. On the other hand, it was quantitatively confirmed that the frequency of connection between 'wave', and 'height' or 'energy' prevailed, and suggested the possibility of high resolution analysis by detailed field and period in the future.

서지정보학의 발달 및 전산화로 방대한 양의 연구논문들이 축적되고 있다. 이에 따라 전 세계에서 출판되는 관련 분야 논문들을 모두 검토하기는 실질적으로 어려워졌으며, 연구방향을 잡고 추진하는 것도 어려워졌다. 그러나 자연어 처리기법의 발달로 인해 출판된 연구논문들의 경향 분석이 수월해졌다. 여기서는 해안 해양공학 분야의 SCOPUS DB(Data Base) 서지정보 텍스트 마이닝(Text Mining) 분석을 R언어를 이용하여 수행했다. 분석 결과, 예상한 바와 같이 'wave' 용어가 압도적으로 우세하였으며, 'numerical model', 'numerical simulation' 및 experimental study' 용어로부터 여전히 수치해석 및 수리실험의 우세가 확인되었다. 또한 최근 해양에너지와 관련되는 'wave energy' 용어 사용이 부각되고 있는 것으로 파악되었다. 한편, 해안 해양공학 분야의 연구주제 용어의 빈도와 연결 관계는 'wave -> height, energy' 우세를 정량적으로 확인할 수 있었으며, 향후 세부분야 및 시기별 고해상도 분석 가능성을 제시하였다.

Keywords

References

  1. Aria, M. and Cuccurullo, C. (2017). bibliometrix: An R-tool for comprehensive science mapping analysis. Journal of Informetrics, 11(4), 959-975.
  2. Csardi, G. and Nepusz, T. (2006). The igraph software package for complex network research. InterJournal, Complex Systems, 1695(5), 1-9.
  3. Cuccurullo, C., Aria, M. and Sarto, F. (2016). Foundations and trends in performance management. A twenty-five years bibliometric analysis in business and public administration domains. Scientometrics, 108(2), 595-611. https://doi.org/10.1007/s11192-016-1948-8
  4. Dowle, M. and Srinivasan, A. (2017). data.table: Extension of 'data.frame'. R package version 1.10.4-3. https://CRAN.R-project.org/package=data.table.
  5. Fellows, I. (2014). wordcloud: Word Clouds. R package version 2.5. https://CRAN.R-project.org/package=wordcloud.
  6. Grimes, S. (2007). Brief history of text analytics. http://www.b-eyenetwork.com/view/6311. [Google Scholar].
  7. Hui, I. (2017). Shaping the Coast with Permits: Making the State Regulatory Permitting Process Transparent with Text Mining. Coastal Management, 45(3), 179-198. https://doi.org/10.1080/08920753.2017.1303694
  8. Kim, S.W. and Suh, K.D. (2011). Prediction of Stability Number for Tetrapod Armour Block Using Artificial Neural Network and M5Model Tree. Journal of Korean Society of Coastal and Ocean Engineers, 23(1), 109-117 (in Korean). https://doi.org/10.9765/KSCOE.2011.23.1.109
  9. Pedersen, T.L. (2017). ggraph: An implementation of grammar of graphics for graphs and networks. R package version 0.1, 1.
  10. R Core Team (2017). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. URL https://www.R-project.org/.
  11. Rudis, B. and Embrey, B. (2016). pluralize: Pluralize and Singularize Any (English) Word. R package version 0.1.0. http://github.com/hrbrmstr/pluralize.
  12. Silge, J. and Robinson, D. (2017). Text mining with R: A tidy approach. O'Reilly Media, Inc., Sebastopol, C.A.
  13. Silge, J. and Robinson, D. (2016). tidytext: Text mining and analysis using tidy data principles in r. The Journal of Open Source Software, 1(3).
  14. Van Driel, M.A., Bruggeman, J., Vriend, G., Brunner, H.G. and Leunissen, J.A. (2006). A text-mining analysis of the human phenome. European Journal of Human Genetics, 14(5), 535-542. https://doi.org/10.1038/sj.ejhg.5201585
  15. Wickham, H., Francois, R., Henry, L. and Muller, K. (2017). dplyr: A Grammar of Data Manipulation. R package version 0.7.4. https://CRAN.R-project.org/package=dplyr.
  16. Wickham, H. and Henry, L. (2017). tidyr: Easily Tidy Data with 'spread()' and 'gather()' Functions. R package version 0.7.2. https://CRAN.R-project.org/package=tidyr.
  17. Witten, IH. (2018). Text mining. https://www.cs.waikato.ac.nz/-ihw/papers/04-IHW-Textmining.pdf, [accessed 18.01.12].
  18. Wu, Y., Xie, L., Huang, S.L., Li, P., Yuan, Z. and Liu, W. (2018). Using social media to strengthen public awareness of wildlife conservation. Ocean & Coastal Management, 153, 76-83. https://doi.org/10.1016/j.ocecoaman.2017.12.010