Analysis of the Status of Natural Language Processing Technology Based on Deep Learning (딥러닝 중심의 자연어 처리 기술 현황 분석)
-
- The Journal of Bigdata
- /
- v.6 no.1
- /
- pp.63-81
- /
- 2021
The performance of natural language processing is rapidly improving due to the recent development and application of machine learning and deep learning technologies, and as a result, the field of application is expanding. In particular, as the demand for analysis on unstructured text data increases, interest in NLP(Natural Language Processing) is also increasing. However, due to the complexity and difficulty of the natural language preprocessing process and machine learning and deep learning theories, there are still high barriers to the use of natural language processing. In this paper, for an overall understanding of NLP, by examining the main fields of NLP that are currently being actively researched and the current state of major technologies centered on machine learning and deep learning, We want to provide a foundation to understand and utilize NLP more easily. Therefore, we investigated the change of NLP in AI(artificial intelligence) through the changes of the taxonomy of AI technology. The main areas of NLP which consists of language model, text classification, text generation, document summarization, question answering and machine translation were explained with state of the art deep learning models. In addition, major deep learning models utilized in NLP were explained, and data sets and evaluation measures for performance evaluation were summarized. We hope researchers who want to utilize NLP for various purposes in their field be able to understand the overall technical status and the main technologies of NLP through this paper.
The addition and evaluation of health impact items in Environmental Impact Assessment document are written in hygiene and public health items only for specific development projects and are being reviewed. However, after the publication of the evaluation manual on the addition and evaluation of health impact items in 2011, there is a demand for continuous methodology and improvement plans despite partial improvement. Therefore, in order to propose a methodological improvement of the evaluation manual, this technical paper identified detailed improvement requirements based on the consultation opinions on hygiene and public health items, and investigated and suggested ways to solve this problem by reviewing the contents of the research so far. As for the improvement requirements, the contents related to mitigation plan, post management, effect prediction, assessment, and present-condition investigation were presented in Environmental Impact Assessment documents for the entire development project at a frequency of 93%, 85%, 80%, 74%, and 67%, respectively. Particularly, the detailed improvement requirements related to mitigation plan consisted of an establishment direction and a management of development project. Considering the current evaluation manual and the frequency of improvement requirements, this paper proposed concrete methods or improvement plans for major methodologies for each classification of hygiene and public health items. Furthermore, a comprehensive evaluation methodology related to whether a project is implemented was proposed, which is not provided in the current assessment manual.
As the development of the 4th industrial revolution in the maritime industry has accelerated, the technical development and progress of maritime autonomous surface ship(MASS), and the development of international regulations have been accelerated. In particular, the IMO Maritime Safety Committee(MSC) has established a road-map for the development of the non-mandatory goal-based MASS instrument(MASS Code) and started developing a non-mandatory MASS Code at MSC 105th meeting. Many countries are actively participating in the Correspondence Group on the development of MASS Code, and the development of detailed requirements for MASS functions in the MASS Code is underway. Especially, the concept of "Mode of Operation" for MASS functions was mentioned in the Correspondence Group for the first time, and it is expected that discussions on these modes will be conducted from the IMO MASS JWG meeting to held in April 2023. The concept of "Mode of Operation" will be useful in explaining MASS and MASS functions and will be discussed continually for the development of MASS Code. This paper reviews the contents of the IMCA M 220 document, which provides guidelines on operating modes, to conduct research on the benchmark for setting the operating modes of MASS.
The vast volumes of data that are generated during site characterization and associated research for the disposal of high-level radioactive waste require effective data management to properly chronicle and archive this information. The Swedish Nuclear Fuel and Waste Management Company, SKB, established the SICADA database for site selection, evaluation, analysis, and modeling. The German Federal Company for Radioactive Waste Disposal, BGE, established ArbeitsDB, a database and document management system, and the ELO data system to manage data collected according to the Repository Site Selection Act. The U.K. Nuclear Waste Services established the Data Management System to manage any research and survey data pertaining to nuclear waste storage and disposal. The U.S. Department of Energy and Office of Civilian Radioactive Waste Management established the Technical Data Management System for data management and subsequent licensing procedures during site characterization surveys. The presented cases undertaken by these national agencies highlight the importance of data quality management and the scalability of data utilization to ensure effective data management. Korea should also pursue the establishment of both a data management concept for radioactive waste disposal that considers data quality management and scalability from a long-term perspective and an associated data management system.
From January 2020 to October 2021, more than 500,000 academic studies related to COVID-19 (Coronavirus-2, a fatal respiratory syndrome) have been published. The rapid increase in the number of papers related to COVID-19 is putting time and technical constraints on healthcare professionals and policy makers to quickly find important research. Therefore, in this study, we propose a method of extracting useful information from text data of extensive literature using LDA and Word2vec algorithm. Papers related to keywords to be searched were extracted from papers related to COVID-19, and detailed topics were identified. The data used the CORD-19 data set on Kaggle, a free academic resource prepared by major research groups and the White House to respond to the COVID-19 pandemic, updated weekly. The research methods are divided into two main categories. First, 41,062 articles were collected through data filtering and pre-processing of the abstracts of 47,110 academic papers including full text. For this purpose, the number of publications related to COVID-19 by year was analyzed through exploratory data analysis using a Python program, and the top 10 journals under active research were identified. LDA and Word2vec algorithm were used to derive research topics related to COVID-19, and after analyzing related words, similarity was measured. Second, papers containing 'vaccine' and 'treatment' were extracted from among the topics derived from all papers, and a total of 4,555 papers related to 'vaccine' and 5,971 papers related to 'treatment' were extracted. did For each collected paper, detailed topics were analyzed using LDA and Word2vec algorithms, and a clustering method through PCA dimension reduction was applied to visualize groups of papers with similar themes using the t-SNE algorithm. A noteworthy point from the results of this study is that the topics that were not derived from the topics derived for all papers being researched in relation to COVID-19 (