• Title/Summary/Keyword: processing system

Search Result 22,263, Processing Time 0.048 seconds

Aspect-Based Sentiment Analysis Using BERT: Developing Aspect Category Sentiment Classification Models (BERT를 활용한 속성기반 감성분석: 속성카테고리 감성분류 모델 개발)

  • Park, Hyun-jung;Shin, Kyung-shik
    • Journal of Intelligence and Information Systems
    • /
    • v.26 no.4
    • /
    • pp.1-25
    • /
    • 2020
  • Sentiment Analysis (SA) is a Natural Language Processing (NLP) task that analyzes the sentiments consumers or the public feel about an arbitrary object from written texts. Furthermore, Aspect-Based Sentiment Analysis (ABSA) is a fine-grained analysis of the sentiments towards each aspect of an object. Since having a more practical value in terms of business, ABSA is drawing attention from both academic and industrial organizations. When there is a review that says "The restaurant is expensive but the food is really fantastic", for example, the general SA evaluates the overall sentiment towards the 'restaurant' as 'positive', while ABSA identifies the restaurant's aspect 'price' as 'negative' and 'food' aspect as 'positive'. Thus, ABSA enables a more specific and effective marketing strategy. In order to perform ABSA, it is necessary to identify what are the aspect terms or aspect categories included in the text, and judge the sentiments towards them. Accordingly, there exist four main areas in ABSA; aspect term extraction, aspect category detection, Aspect Term Sentiment Classification (ATSC), and Aspect Category Sentiment Classification (ACSC). It is usually conducted by extracting aspect terms and then performing ATSC to analyze sentiments for the given aspect terms, or by extracting aspect categories and then performing ACSC to analyze sentiments for the given aspect category. Here, an aspect category is expressed in one or more aspect terms, or indirectly inferred by other words. In the preceding example sentence, 'price' and 'food' are both aspect categories, and the aspect category 'food' is expressed by the aspect term 'food' included in the review. If the review sentence includes 'pasta', 'steak', or 'grilled chicken special', these can all be aspect terms for the aspect category 'food'. As such, an aspect category referred to by one or more specific aspect terms is called an explicit aspect. On the other hand, the aspect category like 'price', which does not have any specific aspect terms but can be indirectly guessed with an emotional word 'expensive,' is called an implicit aspect. So far, the 'aspect category' has been used to avoid confusion about 'aspect term'. From now on, we will consider 'aspect category' and 'aspect' as the same concept and use the word 'aspect' more for convenience. And one thing to note is that ATSC analyzes the sentiment towards given aspect terms, so it deals only with explicit aspects, and ACSC treats not only explicit aspects but also implicit aspects. This study seeks to find answers to the following issues ignored in the previous studies when applying the BERT pre-trained language model to ACSC and derives superior ACSC models. First, is it more effective to reflect the output vector of tokens for aspect categories than to use only the final output vector of [CLS] token as a classification vector? Second, is there any performance difference between QA (Question Answering) and NLI (Natural Language Inference) types in the sentence-pair configuration of input data? Third, is there any performance difference according to the order of sentence including aspect category in the QA or NLI type sentence-pair configuration of input data? To achieve these research objectives, we implemented 12 ACSC models and conducted experiments on 4 English benchmark datasets. As a result, ACSC models that provide performance beyond the existing studies without expanding the training dataset were derived. In addition, it was found that it is more effective to reflect the output vector of the aspect category token than to use only the output vector for the [CLS] token as a classification vector. It was also found that QA type input generally provides better performance than NLI, and the order of the sentence with the aspect category in QA type is irrelevant with performance. There may be some differences depending on the characteristics of the dataset, but when using NLI type sentence-pair input, placing the sentence containing the aspect category second seems to provide better performance. The new methodology for designing the ACSC model used in this study could be similarly applied to other studies such as ATSC.

A study on the classification of research topics based on COVID-19 academic research using Topic modeling (토픽모델링을 활용한 COVID-19 학술 연구 기반 연구 주제 분류에 관한 연구)

  • Yoo, So-yeon;Lim, Gyoo-gun
    • Journal of Intelligence and Information Systems
    • /
    • v.28 no.1
    • /
    • pp.155-174
    • /
    • 2022
  • From January 2020 to October 2021, more than 500,000 academic studies related to COVID-19 (Coronavirus-2, a fatal respiratory syndrome) have been published. The rapid increase in the number of papers related to COVID-19 is putting time and technical constraints on healthcare professionals and policy makers to quickly find important research. Therefore, in this study, we propose a method of extracting useful information from text data of extensive literature using LDA and Word2vec algorithm. Papers related to keywords to be searched were extracted from papers related to COVID-19, and detailed topics were identified. The data used the CORD-19 data set on Kaggle, a free academic resource prepared by major research groups and the White House to respond to the COVID-19 pandemic, updated weekly. The research methods are divided into two main categories. First, 41,062 articles were collected through data filtering and pre-processing of the abstracts of 47,110 academic papers including full text. For this purpose, the number of publications related to COVID-19 by year was analyzed through exploratory data analysis using a Python program, and the top 10 journals under active research were identified. LDA and Word2vec algorithm were used to derive research topics related to COVID-19, and after analyzing related words, similarity was measured. Second, papers containing 'vaccine' and 'treatment' were extracted from among the topics derived from all papers, and a total of 4,555 papers related to 'vaccine' and 5,971 papers related to 'treatment' were extracted. did For each collected paper, detailed topics were analyzed using LDA and Word2vec algorithms, and a clustering method through PCA dimension reduction was applied to visualize groups of papers with similar themes using the t-SNE algorithm. A noteworthy point from the results of this study is that the topics that were not derived from the topics derived for all papers being researched in relation to COVID-19 (

    ) were the topic modeling results for each research topic (
    ) was found to be derived from For example, as a result of topic modeling for papers related to 'vaccine', a new topic titled Topic 05 'neutralizing antibodies' was extracted. A neutralizing antibody is an antibody that protects cells from infection when a virus enters the body, and is said to play an important role in the production of therapeutic agents and vaccine development. In addition, as a result of extracting topics from papers related to 'treatment', a new topic called Topic 05 'cytokine' was discovered. A cytokine storm is when the immune cells of our body do not defend against attacks, but attack normal cells. Hidden topics that could not be found for the entire thesis were classified according to keywords, and topic modeling was performed to find detailed topics. In this study, we proposed a method of extracting topics from a large amount of literature using the LDA algorithm and extracting similar words using the Skip-gram method that predicts the similar words as the central word among the Word2vec models. The combination of the LDA model and the Word2vec model tried to show better performance by identifying the relationship between the document and the LDA subject and the relationship between the Word2vec document. In addition, as a clustering method through PCA dimension reduction, a method for intuitively classifying documents by using the t-SNE technique to classify documents with similar themes and forming groups into a structured organization of documents was presented. In a situation where the efforts of many researchers to overcome COVID-19 cannot keep up with the rapid publication of academic papers related to COVID-19, it will reduce the precious time and effort of healthcare professionals and policy makers, and rapidly gain new insights. We hope to help you get It is also expected to be used as basic data for researchers to explore new research directions.

  • A Study on Rationalization of National Forest Management in Korea (국유림경영(國有林經營)의 합리화(合理化)에 관(關)한 연구(硏究))

    • Choi, Kyu-Ryun
      • Journal of Korean Society of Forest Science
      • /
      • v.20 no.1
      • /
      • pp.1-44
      • /
      • 1973
    • Needless to say, the management of national forest in all countries is very important in view of the national mission and management purposes. Korean national forest is also in particular significant in promoting national economy for the continuous increasing of the demand for wood, conservation of the land and social welfare. But there's no denying the fact that the leading aim of the Korean forest policy has been based upon the conservation of forest resources and recovery of land conservation function instead of improvement of the forest productive capacity. Therefore, the management of national forest should be aimed as an industry in the chain of the Korean national economy. And the increment of the forest productive capacity based on rationalized forest management is also urgently needed. Not only the increment of the timber production but also the establishment of the good forest in quality and quantity are to bring naturally many functions of conservation and other public benefits. In 1908 Korean national forest was historically established for the first time as a result of the notification for ownership, and was divided into two kinds in 1911-1924, such as indisposable national forest for land conservation, forest management, scientific research and public welfare, and the other national forest to be disposed. Indisposable forest is mostly under the jurisdiction of national forest stations (Chungbu, Tongbu, Nambu), and the tother national forests are under custody of respective cities and provinces, and under custody of the other government authorities. As of the end of 1971, national forest land is 19.5% (1,297,708 ha) of the total forest land area, but growing stock is 50.1% ($35,406,079m^3$) of the total forest growing stock, and timber production of national forest is 23.6% ($205,959m^3$) of the year production of total timber in Korea. Accordingly, it is the important fact that national forest occupies the major part of Korean forestry. The author positively affirms that success or failure of the management of national forest controls rise or fall of forestry in Korea. All functions of forest are very important, but among others the function of timber production is most important especially in Korea, that unavoidably imports a large quantity of foreign wood every year (in 1971 import of foreign wood-$3,756,000m^3$, 160,995,000 dollars). So, Korea urgently needs the improvement of forest productive capacity in national forest. But it is difficult that wood production meets the rapid increase of demand for wood to the development of economy, because production term of forestry is long, so national forest management should be rationalized by the effective investment and development of forestry techniques in the long view. Although Korean national forest business has many difficulties in the budget, techniques and the lack of labour due to outflow of rural village labour by development of national economy, and the increase of labour wages and administrative expenses etc. the development of national forest depends on adoption of the suitable forest techniques and management adapted for social and economical development. In this view point the writer has investigated and analyzed the status of the management of national forest in Korea to examine the irrational problems and suggest an improvement plan. The national forestry statistics cited in this study is based on the basic statistics and the statistics of the forest business as of the end of 1971 published by Office of Forestry, Republic of Korea, and the other depended on the data presented by the national forest stations. The writer wants to propose as follows (seemed to be helpful in improvement of Korean national forest management). 1) In the organization of national forest management, more national forest stations should be established to manage intensively, and the staff of working plan officials should be strengthened because of the importance of working plan. 2) By increasing the staff of protection officials, forest area assigned for each protection official should be decreased to 1,000-2,000 ha. 3) The frequent personnel changes of supervisor of national forest station(the responsible person on-the-spot) obstructs to accomplish the consistent management plan. 4) In the working plan drafting for national forest, basic investigations should be carefully practiced with sufficient expenditure and staff not to draft unreal working plan. 5) The area of working-unit should be decreased to less than 2,000 ha on the average for intensive management and the principle of a working-unit in a forest station should be realized as soon as possible. 6) Reforestation on open land should be completed in a short time with a debt of the special fund(a long term loan), and the land on which growing hardwood stands should be changed with conifers to increase productivity per unit area, and at the same time techical utilization method of hardwood should be developed. 7) Expenses of reforestation should be saved by mechanization and use of chemicals for reforestation and tree nursery operation providing against the lack of labour in future. 8) In forest protection, forest fire damage is enormous in comparison with foreign countries, accordingly prevention system and equipment should be improved, and also the minimum necessary budget should be counted up for establishment and manintenance of fire-lines. 9) Manufacture production should be enlarged to systematize protection, processing and circulation of forest business, and, by doing this, mich benefit is naturally given for rural people. 10) Establishment and arrangement of forest road networks and erosion control work are indispensable for the future development of national forest itself and local development. Therefore, these works should be promoted by the responsibility of general accounting instead of special accounting. 11) Mechanization of forest works should be realized for exploiting hinterlands to meet the demand for timber increased and for solving lack of labour, consequently it should promote import of forest machines, home production, training for operaters and careful adminitration. 12) Situation of labour in future will grow worse. Therefore, the countermeasure to maintain forest labourers and pay attention to public welfare facilities and works should be considered. 13) Although the condition of income and expenditure grows worse because of economical change, the regular expenditure should be fixed. So part of the surplus fund, as of the end of 1971, should be established for the fund, and used for enlarging reforestation and forest road networks(preceding investment in national forest).

    • PDF