• Title/Summary/Keyword: extraction system

Search Result 3,463, Processing Time 0.024 seconds

Aspect-Based Sentiment Analysis Using BERT: Developing Aspect Category Sentiment Classification Models (BERT를 활용한 속성기반 감성분석: 속성카테고리 감성분류 모델 개발)

  • Park, Hyun-jung;Shin, Kyung-shik
    • Journal of Intelligence and Information Systems
    • /
    • v.26 no.4
    • /
    • pp.1-25
    • /
    • 2020
  • Sentiment Analysis (SA) is a Natural Language Processing (NLP) task that analyzes the sentiments consumers or the public feel about an arbitrary object from written texts. Furthermore, Aspect-Based Sentiment Analysis (ABSA) is a fine-grained analysis of the sentiments towards each aspect of an object. Since having a more practical value in terms of business, ABSA is drawing attention from both academic and industrial organizations. When there is a review that says "The restaurant is expensive but the food is really fantastic", for example, the general SA evaluates the overall sentiment towards the 'restaurant' as 'positive', while ABSA identifies the restaurant's aspect 'price' as 'negative' and 'food' aspect as 'positive'. Thus, ABSA enables a more specific and effective marketing strategy. In order to perform ABSA, it is necessary to identify what are the aspect terms or aspect categories included in the text, and judge the sentiments towards them. Accordingly, there exist four main areas in ABSA; aspect term extraction, aspect category detection, Aspect Term Sentiment Classification (ATSC), and Aspect Category Sentiment Classification (ACSC). It is usually conducted by extracting aspect terms and then performing ATSC to analyze sentiments for the given aspect terms, or by extracting aspect categories and then performing ACSC to analyze sentiments for the given aspect category. Here, an aspect category is expressed in one or more aspect terms, or indirectly inferred by other words. In the preceding example sentence, 'price' and 'food' are both aspect categories, and the aspect category 'food' is expressed by the aspect term 'food' included in the review. If the review sentence includes 'pasta', 'steak', or 'grilled chicken special', these can all be aspect terms for the aspect category 'food'. As such, an aspect category referred to by one or more specific aspect terms is called an explicit aspect. On the other hand, the aspect category like 'price', which does not have any specific aspect terms but can be indirectly guessed with an emotional word 'expensive,' is called an implicit aspect. So far, the 'aspect category' has been used to avoid confusion about 'aspect term'. From now on, we will consider 'aspect category' and 'aspect' as the same concept and use the word 'aspect' more for convenience. And one thing to note is that ATSC analyzes the sentiment towards given aspect terms, so it deals only with explicit aspects, and ACSC treats not only explicit aspects but also implicit aspects. This study seeks to find answers to the following issues ignored in the previous studies when applying the BERT pre-trained language model to ACSC and derives superior ACSC models. First, is it more effective to reflect the output vector of tokens for aspect categories than to use only the final output vector of [CLS] token as a classification vector? Second, is there any performance difference between QA (Question Answering) and NLI (Natural Language Inference) types in the sentence-pair configuration of input data? Third, is there any performance difference according to the order of sentence including aspect category in the QA or NLI type sentence-pair configuration of input data? To achieve these research objectives, we implemented 12 ACSC models and conducted experiments on 4 English benchmark datasets. As a result, ACSC models that provide performance beyond the existing studies without expanding the training dataset were derived. In addition, it was found that it is more effective to reflect the output vector of the aspect category token than to use only the output vector for the [CLS] token as a classification vector. It was also found that QA type input generally provides better performance than NLI, and the order of the sentence with the aspect category in QA type is irrelevant with performance. There may be some differences depending on the characteristics of the dataset, but when using NLI type sentence-pair input, placing the sentence containing the aspect category second seems to provide better performance. The new methodology for designing the ACSC model used in this study could be similarly applied to other studies such as ATSC.

A study on the classification of research topics based on COVID-19 academic research using Topic modeling (토픽모델링을 활용한 COVID-19 학술 연구 기반 연구 주제 분류에 관한 연구)

  • Yoo, So-yeon;Lim, Gyoo-gun
    • Journal of Intelligence and Information Systems
    • /
    • v.28 no.1
    • /
    • pp.155-174
    • /
    • 2022
  • From January 2020 to October 2021, more than 500,000 academic studies related to COVID-19 (Coronavirus-2, a fatal respiratory syndrome) have been published. The rapid increase in the number of papers related to COVID-19 is putting time and technical constraints on healthcare professionals and policy makers to quickly find important research. Therefore, in this study, we propose a method of extracting useful information from text data of extensive literature using LDA and Word2vec algorithm. Papers related to keywords to be searched were extracted from papers related to COVID-19, and detailed topics were identified. The data used the CORD-19 data set on Kaggle, a free academic resource prepared by major research groups and the White House to respond to the COVID-19 pandemic, updated weekly. The research methods are divided into two main categories. First, 41,062 articles were collected through data filtering and pre-processing of the abstracts of 47,110 academic papers including full text. For this purpose, the number of publications related to COVID-19 by year was analyzed through exploratory data analysis using a Python program, and the top 10 journals under active research were identified. LDA and Word2vec algorithm were used to derive research topics related to COVID-19, and after analyzing related words, similarity was measured. Second, papers containing 'vaccine' and 'treatment' were extracted from among the topics derived from all papers, and a total of 4,555 papers related to 'vaccine' and 5,971 papers related to 'treatment' were extracted. did For each collected paper, detailed topics were analyzed using LDA and Word2vec algorithms, and a clustering method through PCA dimension reduction was applied to visualize groups of papers with similar themes using the t-SNE algorithm. A noteworthy point from the results of this study is that the topics that were not derived from the topics derived for all papers being researched in relation to COVID-19 (

    ) were the topic modeling results for each research topic (
    ) was found to be derived from For example, as a result of topic modeling for papers related to 'vaccine', a new topic titled Topic 05 'neutralizing antibodies' was extracted. A neutralizing antibody is an antibody that protects cells from infection when a virus enters the body, and is said to play an important role in the production of therapeutic agents and vaccine development. In addition, as a result of extracting topics from papers related to 'treatment', a new topic called Topic 05 'cytokine' was discovered. A cytokine storm is when the immune cells of our body do not defend against attacks, but attack normal cells. Hidden topics that could not be found for the entire thesis were classified according to keywords, and topic modeling was performed to find detailed topics. In this study, we proposed a method of extracting topics from a large amount of literature using the LDA algorithm and extracting similar words using the Skip-gram method that predicts the similar words as the central word among the Word2vec models. The combination of the LDA model and the Word2vec model tried to show better performance by identifying the relationship between the document and the LDA subject and the relationship between the Word2vec document. In addition, as a clustering method through PCA dimension reduction, a method for intuitively classifying documents by using the t-SNE technique to classify documents with similar themes and forming groups into a structured organization of documents was presented. In a situation where the efforts of many researchers to overcome COVID-19 cannot keep up with the rapid publication of academic papers related to COVID-19, it will reduce the precious time and effort of healthcare professionals and policy makers, and rapidly gain new insights. We hope to help you get It is also expected to be used as basic data for researchers to explore new research directions.

  • A Study on the Waterscape Formation Techniques of China's Suzhou Classical Garden Based on the Water Inlet and Outlet (수구(水口)를 중심으로 분석한 중국 소주고전원림(蘇州古典園林)의 수경관 연출기법)

    • RHO Jaehyun;LYU Yuan
      • Korean Journal of Heritage: History & Science
      • /
      • v.57 no.3
      • /
      • pp.116-137
      • /
      • 2024
    • This study quantitatively explored the interrelationship between water features and surrounding waterscape elements through a literature review and observational study targeting nine waterscapes of Suzhou Classical Garden in Jiangsu Province, China, which is designated as a UNESCO World Heritage Site. The purpose was to understand the objective characteristics of classical Chinese gardens and seek a basis for their differences from Korean gardens. The average area of water space in Suzhou gardens was 1,680.7㎡, which accounted for 21.3% of the total garden area, showing large variation by garden. Most of the Suzhou Gardens use springs and wells as their water sources. The Surging Waves Pavillion uses surface water, and Retreat & Reflection Garden uses seasonal water as its water source. The water pipes in Suzhou Garden are divided into a water outlet and a water outlet(water holes). Of these, the water outlet is a water outlet that imitates the water outlet just to induce a visual effect, and focuses on the meaning of the water system. It is judged to have been combined with the trend of Suzhou gardens. In addition, it was confirmed that, semantically, the arrangement of the water polo in Suzhou Garden is based on the traditional 'Gamyeo(堪輿) theory'. Meanwhile, there are five types of methods for bringing water to Suzhou Garden: Jiginbeop(直引法), Myeonggeobeop(明渠法), Invasionbeop(滲透法), Gwandobeop(管道法), and Chakjeongbeop(鑿井法). Suzhou Classical Garden mainly applies the infiltration method and the irrigation method as a method of securing water in the garden, which can be classified and defined as the water catchment method(集水法) and the water pulling method(引水法) in the domestic classification method. Among the watering techniques in Korean traditional gardens, watering methods such as 'suspension waterfall(懸瀑)', 'flying waterfall(飛瀑)' and water eluted(湧出), have not been found, and it is believed that they mainly 'rely on hide with dignity(姿逸)' and 'submerged current(潛流)' techniques. As for the watering technique, no watering technique was found that uses a Muneomi, which is applied in traditional Korean gardens. As this was applied, the seal method, penetration method, and Gwandobeop were also used in water extraction techniques. And at the inlet and outlet of Suzhou Garden, the main static water bodies were lakes, swamps, and dams. While the eastern water bodies are classified into streams, waterfalls, and springs, the water spaces in the three gardens reflect the centrifugal distributed arrangement, and the water spaces in the six places reflect the water landscape effect due to the centripetal concentrated arrangement. And as a water space landscape design technique, the techniques of 'Gyeok(隔)' and 'Pa(破)' were mainly applied at the inlet, and the techniques of 'Eom(隔)' and 'Pa(破)' were mainly applied at the outlet. For example, most bridges were built around the inlet, and sa(榭), heon(軒), gak(閣), pavilion(亭), and corridor(廊) were built, and the outlet was concealed with a stone wall. Therefore, it is understood to have embodied Suzhou Garden's idea of water(理水), which says, "Although it was created by humans, it is as if the sky is mine(雖由人作,宛自天開)."A trend was detected. Lastly, as a result of analyzing the degree of concealment and exposure in the visual composition of the inlet and outlet, it was confirmed that the water outlet was exposed only at the Eobijeong and Mountain Villa with Embracing Beauty view points of The Surging Waves Pavillion and the water outlet was hidden at other view points. Looking at these results, the 'Hyang-Hyang-Ba-Mi-Bob(向向發微法)' from the perspective of left-orientation theory of Feng Shui, which is applied in Korean traditional gardens in classical Chinese garden water management, "makes water visible as it comes in, but invisible as it goes out." It is judged that the technique was barely matched.


    (34141) Korea Institute of Science and Technology Information, 245, Daehak-ro, Yuseong-gu, Daejeon
    Copyright (C) KISTI. All Rights Reserved.