• Title/Summary/Keyword: Training system

Search Result 6,156, Processing Time 0.032 seconds

KNU Korean Sentiment Lexicon: Bi-LSTM-based Method for Building a Korean Sentiment Lexicon (Bi-LSTM 기반의 한국어 감성사전 구축 방안)

  • Park, Sang-Min;Na, Chul-Won;Choi, Min-Seong;Lee, Da-Hee;On, Byung-Won
    • Journal of Intelligence and Information Systems
    • /
    • v.24 no.4
    • /
    • pp.219-240
    • /
    • 2018
  • Sentiment analysis, which is one of the text mining techniques, is a method for extracting subjective content embedded in text documents. Recently, the sentiment analysis methods have been widely used in many fields. As good examples, data-driven surveys are based on analyzing the subjectivity of text data posted by users and market researches are conducted by analyzing users' review posts to quantify users' reputation on a target product. The basic method of sentiment analysis is to use sentiment dictionary (or lexicon), a list of sentiment vocabularies with positive, neutral, or negative semantics. In general, the meaning of many sentiment words is likely to be different across domains. For example, a sentiment word, 'sad' indicates negative meaning in many fields but a movie. In order to perform accurate sentiment analysis, we need to build the sentiment dictionary for a given domain. However, such a method of building the sentiment lexicon is time-consuming and various sentiment vocabularies are not included without the use of general-purpose sentiment lexicon. In order to address this problem, several studies have been carried out to construct the sentiment lexicon suitable for a specific domain based on 'OPEN HANGUL' and 'SentiWordNet', which are general-purpose sentiment lexicons. However, OPEN HANGUL is no longer being serviced and SentiWordNet does not work well because of language difference in the process of converting Korean word into English word. There are restrictions on the use of such general-purpose sentiment lexicons as seed data for building the sentiment lexicon for a specific domain. In this article, we construct 'KNU Korean Sentiment Lexicon (KNU-KSL)', a new general-purpose Korean sentiment dictionary that is more advanced than existing general-purpose lexicons. The proposed dictionary, which is a list of domain-independent sentiment words such as 'thank you', 'worthy', and 'impressed', is built to quickly construct the sentiment dictionary for a target domain. Especially, it constructs sentiment vocabularies by analyzing the glosses contained in Standard Korean Language Dictionary (SKLD) by the following procedures: First, we propose a sentiment classification model based on Bidirectional Long Short-Term Memory (Bi-LSTM). Second, the proposed deep learning model automatically classifies each of glosses to either positive or negative meaning. Third, positive words and phrases are extracted from the glosses classified as positive meaning, while negative words and phrases are extracted from the glosses classified as negative meaning. Our experimental results show that the average accuracy of the proposed sentiment classification model is up to 89.45%. In addition, the sentiment dictionary is more extended using various external sources including SentiWordNet, SenticNet, Emotional Verbs, and Sentiment Lexicon 0603. Furthermore, we add sentiment information about frequently used coined words and emoticons that are used mainly on the Web. The KNU-KSL contains a total of 14,843 sentiment vocabularies, each of which is one of 1-grams, 2-grams, phrases, and sentence patterns. Unlike existing sentiment dictionaries, it is composed of words that are not affected by particular domains. The recent trend on sentiment analysis is to use deep learning technique without sentiment dictionaries. The importance of developing sentiment dictionaries is declined gradually. However, one of recent studies shows that the words in the sentiment dictionary can be used as features of deep learning models, resulting in the sentiment analysis performed with higher accuracy (Teng, Z., 2016). This result indicates that the sentiment dictionary is used not only for sentiment analysis but also as features of deep learning models for improving accuracy. The proposed dictionary can be used as a basic data for constructing the sentiment lexicon of a particular domain and as features of deep learning models. It is also useful to automatically and quickly build large training sets for deep learning models.

A Study on the Spatial Structure of Eupchi(邑治) and Landscape Architecture of Provincial Government Office(地方官衙) in the Late Joseon Dynasty through 'Sukchunjeahdo(宿踐諸衙圖)' - Focused on the Youngyuhyun Pyeongan Province and Sincheongun Hwanghae Province - (『숙천제아도(宿踐諸衙圖)』를 통해 본 조선시대 읍치(邑治)의 공간구조와 관아(官衙) 조경 - 평안도 영유현과 황해도 신천군을 중심으로 -)

  • Shin, Sang sup;Lee, Seung yoen
    • Korean Journal of Heritage: History & Science
    • /
    • v.49 no.2
    • /
    • pp.86-103
    • /
    • 2016
  • 'Sukchunjeahdo' illustration-book, which was left by Han, Pil-gyo(韓弼敎 : 1807~1878)in the late Joseon Dynasty, includes pictorial record paintings containing government offices, Eupchi, and Feng Shui condition drawn by Gyehwa(界畵) method Sabangjeondomyobeop(四方顚倒描法) and is the rare historical material that help to understand spatial structure and landscape characteristics. Youngyuhyun(永柔縣) and Sincheongun(信川郡) town, the case sites of this study, show Feng Shui foundation structure and placement rules of government offices in the Joseon Period are applied such as 3Dan 1Myo(三壇一廟 : Sajikdan, Yeodan, Seonghwangdan, Hyanggyo), 3Mun 3Jo(三門三朝 : Oeah, Dongheon, Naeah) and Jeonjohuchim(前朝後寢) etc. by setting the upper and lower hierarchy of the north south central axis. The circulation system is the pattern that roads are segmented around the marketplace of the entrance of the town and the structure is that heading to the north along the internal way leads to the government office and going out to the main street leads to the major city. Baesanimsu(背山臨水 : Mountain in backward and water in front) foundation, back hill pine forest, intentionally created low mountains and town forest etc. showed landscape aesthetics well suited for the environmental comfort condition such as microclimate control, natural disaster prevention, psychological stability reflecting color constancy principle etc. and tower pavilions were built throughout the scenic spot, reflecting life philosophy and thoughts of contemporaries such as physical and mental discipline, satisfied at the reality of poverty, returning to nature etc. For government office landscape, shielding and buffer planting, landscape planting etc. were considered around Gaeksa(客舍), Dongheon(東軒), Naeah(內衙) backyard and deciduous tree s and flowering trees were cultivated as main species and in case of Gaeksa, tiled pavilions and pavilions topped with poke weed in tetragonal pond were introduced to Dongheon and Naeah and separate pavilions were built for the purpose of physical and mental discipline and military training such as archery. Back hill pine tree forest formed back landscape and zelkova, pear trees, willow trees, old pine trees, lotus, flowering trees etc. were cultivated as gardening trees and Feng-Shui forest with willow trees as its main species was created for landscape and practical purposes. On the other hand, various cultural landscape elements etc. were introduced such as pavilions, pond serving as fire protection water(square and circle), stone pagoda and stone Buddha, fountains and wells, monument houses, flagpoles etc. In case of Sincheongun town forest(邑藪), Manhagwan(挽河觀), Moonmujeong(文武井), Sangjangdae(上場岱) and Hajangdae(下場岱) Market place, Josanshup<(造山藪 : Dongseojanglim(東西長林)>, Namcheon(南川) etc. were combined and community cultural park with the nature of modern urban park was operated. In this context, government office landscape shows the garden management aspect where square pond and pavilions, flowering trees are harmonized around side pavilion and backyard. Also, environmental design technique not biased to aesthetics and ideological moral philosophy and comprehensively considering functionality (shielding and fire prevention, microclimate control, etc.) and environmental soundness etc. is working.

Aspect-Based Sentiment Analysis Using BERT: Developing Aspect Category Sentiment Classification Models (BERT를 활용한 속성기반 감성분석: 속성카테고리 감성분류 모델 개발)

  • Park, Hyun-jung;Shin, Kyung-shik
    • Journal of Intelligence and Information Systems
    • /
    • v.26 no.4
    • /
    • pp.1-25
    • /
    • 2020
  • Sentiment Analysis (SA) is a Natural Language Processing (NLP) task that analyzes the sentiments consumers or the public feel about an arbitrary object from written texts. Furthermore, Aspect-Based Sentiment Analysis (ABSA) is a fine-grained analysis of the sentiments towards each aspect of an object. Since having a more practical value in terms of business, ABSA is drawing attention from both academic and industrial organizations. When there is a review that says "The restaurant is expensive but the food is really fantastic", for example, the general SA evaluates the overall sentiment towards the 'restaurant' as 'positive', while ABSA identifies the restaurant's aspect 'price' as 'negative' and 'food' aspect as 'positive'. Thus, ABSA enables a more specific and effective marketing strategy. In order to perform ABSA, it is necessary to identify what are the aspect terms or aspect categories included in the text, and judge the sentiments towards them. Accordingly, there exist four main areas in ABSA; aspect term extraction, aspect category detection, Aspect Term Sentiment Classification (ATSC), and Aspect Category Sentiment Classification (ACSC). It is usually conducted by extracting aspect terms and then performing ATSC to analyze sentiments for the given aspect terms, or by extracting aspect categories and then performing ACSC to analyze sentiments for the given aspect category. Here, an aspect category is expressed in one or more aspect terms, or indirectly inferred by other words. In the preceding example sentence, 'price' and 'food' are both aspect categories, and the aspect category 'food' is expressed by the aspect term 'food' included in the review. If the review sentence includes 'pasta', 'steak', or 'grilled chicken special', these can all be aspect terms for the aspect category 'food'. As such, an aspect category referred to by one or more specific aspect terms is called an explicit aspect. On the other hand, the aspect category like 'price', which does not have any specific aspect terms but can be indirectly guessed with an emotional word 'expensive,' is called an implicit aspect. So far, the 'aspect category' has been used to avoid confusion about 'aspect term'. From now on, we will consider 'aspect category' and 'aspect' as the same concept and use the word 'aspect' more for convenience. And one thing to note is that ATSC analyzes the sentiment towards given aspect terms, so it deals only with explicit aspects, and ACSC treats not only explicit aspects but also implicit aspects. This study seeks to find answers to the following issues ignored in the previous studies when applying the BERT pre-trained language model to ACSC and derives superior ACSC models. First, is it more effective to reflect the output vector of tokens for aspect categories than to use only the final output vector of [CLS] token as a classification vector? Second, is there any performance difference between QA (Question Answering) and NLI (Natural Language Inference) types in the sentence-pair configuration of input data? Third, is there any performance difference according to the order of sentence including aspect category in the QA or NLI type sentence-pair configuration of input data? To achieve these research objectives, we implemented 12 ACSC models and conducted experiments on 4 English benchmark datasets. As a result, ACSC models that provide performance beyond the existing studies without expanding the training dataset were derived. In addition, it was found that it is more effective to reflect the output vector of the aspect category token than to use only the output vector for the [CLS] token as a classification vector. It was also found that QA type input generally provides better performance than NLI, and the order of the sentence with the aspect category in QA type is irrelevant with performance. There may be some differences depending on the characteristics of the dataset, but when using NLI type sentence-pair input, placing the sentence containing the aspect category second seems to provide better performance. The new methodology for designing the ACSC model used in this study could be similarly applied to other studies such as ATSC.

A Study on Searching for Export Candidate Countries of the Korean Food and Beverage Industry Using Node2vec Graph Embedding and Light GBM Link Prediction (Node2vec 그래프 임베딩과 Light GBM 링크 예측을 활용한 식음료 산업의 수출 후보국가 탐색 연구)

  • Lee, Jae-Seong;Jun, Seung-Pyo;Seo, Jinny
    • Journal of Intelligence and Information Systems
    • /
    • v.27 no.4
    • /
    • pp.73-95
    • /
    • 2021
  • This study uses Node2vec graph embedding method and Light GBM link prediction to explore undeveloped export candidate countries in Korea's food and beverage industry. Node2vec is the method that improves the limit of the structural equivalence representation of the network, which is known to be relatively weak compared to the existing link prediction method based on the number of common neighbors of the network. Therefore, the method is known to show excellent performance in both community detection and structural equivalence of the network. The vector value obtained by embedding the network in this way operates under the condition of a constant length from an arbitrarily designated starting point node. Therefore, it has the advantage that it is easy to apply the sequence of nodes as an input value to the model for downstream tasks such as Logistic Regression, Support Vector Machine, and Random Forest. Based on these features of the Node2vec graph embedding method, this study applied the above method to the international trade information of the Korean food and beverage industry. Through this, we intend to contribute to creating the effect of extensive margin diversification in Korea in the global value chain relationship of the industry. The optimal predictive model derived from the results of this study recorded a precision of 0.95 and a recall of 0.79, and an F1 score of 0.86, showing excellent performance. This performance was shown to be superior to that of the binary classifier based on Logistic Regression set as the baseline model. In the baseline model, a precision of 0.95 and a recall of 0.73 were recorded, and an F1 score of 0.83 was recorded. In addition, the light GBM-based optimal prediction model derived from this study showed superior performance than the link prediction model of previous studies, which is set as a benchmarking model in this study. The predictive model of the previous study recorded only a recall rate of 0.75, but the proposed model of this study showed better performance which recall rate is 0.79. The difference in the performance of the prediction results between benchmarking model and this study model is due to the model learning strategy. In this study, groups were classified by the trade value scale, and prediction models were trained differently for these groups. Specific methods are (1) a method of randomly masking and learning a model for all trades without setting specific conditions for trade value, (2) arbitrarily masking a part of the trades with an average trade value or higher and using the model method, and (3) a method of arbitrarily masking some of the trades with the top 25% or higher trade value and learning the model. As a result of the experiment, it was confirmed that the performance of the model trained by randomly masking some of the trades with the above-average trade value in this method was the best and appeared stably. It was found that most of the results of potential export candidates for Korea derived through the above model appeared appropriate through additional investigation. Combining the above, this study could suggest the practical utility of the link prediction method applying Node2vec and Light GBM. In addition, useful implications could be derived for weight update strategies that can perform better link prediction while training the model. On the other hand, this study also has policy utility because it is applied to trade transactions that have not been performed much in the research related to link prediction based on graph embedding. The results of this study support a rapid response to changes in the global value chain such as the recent US-China trade conflict or Japan's export regulations, and I think that it has sufficient usefulness as a tool for policy decision-making.

Target-Aspect-Sentiment Joint Detection with CNN Auxiliary Loss for Aspect-Based Sentiment Analysis (CNN 보조 손실을 이용한 차원 기반 감성 분석)

  • Jeon, Min Jin;Hwang, Ji Won;Kim, Jong Woo
    • Journal of Intelligence and Information Systems
    • /
    • v.27 no.4
    • /
    • pp.1-22
    • /
    • 2021
  • Aspect Based Sentiment Analysis (ABSA), which analyzes sentiment based on aspects that appear in the text, is drawing attention because it can be used in various business industries. ABSA is a study that analyzes sentiment by aspects for multiple aspects that a text has. It is being studied in various forms depending on the purpose, such as analyzing all targets or just aspects and sentiments. Here, the aspect refers to the property of a target, and the target refers to the text that causes the sentiment. For example, for restaurant reviews, you could set the aspect into food taste, food price, quality of service, mood of the restaurant, etc. Also, if there is a review that says, "The pasta was delicious, but the salad was not," the words "steak" and "salad," which are directly mentioned in the sentence, become the "target." So far, in ABSA, most studies have analyzed sentiment only based on aspects or targets. However, even with the same aspects or targets, sentiment analysis may be inaccurate. Instances would be when aspects or sentiment are divided or when sentiment exists without a target. For example, sentences like, "Pizza and the salad were good, but the steak was disappointing." Although the aspect of this sentence is limited to "food," conflicting sentiments coexist. In addition, in the case of sentences such as "Shrimp was delicious, but the price was extravagant," although the target here is "shrimp," there are opposite sentiments coexisting that are dependent on the aspect. Finally, in sentences like "The food arrived too late and is cold now." there is no target (NULL), but it transmits a negative sentiment toward the aspect "service." Like this, failure to consider both aspects and targets - when sentiment or aspect is divided or when sentiment exists without a target - creates a dual dependency problem. To address this problem, this research analyzes sentiment by considering both aspects and targets (Target-Aspect-Sentiment Detection, hereby TASD). This study detected the limitations of existing research in the field of TASD: local contexts are not fully captured, and the number of epochs and batch size dramatically lowers the F1-score. The current model excels in spotting overall context and relations between each word. However, it struggles with phrases in the local context and is relatively slow when learning. Therefore, this study tries to improve the model's performance. To achieve the objective of this research, we additionally used auxiliary loss in aspect-sentiment classification by constructing CNN(Convolutional Neural Network) layers parallel to existing models. If existing models have analyzed aspect-sentiment through BERT encoding, Pooler, and Linear layers, this research added CNN layer-adaptive average pooling to existing models, and learning was progressed by adding additional loss values for aspect-sentiment to existing loss. In other words, when learning, the auxiliary loss, computed through CNN layers, allowed the local context to be captured more fitted. After learning, the model is designed to do aspect-sentiment analysis through the existing method. To evaluate the performance of this model, two datasets, SemEval-2015 task 12 and SemEval-2016 task 5, were used and the f1-score increased compared to the existing models. When the batch was 8 and epoch was 5, the difference was largest between the F1-score of existing models and this study with 29 and 45, respectively. Even when batch and epoch were adjusted, the F1-scores were higher than the existing models. It can be said that even when the batch and epoch numbers were small, they can be learned effectively compared to the existing models. Therefore, it can be useful in situations where resources are limited. Through this study, aspect-based sentiments can be more accurately analyzed. Through various uses in business, such as development or establishing marketing strategies, both consumers and sellers will be able to make efficient decisions. In addition, it is believed that the model can be fully learned and utilized by small businesses, those that do not have much data, given that they use a pre-training model and recorded a relatively high F1-score even with limited resources.

A Study on Rationalization of National Forest Management in Korea (국유림경영(國有林經營)의 합리화(合理化)에 관(關)한 연구(硏究))

  • Choi, Kyu-Ryun
    • Journal of Korean Society of Forest Science
    • /
    • v.20 no.1
    • /
    • pp.1-44
    • /
    • 1973
  • Needless to say, the management of national forest in all countries is very important in view of the national mission and management purposes. Korean national forest is also in particular significant in promoting national economy for the continuous increasing of the demand for wood, conservation of the land and social welfare. But there's no denying the fact that the leading aim of the Korean forest policy has been based upon the conservation of forest resources and recovery of land conservation function instead of improvement of the forest productive capacity. Therefore, the management of national forest should be aimed as an industry in the chain of the Korean national economy. And the increment of the forest productive capacity based on rationalized forest management is also urgently needed. Not only the increment of the timber production but also the establishment of the good forest in quality and quantity are to bring naturally many functions of conservation and other public benefits. In 1908 Korean national forest was historically established for the first time as a result of the notification for ownership, and was divided into two kinds in 1911-1924, such as indisposable national forest for land conservation, forest management, scientific research and public welfare, and the other national forest to be disposed. Indisposable forest is mostly under the jurisdiction of national forest stations (Chungbu, Tongbu, Nambu), and the tother national forests are under custody of respective cities and provinces, and under custody of the other government authorities. As of the end of 1971, national forest land is 19.5% (1,297,708 ha) of the total forest land area, but growing stock is 50.1% ($35,406,079m^3$) of the total forest growing stock, and timber production of national forest is 23.6% ($205,959m^3$) of the year production of total timber in Korea. Accordingly, it is the important fact that national forest occupies the major part of Korean forestry. The author positively affirms that success or failure of the management of national forest controls rise or fall of forestry in Korea. All functions of forest are very important, but among others the function of timber production is most important especially in Korea, that unavoidably imports a large quantity of foreign wood every year (in 1971 import of foreign wood-$3,756,000m^3$, 160,995,000 dollars). So, Korea urgently needs the improvement of forest productive capacity in national forest. But it is difficult that wood production meets the rapid increase of demand for wood to the development of economy, because production term of forestry is long, so national forest management should be rationalized by the effective investment and development of forestry techniques in the long view. Although Korean national forest business has many difficulties in the budget, techniques and the lack of labour due to outflow of rural village labour by development of national economy, and the increase of labour wages and administrative expenses etc. the development of national forest depends on adoption of the suitable forest techniques and management adapted for social and economical development. In this view point the writer has investigated and analyzed the status of the management of national forest in Korea to examine the irrational problems and suggest an improvement plan. The national forestry statistics cited in this study is based on the basic statistics and the statistics of the forest business as of the end of 1971 published by Office of Forestry, Republic of Korea, and the other depended on the data presented by the national forest stations. The writer wants to propose as follows (seemed to be helpful in improvement of Korean national forest management). 1) In the organization of national forest management, more national forest stations should be established to manage intensively, and the staff of working plan officials should be strengthened because of the importance of working plan. 2) By increasing the staff of protection officials, forest area assigned for each protection official should be decreased to 1,000-2,000 ha. 3) The frequent personnel changes of supervisor of national forest station(the responsible person on-the-spot) obstructs to accomplish the consistent management plan. 4) In the working plan drafting for national forest, basic investigations should be carefully practiced with sufficient expenditure and staff not to draft unreal working plan. 5) The area of working-unit should be decreased to less than 2,000 ha on the average for intensive management and the principle of a working-unit in a forest station should be realized as soon as possible. 6) Reforestation on open land should be completed in a short time with a debt of the special fund(a long term loan), and the land on which growing hardwood stands should be changed with conifers to increase productivity per unit area, and at the same time techical utilization method of hardwood should be developed. 7) Expenses of reforestation should be saved by mechanization and use of chemicals for reforestation and tree nursery operation providing against the lack of labour in future. 8) In forest protection, forest fire damage is enormous in comparison with foreign countries, accordingly prevention system and equipment should be improved, and also the minimum necessary budget should be counted up for establishment and manintenance of fire-lines. 9) Manufacture production should be enlarged to systematize protection, processing and circulation of forest business, and, by doing this, mich benefit is naturally given for rural people. 10) Establishment and arrangement of forest road networks and erosion control work are indispensable for the future development of national forest itself and local development. Therefore, these works should be promoted by the responsibility of general accounting instead of special accounting. 11) Mechanization of forest works should be realized for exploiting hinterlands to meet the demand for timber increased and for solving lack of labour, consequently it should promote import of forest machines, home production, training for operaters and careful adminitration. 12) Situation of labour in future will grow worse. Therefore, the countermeasure to maintain forest labourers and pay attention to public welfare facilities and works should be considered. 13) Although the condition of income and expenditure grows worse because of economical change, the regular expenditure should be fixed. So part of the surplus fund, as of the end of 1971, should be established for the fund, and used for enlarging reforestation and forest road networks(preceding investment in national forest).

  • PDF