Search | Korea Science

Sentiment Analysis of Korean Reviews Using CNN: Focusing on Morpheme Embedding (CNN을 적용한 한국어 상품평 감성분석: 형태소 임베딩을 중심으로)

Park, Hyun-jung;Song, Min-chae;Shin, Kyung-shik
- Journal of Intelligence and Information Systems
- /
- v.24 no.2
- /
- pp.59-83
- /
- 2018
With the increasing importance of sentiment analysis to grasp the needs of customers and the public, various types of deep learning models have been actively applied to English texts. In the sentiment analysis of English texts by deep learning, natural language sentences included in training and test datasets are usually converted into sequences of word vectors before being entered into the deep learning models. In this case, word vectors generally refer to vector representations of words obtained through splitting a sentence by space characters. There are several ways to derive word vectors, one of which is Word2Vec used for producing the 300 dimensional Google word vectors from about 100 billion words of Google News data. They have been widely used in the studies of sentiment analysis of reviews from various fields such as restaurants, movies, laptops, cameras, etc. Unlike English, morpheme plays an essential role in sentiment analysis and sentence structure analysis in Korean, which is a typical agglutinative language with developed postpositions and endings. A morpheme can be defined as the smallest meaningful unit of a language, and a word consists of one or more morphemes. For example, for a word '예쁘고', the morphemes are '예쁘(= adjective)' and '고(=connective ending)'. Reflecting the significance of Korean morphemes, it seems reasonable to adopt the morphemes as a basic unit in Korean sentiment analysis. Therefore, in this study, we use 'morpheme vector' as an input to a deep learning model rather than 'word vector' which is mainly used in English text. The morpheme vector refers to a vector representation for the morpheme and can be derived by applying an existent word vector derivation mechanism to the sentences divided into constituent morphemes. By the way, here come some questions as follows. What is the desirable range of POS(Part-Of-Speech) tags when deriving morpheme vectors for improving the classification accuracy of a deep learning model? Is it proper to apply a typical word vector model which primarily relies on the form of words to Korean with a high homonym ratio? Will the text preprocessing such as correcting spelling or spacing errors affect the classification accuracy, especially when drawing morpheme vectors from Korean product reviews with a lot of grammatical mistakes and variations? We seek to find empirical answers to these fundamental issues, which may be encountered first when applying various deep learning models to Korean texts. As a starting point, we summarized these issues as three central research questions as follows. First, which is better effective, to use morpheme vectors from grammatically correct texts of other domain than the analysis target, or to use morpheme vectors from considerably ungrammatical texts of the same domain, as the initial input of a deep learning model? Second, what is an appropriate morpheme vector derivation method for Korean regarding the range of POS tags, homonym, text preprocessing, minimum frequency? Third, can we get a satisfactory level of classification accuracy when applying deep learning to Korean sentiment analysis? As an approach to these research questions, we generate various types of morpheme vectors reflecting the research questions and then compare the classification accuracy through a non-static CNN(Convolutional Neural Network) model taking in the morpheme vectors. As for training and test datasets, Naver Shopping's 17,260 cosmetics product reviews are used. To derive morpheme vectors, we use data from the same domain as the target one and data from other domain; Naver shopping's about 2 million cosmetics product reviews and 520,000 Naver News data arguably corresponding to Google's News data. The six primary sets of morpheme vectors constructed in this study differ in terms of the following three criteria. First, they come from two types of data source; Naver news of high grammatical correctness and Naver shopping's cosmetics product reviews of low grammatical correctness. Second, they are distinguished in the degree of data preprocessing, namely, only splitting sentences or up to additional spelling and spacing corrections after sentence separation. Third, they vary concerning the form of input fed into a word vector model; whether the morphemes themselves are entered into a word vector model or with their POS tags attached. The morpheme vectors further vary depending on the consideration range of POS tags, the minimum frequency of morphemes included, and the random initialization range. All morpheme vectors are derived through CBOW(Continuous Bag-Of-Words) model with the context window 5 and the vector dimension 300. It seems that utilizing the same domain text even with a lower degree of grammatical correctness, performing spelling and spacing corrections as well as sentence splitting, and incorporating morphemes of any POS tags including incomprehensible category lead to the better classification accuracy. The POS tag attachment, which is devised for the high proportion of homonyms in Korean, and the minimum frequency standard for the morpheme to be included seem not to have any definite influence on the classification accuracy.
https://doi.org/10.13088/jiis.2018.24.2.059 인용 PDF KSCI

A Study of Anomaly Detection for ICT Infrastructure using Conditional Multimodal Autoencoder (ICT 인프라 이상탐지를 위한 조건부 멀티모달 오토인코더에 관한 연구)

Shin, Byungjin;Lee, Jonghoon;Han, Sangjin;Park, Choong-Shik
- Journal of Intelligence and Information Systems
- /
- v.27 no.3
- /
- pp.57-73
- /
- 2021
Maintenance and prevention of failure through anomaly detection of ICT infrastructure is becoming important. System monitoring data is multidimensional time series data. When we deal with multidimensional time series data, we have difficulty in considering both characteristics of multidimensional data and characteristics of time series data. When dealing with multidimensional data, correlation between variables should be considered. Existing methods such as probability and linear base, distance base, etc. are degraded due to limitations called the curse of dimensions. In addition, time series data is preprocessed by applying sliding window technique and time series decomposition for self-correlation analysis. These techniques are the cause of increasing the dimension of data, so it is necessary to supplement them. The anomaly detection field is an old research field, and statistical methods and regression analysis were used in the early days. Currently, there are active studies to apply machine learning and artificial neural network technology to this field. Statistically based methods are difficult to apply when data is non-homogeneous, and do not detect local outliers well. The regression analysis method compares the predictive value and the actual value after learning the regression formula based on the parametric statistics and it detects abnormality. Anomaly detection using regression analysis has the disadvantage that the performance is lowered when the model is not solid and the noise or outliers of the data are included. There is a restriction that learning data with noise or outliers should be used. The autoencoder using artificial neural networks is learned to output as similar as possible to input data. It has many advantages compared to existing probability and linear model, cluster analysis, and map learning. It can be applied to data that does not satisfy probability distribution or linear assumption. In addition, it is possible to learn non-mapping without label data for teaching. However, there is a limitation of local outlier identification of multidimensional data in anomaly detection, and there is a problem that the dimension of data is greatly increased due to the characteristics of time series data. In this study, we propose a CMAE (Conditional Multimodal Autoencoder) that enhances the performance of anomaly detection by considering local outliers and time series characteristics. First, we applied Multimodal Autoencoder (MAE) to improve the limitations of local outlier identification of multidimensional data. Multimodals are commonly used to learn different types of inputs, such as voice and image. The different modal shares the bottleneck effect of Autoencoder and it learns correlation. In addition, CAE (Conditional Autoencoder) was used to learn the characteristics of time series data effectively without increasing the dimension of data. In general, conditional input mainly uses category variables, but in this study, time was used as a condition to learn periodicity. The CMAE model proposed in this paper was verified by comparing with the Unimodal Autoencoder (UAE) and Multi-modal Autoencoder (MAE). The restoration performance of Autoencoder for 41 variables was confirmed in the proposed model and the comparison model. The restoration performance is different by variables, and the restoration is normally well operated because the loss value is small for Memory, Disk, and Network modals in all three Autoencoder models. The process modal did not show a significant difference in all three models, and the CPU modal showed excellent performance in CMAE. ROC curve was prepared for the evaluation of anomaly detection performance in the proposed model and the comparison model, and AUC, accuracy, precision, recall, and F1-score were compared. In all indicators, the performance was shown in the order of CMAE, MAE, and AE. Especially, the reproduction rate was 0.9828 for CMAE, which can be confirmed to detect almost most of the abnormalities. The accuracy of the model was also improved and 87.12%, and the F1-score was 0.8883, which is considered to be suitable for anomaly detection. In practical aspect, the proposed model has an additional advantage in addition to performance improvement. The use of techniques such as time series decomposition and sliding windows has the disadvantage of managing unnecessary procedures; and their dimensional increase can cause a decrease in the computational speed in inference.The proposed model has characteristics that are easy to apply to practical tasks such as inference speed and model management.
https://doi.org/10.13088/jiis.2021.27.3.057 인용 PDF KSCI

Broadening the Understanding of Sixteenth-century Real Scenery Landscape Painting: Gyeongpodae Pavilion and Chongseokjeong Pavilion (16세기(十六世紀) 실경산수화(實景山水畫) 이해의 확장 : <경포대도(鏡浦臺圖)>, <총석정도(叢石亭圖)>를 중심으로)

Lee, Soomi
- MISULJARYO - National Museum of Korea Art Journal
- /
- v.96
- /
- pp.18-53
- /
- 2019
The paintings Gyeongpodae Pavilion and Chongseokjeong Pavilion were recently donated to the National Museum of Korea and unveiled to the public for the first time at the 2019 special exhibition "Through the Eyes of Joseon Painters: Real Scenery Landscapes of Korea." These two paintings carry significant implications for understanding Joseon art history. Because the fact that they were components of a folding screen produced after a sightseeing tour of the Gwandong regions in 1557 has led to a broadening of our understanding of sixteenth-century landscape painting. This paper explores the art historical meanings of Gyeongpodae Pavilion and Chongseokjeong Pavilion by examining the contents in the two paintings, dating them, analyzing their stylistic characteristics, and comparing them with other works. The production background of Gyeongpodae Pavilion and Chongseokjeong Pavilion can be found in the colophon of Chongseokjeong Pavilion. According to this writing, Sangsanilro, who is presumed to be Park Chung-gan (?-1601) in this paper, and Hong Yeon(?~?) went sightseeing around Geumgangsan Mountain (or Pungaksan Mountain) and the Gwandong region in the spring of 1557, wrote a travelogue, and after some time produced a folding screen depicting several famous scenic spots that they visited. Hong Yeon, whose courtesy name was Deokwon, passed the special civil examination in 1551 and has a record of being active until 1584. Park Chung-gan, whose pen name was Namae, reported the treason of Jeong Yeo-rip in 1589. In recognition of this meritorious deed, he was promoted to the position of Deputy Minister of the Ministry of Punishments, rewarded with the title of first-grade pyeongnan gongsin(meritorious subject who resolved difficulties), and raised to Lord of Sangsan. Based on the colophon to Chongseokjeong Pavilion, I suggest that the two paintings Gyeongpodae Pavilion and Chongseokjeong Pavilion were painted in the late sixteenth century, more specifically after 1557 when Park Chung-gan and Hong Yeon went on their sightseeing trip and after 1571 when Park, who wrote the colophon, was in his 50s or over. The painting style used in depicting the landscapes corresponds to that of the late sixteenth century. The colophon further states that Gyeongpodae Pavilion and Chongseokjeong Pavilion were two paintings of a folding screen. Chongseokjeong Pavilion with its colophon is thought to have been the final panel of this screen. The composition of Gyeongpodae Pavilion recalls the onesided three-layered composition often used in early Joseon landscape paintings in the style of An Gyeon. However, unlike such landscape paintings in the An Gyeon style, Gyeongpodae Pavilion positions and depicts the scenery in a realistic manner. Moreover, diverse perspectives, including a diagonal bird's-eye perspective and frontal perspective, are employed in Gyeongpodae Pavilion to effectively depict the relations among several natural features and the characteristics of the real scenery around Gyeongpodae Pavilion. The shapes of the mountains and the use of moss dots can be also found in Welcoming an Imperial Edict from China and Chinese Envoys at Uisungwan Lodge painted in 1557 and currently housed in the Kyujanggak Institute for Korean Studies at Seoul National University. Furthermore, the application of "cloud-head" texture strokes as well as the texture strokes with short lines and dots used in paintings in the An Gyeon style are transformed into a sense of realism. Compared to the composition of Gyeongpodae Pavilion, which recalls that of traditional Joseon early landscape painting, the composition of Chongseokjeong Pavilion is remarkably unconventional. Stone pillars lined up in layers with the tallest in the center form a triangle. A sense of space is created by dividing the painting into three planes(foreground, middle-ground, and background) and placing the stone pillars in the foreground, Saseonbong Peaks in the middle-ground, and Saseonjeong Pavilion on the cliff in the background. The Saseonbong Peaks in the center occupy an overwhelming proportion of the picture plane. However, the vertical stone pillars fail to form an organic relation and are segmented and flat. The painter of Chongseokjeong Pavilion had not yet developed a three-dimensional or natural spatial perception. The white lower and dark upper portions of the stone pillars emphasize their loftiness. The textures and cracks of the dense stone pillars were rendered by first applying light ink to the surfaces and then adding fine lines in dark ink. Here, the tip of the brush is pressed at an oblique angle and pulled down vertically, which shows an early stage of the development of axe-cut texture strokes. The contrast of black and white and use of vertical texture strokes signal the forthcoming trend toward the Zhe School painting style. Each and every contour and crack on the stone pillars is unique, which indicates an effort to accentuate their actual characteristics. The birds sitting above the stone pillars, waves, and the foam of breaking waves are all vividly described, not simply in repeated brushstrokes. The configuration of natural features shown in the above-mentioned Gyeongpodae Pavilion and Chongseokjeong Pavilion changes in other later paintings of the two scenic spots. In the Gyeongpodae Pavilion, Jukdo Island is depicted in the foreground, Gyeongpoho Lake in the middle-ground, and Gyeongpodae Pavilion and Odaesan Mountain in the background. This composition differs from the typical configuration of other Gyeongpodae Pavilion paintings from the eighteenth century that place Gyeongpodae Pavilion in the foreground and the sea in the upper section. In Chongseokjeong Pavilion, stone pillars are illustrated using a perspective viewing them from the sea, while other paintings depict them while facing upward toward the sea. These changes resulted from the established patterns of compositions used in Jeong Seon(1676~1759) and Kim Hong-do(1745~ after 1806)'s paintings of Gwandong regions. However, the configuration of the sixteenth-century Gyeongpodae Pavilion, which seemed to have no longer been used, was employed again in late Joseon folk paintings such as Gyeongpodae Pavilion in Gangneung. Famous scenic spots in the Gwandong region were painted from early on. According to historical records, they were created by several painters, including Kim Saeng(711~?) from the Goryeo Dynasty and An Gyeon(act. 15th C.) from the early Joseon period, either on a single scroll or over several panels of a folding screen or several leaves of an album. Although many records mention the production of paintings depicting sites around the Gwandong region, there are no other extant examples from this era beyond the paintings of Gyeongpodae Pavilion and Chongseokjeong Pavilion discussed in this paper. These two paintings are thought to be the earliest works depicting the Gwandong regions thus far. Moreover, they hold art historical significance in that they present information on the tradition of producing folding screens on the Gwandong region. In particular, based on the contents of the colophon written for Chongseokjeong Pavilion, the original folding screen is presumed to have consisted of eight panels. This proves that the convention of painting eight views of Gwangdong had been established by the late sixteenth century. All of the existing works mentioned as examples of sixteenth-century real scenery landscape painting show only partial elements of real scenery landscape painting since they were created as depictions of notable social gatherings or as a documentary painting for practical and/or official purposes. However, a primary objective of the paintings of Gyeongpodae Pavilion and Chongseokjeong Pavilion was to portray the ever-changing and striking nature of this real scenery. Moreover, Park Chung-gan wrote a colophon and added a poem on his admiration of the scenery he witnessed during his trip and ruminated over the true character of nature. Thus, unlike other previously known real-scenery landscape paintings, these two are of great significance as examples of real-scenery landscape paintings produced for the simple appreciation of nature. Gyeongpodae Pavilion and Chongseokjeong Pavilion are noteworthy in that they are the earliest remaining examples of the historical tradition of reflecting a sightseeing trip in painting accompanied by poetry. Furthermore, and most importantly, they broaden the understanding of Korean real-scenery landscape painting by presenting varied forms, compositions, and perspectives from sixteenth-century real-scenery landscape paintings that had formerly been unfound.
https://doi.org/10.22790/artjournal.2019.96.1853 인용 PDF

The Gradient Variation of Thermal Environments on the Park Woodland Edge in Summer - A Study of Hadongsongrim and Hamyangsangrim - (여름철 공원 수림지 가장자리의 온열환경 기울기 변화 - 하동송림과 함양상림을 대상으로 -)

Ryu, Nam-Hyong;Lee, Chun-Seok
- Journal of the Korean Institute of Landscape Architecture
- /
- v.43 no.6
- /
- pp.73-85
- /
- 2015
This study investigated the extent and magnitude of the woodland edge effects on users' thermal environments according to distance from woodland border. A series of experiments to measure air temperature, relative humidity, wind velocity, MRT and UTCI were conducted over six days between July 31 and August 5, 2015, which corresponded with extremely hot weather, at the south-facing edge of Hadongsongrim(pure Pinus densiflora stands, tree age: $100{\pm}33yr$, tree height: $12.8{\pm}2.7m$, canopy closure: 75%, N $35^{\circ}03^{\prime}34.7^{{\prime}{\prime}}$, E $127^{\circ}44^{\prime}43.3^{{\prime}{\prime}}$, elevation 7~10m) and east-facing edge of Hamyangsangrim (Quercus serrata-Carpinus tschonoskii community, tree age: 102~125yr/58~123yr, tree height: tree layer $18.6{\pm}2.3m/subtree$ layer $5.9{\pm}3.2m/shrub$ layer $0.5{\pm}0.5m$, herbaceous layer coverage ratio 60%, canopy closure: 96%, N $35^{\circ}31^{\prime}28.1^{{\prime}{\prime}}$, E $127^{\circ}43^{\prime}09.8^{{\prime}{\prime}}$, elevation 170~180m) in rural villages of Hadong and Hamyang, Korea. The minus result value of depth means woodland's outside. The depth of edge influence(DEI) on the maximum air temperature, minimum relative humidity and wind speed at maximum air temperature time during the daytime(10:00~17:00) were detected to be $12.7{\pm}4.9$, $15.8{\pm}9.8$ and $23.8{\pm}26.2m$, respectively, in the mature evergreen conifer woodland of Hadongsongrim. These were detected to be $3.7{\pm}2.2$, $4.9{\pm}4.4$ and $2.6{\pm}7.8m$, respectively, in the deciduous broadleaf woodland of Hamyansangrim. The DEI on the maximum 10 minutes average MRT, UTCI from the three-dimensional environment absorbed by the human-biometeorological reference person during the daytime(10:00~17:00) were detected to be $7.1{\pm}1.7$ and $4.3{\pm}4.6m$, respectively, in the relatively sparse woodland of Hadongsongrim. These were detected to be $5.8{\pm}4.9$ and $3.5{\pm}4.1m$, respectively, in the dense and closed woodland of Hadongsongrim. Edge effects on the thermal environments of air temperature, relative humidity, wind speed, MRT and UTCI in the sparse woodland of Hadongsongrim were less pronounced than those recorded in densed and closed woodland of Hamyansangrim. The gradient variation was less steep for maximum 10 minutes average UTCI with at least $4.3{\pm}4.6m$(Hadongsongrim) and $3.5{\pm}4.1m$(Hamyansangrim) being required to stabilize the UTCI at mature woodlands. Therefore it is suggested that the woodlands buffer widths based on the UTCI values should be 3.5~7.6 m(Hamyansangrim) and 4.3~8.9(Hadongsongrim) m on each side of mature woodlands for users' thermal comfort environments. The woodland edge structure should be multi-layered canopies and closed edge for the buffer effect of woodland edge on woodland users' thermal comfort.
https://doi.org/10.9715/KILA.2015.43.6.073 인용 PDF KSCI

Search Result 2,434, Processing Time 0.027 seconds

Sentiment Analysis of Korean Reviews Using CNN: Focusing on Morpheme Embedding (CNN을 적용한 한국어 상품평 감성분석: 형태소 임베딩을 중심으로)

A Study of Anomaly Detection for ICT Infrastructure using Conditional Multimodal Autoencoder (ICT 인프라 이상탐지를 위한 조건부 멀티모달 오토인코더에 관한 연구)

Broadening the Understanding of Sixteenth-century Real Scenery Landscape Painting: Gyeongpodae Pavilion and Chongseokjeong Pavilion (16세기(十六世紀) 실경산수화(實景山水畫) 이해의 확장 : <경포대도(鏡浦臺圖)>, <총석정도(叢石亭圖)>를 중심으로)

The Gradient Variation of Thermal Environments on the Park Woodland Edge in Summer - A Study of Hadongsongrim and Hamyangsangrim - (여름철 공원 수림지 가장자리의 온열환경 기울기 변화 - 하동송림과 함양상림을 대상으로 -)

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)