Search | Korea Science

Sentiment Analysis of Korean Reviews Using CNN: Focusing on Morpheme Embedding (CNN을 적용한 한국어 상품평 감성분석: 형태소 임베딩을 중심으로)

Park, Hyun-jung;Song, Min-chae;Shin, Kyung-shik
- Journal of Intelligence and Information Systems
- /
- v.24 no.2
- /
- pp.59-83
- /
- 2018
With the increasing importance of sentiment analysis to grasp the needs of customers and the public, various types of deep learning models have been actively applied to English texts. In the sentiment analysis of English texts by deep learning, natural language sentences included in training and test datasets are usually converted into sequences of word vectors before being entered into the deep learning models. In this case, word vectors generally refer to vector representations of words obtained through splitting a sentence by space characters. There are several ways to derive word vectors, one of which is Word2Vec used for producing the 300 dimensional Google word vectors from about 100 billion words of Google News data. They have been widely used in the studies of sentiment analysis of reviews from various fields such as restaurants, movies, laptops, cameras, etc. Unlike English, morpheme plays an essential role in sentiment analysis and sentence structure analysis in Korean, which is a typical agglutinative language with developed postpositions and endings. A morpheme can be defined as the smallest meaningful unit of a language, and a word consists of one or more morphemes. For example, for a word '예쁘고', the morphemes are '예쁘(= adjective)' and '고(=connective ending)'. Reflecting the significance of Korean morphemes, it seems reasonable to adopt the morphemes as a basic unit in Korean sentiment analysis. Therefore, in this study, we use 'morpheme vector' as an input to a deep learning model rather than 'word vector' which is mainly used in English text. The morpheme vector refers to a vector representation for the morpheme and can be derived by applying an existent word vector derivation mechanism to the sentences divided into constituent morphemes. By the way, here come some questions as follows. What is the desirable range of POS(Part-Of-Speech) tags when deriving morpheme vectors for improving the classification accuracy of a deep learning model? Is it proper to apply a typical word vector model which primarily relies on the form of words to Korean with a high homonym ratio? Will the text preprocessing such as correcting spelling or spacing errors affect the classification accuracy, especially when drawing morpheme vectors from Korean product reviews with a lot of grammatical mistakes and variations? We seek to find empirical answers to these fundamental issues, which may be encountered first when applying various deep learning models to Korean texts. As a starting point, we summarized these issues as three central research questions as follows. First, which is better effective, to use morpheme vectors from grammatically correct texts of other domain than the analysis target, or to use morpheme vectors from considerably ungrammatical texts of the same domain, as the initial input of a deep learning model? Second, what is an appropriate morpheme vector derivation method for Korean regarding the range of POS tags, homonym, text preprocessing, minimum frequency? Third, can we get a satisfactory level of classification accuracy when applying deep learning to Korean sentiment analysis? As an approach to these research questions, we generate various types of morpheme vectors reflecting the research questions and then compare the classification accuracy through a non-static CNN(Convolutional Neural Network) model taking in the morpheme vectors. As for training and test datasets, Naver Shopping's 17,260 cosmetics product reviews are used. To derive morpheme vectors, we use data from the same domain as the target one and data from other domain; Naver shopping's about 2 million cosmetics product reviews and 520,000 Naver News data arguably corresponding to Google's News data. The six primary sets of morpheme vectors constructed in this study differ in terms of the following three criteria. First, they come from two types of data source; Naver news of high grammatical correctness and Naver shopping's cosmetics product reviews of low grammatical correctness. Second, they are distinguished in the degree of data preprocessing, namely, only splitting sentences or up to additional spelling and spacing corrections after sentence separation. Third, they vary concerning the form of input fed into a word vector model; whether the morphemes themselves are entered into a word vector model or with their POS tags attached. The morpheme vectors further vary depending on the consideration range of POS tags, the minimum frequency of morphemes included, and the random initialization range. All morpheme vectors are derived through CBOW(Continuous Bag-Of-Words) model with the context window 5 and the vector dimension 300. It seems that utilizing the same domain text even with a lower degree of grammatical correctness, performing spelling and spacing corrections as well as sentence splitting, and incorporating morphemes of any POS tags including incomprehensible category lead to the better classification accuracy. The POS tag attachment, which is devised for the high proportion of homonyms in Korean, and the minimum frequency standard for the morpheme to be included seem not to have any definite influence on the classification accuracy.
https://doi.org/10.13088/jiis.2018.24.2.059 인용 PDF KSCI

The Relationship Between DEA Model-based Eco-Efficiency and Economic Performance (DEA 모형 기반의 에코효율성과 경제적 성과의 연관성)

Kim, Myoung-Jong
- Journal of Environmental Policy
- /
- v.13 no.4
- /
- pp.3-49
- /
- 2014
Growing interest of stakeholders on corporate responsibilities for environment and tightening environmental regulations are highlighting the importance of environmental management more than ever. However, companies' awareness of the importance of environment is still falling behind, and related academic works have not shown consistent conclusions on the relationship between environmental performance and economic performance. One of the reasons is different ways of measuring these two performances. The evaluation scope of economic performance is relatively narrow and the performance can be measured by a unified unit such as price, while the scope of environmental performance is diverse and a wide range of units are used for measuring environmental performances instead of using a single unified unit. Therefore, the results of works can be different depending on the performance indicators selected. In order to resolve this problem, generalized and standardized performance indicators should be developed. In particular, the performance indicators should be able to cover the concepts of both environmental and economic performances because the recent idea of environmental management has expanded to encompass the concept of sustainability. Another reason is that most of the current researches tend to focus on the motive of environmental investments and environmental performance, and do not offer a guideline for an effective implementation strategy for environmental management. For example, a process improvement strategy or a market discrimination strategy can be deployed through comparing the environment competitiveness among the companies in the same or similar industries, so that a virtuous cyclical relationship between environmental and economic performances can be secured. A novel method for measuring eco-efficiency by utilizing Data Envelopment Analysis (DEA), which is able to combine multiple environmental and economic performances, is proposed in this report. Based on the eco-efficiencies, the environmental competitiveness is analyzed and the optimal combination of inputs and outputs are recommended for improving the eco-efficiencies of inefficient firms. Furthermore, the panel analysis is applied to the causal relationship between eco-efficiency and economic performance, and the pooled regression model is used to investigate the relationship between eco-efficiency and economic performance. The four-year eco-efficiencies between 2010 and 2013 of 23 companies are obtained from the DEA analysis; a comparison of efficiencies among 23 companies is carried out in terms of technical efficiency(TE), pure technical efficiency(PTE) and scale efficiency(SE), and then a set of recommendations for optimal combination of inputs and outputs are suggested for the inefficient companies. Furthermore, the experimental results with the panel analysis have demonstrated the causality from eco-efficiency to economic performance. The results of the pooled regression have shown that eco-efficiency positively affect financial perform ances(ROA and ROS) of the companies, as well as firm values(Tobin Q, stock price, and stock returns). This report proposes a novel approach for generating standardized performance indicators obtained from multiple environmental and economic performances, so that it is able to enhance the generality of relevant researches and provide a deep insight into the sustainability of environmental management. Furthermore, using efficiency indicators obtained from the DEA model, the cause of change in eco-efficiency can be investigated and an effective strategy for environmental management can be suggested. Finally, this report can be a motive for environmental management by providing empirical evidence that environmental investments can improve economic performance.
PDF

Self-optimizing feature selection algorithm for enhancing campaign effectiveness (캠페인 효과 제고를 위한 자기 최적화 변수 선택 알고리즘)

Seo, Jeoung-soo;Ahn, Hyunchul
- Journal of Intelligence and Information Systems
- /
- v.26 no.4
- /
- pp.173-198
- /
- 2020
For a long time, many studies have been conducted on predicting the success of campaigns for customers in academia, and prediction models applying various techniques are still being studied. Recently, as campaign channels have been expanded in various ways due to the rapid revitalization of online, various types of campaigns are being carried out by companies at a level that cannot be compared to the past. However, customers tend to perceive it as spam as the fatigue of campaigns due to duplicate exposure increases. Also, from a corporate standpoint, there is a problem that the effectiveness of the campaign itself is decreasing, such as increasing the cost of investing in the campaign, which leads to the low actual campaign success rate. Accordingly, various studies are ongoing to improve the effectiveness of the campaign in practice. This campaign system has the ultimate purpose to increase the success rate of various campaigns by collecting and analyzing various data related to customers and using them for campaigns. In particular, recent attempts to make various predictions related to the response of campaigns using machine learning have been made. It is very important to select appropriate features due to the various features of campaign data. If all of the input data are used in the process of classifying a large amount of data, it takes a lot of learning time as the classification class expands, so the minimum input data set must be extracted and used from the entire data. In addition, when a trained model is generated by using too many features, prediction accuracy may be degraded due to overfitting or correlation between features. Therefore, in order to improve accuracy, a feature selection technique that removes features close to noise should be applied, and feature selection is a necessary process in order to analyze a high-dimensional data set. Among the greedy algorithms, SFS (Sequential Forward Selection), SBS (Sequential Backward Selection), SFFS (Sequential Floating Forward Selection), etc. are widely used as traditional feature selection techniques. It is also true that if there are many risks and many features, there is a limitation in that the performance for classification prediction is poor and it takes a lot of learning time. Therefore, in this study, we propose an improved feature selection algorithm to enhance the effectiveness of the existing campaign. The purpose of this study is to improve the existing SFFS sequential method in the process of searching for feature subsets that are the basis for improving machine learning model performance using statistical characteristics of the data to be processed in the campaign system. Through this, features that have a lot of influence on performance are first derived, features that have a negative effect are removed, and then the sequential method is applied to increase the efficiency for search performance and to apply an improved algorithm to enable generalized prediction. Through this, it was confirmed that the proposed model showed better search and prediction performance than the traditional greed algorithm. Compared with the original data set, greed algorithm, genetic algorithm (GA), and recursive feature elimination (RFE), the campaign success prediction was higher. In addition, when performing campaign success prediction, the improved feature selection algorithm was found to be helpful in analyzing and interpreting the prediction results by providing the importance of the derived features. This is important features such as age, customer rating, and sales, which were previously known statistically. Unlike the previous campaign planners, features such as the combined product name, average 3-month data consumption rate, and the last 3-month wireless data usage were unexpectedly selected as important features for the campaign response, which they rarely used to select campaign targets. It was confirmed that base attributes can also be very important features depending on the type of campaign. Through this, it is possible to analyze and understand the important characteristics of each campaign type.
https://doi.org/10.13088/jiis.2020.26.4.173 인용 PDF KSCI

Assessment Study on Educational Programs for the Gifted Students in Mathematics (영재학급에서의 수학영재프로그램 평가에 관한 연구)

Kim, Jung-Hyun;Whang, Woo-Hyung
- Communications of Mathematical Education
- /
- v.24 no.1
- /
- pp.235-257
- /
- 2010
Contemporary belief is that the creative talented can create new knowledge and lead national development, so lots of countries in the world have interest in Gifted Education. As we well know, U.S.A., England, Russia, Germany, Australia, Israel, and Singapore enforce related laws in Gifted Education to offer Gifted Classes, and our government has also created an Improvement Act in January, 2000 and Enforcement Ordinance for Gifted Improvement Act was also announced in April, 2002. Through this initiation Gifted Education can be possible. Enforcement Ordinance was revised in October, 2008. The main purpose of this revision was to expand the opportunity of Gifted Education to students with special education needs. One of these programs is, the opportunity of Gifted Education to be offered to lots of the Gifted by establishing Special Classes at each school. Also, it is important that the quality of Gifted Education should be combined with the expansion of opportunity for the Gifted. Social opinion is that it will be reckless only to expand the opportunity for the Gifted Education, therefore, assessment on the Teaching and Learning Program for the Gifted is indispensible. In this study, 3 middle schools were selected for the Teaching and Learning Programs in mathematics. Each 1st Grade was reviewed and analyzed through comparative tables between Regular and Gifted Education Programs. Also reviewed was the content of what should be taught, and programs were evaluated on assessment standards which were revised and modified from the present teaching and learning programs in mathematics. Below, research issues were set up to assess the formation of content areas and appropriateness for Teaching and Learning Programs for the Gifted in mathematics. A. Is the formation of special class content areas complying with the 7th national curriculum? 1. Which content areas of regular curriculum is applied in this program? 2. Among Enrichment and Selection in Curriculum for the Gifted, which one is applied in this programs? 3. Are the content areas organized and performed properly? B. Are the Programs for the Gifted appropriate? 1. Are the Educational goals of the Programs aligned with that of Gifted Education in mathematics? 2. Does the content of each program reflect characteristics of mathematical Gifted students and express their mathematical talents? 3. Are Teaching and Learning models and methods diverse enough to express their talents? 4. Can the assessment on each program reflect the Learning goals and content, and enhance Gifted students' thinking ability? The conclusions are as follows: First, the best contents to be taught to the mathematical Gifted were found to be the Numeration, Arithmetic, Geometry, Measurement, Probability, Statistics, Letter and Expression. Also, Enrichment area and Selection area within the curriculum for the Gifted were offered in many ways so that their Giftedness could be fully enhanced. Second, the educational goals of Teaching and Learning Programs for the mathematical Gifted students were in accordance with the directions of mathematical education and philosophy. Also, it reflected that their research ability was successful in reaching the educational goals of improving creativity, thinking ability, problem-solving ability, all of which are required in the set curriculum. In order to accomplish the goals, visualization, symbolization, phasing and exploring strategies were used effectively. Many different of lecturing types, cooperative learning, discovery learning were applied to accomplish the Teaching and Learning model goals. For Teaching and Learning activities, various strategies and models were used to express the students' talents. These activities included experiments, exploration, application, estimation, guess, discussion (conjecture and refutation) reconsideration and so on. There were no mention to the students about evaluation and paper exams. While the program activities were being performed, educational goals and assessment methods were reflected, that is, products, performance assessment, and portfolio were mainly used rather than just paper assessment.
PDF KSCI

Studies on the Properties of Populus Grown in Korea (포플러재(材)의 재질(材質)에 관(關)한 시험(試驗))

Jo, Jae-Myeong;Kang, Sun-Goo;Lee, Yong-Dae;Jung, Hee-Suk;Ahn, Jung-Mo;Shim, Chong-Supp
- Journal of the Korean Wood Science and Technology
- /
- v.10 no.3
- /
- pp.68-87
- /
- 1982
In Korea, this is the situation at moment that the total demand of timber in 1972 is more than 5 million cubic meters. On the other hand, however, the available domestic supply of timber at the same year is only about, 1 million cubic meters. A great unbalancing between demand and supply of timber has been prevailing. To solve this hard problem, it has been necessitiated to build up the forest stocks as early as possible with fast grown species such as poplar. Under circumstances, poplar plantations which have been carryed on government and private have reached to large area of 116,603 hectors from 1962 up to date. It has now be come a principal timber resources in this country, and required the basic study on various properties of wood for it's proper utilization, since it has not been made of any systematic study on the properties of Populus grown in Korea. In order to investigate the properties such as anatomical, physical and mechanical properties of nine different species (P. euramericana Guiner I-214. P. euramericana Guiner I-476, P. deltoides Marsh, P. nigra var. italica (Muchk) Koeme, P. alba L.,P. alba $\times$ glandulosa P. maximowiczii Henry, P. koreana Rehder, P. davidiana Dode) of poplar for their proper use and development of new ways of grading processing and quality improving, this study has been made by the Forest Research Institute.
PDF

Analysis of Research Trends in Journal of Distribution Science (유통과학연구의 연구 동향 분석 : 창간호부터 제8권 제3호까지를 중심으로)

Kim, Young-Min;Kim, Young-Ei;Youn, Myoung-Kil
- Journal of Distribution Science
- /
- v.8 no.4
- /
- pp.5-15
- /
- 2010
This study investigated research trends of JDS that KODISA published and gave implications to elevate quality of scholarly journals. In other words, the study classified scientific system of distribution area to investigate research trends and to compare it with other scholarly journals of distribution and to give implications for higher level of JDS. KODISA published JDS Vol.1 No.1 for the first time in 1999 followed by Vol.8 No.3 in September 2010 to show 109 theses in total. KODISA investigated subjects, research institutions, number of participants, methodology, frequency of theses in both the Korean language and English, frequency of participation of not only the Koreans but also foreigners and use of references, etc. And, the study investigated JDR of KODIA, JKDM(The Journal of Korean Distribution & Management) and JDA that researched distribution, so that it found out development ways. To investigate research trends of JDS that KODISA publishes, main category was made based on the national science and technology standard classification system of MEST (Ministry Of Education, Science And Technology), table of classification of research areas of NRF(National Research Foundation of Korea), research classification system of both KOREADIMA and KLRA(Korea Logistics Research Association) and distribution science and others that KODISA is looking for, and distribution economy area was divided into general distribution, distribution economy, distribution, distribution information and others, and distribution management was divided into distribution management, marketing, MD and purchasing, consumer behavior and others. The findings were as follow: Firstly, main category occupied 47 theses (43.1%) of distribution economy and 62 theses (56.9%) of distribution management among 109 theses in total. Active research area of distribution economy consisted of 14 theses (12.8%) of distribution information and 9 theses (8.3%) of distribution economy to research distribution as well as distribution information positively every year. The distribution management consisted of 25 theses (22.9%) of distribution management and 20 theses (18.3%) of marketing, These days, research on distribution management, marketing, distribution, distribution information and others is increasing. Secondly, researchers published theses as follow: 55 theses (50.5%) by professor by himself or herself, 12 theses (11.0%) of joint research by professors and businesses, Professors/students published 9 theses (8.3%) followed by 5 theses (4.6%) of researchers, 5 theses (4.6%) of businesses, 4 theses (3.7%) of professors, researchers and businesses and 2 theses (1.8%) of students. Professors published theses less, while businesses, research institutions and graduate school students did more continuously. The number of researchers occupied single researcher (43 theses, 39.5%), two researchers (42 theses, 38.5%) and three researchers or more (24 theses, 22.0%). Thirdly, professors published theses the most at most of areas. Researchers of main category of distribution economy consisted of professors (25 theses, 53.2%), professors and businesses (7 theses, 14.9%), professors and businesses (7 theses, 14.9%), professors and researchers (6 theses, 12.8%) and professors and students (3 theses, 6.3%). And, researchers of main category of distribution management consisted of professors (30 theses, 48.4%), professors and businesses (10 theses, 16.1%), and professors and researchers as well as professors and students (6 theses, 9.7%). Researchers of distribution management consisted of professors, professors and businesses, professors and researchers, researchers and businesses, etc to have various types. Professors mainly researched marketing, MD and purchasing, and consumer behavior, etc to demand active participation of businesses and researchers. Fourthly, research methodology was: Literature research occupied 45 theses (41.3%) the most followed by empirical research based on questionnaire survey (44 theses, 40.4%). General distribution, distribution economy, distribution and distribution management, etc mostly adopted literature research, while marketing did empirical research based on questionnaire survey the most. Fifthly, theses in the Korean language occupied 92.7% (101 theses), while those in English did 7.3% (8 theses). No more than one thesis in English was published until 2006, and 7 theses (11.9%) were published after 2007 to increase. The theses in English were published more to be affirmative. Foreigner researcher published one thesis (0.9%) and both Korean researchers and foreigner researchers jointly published two theses (1.8%) to have very much low participation of foreigner researchers. Sixthly, one thesis of JDS had 27.5 references in average that consisted of 11.1 local references and 16.4 foreign references. And, cited times was 0.4 thesis in average to be low. The distribution economy cited 24.2 references in average (9.4 local references and 14.8 foreign references and JDS had 0.6 cited reference. The distribution management had 30.0 references in average (12.1 local references and 17.9 foreign references) and had 0.3 reference of JDS itself. Seventhly, similar type of scholarly journal had theses in the Korean language and English: JDR( Journal of Distribution Research) of KODIA(Korea Distribution Association) published 92 theses in the Korean language (96.8%) and 3 theses in English (3.2%), that is to say, 95 theses in total. JKDM of KOREADIMA published 132 theses in total that consisted of 93 theses in the Korean language (70.5%) and 39 theses in English (29.5%). Since 2008, JKDM has published scholarly journal in English one time every year. JDS published 52 theses in the Korean language (88.1%) and 7 theses in English (11.9%), that is to say, 59 theses in total. Sixthly, similar type of scholarly journals and research methodology were: JDR's research methodology had 65 empirical researches based on questionnaire survey (68.4%), followed by 17 literature researches (17.9%) and 11 quantitative analyses (11.6%). JKDM made use of various kinds of research methodologies to have 60 questionnaire surveys (45.5%), followed by 40 literature researches (30.3%), 21 quantitative analyses (15.9%), 6 system analyses (4.5%) and 5 case studies (3.8%). And, JDS made use of 30 questionnaire surveys (50.8%), followed by 15 literature researches (25.4%), 7 case studies (11.9%) and 6 quantitative analyses (10.2%). Ninthly, similar types of scholarly journals and Korean researchers and foreigner researchers were: JDR published 93 theses (97.8%) by Korean researchers except for 1 thesis by foreigner researcher and 1 thesis by joint research of the Korean researchers and foreigner researchers. And, JKDM had no foreigner research and 13 theses (9.8%) by joint research of the Korean researchers and foreigner researchers to have more foreigner researchers as well as researchers in foreign countries than similar types of scholarly journals had. And, JDS published 56 theses (94.9%) of the Korean researchers, one thesis (1.7%) of foreigner researcher only, and 2 theses (3.4%) of joint research of both the Koreans and foreigners. Tenthly, similar type of scholarly journals and reference had citation: JDR had 42.5 literatures in average that consisted of 10.9 local literatures (25.7%) and 31.6 foreign literatures (74.3%), and cited times accounted for 1.1 thesis to decrease. JKDM cited 10.5 Korean literatures (36.3%) and 18.4 foreign literatures (63.7%), and number of self-cited literature was no more than 1.1. Number of cited times accounted for 2.9 literatures in 2008 and then decreased continuously since then. JDS cited 26,8 references in average that consisted of 10.9 local references (40.7%) and 15.9 foreign references (59.3%), and number of self-cited accounted for 0.2 reference until 2009, and it increased to be 2.1 references in 2010. The author gives implications based on JDS research trends and investigation on similar type of scholarly journals as follow: Firstly, JDS shall actively invite foreign contributors to prepare for SSCI. Secondly, ratio of theses in English shall increase greatly. Thirdly, various kinds of research methodology shall be accepted to elevate quality of scholarly journals. Fourthly, to increase cited times, Google and other web retrievals shall be reinforced to supply scholarly journals to foreign countries more. Local scholarly journals can be worldwide scholarly journal enough to be acknowledged even in foreign countries by improving the implications above.
PDF

Search Result 826, Processing Time 0.025 seconds

Sentiment Analysis of Korean Reviews Using CNN: Focusing on Morpheme Embedding (CNN을 적용한 한국어 상품평 감성분석: 형태소 임베딩을 중심으로)

The Relationship Between DEA Model-based Eco-Efficiency and Economic Performance (DEA 모형 기반의 에코효율성과 경제적 성과의 연관성)

Self-optimizing feature selection algorithm for enhancing campaign effectiveness (캠페인 효과 제고를 위한 자기 최적화 변수 선택 알고리즘)

Assessment Study on Educational Programs for the Gifted Students in Mathematics (영재학급에서의 수학영재프로그램 평가에 관한 연구)

Studies on the Properties of Populus Grown in Korea (포플러재(材)의 재질(材質)에 관(關)한 시험(試驗))

Analysis of Research Trends in Journal of Distribution Science (유통과학연구의 연구 동향 분석 : 창간호부터 제8권 제3호까지를 중심으로)

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)