• Title/Summary/Keyword: 동사정보

Search Result 275, Processing Time 0.019 seconds

Building a Korean Sentiment Lexicon Using Collective Intelligence (집단지성을 이용한 한글 감성어 사전 구축)

  • An, Jungkook;Kim, Hee-Woong
    • Journal of Intelligence and Information Systems
    • /
    • v.21 no.2
    • /
    • pp.49-67
    • /
    • 2015
  • Recently, emerging the notion of big data and social media has led us to enter data's big bang. Social networking services are widely used by people around the world, and they have become a part of major communication tools for all ages. Over the last decade, as online social networking sites become increasingly popular, companies tend to focus on advanced social media analysis for their marketing strategies. In addition to social media analysis, companies are mainly concerned about propagating of negative opinions on social networking sites such as Facebook and Twitter, as well as e-commerce sites. The effect of online word of mouth (WOM) such as product rating, product review, and product recommendations is very influential, and negative opinions have significant impact on product sales. This trend has increased researchers' attention to a natural language processing, such as a sentiment analysis. A sentiment analysis, also refers to as an opinion mining, is a process of identifying the polarity of subjective information and has been applied to various research and practical fields. However, there are obstacles lies when Korean language (Hangul) is used in a natural language processing because it is an agglutinative language with rich morphology pose problems. Therefore, there is a lack of Korean natural language processing resources such as a sentiment lexicon, and this has resulted in significant limitations for researchers and practitioners who are considering sentiment analysis. Our study builds a Korean sentiment lexicon with collective intelligence, and provides API (Application Programming Interface) service to open and share a sentiment lexicon data with the public (www.openhangul.com). For the pre-processing, we have created a Korean lexicon database with over 517,178 words and classified them into sentiment and non-sentiment words. In order to classify them, we first identified stop words which often quite likely to play a negative role in sentiment analysis and excluded them from our sentiment scoring. In general, sentiment words are nouns, adjectives, verbs, adverbs as they have sentimental expressions such as positive, neutral, and negative. On the other hands, non-sentiment words are interjection, determiner, numeral, postposition, etc. as they generally have no sentimental expressions. To build a reliable sentiment lexicon, we have adopted a concept of collective intelligence as a model for crowdsourcing. In addition, a concept of folksonomy has been implemented in the process of taxonomy to help collective intelligence. In order to make up for an inherent weakness of folksonomy, we have adopted a majority rule by building a voting system. Participants, as voters were offered three voting options to choose from positivity, negativity, and neutrality, and the voting have been conducted on one of the largest social networking sites for college students in Korea. More than 35,000 votes have been made by college students in Korea, and we keep this voting system open by maintaining the project as a perpetual study. Besides, any change in the sentiment score of words can be an important observation because it enables us to keep track of temporal changes in Korean language as a natural language. Lastly, our study offers a RESTful, JSON based API service through a web platform to make easier support for users such as researchers, companies, and developers. Finally, our study makes important contributions to both research and practice. In terms of research, our Korean sentiment lexicon plays an important role as a resource for Korean natural language processing. In terms of practice, practitioners such as managers and marketers can implement sentiment analysis effectively by using Korean sentiment lexicon we built. Moreover, our study sheds new light on the value of folksonomy by combining collective intelligence, and we also expect to give a new direction and a new start to the development of Korean natural language processing.

Sentiment analysis on movie review through building modified sentiment dictionary by movie genre (영역별 맞춤형 감성사전 구축을 통한 영화리뷰 감성분석)

  • Lee, Sang Hoon;Cui, Jing;Kim, Jong Woo
    • Journal of Intelligence and Information Systems
    • /
    • v.22 no.2
    • /
    • pp.97-113
    • /
    • 2016
  • Due to the growth of internet data and the rapid development of internet technology, "big data" analysis is actively conducted to analyze enormous data for various purposes. Especially in recent years, a number of studies have been performed on the applications of text mining techniques in order to overcome the limitations of existing structured data analysis. Various studies on sentiment analysis, the part of text mining techniques, are actively studied to score opinions based on the distribution of polarity of words in documents. Usually, the sentiment analysis uses sentiment dictionary contains positivity and negativity of vocabularies. As a part of such studies, this study tries to construct sentiment dictionary which is customized to specific data domain. Using a common sentiment dictionary for sentiment analysis without considering data domain characteristic cannot reflect contextual expression only used in the specific data domain. So, we can expect using a modified sentiment dictionary customized to data domain can lead the improvement of sentiment analysis efficiency. Therefore, this study aims to suggest a way to construct customized dictionary to reflect characteristics of data domain. Especially, in this study, movie review data are divided by genre and construct genre-customized dictionaries. The performance of customized dictionary in sentiment analysis is compared with a common sentiment dictionary. In this study, IMDb data are chosen as the subject of analysis, and movie reviews are categorized by genre. Six genres in IMDb, 'action', 'animation', 'comedy', 'drama', 'horror', and 'sci-fi' are selected. Five highest ranking movies and five lowest ranking movies per genre are selected as training data set and two years' movie data from 2012 September 2012 to June 2014 are collected as test data set. Using SO-PMI (Semantic Orientation from Point-wise Mutual Information) technique, we build customized sentiment dictionary per genre and compare prediction accuracy on review rating. As a result of the analysis, the prediction using customized dictionaries improves prediction accuracy. The performance improvement is 2.82% in overall and is statistical significant. Especially, the customized dictionary on 'sci-fi' leads the highest accuracy improvement among six genres. Even though this study shows the usefulness of customized dictionaries in sentiment analysis, further studies are required to generalize the results. In this study, we only consider adjectives as additional terms in customized sentiment dictionary. Other part of text such as verb and adverb can be considered to improve sentiment analysis performance. Also, we need to apply customized sentiment dictionary to other domain such as product reviews.

Occurrence and distribution of ALS inhibiting herbicide-resistant weeds in the paddy field of Gyeongnam province (경남지역 ALS 저해 제초제 저항성 논잡초의 발생 및 분포)

  • Lee, Yong Hyun;Shim, Soo Yong;Kim, Jin-Won;Lee, Jeongran;Park, Kee Woong;Lee, Jeung Joo
    • Weed & Turfgrass Science
    • /
    • v.7 no.3
    • /
    • pp.209-218
    • /
    • 2018
  • This study was carried out to investigate the occurrence and distribution of ALS inhibiting herbicide-resistant weeds and to estimate the appeared areas of resistant weeds in the paddy fields of Gyeongnam province of Korea in 2017 and 2018 using a soil assay method. Compared with the 2012 survey, this study showed that the infested ratio of ALS inhibiting herbicide-resistant weeds increased from 1.0% to 66.8% and the infested area increased from 876 ha to 49,008 ha. The infested area of ALS inhibiting herbicide-resistant weeds was estimated in Ulsan-si (8.4%), Hapcheon-gun (8.3%), Haman-gun (7.9%), Goseong-gun (7.9%), Hadong-gun (7.3%), Jinju-si (7.2%), Changnyeong-gun (7.0%), Gimhae-si (6.4%), Miryang-si (5.5%), Busan-si (4.9%), Uiryeong-gun (4.6%), Namhae-gun (4.3%), Geochang-gun (4.2%), Changwon-si (3.8%), Geoje-si (2.9%), Yangsan-si (1.8%), Sancheong-gun (0.9%) and Tongyeong-si (0.4%), and the herbicide resistant weeds was not occurred in Hamyang-gun. The most dominant ALS inhibiting herbicide-resistant weeds in paddy fields were Monochoria vaginalis, followed by Echinochloa oryzicola, Lindernia dubia, Scirpus juncoides, Ludwigia prostrata, Cyperus difformis, Sagittaria trifolia and Rotala indica. ALS inhibiting herbicide-resistant M. vaginalis, L. dubia, and E. oryzoides occurred throughout Gyeongnam province, and ALS inhibiting herbicide-resistant S. trifolia and R. indica were only found in Gimhae-si. Therefore, these results will be utilized to estimate population dynamics of ALS inhibiting herbicide-resistant weeds and provide proper management practices in the paddy fields of Gyeongnam province.

The Difference between the Interpretations of Korean Language Experts and Science Education Experts on the Cognitive Domain of Science Achievement Standards: Focus on 'Explain' (과학과 교육과정 성취기준의 인지적 영역에 대한 국어교육전공자와 과학교육전공자의 해석 차이:설명하기를 중심으로)

  • Song, Eunjeong;Je, Minkyeong;Cha, Kyungmi;Yoo, Junehee
    • Journal of The Korean Association For Science Education
    • /
    • v.37 no.2
    • /
    • pp.371-382
    • /
    • 2017
  • The texts in the national science curriculum documents are expected to be interpreted in the same meaning as that of the authors. In this study, the science achievement standards in national curriculum documents were examined through an analysis of the differences between the interpretations of Korean language education experts and science education experts. Three Korean language education experts designed and utilized an analysis framework on science curriculum standards from their viewpoints while three science education experts utilized TIMSS cognitive domain framework to analyze the 2009 Korean revised science curriculum achievement standards. The differences between interpretations of both groups were analyzed qualitatively through interviews. First of all, the two groups seemed to have different meanings for terms such as "explain," "analyze," "define," and "cause and effect." The science achievement standards described by general verbs like "explain" were interpreted in various ways. The verb "explain" that appears many times in the science achievement standards seem to be representing the "describe" subsections in the framework of Korean language education expert rather than the "explain" subsections of the framework of science education experts. Science education experts seemed to focus on prepositional phrases, which indicate inquiry process, while Korean language education experts seemed to focus on objective phrases. Moreover, the science education experts would interpret the achievement standards based on their background knowledge while the Korean language education experts would interpret them based on the structure of the sentences. This study suggests that achievement standards should specifically indicate the levels and scopes of cognitive domain as well as the knowledge domain. Also, integrations of achievement standards in cognitive domains of Korean language and science subjects should be considered.

KNU Korean Sentiment Lexicon: Bi-LSTM-based Method for Building a Korean Sentiment Lexicon (Bi-LSTM 기반의 한국어 감성사전 구축 방안)

  • Park, Sang-Min;Na, Chul-Won;Choi, Min-Seong;Lee, Da-Hee;On, Byung-Won
    • Journal of Intelligence and Information Systems
    • /
    • v.24 no.4
    • /
    • pp.219-240
    • /
    • 2018
  • Sentiment analysis, which is one of the text mining techniques, is a method for extracting subjective content embedded in text documents. Recently, the sentiment analysis methods have been widely used in many fields. As good examples, data-driven surveys are based on analyzing the subjectivity of text data posted by users and market researches are conducted by analyzing users' review posts to quantify users' reputation on a target product. The basic method of sentiment analysis is to use sentiment dictionary (or lexicon), a list of sentiment vocabularies with positive, neutral, or negative semantics. In general, the meaning of many sentiment words is likely to be different across domains. For example, a sentiment word, 'sad' indicates negative meaning in many fields but a movie. In order to perform accurate sentiment analysis, we need to build the sentiment dictionary for a given domain. However, such a method of building the sentiment lexicon is time-consuming and various sentiment vocabularies are not included without the use of general-purpose sentiment lexicon. In order to address this problem, several studies have been carried out to construct the sentiment lexicon suitable for a specific domain based on 'OPEN HANGUL' and 'SentiWordNet', which are general-purpose sentiment lexicons. However, OPEN HANGUL is no longer being serviced and SentiWordNet does not work well because of language difference in the process of converting Korean word into English word. There are restrictions on the use of such general-purpose sentiment lexicons as seed data for building the sentiment lexicon for a specific domain. In this article, we construct 'KNU Korean Sentiment Lexicon (KNU-KSL)', a new general-purpose Korean sentiment dictionary that is more advanced than existing general-purpose lexicons. The proposed dictionary, which is a list of domain-independent sentiment words such as 'thank you', 'worthy', and 'impressed', is built to quickly construct the sentiment dictionary for a target domain. Especially, it constructs sentiment vocabularies by analyzing the glosses contained in Standard Korean Language Dictionary (SKLD) by the following procedures: First, we propose a sentiment classification model based on Bidirectional Long Short-Term Memory (Bi-LSTM). Second, the proposed deep learning model automatically classifies each of glosses to either positive or negative meaning. Third, positive words and phrases are extracted from the glosses classified as positive meaning, while negative words and phrases are extracted from the glosses classified as negative meaning. Our experimental results show that the average accuracy of the proposed sentiment classification model is up to 89.45%. In addition, the sentiment dictionary is more extended using various external sources including SentiWordNet, SenticNet, Emotional Verbs, and Sentiment Lexicon 0603. Furthermore, we add sentiment information about frequently used coined words and emoticons that are used mainly on the Web. The KNU-KSL contains a total of 14,843 sentiment vocabularies, each of which is one of 1-grams, 2-grams, phrases, and sentence patterns. Unlike existing sentiment dictionaries, it is composed of words that are not affected by particular domains. The recent trend on sentiment analysis is to use deep learning technique without sentiment dictionaries. The importance of developing sentiment dictionaries is declined gradually. However, one of recent studies shows that the words in the sentiment dictionary can be used as features of deep learning models, resulting in the sentiment analysis performed with higher accuracy (Teng, Z., 2016). This result indicates that the sentiment dictionary is used not only for sentiment analysis but also as features of deep learning models for improving accuracy. The proposed dictionary can be used as a basic data for constructing the sentiment lexicon of a particular domain and as features of deep learning models. It is also useful to automatically and quickly build large training sets for deep learning models.