• Title/Summary/Keyword: Tag-in

Search Result 2,585, Processing Time 0.029 seconds

Sentiment Analysis of Korean Reviews Using CNN: Focusing on Morpheme Embedding (CNN을 적용한 한국어 상품평 감성분석: 형태소 임베딩을 중심으로)

  • Park, Hyun-jung;Song, Min-chae;Shin, Kyung-shik
    • Journal of Intelligence and Information Systems
    • /
    • v.24 no.2
    • /
    • pp.59-83
    • /
    • 2018
  • With the increasing importance of sentiment analysis to grasp the needs of customers and the public, various types of deep learning models have been actively applied to English texts. In the sentiment analysis of English texts by deep learning, natural language sentences included in training and test datasets are usually converted into sequences of word vectors before being entered into the deep learning models. In this case, word vectors generally refer to vector representations of words obtained through splitting a sentence by space characters. There are several ways to derive word vectors, one of which is Word2Vec used for producing the 300 dimensional Google word vectors from about 100 billion words of Google News data. They have been widely used in the studies of sentiment analysis of reviews from various fields such as restaurants, movies, laptops, cameras, etc. Unlike English, morpheme plays an essential role in sentiment analysis and sentence structure analysis in Korean, which is a typical agglutinative language with developed postpositions and endings. A morpheme can be defined as the smallest meaningful unit of a language, and a word consists of one or more morphemes. For example, for a word '예쁘고', the morphemes are '예쁘(= adjective)' and '고(=connective ending)'. Reflecting the significance of Korean morphemes, it seems reasonable to adopt the morphemes as a basic unit in Korean sentiment analysis. Therefore, in this study, we use 'morpheme vector' as an input to a deep learning model rather than 'word vector' which is mainly used in English text. The morpheme vector refers to a vector representation for the morpheme and can be derived by applying an existent word vector derivation mechanism to the sentences divided into constituent morphemes. By the way, here come some questions as follows. What is the desirable range of POS(Part-Of-Speech) tags when deriving morpheme vectors for improving the classification accuracy of a deep learning model? Is it proper to apply a typical word vector model which primarily relies on the form of words to Korean with a high homonym ratio? Will the text preprocessing such as correcting spelling or spacing errors affect the classification accuracy, especially when drawing morpheme vectors from Korean product reviews with a lot of grammatical mistakes and variations? We seek to find empirical answers to these fundamental issues, which may be encountered first when applying various deep learning models to Korean texts. As a starting point, we summarized these issues as three central research questions as follows. First, which is better effective, to use morpheme vectors from grammatically correct texts of other domain than the analysis target, or to use morpheme vectors from considerably ungrammatical texts of the same domain, as the initial input of a deep learning model? Second, what is an appropriate morpheme vector derivation method for Korean regarding the range of POS tags, homonym, text preprocessing, minimum frequency? Third, can we get a satisfactory level of classification accuracy when applying deep learning to Korean sentiment analysis? As an approach to these research questions, we generate various types of morpheme vectors reflecting the research questions and then compare the classification accuracy through a non-static CNN(Convolutional Neural Network) model taking in the morpheme vectors. As for training and test datasets, Naver Shopping's 17,260 cosmetics product reviews are used. To derive morpheme vectors, we use data from the same domain as the target one and data from other domain; Naver shopping's about 2 million cosmetics product reviews and 520,000 Naver News data arguably corresponding to Google's News data. The six primary sets of morpheme vectors constructed in this study differ in terms of the following three criteria. First, they come from two types of data source; Naver news of high grammatical correctness and Naver shopping's cosmetics product reviews of low grammatical correctness. Second, they are distinguished in the degree of data preprocessing, namely, only splitting sentences or up to additional spelling and spacing corrections after sentence separation. Third, they vary concerning the form of input fed into a word vector model; whether the morphemes themselves are entered into a word vector model or with their POS tags attached. The morpheme vectors further vary depending on the consideration range of POS tags, the minimum frequency of morphemes included, and the random initialization range. All morpheme vectors are derived through CBOW(Continuous Bag-Of-Words) model with the context window 5 and the vector dimension 300. It seems that utilizing the same domain text even with a lower degree of grammatical correctness, performing spelling and spacing corrections as well as sentence splitting, and incorporating morphemes of any POS tags including incomprehensible category lead to the better classification accuracy. The POS tag attachment, which is devised for the high proportion of homonyms in Korean, and the minimum frequency standard for the morpheme to be included seem not to have any definite influence on the classification accuracy.

Stock Identification of Todarodes pacificus in Northwest Pacific (북서태평양에 서식하는 살오징어(Todarodes pacificus) 계군 분석에 대한 고찰)

  • Kim, Jeong-Yun;Moon, Chang-Ho;Yoon, Moon-Geun;Kang, Chang-Keun;Kim, Kyung-Ryul;Na, Taehee;Choy, Eun Jung;Lee, Chung Il
    • The Sea:JOURNAL OF THE KOREAN SOCIETY OF OCEANOGRAPHY
    • /
    • v.17 no.4
    • /
    • pp.292-302
    • /
    • 2012
  • This paper reviews comparison analysis of current and latest application for stock identification methods of Todarodes pacificus, and the pros and cons of each method and consideration of how to compensate for each other. Todarodes pacificus which migrates wide areas in western North Pacific is important fishery resource ecologically and commercially. Todarodes pacificus is also considered as 'biological indicator' of ocean environmental changes. And changes in its short and long term catch and distribution area occur along with environmental changes. For example, while the catch of pollack, a cold water fish, has dramatically decreased until today after the climate regime shift in 1987/1988, the catch of Todarodes pacificus has been dramatically increased. Regarding the decrease in pollack catch, overfishing and climate changes were considered as the main causes, but there has been no definite reason until today. One of the reasons why there is no definite answer is related with no proper analysis about ecological and environmental aspects based on stock identification. Subpopulation is a group sharing the same gene pool through sexual reproduction process within limited boundaries having similar ecological characteristics. Each individual with same stock might be affected by different environment in temporal and spatial during the process of spawning, recruitment and then reproduction. Thereby, accurate stock analysis about the species can play an efficient alternative to comply with effective resource management and rapid changes. Four main stock analysis were applied to Todarodes pacificus: Morphologic Method, Ecological Method, Tagging Method, Genetic Method. Ecological method is studies for analysis of differences in spawning grounds by analysing the individual ecological change, distribution, migration status, parasitic state of parasite, kinds of parasite and parasite infection rate etc. Currently the method has been studying lively can identify the group in the similar environment. However It is difficult to know to identify the same genetic group in each other. Tagging Method is direct method. It can analyse cohort's migration, distribution and location of spawning, but it is very difficult to recapture tagged squids and hard to tag juveniles. Genetic method, which is for useful fishery resource stock analysis has provided the basic information regarding resource management study. Genetic method for stock analysis is determined according to markers' sensitivity and need to select high multiform of genetic markers. For stock identification, isozyme multiform has been used for genetic markers. Recently there is increase in use of makers with high range variability among DNA sequencing like mitochondria, microsatellite. Even the current morphologic method, tagging method and ecological method played important rolls through finding Todarodes pacificus' life cycle, migration route and changes in spawning grounds, it is still difficult to analyze the stock of Todarodes pacificus as those are distributed in difference seas. Lately, by taking advantages of each stock analysis method, more complicated method is being applied. If based on such analysis and genetic method for improvement are played, there will be much advance in management system for the resource fluctuation of Todarodes pacificus.

A STUDY ON THE RELATIONS OF VARIOUS PARTS OF THE PALATE FOR PRIMARY AND PERMANENT DENTITION (유치열과 영구치열의 구개 각부의 관계에 관한 연구)

  • Lee, Yong-Hoon;Yang, Yeon-Mi;Lee, Yong-Hee;Kim, Sang-Hoon;Kim, Jae-Gon;Baik, Byeong-Ju
    • Journal of the korean academy of Pediatric Dentistry
    • /
    • v.31 no.4
    • /
    • pp.569-578
    • /
    • 2004
  • The purpose of this study was to clarify the palatal arch length, width and height in the primary and permanent dentition. Samples were consisted of normal occlusions both in the primary dentition(50 males and 50 females) and in the permanent dentition(50 males and 50 females). With their upper plaster casts were used and through 3-dimensional laser scanning(3D Scanner, DS4060, LDI, U.S.A.), cloud data, polygonization, section curve and loft surface, fit and horizontal plane were based to measure the palatal arch length, width and height(Surfacer 10.0, Imageware, U.S.A.). T-tests were applied for the statistical analyze of the data. The results were as follows : 1. In the measurement values, the values of the male were higher than those of the female except primary anterior palatal height. There were not only statistically significant differences in anterior palatal width(p<0.05) and posterior palatal width(p<0.01) in primary dentition but palatal width(p<0.05), anterior palatal length(p<0.01), middle and posterior palatal length(p<0.05) in permanent dentition between male and female. 2. In the indices of palate, there were statistically significant differences in height-length index(p<0.05) and width-length index(p<0.01) between male and female in primary dentition. In permanent dentition, there was statistically difference between male and female. 3. In the measurement values, posterior palatal width was increased most greatly. Posterior palatal height, anterior palatal width and anterior palatal length were followed by descending order. On the other hand, anterior palatal height and posterior palatal length were decreased. 4. In the indices of palate, the height-length index, the width-length index and posterior height-width index were increased, but the others were decreased.

  • PDF

Stock-Index Invest Model Using News Big Data Opinion Mining (뉴스와 주가 : 빅데이터 감성분석을 통한 지능형 투자의사결정모형)

  • Kim, Yoo-Sin;Kim, Nam-Gyu;Jeong, Seung-Ryul
    • Journal of Intelligence and Information Systems
    • /
    • v.18 no.2
    • /
    • pp.143-156
    • /
    • 2012
  • People easily believe that news and stock index are closely related. They think that securing news before anyone else can help them forecast the stock prices and enjoy great profit, or perhaps capture the investment opportunity. However, it is no easy feat to determine to what extent the two are related, come up with the investment decision based on news, or find out such investment information is valid. If the significance of news and its impact on the stock market are analyzed, it will be possible to extract the information that can assist the investment decisions. The reality however is that the world is inundated with a massive wave of news in real time. And news is not patterned text. This study suggests the stock-index invest model based on "News Big Data" opinion mining that systematically collects, categorizes and analyzes the news and creates investment information. To verify the validity of the model, the relationship between the result of news opinion mining and stock-index was empirically analyzed by using statistics. Steps in the mining that converts news into information for investment decision making, are as follows. First, it is indexing information of news after getting a supply of news from news provider that collects news on real-time basis. Not only contents of news but also various information such as media, time, and news type and so on are collected and classified, and then are reworked as variable from which investment decision making can be inferred. Next step is to derive word that can judge polarity by separating text of news contents into morpheme, and to tag positive/negative polarity of each word by comparing this with sentimental dictionary. Third, positive/negative polarity of news is judged by using indexed classification information and scoring rule, and then final investment decision making information is derived according to daily scoring criteria. For this study, KOSPI index and its fluctuation range has been collected for 63 days that stock market was open during 3 months from July 2011 to September in Korea Exchange, and news data was collected by parsing 766 articles of economic news media M company on web page among article carried on stock information>news>main news of portal site Naver.com. In change of the price index of stocks during 3 months, it rose on 33 days and fell on 30 days, and news contents included 197 news articles before opening of stock market, 385 news articles during the session, 184 news articles after closing of market. Results of mining of collected news contents and of comparison with stock price showed that positive/negative opinion of news contents had significant relation with stock price, and change of the price index of stocks could be better explained in case of applying news opinion by deriving in positive/negative ratio instead of judging between simplified positive and negative opinion. And in order to check whether news had an effect on fluctuation of stock price, or at least went ahead of fluctuation of stock price, in the results that change of stock price was compared only with news happening before opening of stock market, it was verified to be statistically significant as well. In addition, because news contained various type and information such as social, economic, and overseas news, and corporate earnings, the present condition of type of industry, market outlook, the present condition of market and so on, it was expected that influence on stock market or significance of the relation would be different according to the type of news, and therefore each type of news was compared with fluctuation of stock price, and the results showed that market condition, outlook, and overseas news was the most useful to explain fluctuation of news. On the contrary, news about individual company was not statistically significant, but opinion mining value showed tendency opposite to stock price, and the reason can be thought to be the appearance of promotional and planned news for preventing stock price from falling. Finally, multiple regression analysis and logistic regression analysis was carried out in order to derive function of investment decision making on the basis of relation between positive/negative opinion of news and stock price, and the results showed that regression equation using variable of market conditions, outlook, and overseas news before opening of stock market was statistically significant, and classification accuracy of logistic regression accuracy results was shown to be 70.0% in rise of stock price, 78.8% in fall of stock price, and 74.6% on average. This study first analyzed relation between news and stock price through analyzing and quantifying sensitivity of atypical news contents by using opinion mining among big data analysis techniques, and furthermore, proposed and verified smart investment decision making model that could systematically carry out opinion mining and derive and support investment information. This shows that news can be used as variable to predict the price index of stocks for investment, and it is expected the model can be used as real investment support system if it is implemented as system and verified in the future.

NFC-based Smartwork Service Model Design (NFC 기반의 스마트워크 서비스 모델 설계)

  • Park, Arum;Kang, Min Su;Jun, Jungho;Lee, Kyoung Jun
    • Journal of Intelligence and Information Systems
    • /
    • v.19 no.2
    • /
    • pp.157-175
    • /
    • 2013
  • Since Korean government announced 'Smartwork promotion strategy' in 2010, Korean firms and government organizations have started to adopt smartwork. However, the smartwork has been implemented only in a few of large enterprises and government organizations rather than SMEs (small and medium enterprises). In USA, both Yahoo! and Best Buy have stopped their flexible work because of its reported low productivity and job loafing problems. In addition, according to the literature on smartwork, we could draw obstacles of smartwork adoption and categorize them into the three types: institutional, organizational, and technological. The first category of smartwork adoption obstacles, institutional, include the difficulties of smartwork performance evaluation metrics, the lack of readiness of organizational processes, limitation of smartwork types and models, lack of employee participation in smartwork adoption procedure, high cost of building smartwork system, and insufficiency of government support. The second category, organizational, includes limitation of the organization hierarchy, wrong perception of employees and employers, a difficulty in close collaboration, low productivity with remote coworkers, insufficient understanding on remote working, and lack of training about smartwork. The third category, technological, obstacles include security concern of mobile work, lack of specialized solution, and lack of adoption and operation know-how. To overcome the current problems of smartwork in reality and the reported obstacles in literature, we suggest a novel smartwork service model based on NFC(Near Field Communication). This paper suggests NFC-based Smartwork Service Model composed of NFC-based Smartworker networking service and NFC-based Smartwork space management service. NFC-based smartworker networking service is comprised of NFC-based communication/SNS service and NFC-based recruiting/job seeking service. NFC-based communication/SNS Service Model supplements the key shortcomings that existing smartwork service model has. By connecting to existing legacy system of a company through NFC tags and systems, the low productivity and the difficulty of collaboration and attendance management can be overcome since managers can get work processing information, work time information and work space information of employees and employees can do real-time communication with coworkers and get location information of coworkers. Shortly, this service model has features such as affordable system cost, provision of location-based information, and possibility of knowledge accumulation. NFC-based recruiting/job-seeking service provides new value by linking NFC tag service and sharing economy sites. This service model has features such as easiness of service attachment and removal, efficient space-based work provision, easy search of location-based recruiting/job-seeking information, and system flexibility. This service model combines advantages of sharing economy sites with the advantages of NFC. By cooperation with sharing economy sites, the model can provide recruiters with human resource who finds not only long-term works but also short-term works. Additionally, SMEs (Small Medium-sized Enterprises) can easily find job seeker by attaching NFC tags to any spaces at which human resource with qualification may be located. In short, this service model helps efficient human resource distribution by providing location of job hunters and job applicants. NFC-based smartwork space management service can promote smartwork by linking NFC tags attached to the work space and existing smartwork system. This service has features such as low cost, provision of indoor and outdoor location information, and customized service. In particular, this model can help small company adopt smartwork system because it is light-weight system and cost-effective compared to existing smartwork system. This paper proposes the scenarios of the service models, the roles and incentives of the participants, and the comparative analysis. The superiority of NFC-based smartwork service model is shown by comparing and analyzing the new service models and the existing service models. The service model can expand scope of enterprises and organizations that adopt smartwork and expand the scope of employees that take advantages of smartwork.