• Title/Summary/Keyword: Embedded-system

Search Result 4,509, Processing Time 0.037 seconds

Estimation of GARCH Models and Performance Analysis of Volatility Trading System using Support Vector Regression (Support Vector Regression을 이용한 GARCH 모형의 추정과 투자전략의 성과분석)

  • Kim, Sun Woong;Choi, Heung Sik
    • Journal of Intelligence and Information Systems
    • /
    • v.23 no.2
    • /
    • pp.107-122
    • /
    • 2017
  • Volatility in the stock market returns is a measure of investment risk. It plays a central role in portfolio optimization, asset pricing and risk management as well as most theoretical financial models. Engle(1982) presented a pioneering paper on the stock market volatility that explains the time-variant characteristics embedded in the stock market return volatility. His model, Autoregressive Conditional Heteroscedasticity (ARCH), was generalized by Bollerslev(1986) as GARCH models. Empirical studies have shown that GARCH models describes well the fat-tailed return distributions and volatility clustering phenomenon appearing in stock prices. The parameters of the GARCH models are generally estimated by the maximum likelihood estimation (MLE) based on the standard normal density. But, since 1987 Black Monday, the stock market prices have become very complex and shown a lot of noisy terms. Recent studies start to apply artificial intelligent approach in estimating the GARCH parameters as a substitute for the MLE. The paper presents SVR-based GARCH process and compares with MLE-based GARCH process to estimate the parameters of GARCH models which are known to well forecast stock market volatility. Kernel functions used in SVR estimation process are linear, polynomial and radial. We analyzed the suggested models with KOSPI 200 Index. This index is constituted by 200 blue chip stocks listed in the Korea Exchange. We sampled KOSPI 200 daily closing values from 2010 to 2015. Sample observations are 1487 days. We used 1187 days to train the suggested GARCH models and the remaining 300 days were used as testing data. First, symmetric and asymmetric GARCH models are estimated by MLE. We forecasted KOSPI 200 Index return volatility and the statistical metric MSE shows better results for the asymmetric GARCH models such as E-GARCH or GJR-GARCH. This is consistent with the documented non-normal return distribution characteristics with fat-tail and leptokurtosis. Compared with MLE estimation process, SVR-based GARCH models outperform the MLE methodology in KOSPI 200 Index return volatility forecasting. Polynomial kernel function shows exceptionally lower forecasting accuracy. We suggested Intelligent Volatility Trading System (IVTS) that utilizes the forecasted volatility results. IVTS entry rules are as follows. If forecasted tomorrow volatility will increase then buy volatility today. If forecasted tomorrow volatility will decrease then sell volatility today. If forecasted volatility direction does not change we hold the existing buy or sell positions. IVTS is assumed to buy and sell historical volatility values. This is somewhat unreal because we cannot trade historical volatility values themselves. But our simulation results are meaningful since the Korea Exchange introduced volatility futures contract that traders can trade since November 2014. The trading systems with SVR-based GARCH models show higher returns than MLE-based GARCH in the testing period. And trading profitable percentages of MLE-based GARCH IVTS models range from 47.5% to 50.0%, trading profitable percentages of SVR-based GARCH IVTS models range from 51.8% to 59.7%. MLE-based symmetric S-GARCH shows +150.2% return and SVR-based symmetric S-GARCH shows +526.4% return. MLE-based asymmetric E-GARCH shows -72% return and SVR-based asymmetric E-GARCH shows +245.6% return. MLE-based asymmetric GJR-GARCH shows -98.7% return and SVR-based asymmetric GJR-GARCH shows +126.3% return. Linear kernel function shows higher trading returns than radial kernel function. Best performance of SVR-based IVTS is +526.4% and that of MLE-based IVTS is +150.2%. SVR-based GARCH IVTS shows higher trading frequency. This study has some limitations. Our models are solely based on SVR. Other artificial intelligence models are needed to search for better performance. We do not consider costs incurred in the trading process including brokerage commissions and slippage costs. IVTS trading performance is unreal since we use historical volatility values as trading objects. The exact forecasting of stock market volatility is essential in the real trading as well as asset pricing models. Further studies on other machine learning-based GARCH models can give better information for the stock market investors.

An Empirical Study on Influencing Factors of Switching Intention from Online Shopping to Webrooming (온라인 쇼핑에서 웹루밍으로의 쇼핑전환 의도에 영향을 미치는 요인에 대한 연구)

  • Choi, Hyun-Seung;Yang, Sung-Byung
    • Journal of Intelligence and Information Systems
    • /
    • v.22 no.1
    • /
    • pp.19-41
    • /
    • 2016
  • Recently, the proliferation of mobile devices such as smartphones and tablet personal computers and the development of information communication technologies (ICT) have led to a big trend of a shift from single-channel shopping to multi-channel shopping. With the emergence of a "smart" group of consumers who want to shop in more reasonable and convenient ways, the boundaries apparently dividing online and offline shopping have collapsed and blurred more than ever before. Thus, there is now fierce competition between online and offline channels. Ever since the emergence of online shopping, a major type of multi-channel shopping has been "showrooming," where consumers visit offline stores to examine products before buying them online. However, because of the growing use of smart devices and the counterattack of offline retailers represented by omni-channel marketing strategies, one of the latest huge trends of shopping is "webrooming," where consumers visit online stores to examine products before buying them offline. This has become a threat to online retailers. In this situation, although it is very important to examine the influencing factors for switching from online shopping to webrooming, most prior studies have mainly focused on a single- or multi-channel shopping pattern. Therefore, this study thoroughly investigated the influencing factors on customers switching from online shopping to webrooming in terms of both the "search" and "purchase" processes through the application of a push-pull-mooring (PPM) framework. In order to test the research model, 280 individual samples were gathered from undergraduate and graduate students who had actual experience with webrooming. The results of the structural equation model (SEM) test revealed that the "pull" effect is strongest on the webrooming intention rather than the "push" or "mooring" effects. This proves a significant relationship between "attractiveness of webrooming" and "webrooming intention." In addition, the results showed that both the "perceived risk of online search" and "perceived risk of online purchase" significantly affect "distrust of online shopping." Similarly, both "perceived benefit of multi-channel search" and "perceived benefit of offline purchase" were found to have significant effects on "attractiveness of webrooming" were also found. Furthermore, the results indicated that "online purchase habit" is the only influencing factor that leads to "online shopping lock-in." The theoretical implications of the study are as follows. First, by examining the multi-channel shopping phenomenon from the perspective of "shopping switching" from online shopping to webrooming, this study complements the limits of the "channel switching" perspective, represented by multi-channel freeriding studies that merely focused on customers' channel switching behaviors from one to another. While extant studies with a channel switching perspective have focused on only one type of multi-channel shopping, where consumers just move from one particular channel to different channels, a study with a shopping switching perspective has the advantage of comprehensively investigating how consumers choose and navigate among diverse types of single- or multi-channel shopping alternatives. In this study, only limited shopping switching behavior from online shopping to webrooming was examined; however, the results should explain various phenomena in a more comprehensive manner from the perspective of shopping switching. Second, this study extends the scope of application of the push-pull-mooring framework, which is quite commonly used in marketing research to explain consumers' product switching behaviors. Through the application of this framework, it is hoped that more diverse shopping switching behaviors can be examined in future research. This study can serve a stepping stone for future studies. One of the most important practical implications of the study is that it may help single- and multi-channel retailers develop more specific customer strategies by revealing the influencing factors of webrooming intention from online shopping. For example, online single-channel retailers can ease the distrust of online shopping to prevent consumers from churning by reducing the perceived risk in terms of online search and purchase. On the other hand, offline retailers can develop specific strategies to increase the attractiveness of webrooming by letting customers perceive the benefits of multi-channel search or offline purchase. Although this study focused only on customers switching from online shopping to webrooming, the results can be expanded to various types of shopping switching behaviors embedded in single- and multi-channel shopping environments, such as showrooming and mobile shopping.

A Study on Market Size Estimation Method by Product Group Using Word2Vec Algorithm (Word2Vec을 활용한 제품군별 시장규모 추정 방법에 관한 연구)

  • Jung, Ye Lim;Kim, Ji Hui;Yoo, Hyoung Sun
    • Journal of Intelligence and Information Systems
    • /
    • v.26 no.1
    • /
    • pp.1-21
    • /
    • 2020
  • With the rapid development of artificial intelligence technology, various techniques have been developed to extract meaningful information from unstructured text data which constitutes a large portion of big data. Over the past decades, text mining technologies have been utilized in various industries for practical applications. In the field of business intelligence, it has been employed to discover new market and/or technology opportunities and support rational decision making of business participants. The market information such as market size, market growth rate, and market share is essential for setting companies' business strategies. There has been a continuous demand in various fields for specific product level-market information. However, the information has been generally provided at industry level or broad categories based on classification standards, making it difficult to obtain specific and proper information. In this regard, we propose a new methodology that can estimate the market sizes of product groups at more detailed levels than that of previously offered. We applied Word2Vec algorithm, a neural network based semantic word embedding model, to enable automatic market size estimation from individual companies' product information in a bottom-up manner. The overall process is as follows: First, the data related to product information is collected, refined, and restructured into suitable form for applying Word2Vec model. Next, the preprocessed data is embedded into vector space by Word2Vec and then the product groups are derived by extracting similar products names based on cosine similarity calculation. Finally, the sales data on the extracted products is summated to estimate the market size of the product groups. As an experimental data, text data of product names from Statistics Korea's microdata (345,103 cases) were mapped in multidimensional vector space by Word2Vec training. We performed parameters optimization for training and then applied vector dimension of 300 and window size of 15 as optimized parameters for further experiments. We employed index words of Korean Standard Industry Classification (KSIC) as a product name dataset to more efficiently cluster product groups. The product names which are similar to KSIC indexes were extracted based on cosine similarity. The market size of extracted products as one product category was calculated from individual companies' sales data. The market sizes of 11,654 specific product lines were automatically estimated by the proposed model. For the performance verification, the results were compared with actual market size of some items. The Pearson's correlation coefficient was 0.513. Our approach has several advantages differing from the previous studies. First, text mining and machine learning techniques were applied for the first time on market size estimation, overcoming the limitations of traditional sampling based- or multiple assumption required-methods. In addition, the level of market category can be easily and efficiently adjusted according to the purpose of information use by changing cosine similarity threshold. Furthermore, it has a high potential of practical applications since it can resolve unmet needs for detailed market size information in public and private sectors. Specifically, it can be utilized in technology evaluation and technology commercialization support program conducted by governmental institutions, as well as business strategies consulting and market analysis report publishing by private firms. The limitation of our study is that the presented model needs to be improved in terms of accuracy and reliability. The semantic-based word embedding module can be advanced by giving a proper order in the preprocessed dataset or by combining another algorithm such as Jaccard similarity with Word2Vec. Also, the methods of product group clustering can be changed to other types of unsupervised machine learning algorithm. Our group is currently working on subsequent studies and we expect that it can further improve the performance of the conceptually proposed basic model in this study.

The Relationship between Expression of EGFR, MMP-9, and C-erbB-2 and Survival Time in Resected Non-Small Cell Lung Cancer (수술을 시행한 비소세포 폐암 환자에서 EGFR, MMP-9 및 C-erbB-2의 발현과 환자 생존율과의 관계)

  • Lee, Seung Heon;Jung, Jin Yong;Lee, Kyoung Ju;Lee, Seung Hyeun;Kim, Se Joong;Ha, Eun Sil;Kim, Jeong-Ha;Lee, Eun Joo;Hur, Gyu Young;Jung, Ki Hwan;Jung, Hye Cheol;Lee, Sung Yong;Lee, Sang Yeub;Kim, Je Hyeong;Shin, Chol;Shim, Jae Jeong;In, Kwang Ho;Kang, Kyung Ho;Yoo, Se Hwa;Kim, Chul Hwan
    • Tuberculosis and Respiratory Diseases
    • /
    • v.59 no.3
    • /
    • pp.286-297
    • /
    • 2005
  • Background : Non-small cell lung cancer (NSCLC) is a common cause of cancer-related death in North America and Korea, with an overall 5-year survival rate of between 4 and 14%. The TNM staging system is the best prognostic index for operable NSCLC . However, epidermal growth factor receptor (EGFR), matrix metalloproteinase-9(MMP-9), and C-erbB-2 have all been implicated in the pathogenesis of NSCLC and might provide prognostic information. Methods : Immunohistochemical staining of 81 specimens from a resected primary non-small cell lung cancer was evaluated in order to determine the role of the biological markers on NSCLC . Immunohistochemical staining for EGFR, MMP-9, and C-erbB-2 was performed on paraffin-embedded tissue sections to observe the expression pattern according to the pathologic type and surgical staging. The correlations between the expression of each biological marker and the survival time was determined. Results : When positive immunohistochemical staining was defined as the extent area>20%(more than Grade 2), the positive rates for EGFR, MMP-9, and C-erbB-2 staining were 71.6%, 44.3%, and 24.1% of the 81 patients, respectively. The positive rates of EGFR and MMP-9 stain for NSCLC according to the surgical stages I, II, and IIIa were 75.0% and 41.7%, 66.7% and 47.6%, and 76.9% and 46.2%, respectively. The median survival time of the EGFR(-) group, 71.8 months, was significantly longer than that of the EGFR(+) group, 33.5 months.(p=0.018, Kaplan-Meier Method, log-rank test).. The MMP-9(+) group had a shorter median survival time than the MMP-9(-) group, 35.0 and 65.3 months, respectively (p=0.2). The co-expression of EGFR and MMP-9 was associated with a worse prognosis with a median survival time of 26.9 months, when compared with the 77 months for both negative-expression groups (p=0.0023). There were no significant differences between the C-erbB-2(+) and C-erbB-2 (-) groups. Conclusion : In NSCLC, the expression of EGFR might be a prognostic factor, and the co-expression of EGFR and MMP-9 was found to be associated with a poor prognosis. However, C-erbB-2 expression had no prognostic significance.

The Expression of Vascular Endothelial Growth Factor (VEGF) is a Highly Significant Prognostic Factor in Stage IB Carcinoma of the Cervix (병기 IB 자궁경부암에서 혈관내피세포성장인자(VEGF)의 발현이 예후에 미치는 영향)

  • Lee Ik Jae;Park Kyung Ran;Lee Jong Young;Lee Kang Kyoo;Song Ji Sun;Lee Kwang Gil;Cha Dong Soo;Choi Hyun Il
    • Radiation Oncology Journal
    • /
    • v.19 no.4
    • /
    • pp.335-344
    • /
    • 2001
  • Purpose : The aim of this study was to clarify the role of VEGF expression as an independent prognostic factor and to identify the patients at high risk for poor prognosis in stage IB cervical cancer. Materials and methods : A total of 118 patients with stage IB cervical cancer who had radical hysterectomy and pelvic lymph node dissection were included in the study. All known high risk factors of the patients were pathologically confirmed from the surgical specimen. Of the 118 patients, n patients were treated with postoperative radiotherapy and/or chemotherapy. VEGF expression was examined using immunohistochemistry in formalin-fixed, paraffin-embedded specimens of post-hysterectomy surgical materials. A semiquantitative analysis was made using a scoring system of 0, +, ++, and +++ for increasing intensity of stain. We classified the patients with scores from 0 to ++ as low VEGF expression and the patients with a score of +++ as high VEGF expression. Results : Of the 118 patients, 35 patients $(29.7\%)$ showed high VEGF expression. Strong correlations were found between the high VEGF expression and both deep stromal invasion (p=0.01) and the positive pelvic node (p=0.03). The 5-year overall and disease-free survival rates for all 118 patients were $95.5\%\;and\;93.8\%$. The 5-year overall (p=0.03) and disease-free survival (p<0.001) rates were $98.5\%\;and\;100%$ for low VEGF expression (0, +, and ++) and $85.5\%\;and\;79.7\%$ for high VEGF expression, respectively. Pelvic and distant failures for low versus high VEGF expression were $1.2\%$ versus $17.1\%$, (p=0.001) and $0\%$ versus $14.3\%$ (p<0.001), respectively. In a Cox multivariate analysis of survival, the high VEGF expression (p=0.02) and the bulky mass (p=0.02) were significant prognostic factors for overall survival. The high VEGF expression (p=0.002), and bulky mass (p=0.01) demonstrated as significant prognostic indicators for disease free survival. Conclusion : These results showed that VEGF expression was a highly significant predictor for pelvic and distant failure and the most significant prognostic factor of overall and disease free survival for the patients with stage IB cervix cancer treated with radical surgery. We strongly suggest that the immune-histochemistry for VEGF expression be performed in a routine clinical setting in order to identify the patients at high risk for poor prognosis in early stage cervical cancer. Furthermore, postoperative and/or chemotherapy did not reduce the pelvic failure and distant metastasis. To improve the cure rate for the patients with high VEGF expression in stage IB cervical cancer, antiangiogenic therapy including anti-VEGF Ab may be new treatment option.

  • PDF

Air Cavity Effects on the Absorbed Dose for 4-, 6- and 10-MV X-ray Beams : Larynx Model (4-, 6-, 10-MV X-선원에서 공기동이 흡수선량에 미치는 효과 : 후두모형)

  • Kim Chang-Seon;Yang Dae-Sik;Kim Chul-Yong;Choi Myung-Sun
    • Radiation Oncology Journal
    • /
    • v.15 no.4
    • /
    • pp.393-402
    • /
    • 1997
  • Purpose : When an x-ray beam of small field size is irradiated to target area containing an air cavity, such as larynx, the underdosing effect is observed in the region near the interfaces of air and soft tissue. With a larynx model, air cavity embedded in tissue-equivalent material, this study is intonded for examining Parameters, such as beam quality, field size, and cavity size, to affect the dose distribution near the air cavity. Materials and Methods : Three x-rar beams, 4-, 6- and 10-MV, were employed to Perform a measurement using a 2cm $(width){\times}L$ (length in cm, one side of x-ray field used 2cm (height) air cavity in the simulated larynx. A thin window parallel-plate chamber connected to an electrometer was used for a dosimetry system. A ratio of the dose at various distances from the cavity-tissue interface to the dose at the same points in a homogeneous Phantom (ebservedlexpected ratio, O/E) normalized buildup curves, and ratio of distal surface dose to dose at the maximum buildup depth were examined for various field sizes. Measurement for cavity size effect was performed by varying the height (Z) of the air cavity with the width kept constant for several field sizes. Results : No underdosing effect for 4-MV beam for fields larger than $5cm\times5cm$ was found For both 6- and 10-MV beams, the underdosing portion of the larynx at the distal surface was seen to occur for small fields, $4cm\times4cm\;and\;5cm\times5cm$. The underdosed tissue was increased in its volume with beam energy even for similar surface doses. The relative distal surface dose to maximum dose was changed to 0.99 from 0.95, 0.92, and 0.91 for 4-, 6-, and 10-MV, respectively, with increasing field size, $4cm\times4cm\;to\;8cm\times8cm$, For 6- and 10-MV beams, the dose at the surface of the cavity is measured less than the predicted by about two and three percent. respectively. but decrease was found for 4-MV beam for $5cm\times5cm$ field. For the $4cm\timesL\timesZ$ (height in cm). varying depth from 0.0 to 4.8cm, cavity, O/E> 1.0 was observed regardless of the cavity size for any field larger than about $8cm\times8cm$. Conclusion : The magnitude of underdosing depends on beam energy, field size. and cavity size for the larynx model. Based on the result of the study. caution must be used when a small field of a high quality x-ray beam is irradiated to regions including air cavities. and especially the region where the tumor extends to the surface. Low quality beam. such as. 4-MV x-ray, and larger fields can be used preferably to reduce the risk of underdosing, local failure. In the case of high quality beams such as 6- and 10-MV x-rays, however. an additional boost field is recommended to add for the compensation of the underdosing region when a typically used treatment field. $8cm\times8cm$, is employed.

  • PDF

Image Watermarking for Copyright Protection of Images on Shopping Mall (쇼핑몰 이미지 저작권보호를 위한 영상 워터마킹)

  • Bae, Kyoung-Yul
    • Journal of Intelligence and Information Systems
    • /
    • v.19 no.4
    • /
    • pp.147-157
    • /
    • 2013
  • With the advent of the digital environment that can be accessed anytime, anywhere with the introduction of high-speed network, the free distribution and use of digital content were made possible. Ironically this environment is raising a variety of copyright infringement, and product images used in the online shopping mall are pirated frequently. There are many controversial issues whether shopping mall images are creative works or not. According to Supreme Court's decision in 2001, to ad pictures taken with ham products is simply a clone of the appearance of objects to deliver nothing but the decision was not only creative expression. But for the photographer's losses recognized in the advertising photo shoot takes the typical cost was estimated damages. According to Seoul District Court precedents in 2003, if there are the photographer's personality and creativity in the selection of the subject, the composition of the set, the direction and amount of light control, set the angle of the camera, shutter speed, shutter chance, other shooting methods for capturing, developing and printing process, the works should be protected by copyright law by the Court's sentence. In order to receive copyright protection of the shopping mall images by the law, it is simply not to convey the status of the product, the photographer's personality and creativity can be recognized that it requires effort. Accordingly, the cost of making the mall image increases, and the necessity for copyright protection becomes higher. The product images of the online shopping mall have a very unique configuration unlike the general pictures such as portraits and landscape photos and, therefore, the general image watermarking technique can not satisfy the requirements of the image watermarking. Because background of product images commonly used in shopping malls is white or black, or gray scale (gradient) color, it is difficult to utilize the space to embed a watermark and the area is very sensitive even a slight change. In this paper, the characteristics of images used in shopping malls are analyzed and a watermarking technology which is suitable to the shopping mall images is proposed. The proposed image watermarking technology divide a product image into smaller blocks, and the corresponding blocks are transformed by DCT (Discrete Cosine Transform), and then the watermark information was inserted into images using quantization of DCT coefficients. Because uniform treatment of the DCT coefficients for quantization cause visual blocking artifacts, the proposed algorithm used weighted mask which quantizes finely the coefficients located block boundaries and coarsely the coefficients located center area of the block. This mask improves subjective visual quality as well as the objective quality of the images. In addition, in order to improve the safety of the algorithm, the blocks which is embedded the watermark are randomly selected and the turbo code is used to reduce the BER when extracting the watermark. The PSNR(Peak Signal to Noise Ratio) of the shopping mall image watermarked by the proposed algorithm is 40.7~48.5[dB] and BER(Bit Error Rate) after JPEG with QF = 70 is 0. This means the watermarked image is high quality and the algorithm is robust to JPEG compression that is used generally at the online shopping malls. Also, for 40% change in size and 40 degrees of rotation, the BER is 0. In general, the shopping malls are used compressed images with QF which is higher than 90. Because the pirated image is used to replicate from original image, the proposed algorithm can identify the copyright infringement in the most cases. As shown the experimental results, the proposed algorithm is suitable to the shopping mall images with simple background. However, the future study should be carried out to enhance the robustness of the proposed algorithm because the robustness loss is occurred after mask process.

Impact of Semantic Characteristics on Perceived Helpfulness of Online Reviews (온라인 상품평의 내용적 특성이 소비자의 인지된 유용성에 미치는 영향)

  • Park, Yoon-Joo;Kim, Kyoung-jae
    • Journal of Intelligence and Information Systems
    • /
    • v.23 no.3
    • /
    • pp.29-44
    • /
    • 2017
  • In Internet commerce, consumers are heavily influenced by product reviews written by other users who have already purchased the product. However, as the product reviews accumulate, it takes a lot of time and effort for consumers to individually check the massive number of product reviews. Moreover, product reviews that are written carelessly actually inconvenience consumers. Thus many online vendors provide mechanisms to identify reviews that customers perceive as most helpful (Cao et al. 2011; Mudambi and Schuff 2010). For example, some online retailers, such as Amazon.com and TripAdvisor, allow users to rate the helpfulness of each review, and use this feedback information to rank and re-order them. However, many reviews have only a few feedbacks or no feedback at all, thus making it hard to identify their helpfulness. Also, it takes time to accumulate feedbacks, thus the newly authored reviews do not have enough ones. For example, only 20% of the reviews in Amazon Review Dataset (Mcauley and Leskovec, 2013) have more than 5 reviews (Yan et al, 2014). The purpose of this study is to analyze the factors affecting the usefulness of online product reviews and to derive a forecasting model that selectively provides product reviews that can be helpful to consumers. In order to do this, we extracted the various linguistic, psychological, and perceptual elements included in product reviews by using text-mining techniques and identifying the determinants among these elements that affect the usability of product reviews. In particular, considering that the characteristics of the product reviews and determinants of usability for apparel products (which are experiential products) and electronic products (which are search goods) can differ, the characteristics of the product reviews were compared within each product group and the determinants were established for each. This study used 7,498 apparel product reviews and 106,962 electronic product reviews from Amazon.com. In order to understand a review text, we first extract linguistic and psychological characteristics from review texts such as a word count, the level of emotional tone and analytical thinking embedded in review text using widely adopted text analysis software LIWC (Linguistic Inquiry and Word Count). After then, we explore the descriptive statistics of review text for each category and statistically compare their differences using t-test. Lastly, we regression analysis using the data mining software RapidMiner to find out determinant factors. As a result of comparing and analyzing product review characteristics of electronic products and apparel products, it was found that reviewers used more words as well as longer sentences when writing product reviews for electronic products. As for the content characteristics of the product reviews, it was found that these reviews included many analytic words, carried more clout, and related to the cognitive processes (CogProc) more so than the apparel product reviews, in addition to including many words expressing negative emotions (NegEmo). On the other hand, the apparel product reviews included more personal, authentic, positive emotions (PosEmo) and perceptual processes (Percept) compared to the electronic product reviews. Next, we analyzed the determinants toward the usefulness of the product reviews between the two product groups. As a result, it was found that product reviews with high product ratings from reviewers in both product groups that were perceived as being useful contained a larger number of total words, many expressions involving perceptual processes, and fewer negative emotions. In addition, apparel product reviews with a large number of comparative expressions, a low expertise index, and concise content with fewer words in each sentence were perceived to be useful. In the case of electronic product reviews, those that were analytical with a high expertise index, along with containing many authentic expressions, cognitive processes, and positive emotions (PosEmo) were perceived to be useful. These findings are expected to help consumers effectively identify useful product reviews in the future.

KNU Korean Sentiment Lexicon: Bi-LSTM-based Method for Building a Korean Sentiment Lexicon (Bi-LSTM 기반의 한국어 감성사전 구축 방안)

  • Park, Sang-Min;Na, Chul-Won;Choi, Min-Seong;Lee, Da-Hee;On, Byung-Won
    • Journal of Intelligence and Information Systems
    • /
    • v.24 no.4
    • /
    • pp.219-240
    • /
    • 2018
  • Sentiment analysis, which is one of the text mining techniques, is a method for extracting subjective content embedded in text documents. Recently, the sentiment analysis methods have been widely used in many fields. As good examples, data-driven surveys are based on analyzing the subjectivity of text data posted by users and market researches are conducted by analyzing users' review posts to quantify users' reputation on a target product. The basic method of sentiment analysis is to use sentiment dictionary (or lexicon), a list of sentiment vocabularies with positive, neutral, or negative semantics. In general, the meaning of many sentiment words is likely to be different across domains. For example, a sentiment word, 'sad' indicates negative meaning in many fields but a movie. In order to perform accurate sentiment analysis, we need to build the sentiment dictionary for a given domain. However, such a method of building the sentiment lexicon is time-consuming and various sentiment vocabularies are not included without the use of general-purpose sentiment lexicon. In order to address this problem, several studies have been carried out to construct the sentiment lexicon suitable for a specific domain based on 'OPEN HANGUL' and 'SentiWordNet', which are general-purpose sentiment lexicons. However, OPEN HANGUL is no longer being serviced and SentiWordNet does not work well because of language difference in the process of converting Korean word into English word. There are restrictions on the use of such general-purpose sentiment lexicons as seed data for building the sentiment lexicon for a specific domain. In this article, we construct 'KNU Korean Sentiment Lexicon (KNU-KSL)', a new general-purpose Korean sentiment dictionary that is more advanced than existing general-purpose lexicons. The proposed dictionary, which is a list of domain-independent sentiment words such as 'thank you', 'worthy', and 'impressed', is built to quickly construct the sentiment dictionary for a target domain. Especially, it constructs sentiment vocabularies by analyzing the glosses contained in Standard Korean Language Dictionary (SKLD) by the following procedures: First, we propose a sentiment classification model based on Bidirectional Long Short-Term Memory (Bi-LSTM). Second, the proposed deep learning model automatically classifies each of glosses to either positive or negative meaning. Third, positive words and phrases are extracted from the glosses classified as positive meaning, while negative words and phrases are extracted from the glosses classified as negative meaning. Our experimental results show that the average accuracy of the proposed sentiment classification model is up to 89.45%. In addition, the sentiment dictionary is more extended using various external sources including SentiWordNet, SenticNet, Emotional Verbs, and Sentiment Lexicon 0603. Furthermore, we add sentiment information about frequently used coined words and emoticons that are used mainly on the Web. The KNU-KSL contains a total of 14,843 sentiment vocabularies, each of which is one of 1-grams, 2-grams, phrases, and sentence patterns. Unlike existing sentiment dictionaries, it is composed of words that are not affected by particular domains. The recent trend on sentiment analysis is to use deep learning technique without sentiment dictionaries. The importance of developing sentiment dictionaries is declined gradually. However, one of recent studies shows that the words in the sentiment dictionary can be used as features of deep learning models, resulting in the sentiment analysis performed with higher accuracy (Teng, Z., 2016). This result indicates that the sentiment dictionary is used not only for sentiment analysis but also as features of deep learning models for improving accuracy. The proposed dictionary can be used as a basic data for constructing the sentiment lexicon of a particular domain and as features of deep learning models. It is also useful to automatically and quickly build large training sets for deep learning models.