Search | Korea Science

Impact of Semantic Characteristics on Perceived Helpfulness of Online Reviews (온라인 상품평의 내용적 특성이 소비자의 인지된 유용성에 미치는 영향)

Park, Yoon-Joo;Kim, Kyoung-jae
- Journal of Intelligence and Information Systems
- /
- v.23 no.3
- /
- pp.29-44
- /
- 2017
In Internet commerce, consumers are heavily influenced by product reviews written by other users who have already purchased the product. However, as the product reviews accumulate, it takes a lot of time and effort for consumers to individually check the massive number of product reviews. Moreover, product reviews that are written carelessly actually inconvenience consumers. Thus many online vendors provide mechanisms to identify reviews that customers perceive as most helpful (Cao et al. 2011; Mudambi and Schuff 2010). For example, some online retailers, such as Amazon.com and TripAdvisor, allow users to rate the helpfulness of each review, and use this feedback information to rank and re-order them. However, many reviews have only a few feedbacks or no feedback at all, thus making it hard to identify their helpfulness. Also, it takes time to accumulate feedbacks, thus the newly authored reviews do not have enough ones. For example, only 20% of the reviews in Amazon Review Dataset (Mcauley and Leskovec, 2013) have more than 5 reviews (Yan et al, 2014). The purpose of this study is to analyze the factors affecting the usefulness of online product reviews and to derive a forecasting model that selectively provides product reviews that can be helpful to consumers. In order to do this, we extracted the various linguistic, psychological, and perceptual elements included in product reviews by using text-mining techniques and identifying the determinants among these elements that affect the usability of product reviews. In particular, considering that the characteristics of the product reviews and determinants of usability for apparel products (which are experiential products) and electronic products (which are search goods) can differ, the characteristics of the product reviews were compared within each product group and the determinants were established for each. This study used 7,498 apparel product reviews and 106,962 electronic product reviews from Amazon.com. In order to understand a review text, we first extract linguistic and psychological characteristics from review texts such as a word count, the level of emotional tone and analytical thinking embedded in review text using widely adopted text analysis software LIWC (Linguistic Inquiry and Word Count). After then, we explore the descriptive statistics of review text for each category and statistically compare their differences using t-test. Lastly, we regression analysis using the data mining software RapidMiner to find out determinant factors. As a result of comparing and analyzing product review characteristics of electronic products and apparel products, it was found that reviewers used more words as well as longer sentences when writing product reviews for electronic products. As for the content characteristics of the product reviews, it was found that these reviews included many analytic words, carried more clout, and related to the cognitive processes (CogProc) more so than the apparel product reviews, in addition to including many words expressing negative emotions (NegEmo). On the other hand, the apparel product reviews included more personal, authentic, positive emotions (PosEmo) and perceptual processes (Percept) compared to the electronic product reviews. Next, we analyzed the determinants toward the usefulness of the product reviews between the two product groups. As a result, it was found that product reviews with high product ratings from reviewers in both product groups that were perceived as being useful contained a larger number of total words, many expressions involving perceptual processes, and fewer negative emotions. In addition, apparel product reviews with a large number of comparative expressions, a low expertise index, and concise content with fewer words in each sentence were perceived to be useful. In the case of electronic product reviews, those that were analytical with a high expertise index, along with containing many authentic expressions, cognitive processes, and positive emotions (PosEmo) were perceived to be useful. These findings are expected to help consumers effectively identify useful product reviews in the future.
https://doi.org/10.13088/jiis.2017.23.3.029 인용 PDF KSCI

Exploring the Role of Preference Heterogeneity and Causal Attribution in Online Ratings Dynamics

Chu, Wujin;Roh, Minjung
- Asia Marketing Journal
- /
- v.15 no.4
- /
- pp.61-101
- /
- 2014
This study investigates when and how disagreements in online customer ratings prompt more favorable product evaluations. Among the three metrics of volume, valence, and variance that feature in the research on online customer ratings, volume and valence have exhibited consistently positive patterns in their effects on product sales or evaluations (e.g., Dellarocas, Zhang, and Awad 2007; Liu 2006). Ratings variance, or the degree of disagreement among reviewers, however, has shown rather mixed results, with some studies reporting positive effects on product sales (e.g., Clement, Proppe, and Rott 2007) while others finding negative effects on product evaluations (e.g., Zhu and Zhang 2010). This study aims to resolve these contradictory findings by introducing preference heterogeneity as a possible moderator and causal attribution as a mediator to account for the moderating effect. The main proposition of this study is that when preference heterogeneity is perceived as high, a disagreement in ratings is attributed more to reviewers' different preferences than to unreliable product quality, which in turn prompts better quality evaluations of a product. Because disagreements mostly result from differences in reviewers' tastes or the low reliability of a product's quality (Mizerski 1982; Sen and Lerman 2007), a greater level of attribution to reviewer tastes can mitigate the negative effect of disagreement on product evaluations. Specifically, if consumers infer that reviewers' heterogeneous preferences result in subjectively different experiences and thereby highly diverse ratings, they would not disregard the overall quality of a product. However, if consumers infer that reviewers' preferences are quite homogeneous and thus the low reliability of the product quality contributes to such disagreements, they would discount the overall product quality. Therefore, consumers would respond more favorably to disagreements in ratings when preference heterogeneity is perceived as high rather than low. This study furthermore extends this prediction to the various levels of average ratings. The heuristicsystematic processing model so far indicates that the engagement in effortful systematic processing occurs only when sufficient motivation is present (Hann et al. 2007; Maheswaran and Chaiken 1991; Martin and Davies 1998). One of the key factors affecting this motivation is the aspiration level of the decision maker. Only under conditions that meet or exceed his aspiration level does he tend to engage in systematic processing (Patzelt and Shepherd 2008; Stephanous and Sage 1987). Therefore, systematic causal attribution processing regarding ratings variance is likely more activated when the average rating is high enough to meet the aspiration level than when it is too low to meet it. Considering that the interaction between ratings variance and preference heterogeneity occurs through the mediation of causal attribution, this greater activation of causal attribution in high versus low average ratings would lead to more pronounced interaction between ratings variance and preference heterogeneity in high versus low average ratings. Overall, this study proposes that the interaction between ratings variance and preference heterogeneity is more pronounced when the average rating is high as compared to when it is low. Two laboratory studies lend support to these predictions. Study 1 reveals that participants exposed to a high-preference heterogeneity book title (i.e., a novel) attributed disagreement in ratings more to reviewers' tastes, and thereby more favorably evaluated books with such ratings, compared to those exposed to a low-preference heterogeneity title (i.e., an English listening practice book). Study 2 then extended these findings to the various levels of average ratings and found that this greater preference for disagreement options under high preference heterogeneity is more pronounced when the average rating is high compared to when it is low. This study makes an important theoretical contribution to the online customer ratings literature by showing that preference heterogeneity serves as a key moderator of the effect of ratings variance on product evaluations and that causal attribution acts as a mediator of this moderation effect. A more comprehensive picture of the interplay among ratings variance, preference heterogeneity, and average ratings is also provided by revealing that the interaction between ratings variance and preference heterogeneity varies as a function of the average rating. In addition, this work provides some significant managerial implications for marketers in terms of how they manage word of mouth. Because a lack of consensus creates some uncertainty and anxiety over the given information, consumers experience a psychological burden regarding their choice of a product when ratings show disagreement. The results of this study offer a way to address this problem. By explicitly clarifying that there are many more differences in tastes among reviewers than expected, marketers can allow consumers to speculate that differing tastes of reviewers rather than an uncertain or poor product quality contribute to such conflicts in ratings. Thus, when fierce disagreements are observed in the WOM arena, marketers are advised to communicate to consumers that diverse, rather than uniform, tastes govern reviews and evaluations of products.
PDF

Classification and Performance Evaluation Methods of an Algal Bloom Model (적조모형의 분류 및 성능평가 기법)

Cho, Hong-Yeon;Cho, Beom Jun
- Journal of Korean Society of Coastal and Ocean Engineers
- /
- v.26 no.6
- /
- pp.405-412
- /
- 2014
A number of algal bloom models (red-tide models) have been developed and applied to simulate the redtide growth and decline patterns as the interest on the phytoplankton blooms has been continuously increased. The quantitative error analysis of the model is of great importance because the accurate prediction of the red-tide occurrence and transport pattern can be used to setup the effective mitigations and counter-measures on the coastal ecosystem, aquaculture and fisheries damages. The word "red-tide model" is widely used without any clear definitions and references. It makes the comparative evaluation of the ecological models difficult and confusable. It is highly required to do the performance test of the red-tide models based on the suitable classification and appropriate error analysis because model structures are different even though the same/similar words (e.g., red-tide, algal bloom, phytoplankton growth, ecological or ecosystem models) are used. Thus, the references on the model classification are suggested and the advantage and disadvantage of the models are also suggested. The processes and methods on the performance test (quantitative error analysis) are recommend to the practical use of the red-tide model in the coastal seas. It is suggested in each stage of the modeling procedures, such as verification, calibration, validation, and application steps. These suggested references and methods can be attributed to the effective/efficient marine policy decision and the coastal ecosystem management plan setup considering the red-tide and/or ecological models uncertainty.
https://doi.org/10.9765/KSCOE.2014.26.6.405 인용 PDF KSCI

Analysis method of patent document to Forecast Patent Registration (특허 등록 예측을 위한 특허 문서 분석 방법)

Koo, Jung-Min;Park, Sang-Sung;Shin, Young-Geun;Jung, Won-Kyo;Jang, Dong-Sik
- Journal of the Korea Academia-Industrial cooperation Society
- /
- v.11 no.4
- /
- pp.1458-1467
- /
- 2010
Recently, imitation and infringement rights of an intellectual property are being recognized as impediments to nation's industrial growth. To prevent the huge loss which comes from theses impediments, many researchers are studying protection and efficient management of an intellectual property in various ways. Especially, the prediction of patent registration is very important part to protect and assert intellectual property rights. In this study, we propose the patent document analysis method by using text mining to predict whether the patent is registered or rejected. In the first instance, the proposed method builds the database by using the word frequencies of the rejected patent documents. And comparing the builded database with another patent documents draws the similarity value between each patent document and the database. In this study, we used k-means which is partitioning clustering algorithm to select criteria value of patent rejection. In result, we found conclusion that some patent which similar to rejected patent have strong possibility of rejection. We used U.S.A patent documents about bluetooth technology, solar battery technology and display technology for experiment data.
https://doi.org/10.5762/KAIS.2010.11.4.1458 인용 PDF KSCI

A Study on Risk Management of Bill of Lading in International Trade Transaction (국제무역거래에서 선하증권의 위험관리에 관한연구)

Han, Nak-Hyun
- THE INTERNATIONAL COMMERCE & LAW REVIEW
- /
- v.37
- /
- pp.187-216
- /
- 2008
Risk regarding the possibility of loss can be especially problematic. If a loss is certain to occur, it may be planned for in advance and treated as a definite, known expense. It is when there is uncertainty about the occurrence of a loss that risk becomes an important problem. The word risk is often used in connection with insurance. No one generally accepted definition of risk exists, however. Of the many definitions, two distinctive ones are commonly used. One defines risk as the variation in possible outcomes of an event based on chance. That is, the greater the number of different outcomes that may occur, the greater the risk. Another way of expressing this concept is to state: The greater the variation around an average expected loss, the greater the risk. The second definition of risk is the uncertainty concerning a possible loss. The definition of risk as a useful one because it focuses attention on the degree of risk in given situations. The degree of risk is a measure of the accuracy with which the outcome of an event based on chance can be predicted. For now, it will serve our purpose to note the more accurate the prediction of the outcome of an event based on chance, the lower the degree of risk. After sources of risks are identified and measured, a decision can be made as to how the risk should be handled. A pure risk that is not identified does not disappear, the business merely loses the opportunity to consciously decide on the best technique for dealing with that risk. The process used to systematically manage risk exposures is known as risk management. Some persons use the term risk management only in connection with businesses, and often the term refers only to the management of pure risks. In this sense, the traditional risk management goal has been to minimize the cost of pure risk to the company. But as firms broaden the ways that they view and manage many different types of risk, the need for new terminology has become apparent. The terms integrated risk management and enterprise risk management reflect the intent to manage all forms of risk, regardless of type. International trade transaction is called between countries has features of globalism, cultural gap, long distance and long terms for the transaction. It is riskier than domestic transaction has its specific risks, such as foreign exchange risk and political risk, and requires various active risk management skills. Risks in relation to the international trade transaction are the contract risk, transit risk and payment risk, etc. The risk management in relation to the international trade transaction is to identify and measure these risks. The purpose of this study is to analyse the practical problems and its solution plan by analyzing various cases related to the risk management of bill of lading in the international trade transaction.
PDF

A Study on Verification of Equivalence and Effectiveness of Non-Pharmacologic Dementia Prevention and Early Detection Contents : Non-Randomly Equivalent Design

Jeong, Hyun-Seok;Kim, Oh-Lyong;Koo, Bon-Hoon;Kim, Ki-Hyun;Kim, Gi-Hwan;Bai, Dai-Seg;Kim, Ji-Yean;Chang, Mun-Seon;Kim, Hye-Geum
- Journal of Korean Neurosurgical Society
- /
- v.65 no.2
- /
- pp.315-324
- /
- 2022
Objective : The aim of this study was to verify the equivalence and effectiveness of the tablet-administered Korean Repeatable Battery for the Assessment of Neuropsychological Status (K-RBANS) for the prevention and early detection of dementia. Methods : Data from 88 psychiatry and neurology patient samples were examined to evaluate the equivalence between tablet and paper administrations of the K-RBANS using a non-randomly equivalent group design. We calculated the prediction scores of the tablet-administered K-RBANS based on demographics and covariate-test scores for focal tests using norm samples and tested format effects. In addition, we compared the receiver operating characteristic curves to confirm the effectiveness of the K-RBANS for preventing and detecting dementia. Results : In the analysis of raw scores, line orientation showed a significant difference (t=-2.94, p<0.001), and subtests showed small to large effect sizes (0.04-0.86) between paper- and tablet-administered K-RBANS. To investigate the format effect, we compared the predicted scaled scores of the tablet sample to the scaled scores of the norm sample. Consequently, a small effect size (d≤0.20) was observed in most of the subtests, except word list and story recall, which showed a medium effect size (d=0.21), while picture naming and subtests of delayed memory showed significant differences in the one-sample t-test. In addition, the area under the curve of the total scale index (TSI) (0.827; 95% confidence interval, 0.738-0.916) was higher than that of the five indices, ranging from 0.688 to 0.820. The sensitivity and specificity of TSI were 80% and 76%, respectively. Conclusion : The overall results of this study suggest that the tablet-administered K-RBANS showed significant equivalence to the norm sample, although some subtests showed format effects, and it may be used as a valid tool for the brief screening of patients with neuropsychological disorders in Korea.
https://doi.org/10.3340/jkns.2021.0153 인용 PDF KSCI

Priority Area Prediction Service for Local Road Packaging Maintenance Using Spatial Big Data (공간 빅데이터를 활용한 지방도 포장보수 우선지역 예측 서비스)

Minyoung Lee;Jiwoo Choi;Inyoung Kim;Sujin Son;Inho Choi
- Journal of Intelligence and Information Systems
- /
- v.29 no.3
- /
- pp.79-101
- /
- 2023
The current status of local road pavement management in Jeollabuk-do only relies on the accomplishments of the site construction company's pavement repair and is only managed through Microsoft Excel and word documents. Furthermore, the budget is irregular each year. Accordingly, a systematic maintenance plan for local roads is necessary. In this paper, data related to road damage and road environment were collected and processed to derive possible areas which could suffer from road damage. The effectiveness of the methodology was reviewed through the on-site inspection of the area. According to the Ministry of Land, Infrastructure and Transport, in 2018, the number of damages on general national roads were about 47,000. In 2019, it reached around 38,000. Furthermore, the number of lawsuits regarding the road damages were about 93 in 2018 and it increased to 119 in 2019. In the case of national roads, the number of damages decreased compared to 2018 due to pavement repairs. To measure the priorities in maintenance of local roads at Jeollabuk-do, data on maintenance history, local port hole occurrence site, overlapping business section, and emergency maintenance section were transformed into data. Eventually, it led to improvements in maintenance of local roads. Furthermore, spatial data were constructed using various current status data related to roads, and finally the data was processed into a new form that could be utilized in machine learning and predictions. Using the spatial data, areas requiring maintenance on pavement were predicted and the results were used to establish new budgets and policies on road management.
https://doi.org/10.13088/jiis.2023.29.3.079 인용 PDF

VKOSPI Forecasting and Option Trading Application Using SVM (SVM을 이용한 VKOSPI 일 중 변화 예측과 실제 옵션 매매에의 적용)

Ra, Yun Seon;Choi, Heung Sik;Kim, Sun Woong
- Journal of Intelligence and Information Systems
- /
- v.22 no.4
- /
- pp.177-192
- /
- 2016
Machine learning is a field of artificial intelligence. It refers to an area of computer science related to providing machines the ability to perform their own data analysis, decision making and forecasting. For example, one of the representative machine learning models is artificial neural network, which is a statistical learning algorithm inspired by the neural network structure of biology. In addition, there are other machine learning models such as decision tree model, naive bayes model and SVM(support vector machine) model. Among the machine learning models, we use SVM model in this study because it is mainly used for classification and regression analysis that fits well to our study. The core principle of SVM is to find a reasonable hyperplane that distinguishes different group in the data space. Given information about the data in any two groups, the SVM model judges to which group the new data belongs based on the hyperplane obtained from the given data set. Thus, the more the amount of meaningful data, the better the machine learning ability. In recent years, many financial experts have focused on machine learning, seeing the possibility of combining with machine learning and the financial field where vast amounts of financial data exist. Machine learning techniques have been proved to be powerful in describing the non-stationary and chaotic stock price dynamics. A lot of researches have been successfully conducted on forecasting of stock prices using machine learning algorithms. Recently, financial companies have begun to provide Robo-Advisor service, a compound word of Robot and Advisor, which can perform various financial tasks through advanced algorithms using rapidly changing huge amount of data. Robo-Adviser's main task is to advise the investors about the investor's personal investment propensity and to provide the service to manage the portfolio automatically. In this study, we propose a method of forecasting the Korean volatility index, VKOSPI, using the SVM model, which is one of the machine learning methods, and applying it to real option trading to increase the trading performance. VKOSPI is a measure of the future volatility of the KOSPI 200 index based on KOSPI 200 index option prices. VKOSPI is similar to the VIX index, which is based on S&P 500 option price in the United States. The Korea Exchange(KRX) calculates and announce the real-time VKOSPI index. VKOSPI is the same as the usual volatility and affects the option prices. The direction of VKOSPI and option prices show positive relation regardless of the option type (call and put options with various striking prices). If the volatility increases, all of the call and put option premium increases because the probability of the option's exercise possibility increases. The investor can know the rising value of the option price with respect to the volatility rising value in real time through Vega, a Black-Scholes's measurement index of an option's sensitivity to changes in the volatility. Therefore, accurate forecasting of VKOSPI movements is one of the important factors that can generate profit in option trading. In this study, we verified through real option data that the accurate forecast of VKOSPI is able to make a big profit in real option trading. To the best of our knowledge, there have been no studies on the idea of predicting the direction of VKOSPI based on machine learning and introducing the idea of applying it to actual option trading. In this study predicted daily VKOSPI changes through SVM model and then made intraday option strangle position, which gives profit as option prices reduce, only when VKOSPI is expected to decline during daytime. We analyzed the results and tested whether it is applicable to real option trading based on SVM's prediction. The results showed the prediction accuracy of VKOSPI was 57.83% on average, and the number of position entry times was 43.2 times, which is less than half of the benchmark (100 times). A small number of trading is an indicator of trading efficiency. In addition, the experiment proved that the trading performance was significantly higher than the benchmark.
https://doi.org/10.13088/jiis.2016.22.4.177 인용 PDF KSCI

Construction of Consumer Confidence index based on Sentiment analysis using News articles (뉴스기사를 이용한 소비자의 경기심리지수 생성)

Song, Minchae;Shin, Kyung-shik
- Journal of Intelligence and Information Systems
- /
- v.23 no.3
- /
- pp.1-27
- /
- 2017
It is known that the economic sentiment index and macroeconomic indicators are closely related because economic agent's judgment and forecast of the business conditions affect economic fluctuations. For this reason, consumer sentiment or confidence provides steady fodder for business and is treated as an important piece of economic information. In Korea, private consumption accounts and consumer sentiment index highly relevant for both, which is a very important economic indicator for evaluating and forecasting the domestic economic situation. However, despite offering relevant insights into private consumption and GDP, the traditional approach to measuring the consumer confidence based on the survey has several limits. One possible weakness is that it takes considerable time to research, collect, and aggregate the data. If certain urgent issues arise, timely information will not be announced until the end of each month. In addition, the survey only contains information derived from questionnaire items, which means it can be difficult to catch up to the direct effects of newly arising issues. The survey also faces potential declines in response rates and erroneous responses. Therefore, it is necessary to find a way to complement it. For this purpose, we construct and assess an index designed to measure consumer economic sentiment index using sentiment analysis. Unlike the survey-based measures, our index relies on textual analysis to extract sentiment from economic and financial news articles. In particular, text data such as news articles and SNS are timely and cover a wide range of issues; because such sources can quickly capture the economic impact of specific economic issues, they have great potential as economic indicators. There exist two main approaches to the automatic extraction of sentiment from a text, we apply the lexicon-based approach, using sentiment lexicon dictionaries of words annotated with the semantic orientations. In creating the sentiment lexicon dictionaries, we enter the semantic orientation of individual words manually, though we do not attempt a full linguistic analysis (one that involves analysis of word senses or argument structure); this is the limitation of our research and further work in that direction remains possible. In this study, we generate a time series index of economic sentiment in the news. The construction of the index consists of three broad steps: (1) Collecting a large corpus of economic news articles on the web, (2) Applying lexicon-based methods for sentiment analysis of each article to score the article in terms of sentiment orientation (positive, negative and neutral), and (3) Constructing an economic sentiment index of consumers by aggregating monthly time series for each sentiment word. In line with existing scholarly assessments of the relationship between the consumer confidence index and macroeconomic indicators, any new index should be assessed for its usefulness. We examine the new index's usefulness by comparing other economic indicators to the CSI. To check the usefulness of the newly index based on sentiment analysis, trend and cross - correlation analysis are carried out to analyze the relations and lagged structure. Finally, we analyze the forecasting power using the one step ahead of out of sample prediction. As a result, the news sentiment index correlates strongly with related contemporaneous key indicators in almost all experiments. We also find that news sentiment shocks predict future economic activity in most cases. In almost all experiments, the news sentiment index strongly correlates with related contemporaneous key indicators. Furthermore, in most cases, news sentiment shocks predict future economic activity; in head-to-head comparisons, the news sentiment measures outperform survey-based sentiment index as CSI. Policy makers want to understand consumer or public opinions about existing or proposed policies. Such opinions enable relevant government decision-makers to respond quickly to monitor various web media, SNS, or news articles. Textual data, such as news articles and social networks (Twitter, Facebook and blogs) are generated at high-speeds and cover a wide range of issues; because such sources can quickly capture the economic impact of specific economic issues, they have great potential as economic indicators. Although research using unstructured data in economic analysis is in its early stages, but the utilization of data is expected to greatly increase once its usefulness is confirmed.
https://doi.org/10.13088/jiis.2017.23.3.001 인용 PDF KSCI

A Study on Knowledge Entity Extraction Method for Individual Stocks Based on Neural Tensor Network (뉴럴 텐서 네트워크 기반 주식 개별종목 지식개체명 추출 방법에 관한 연구)

Yang, Yunseok;Lee, Hyun Jun;Oh, Kyong Joo
- Journal of Intelligence and Information Systems
- /
- v.25 no.2
- /
- pp.25-38
- /
- 2019
Selecting high-quality information that meets the interests and needs of users among the overflowing contents is becoming more important as the generation continues. In the flood of information, efforts to reflect the intention of the user in the search result better are being tried, rather than recognizing the information request as a simple string. Also, large IT companies such as Google and Microsoft focus on developing knowledge-based technologies including search engines which provide users with satisfaction and convenience. Especially, the finance is one of the fields expected to have the usefulness and potential of text data analysis because it's constantly generating new information, and the earlier the information is, the more valuable it is. Automatic knowledge extraction can be effective in areas where information flow is vast, such as financial sector, and new information continues to emerge. However, there are several practical difficulties faced by automatic knowledge extraction. First, there are difficulties in making corpus from different fields with same algorithm, and it is difficult to extract good quality triple. Second, it becomes more difficult to produce labeled text data by people if the extent and scope of knowledge increases and patterns are constantly updated. Third, performance evaluation is difficult due to the characteristics of unsupervised learning. Finally, problem definition for automatic knowledge extraction is not easy because of ambiguous conceptual characteristics of knowledge. So, in order to overcome limits described above and improve the semantic performance of stock-related information searching, this study attempts to extract the knowledge entity by using neural tensor network and evaluate the performance of them. Different from other references, the purpose of this study is to extract knowledge entity which is related to individual stock items. Various but relatively simple data processing methods are applied in the presented model to solve the problems of previous researches and to enhance the effectiveness of the model. From these processes, this study has the following three significances. First, A practical and simple automatic knowledge extraction method that can be applied. Second, the possibility of performance evaluation is presented through simple problem definition. Finally, the expressiveness of the knowledge increased by generating input data on a sentence basis without complex morphological analysis. The results of the empirical analysis and objective performance evaluation method are also presented. The empirical study to confirm the usefulness of the presented model, experts' reports about individual 30 stocks which are top 30 items based on frequency of publication from May 30, 2017 to May 21, 2018 are used. the total number of reports are 5,600, and 3,074 reports, which accounts about 55% of the total, is designated as a training set, and other 45% of reports are designated as a testing set. Before constructing the model, all reports of a training set are classified by stocks, and their entities are extracted using named entity recognition tool which is the KKMA. for each stocks, top 100 entities based on appearance frequency are selected, and become vectorized using one-hot encoding. After that, by using neural tensor network, the same number of score functions as stocks are trained. Thus, if a new entity from a testing set appears, we can try to calculate the score by putting it into every single score function, and the stock of the function with the highest score is predicted as the related item with the entity. To evaluate presented models, we confirm prediction power and determining whether the score functions are well constructed by calculating hit ratio for all reports of testing set. As a result of the empirical study, the presented model shows 69.3% hit accuracy for testing set which consists of 2,526 reports. this hit ratio is meaningfully high despite of some constraints for conducting research. Looking at the prediction performance of the model for each stocks, only 3 stocks, which are LG ELECTRONICS, KiaMtr, and Mando, show extremely low performance than average. this result maybe due to the interference effect with other similar items and generation of new knowledge. In this paper, we propose a methodology to find out key entities or their combinations which are necessary to search related information in accordance with the user's investment intention. Graph data is generated by using only the named entity recognition tool and applied to the neural tensor network without learning corpus or word vectors for the field. From the empirical test, we confirm the effectiveness of the presented model as described above. However, there also exist some limits and things to complement. Representatively, the phenomenon that the model performance is especially bad for only some stocks shows the need for further researches. Finally, through the empirical study, we confirmed that the learning method presented in this study can be used for the purpose of matching the new text information semantically with the related stocks.
https://doi.org/10.13088/jiis.2019.25.2.025 인용 PDF KSCI HTML

Search Result 114, Processing Time 0.027 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)