• Title/Summary/Keyword: Logit Function

Search Result 75, Processing Time 0.017 seconds

Development of Sentiment Analysis Model for the hot topic detection of online stock forums (온라인 주식 포럼의 핫토픽 탐지를 위한 감성분석 모형의 개발)

  • Hong, Taeho;Lee, Taewon;Li, Jingjing
    • Journal of Intelligence and Information Systems
    • /
    • v.22 no.1
    • /
    • pp.187-204
    • /
    • 2016
  • Document classification based on emotional polarity has become a welcomed emerging task owing to the great explosion of data on the Web. In the big data age, there are too many information sources to refer to when making decisions. For example, when considering travel to a city, a person may search reviews from a search engine such as Google or social networking services (SNSs) such as blogs, Twitter, and Facebook. The emotional polarity of positive and negative reviews helps a user decide on whether or not to make a trip. Sentiment analysis of customer reviews has become an important research topic as datamining technology is widely accepted for text mining of the Web. Sentiment analysis has been used to classify documents through machine learning techniques, such as the decision tree, neural networks, and support vector machines (SVMs). is used to determine the attitude, position, and sensibility of people who write articles about various topics that are published on the Web. Regardless of the polarity of customer reviews, emotional reviews are very helpful materials for analyzing the opinions of customers through their reviews. Sentiment analysis helps with understanding what customers really want instantly through the help of automated text mining techniques. Sensitivity analysis utilizes text mining techniques on text on the Web to extract subjective information in the text for text analysis. Sensitivity analysis is utilized to determine the attitudes or positions of the person who wrote the article and presented their opinion about a particular topic. In this study, we developed a model that selects a hot topic from user posts at China's online stock forum by using the k-means algorithm and self-organizing map (SOM). In addition, we developed a detecting model to predict a hot topic by using machine learning techniques such as logit, the decision tree, and SVM. We employed sensitivity analysis to develop our model for the selection and detection of hot topics from China's online stock forum. The sensitivity analysis calculates a sentimental value from a document based on contrast and classification according to the polarity sentimental dictionary (positive or negative). The online stock forum was an attractive site because of its information about stock investment. Users post numerous texts about stock movement by analyzing the market according to government policy announcements, market reports, reports from research institutes on the economy, and even rumors. We divided the online forum's topics into 21 categories to utilize sentiment analysis. One hundred forty-four topics were selected among 21 categories at online forums about stock. The posts were crawled to build a positive and negative text database. We ultimately obtained 21,141 posts on 88 topics by preprocessing the text from March 2013 to February 2015. The interest index was defined to select the hot topics, and the k-means algorithm and SOM presented equivalent results with this data. We developed a decision tree model to detect hot topics with three algorithms: CHAID, CART, and C4.5. The results of CHAID were subpar compared to the others. We also employed SVM to detect the hot topics from negative data. The SVM models were trained with the radial basis function (RBF) kernel function by a grid search to detect the hot topics. The detection of hot topics by using sentiment analysis provides the latest trends and hot topics in the stock forum for investors so that they no longer need to search the vast amounts of information on the Web. Our proposed model is also helpful to rapidly determine customers' signals or attitudes towards government policy and firms' products and services.

Patent Production and Technological Performance of Korean Firms: The Role of Corporate Innovation Strategies (특허생산과 기술성과: 기업 혁신전략의 역할)

  • Lee, Jukwan;Jung, Jin Hwa
    • Journal of Technology Innovation
    • /
    • v.22 no.1
    • /
    • pp.149-175
    • /
    • 2014
  • This study analyzed the effect of corporate innovation strategies on patent production and ultimately on technological change and new product development of firms in South Korea. The intent was to derive efficient strategies for enhancing technological performance of the firms. For the empirical analysis, three sources of data were combined: four waves of the Human Capital Corporate Panel Survey (HCCP) data collected by the Korea Research Institute for Vocational Education and Training (KRIVET), corporate financial data obtained from the Korea Information Service (KIS), and corporate patent data provided by the Korean Intellectual Property Office (KIPO). The patent production function was estimated by zero-inflated negative binomial (ZINB) regression. The technological performance function was estimated by two-stage regression, taking into account the endogeneity of patent production. An ordered logit model was applied for the second stage regression. Empirical results confirmed the critical role of corporate innovation strategies in patent production and in facilitating technological change and new product development of the firms. In patent production, the firms' R&D investment and human resources were key determinants. Higher R&D intensity led to more patents, yet with decreasing marginal productivity. A larger stock of registered patents also led to a larger flow of new patent production. Firms were more prolific in patent production when they had high-quality personnel, intensely investing in human resource development, and adopting market-leading or fast-follower strategy as compared to stability strategy. In technological performance, the firms' human resources played a key role in accelerating technological change and new product development. R&D intensity expedited new product development of the firm. Firms adopting market-leading or fast-follower strategy were at an advantage than those with stability strategy in technological performance. Firms prolific in patent production were also advanced in terms of technological change and new product development. However, the nexus between patent production and technological performance measures was substantially reduced when controlling for the endogeneity of patent production. These results suggest that firms need to strengthen the linkage between patent production and technological performance, and take strategies that address each firm's capacities and needs.

The Effect of Objective and Subjective Social Isolation and Interpersonal Conflict Type on the Probability of Cognitive Impairment by Age Group in Old Age (노년기 연령집단별 객관적·주관적 사회적 고립과 대인관계갈등 유형이 인지기능에 미치는 영향)

  • Lee, Sang Chul
    • 한국노년학
    • /
    • v.38 no.4
    • /
    • pp.811-835
    • /
    • 2018
  • Social relations and cognitive function in old age are closely related to each other, and social relation is classified into structural characteristics and qualitative characteristics reflecting cognitive and emotional evaluation. The concept of social isolation is the focus of attention in relation to the social relations of old age. Social isolation has a multidimensional theoretical structure that is divided into objective dimension such as social network, type of furniture, social participation, and subjective dimension such as lack of perceived social support and loneliness. There is also a close relationship between cognitive function and interpersonal conflict in old age. In this study, we examined the effect of subjective social isolation, which shows the structural characteristics of social relations, and subjective social isolation and interpersonal conflict on the dementia occurrence by age group in the elderly. The data were analyzed by applying a random effect panel logit model using 1,740 panel data from the first year to the third year of KSHAP. The results of the analysis are summarized as follows. First, the cognitive impairment increased sharply with age. Objective and subjective social isolation were both U-shaped distribution with an inflection point of 80 years old. Second, the main effect on the probability of cognitive impairment was statistically significant with objective and subjective social isolation, but the type of interpersonal conflict did not appear to be significant. Third, the results of two-way interaction effect analysis on the probability of cognitive impairment are as follows. The relationship between subjective social isolation and the probability of occurrence of cognitive impairment was significantly different according to the level of conflict with spouse. In addition, the higher the subjective social isolation, the higher the probability of cognitive impairment in the elderly(over 85) than in the young-old(65~74). In addition, as the level of conflict with spouses increases, the probability of cognitive impairment of the oldest-old(aged 85 or older) is drastically lower than that of the young-old(aged 65~74). Based on the results of this study, policy and practical implications for reducing the cognitive impairment of the elderly age group were suggested, and limitations of the study and suggestions for future research were discussed.

A Stochastic User Equilibrium Transit Assignment Algorithm for Multiple User Classes (다계층을 고려한 대중교통 확률적사용자균형 알고리즘 개발)

  • Yu, Soon-Kyoung;Lim, Kang-Won;Lee, Young-Ihn;Lim, Yong-Taek
    • Journal of Korean Society of Transportation
    • /
    • v.23 no.7 s.85
    • /
    • pp.165-179
    • /
    • 2005
  • The object of this study is a development of a stochastic user equilibrium transit assignment algorithm for multiple user classes considering stochastic characteristics and heterogeneous attributes of passengers. The existing transit assignment algorithms have limits to attain realistic results because they assume a characteristic of passengers to be equal. Although one group with transit information and the other group without it have different trip patterns, the past studies could not explain the differences. For overcoming the problems, we use following methods. First, we apply a stochastic transit assignment model to obtain the difference of the perceived travel cost between passengers and apply a multiple user class assignment model to obtain the heterogeneous qualify of groups to get realistic results. Second, we assume that person trips have influence on the travel cost function in the development of model. Third, we use a C-logit model for solving IIA(independence of irrelevant alternatives) problems. According to repetition assigned trips and equivalent path cost have difference by each group and each path. The result comes close to stochastic user equilibrium and converging speed is very fast. The algorithm of this study is expected to make good use of evaluation tools in the transit policies by applying heterogeneous attributes and OD data.

A Time Series Graph based Convolutional Neural Network Model for Effective Input Variable Pattern Learning : Application to the Prediction of Stock Market (효과적인 입력변수 패턴 학습을 위한 시계열 그래프 기반 합성곱 신경망 모형: 주식시장 예측에의 응용)

  • Lee, Mo-Se;Ahn, Hyunchul
    • Journal of Intelligence and Information Systems
    • /
    • v.24 no.1
    • /
    • pp.167-181
    • /
    • 2018
  • Over the past decade, deep learning has been in spotlight among various machine learning algorithms. In particular, CNN(Convolutional Neural Network), which is known as the effective solution for recognizing and classifying images or voices, has been popularly applied to classification and prediction problems. In this study, we investigate the way to apply CNN in business problem solving. Specifically, this study propose to apply CNN to stock market prediction, one of the most challenging tasks in the machine learning research. As mentioned, CNN has strength in interpreting images. Thus, the model proposed in this study adopts CNN as the binary classifier that predicts stock market direction (upward or downward) by using time series graphs as its inputs. That is, our proposal is to build a machine learning algorithm that mimics an experts called 'technical analysts' who examine the graph of past price movement, and predict future financial price movements. Our proposed model named 'CNN-FG(Convolutional Neural Network using Fluctuation Graph)' consists of five steps. In the first step, it divides the dataset into the intervals of 5 days. And then, it creates time series graphs for the divided dataset in step 2. The size of the image in which the graph is drawn is $40(pixels){\times}40(pixels)$, and the graph of each independent variable was drawn using different colors. In step 3, the model converts the images into the matrices. Each image is converted into the combination of three matrices in order to express the value of the color using R(red), G(green), and B(blue) scale. In the next step, it splits the dataset of the graph images into training and validation datasets. We used 80% of the total dataset as the training dataset, and the remaining 20% as the validation dataset. And then, CNN classifiers are trained using the images of training dataset in the final step. Regarding the parameters of CNN-FG, we adopted two convolution filters ($5{\times}5{\times}6$ and $5{\times}5{\times}9$) in the convolution layer. In the pooling layer, $2{\times}2$ max pooling filter was used. The numbers of the nodes in two hidden layers were set to, respectively, 900 and 32, and the number of the nodes in the output layer was set to 2(one is for the prediction of upward trend, and the other one is for downward trend). Activation functions for the convolution layer and the hidden layer were set to ReLU(Rectified Linear Unit), and one for the output layer set to Softmax function. To validate our model - CNN-FG, we applied it to the prediction of KOSPI200 for 2,026 days in eight years (from 2009 to 2016). To match the proportions of the two groups in the independent variable (i.e. tomorrow's stock market movement), we selected 1,950 samples by applying random sampling. Finally, we built the training dataset using 80% of the total dataset (1,560 samples), and the validation dataset using 20% (390 samples). The dependent variables of the experimental dataset included twelve technical indicators popularly been used in the previous studies. They include Stochastic %K, Stochastic %D, Momentum, ROC(rate of change), LW %R(Larry William's %R), A/D oscillator(accumulation/distribution oscillator), OSCP(price oscillator), CCI(commodity channel index), and so on. To confirm the superiority of CNN-FG, we compared its prediction accuracy with the ones of other classification models. Experimental results showed that CNN-FG outperforms LOGIT(logistic regression), ANN(artificial neural network), and SVM(support vector machine) with the statistical significance. These empirical results imply that converting time series business data into graphs and building CNN-based classification models using these graphs can be effective from the perspective of prediction accuracy. Thus, this paper sheds a light on how to apply deep learning techniques to the domain of business problem solving.