Predicting the Direction of the Stock Index by Using a Domain-Specific Sentiment Dictionary (주가지수 방향성 예측을 위한 주제지향 감성사전 구축 방안)
-
- Journal of Intelligence and Information Systems
- /
- v.19 no.1
- /
- pp.95-110
- /
- 2013
Recently, the amount of unstructured data being generated through a variety of social media has been increasing rapidly, resulting in the increasing need to collect, store, search for, analyze, and visualize this data. This kind of data cannot be handled appropriately by using the traditional methodologies usually used for analyzing structured data because of its vast volume and unstructured nature. In this situation, many attempts are being made to analyze unstructured data such as text files and log files through various commercial or noncommercial analytical tools. Among the various contemporary issues dealt with in the literature of unstructured text data analysis, the concepts and techniques of opinion mining have been attracting much attention from pioneer researchers and business practitioners. Opinion mining or sentiment analysis refers to a series of processes that analyze participants' opinions, sentiments, evaluations, attitudes, and emotions about selected products, services, organizations, social issues, and so on. In other words, many attempts based on various opinion mining techniques are being made to resolve complicated issues that could not have otherwise been solved by existing traditional approaches. One of the most representative attempts using the opinion mining technique may be the recent research that proposed an intelligent model for predicting the direction of the stock index. This model works mainly on the basis of opinions extracted from an overwhelming number of economic news repots. News content published on various media is obviously a traditional example of unstructured text data. Every day, a large volume of new content is created, digitalized, and subsequently distributed to us via online or offline channels. Many studies have revealed that we make better decisions on political, economic, and social issues by analyzing news and other related information. In this sense, we expect to predict the fluctuation of stock markets partly by analyzing the relationship between economic news reports and the pattern of stock prices. So far, in the literature on opinion mining, most studies including ours have utilized a sentiment dictionary to elicit sentiment polarity or sentiment value from a large number of documents. A sentiment dictionary consists of pairs of selected words and their sentiment values. Sentiment classifiers refer to the dictionary to formulate the sentiment polarity of words, sentences in a document, and the whole document. However, most traditional approaches have common limitations in that they do not consider the flexibility of sentiment polarity, that is, the sentiment polarity or sentiment value of a word is fixed and cannot be changed in a traditional sentiment dictionary. In the real world, however, the sentiment polarity of a word can vary depending on the time, situation, and purpose of the analysis. It can also be contradictory in nature. The flexibility of sentiment polarity motivated us to conduct this study. In this paper, we have stated that sentiment polarity should be assigned, not merely on the basis of the inherent meaning of a word but on the basis of its ad hoc meaning within a particular context. To implement our idea, we presented an intelligent investment decision-support model based on opinion mining that performs the scrapping and parsing of massive volumes of economic news on the web, tags sentiment words, classifies sentiment polarity of the news, and finally predicts the direction of the next day's stock index. In addition, we applied a domain-specific sentiment dictionary instead of a general purpose one to classify each piece of news as either positive or negative. For the purpose of performance evaluation, we performed intensive experiments and investigated the prediction accuracy of our model. For the experiments to predict the direction of the stock index, we gathered and analyzed 1,072 articles about stock markets published by "M" and "E" media between July 2011 and September 2011.
The exhibition industry, as technology-intensive, eco-friendly industry, contributes to regional and national development and enhancement of its image as well, if it joins cultural and tourist industry. Therefore, We need to revitalize the exhibition industry, as actively holding an exhibition event. However, to attract a number of exhibition audience, the work of enhancing audience satisfaction and awareness of value for participation should be prioritized after improving quality of service within exhibition hall. As one way to enhance the quality of service, it is thought that the way providing personalized service geared toward each audience is needed. that is, if audience avoids the complexity in exhibition space and it affords them service to enable effective time and space management, it will improve the satisfaction. All such personalized service affordable lets the audience's preference on the basis of each audience profile registered in advance online grasp. and Based on this information, it is provided with exhibition-related information suited their purpose that is the booth for the interesting audience, the shortest path to go to the booth and event via audience's smart phone. and it collects audience's reaction information, such as visiting the booth, participating the event through offered the information in this way and location information for the flow of movement, the present position so that it makes revision of existing each audience profile. After correcting the information, it extracts the individual's preference. hereunder, it provides recommend booth and event information. in other words, it provides optimal information for individual by amendment based on reaction information about recommending information built on basic profile. It provides personalized service dynamic and interactive with audience. This paper will be able to provide the most suitable information for each audience through circular and interactive structure and designed smart-phone application supportable for updating dynamic and interactive personalized service that is able to afford surrounding information in real time, as locating movement position through sensing. The proposed application collects user‘s context information and carrys information gathering function collecting the reaction about searched or provided information via sensing. and it also carrys information gathering function providing needed data for user in exhibition hall. In other words, it offers information about recommend booth of position foundation for user, location-based services of recommend booth and involves service providing detailed information for inside exhibition by using service of augmented reality, the map of whole exhibition as well. and it is also provided with SNS service that is able to keep information exchange besides intimacy. To provide this service, application is consisted of several module. first of all, it includes UNS identity module for sensing, and contain sensor information gathering module handling and collecting the perceived information through this module. Sensor information gathered like this transmits the information gathering server. and there is exhibition information interfacing with user and this module transmits to interesting information collection module through user's reaction besides interface. Interesting information collection module transmits collected information and If valid information out of the information gathering server that brings together sensing information and interesting information is sent to recommend server, the recommend server makes recommend information through inference with gathered valid information. If this server transmit by exhibition information process, exhibition information process module is provided with user by interface. Through this system it raises the dynamic, intelligent personalized service for user.
This study investigated consumer intention to use a location-based mobile shopping service (LBMSS) that integrates cognitive and affective responses. Information relevancy was integrated into pleasure-arousal-dominance (PAD) emotional state model in the present study as a conceptual framework. The results of an online survey of 335 mobile phone users in the U.S. indicated the positive effects of arousal and information relevancy on pleasure. In addition, there was a significant relationship between pleasure and intention to use a LBMSS. However, the relationship between dominance and pleasure was not statistically significant. The results of the present study provides insight to retailers and marketers as to what factors they need to consider to implement location-based mobile shopping services to improve their business performance. Extended Abstract : Location aware technology has expanded the marketer's reach by reducing space and time between a consumer's receipt of advertising and purchase, offering real-time information and coupons to consumers in purchasing situations (Dickenger and Kleijnen, 2008; Malhotra and Malhotra, 2009). LBMSS increases the relevancy of SMS marketing by linking advertisements to a user's location (Bamba and Barnes, 2007; Malhotra and Malhotra, 2009). This study investigated consumer intention to use a location-based mobile shopping service (LBMSS) that integrates cognitive and affective response. The purpose of the study was to examine the relationship among information relevancy and affective variables and their effects on intention to use LBMSS. Thus, information relevancy was integrated into pleasure-arousal-dominance (PAD) model and generated the following hypotheses. Hypothesis 1. There will be a positive influence of arousal concerning LBMSS on pleasure in regard to LBMSS. Hypothesis 2. There will be a positive influence of dominance in LBMSS on pleasure in regard to LBMSS. Hypothesis 3. There will be a positive influence of information relevancy on pleasure in regard to LBMSS. Hypothesis 4. There will be a positive influence of pleasure about LBMSS on intention to use LBMSS. E-mail invitations were sent out to a randomly selected sample of three thousand consumers who are older than 18 years old and mobile phone owners, acquired from an independent marketing research company. An online survey technique was employed utilizing Dillman's (2000) online survey method and follow-ups. A total of 335 valid responses were used for the data analysis in the present study. Before the respondents answer any of the questions, they were told to read a document describing LBMSS. The document included definitions and examples of LBMSS provided by various service providers. After that, they were exposed to a scenario describing the participant as taking a saturday shopping trip to a mall and then receiving a short message from the mall. The short message included new product information and coupons for same day use at participating stores. They then completed a questionnaire containing various questions. To assess arousal, dominance, and pleasure, we adapted and modified scales used in the previous studies in the context of location-based mobile shopping service, each of the five items from Mehrabian and Russell (1974). A total of 15 items were measured on a seven-point bipolar scale. To measure information relevancy, four items were borrowed from Mason et al. (1995). Intention to use LBMSS was captured using two items developed by Blackwell, and Miniard (1995) and one items developed by the authors. Data analyses were conducted using SPSS 19.0 and LISREL 8.72. A total of usable 335 data were obtained after deleting the incomplete responses, which results in a response rate of 11.20%. A little over half of the respondents were male (53.9%) and approximately 60% of respondents were married (57.4%). The mean age of the sample was 29.44 years with a range from 19 to 60 years. In terms of the ethnicity there were European Americans (54.5%), Hispanic American (5.3%), African-American (3.6%), and Asian American (2.9%), respectively. The respondents were highly educated; close to 62.5% of participants in the study reported holding a college degree or its equivalent and 14.5% of the participants had graduate degree. The sample represents all income categories: less than $24,999 (10.8%), $25,000-$49,999 (28.34%), $50,000-$74,999 (13.8%), and $75,000 or more (10.23%). The respondents of the study indicated that they were employed in many occupations. Responses came from all 42 states in the U.S. To identify the dimensions of research constructs, Exploratory Factor Analysis (EFA) using a varimax rotation was conducted. As indicated in table 1, these dimensions: arousal, dominance, relevancy, pleasure, and intention to use, suggested by the EFA, explained 82.29% of the total variance with factor loadings ranged from .74 to .89. As a next step, CFA was conducted to validate the dimensions that were identified from the exploratory factor analysis and to further refine the scale. Table 1 exhibits the results of measurement model analysis and revealed a chi-square of 202.13 with degree-of-freedom of 89 (p =.002), GFI of .93, AGFI = .89, CFI of .99, NFI of .98, which indicates of the evidence of a good model fit to the data (Bagozzi and Yi, 1998; Hair et al., 1998). As table 1 shows, reliability was estimated with Cronbach's alpha and composite reliability (CR) for all multi-item scales. All the values met evidence of satisfactory reliability in multi-item measure for alpha (>.91) and CR (>.80). In addition, we tested the convergent validity of the measure using average variance extracted (AVE) by following recommendations from Fornell and Larcker (1981). The AVE values for the model constructs ranged from .74 through .85, which are higher than the threshold suggested by Fornell and Larcker (1981). To examine discriminant validity of the measure, we again followed the recommendations from Fornell and Larcker (1981). The shared variances between constructs were smaller than the AVE of the research constructs and confirm discriminant validity of the measure. The causal model testing was conducted using LISREL 8.72 with a maximum-likelihood estimation method. Table 2 shows the results of the hypotheses testing. The results for the conceptual model revealed good overall fit for the proposed model. Chi-square was 342.00 (df = 92, p =.000), NFI was .97, NNFI was .97, GFI was .89, AGFI was .83, and RMSEA was .08. All paths in the proposed model received significant statistical support except H2. The paths from arousal to pleasure (H1:
In recent years, frequent itemset mining for considering the importance of each item has been intensively studied as one of important issues in the data mining field. According to strategies utilizing the item importance, itemset mining approaches for discovering itemsets based on the item importance are classified as follows: weighted frequent itemset mining, frequent itemset mining using transactional weights, and utility itemset mining. In this paper, we perform empirical analysis with respect to frequent itemset mining algorithms based on transactional weights. The mining algorithms compute transactional weights by utilizing the weight for each item in large databases. In addition, these algorithms discover weighted frequent itemsets on the basis of the item frequency and weight of each transaction. Consequently, we can see the importance of a certain transaction through the database analysis because the weight for the transaction has higher value if it contains many items with high values. We not only analyze the advantages and disadvantages but also compare the performance of the most famous algorithms in the frequent itemset mining field based on the transactional weights. As a representative of the frequent itemset mining using transactional weights, WIS introduces the concept and strategies of transactional weights. In addition, there are various other state-of-the-art algorithms, WIT-FWIs, WIT-FWIs-MODIFY, and WIT-FWIs-DIFF, for extracting itemsets with the weight information. To efficiently conduct processes for mining weighted frequent itemsets, three algorithms use the special Lattice-like data structure, called WIT-tree. The algorithms do not need to an additional database scanning operation after the construction of WIT-tree is finished since each node of WIT-tree has item information such as item and transaction IDs. In particular, the traditional algorithms conduct a number of database scanning operations to mine weighted itemsets, whereas the algorithms based on WIT-tree solve the overhead problem that can occur in the mining processes by reading databases only one time. Additionally, the algorithms use the technique for generating each new itemset of length N+1 on the basis of two different itemsets of length N. To discover new weighted itemsets, WIT-FWIs performs the itemset combination processes by using the information of transactions that contain all the itemsets. WIT-FWIs-MODIFY has a unique feature decreasing operations for calculating the frequency of the new itemset. WIT-FWIs-DIFF utilizes a technique using the difference of two itemsets. To compare and analyze the performance of the algorithms in various environments, we use real datasets of two types (i.e., dense and sparse) in terms of the runtime and maximum memory usage. Moreover, a scalability test is conducted to evaluate the stability for each algorithm when the size of a database is changed. As a result, WIT-FWIs and WIT-FWIs-MODIFY show the best performance in the dense dataset, and in sparse dataset, WIT-FWI-DIFF has mining efficiency better than the other algorithms. Compared to the algorithms using WIT-tree, WIS based on the Apriori technique has the worst efficiency because it requires a large number of computations more than the others on average.
The incidence of globally infectious and pathogenic diseases such as H1N1 (swine flu) and Avian Influenza (AI) has recently increased. An infectious disease is a pathogen-caused disease, which can be passed from the infected person to the susceptible host. Pathogens of infectious diseases, which are bacillus, spirochaeta, rickettsia, virus, fungus, and parasite, etc., cause various symptoms such as respiratory disease, gastrointestinal disease, liver disease, and acute febrile illness. They can be spread through various means such as food, water, insect, breathing and contact with other persons. Recently, most countries around the world use a mathematical model to predict and prepare for the spread of infectious diseases. In a modern society, however, infectious diseases are spread in a fast and complicated manner because of rapid development of transportation (both ground and underground). Therefore, we do not have enough time to predict the fast spreading and complicated infectious diseases. Therefore, new system, which can prevent the spread of infectious diseases by predicting its pathway, needs to be developed. In this study, to solve this kind of problem, an integrated monitoring system, which can track and predict the pathway of infectious diseases for its realtime monitoring and control, is developed. This system is implemented based on the conventional mathematical model called by 'Susceptible-Infectious-Recovered (SIR) Model.' The proposed model has characteristics that both inter- and intra-city modes of transportation to express interpersonal contact (i.e., migration flow) are considered. They include the means of transportation such as bus, train, car and airplane. Also, modified real data according to the geographical characteristics of Korea are employed to reflect realistic circumstances of possible disease spreading in Korea. We can predict where and when vaccination needs to be performed by parameters control in this model. The simulation includes several assumptions and scenarios. Using the data of Statistics Korea, five major cities, which are assumed to have the most population migration have been chosen; Seoul, Incheon (Incheon International Airport), Gangneung, Pyeongchang and Wonju. It was assumed that the cities were connected in one network, and infectious disease was spread through denoted transportation methods only. In terms of traffic volume, daily traffic volume was obtained from Korean Statistical Information Service (KOSIS). In addition, the population of each city was acquired from Statistics Korea. Moreover, data on H1N1 (swine flu) were provided by Korea Centers for Disease Control and Prevention, and air transport statistics were obtained from Aeronautical Information Portal System. As mentioned above, daily traffic volume, population statistics, H1N1 (swine flu) and air transport statistics data have been adjusted in consideration of the current conditions in Korea and several realistic assumptions and scenarios. Three scenarios (occurrence of H1N1 in Incheon International Airport, not-vaccinated in all cities and vaccinated in Seoul and Pyeongchang respectively) were simulated, and the number of days taken for the number of the infected to reach its peak and proportion of Infectious (I) were compared. According to the simulation, the number of days was the fastest in Seoul with 37 days and the slowest in Pyeongchang with 43 days when vaccination was not considered. In terms of the proportion of I, Seoul was the highest while Pyeongchang was the lowest. When they were vaccinated in Seoul, the number of days taken for the number of the infected to reach at its peak was the fastest in Seoul with 37 days and the slowest in Pyeongchang with 43 days. In terms of the proportion of I, Gangneung was the highest while Pyeongchang was the lowest. When they were vaccinated in Pyeongchang, the number of days was the fastest in Seoul with 37 days and the slowest in Pyeongchang with 43 days. In terms of the proportion of I, Gangneung was the highest while Pyeongchang was the lowest. Based on the results above, it has been confirmed that H1N1, upon the first occurrence, is proportionally spread by the traffic volume in each city. Because the infection pathway is different by the traffic volume in each city, therefore, it is possible to come up with a preventive measurement against infectious disease by tracking and predicting its pathway through the analysis of traffic volume.
1. Introduction Today Internet is recognized as an important way for the transaction of products and services. According to the data surveyed by the National Statistical Office, the on-line transaction in 2007 for a year, 15.7656 trillion, shows a 17.1%(2.3060 trillion won) increase over last year, of these, the amount of B2C has been increased 12.0%(10.2258 trillion won). Like this, because the entry barrier of on-line market of Korea is low, many retailers could easily enter into the market. So the bigger its scale is, but on the other hand, the tougher its competition is. Particularly due to the Internet and innovation of IT, the existing market has been changed into the perfect competitive market(Srinivasan, Rolph & Kishore, 2002). In the early years of on-line business, they think that the main reason for success is a moderate price, they are awakened to its importance of on-line service quality with tough competition. If it's not sure whether customers can be provided with what they want, they can use the Web sites, perhaps they can trust their products that had been already bought or not, they have a doubt its viability(Parasuraman, Zeithaml & Malhotra, 2005). Customers can directly reserve and issue their air tickets irrespective of place and time at the Web sites of travel agencies or airlines, but its empirical studies about these Web sites for reserving and issuing air tickets are insufficient. Therefore this study goes on for following specific objects. First object is to measure service quality and service recovery of Web sites for reserving and issuing air tickets. Second is to look into whether above on-line service quality and on-line service recovery have an impact on overall service quality. Third is to seek for the relation with overall service quality and customer satisfaction, then this customer satisfaction and loyalty intention. 2. Theoretical Background 2.1 On-line Service Quality Barnes & Vidgen(2000; 2001a; 2001b; 2002) had invented the tool to measure Web sites' quality four times(called WebQual). The WebQual 1.0, Step one invented a measuring item for information quality based on QFD, and this had been verified by students of UK business school. The Web Qual 2.0, Step two invented for interaction quality, and had been judged by customers of on-line bookshop. The WebQual 3.0, Step three invented by consolidating the WebQual 1.0 for information quality and the WebQual2.0 for interactionquality. It includes 3-quality-dimension, information quality, interaction quality, site design, and had been assessed and confirmed by auction sites(e-bay, Amazon, QXL). Furtheron, through the former empirical studies, the authors changed sites quality into usability by judging that usability is a concept how customers interact with or perceive Web sites and It is used widely for accessing Web sites. By this process, WebQual 4.0 was invented, and is consist of 3-quality-dimension; information quality, interaction quality, usability, 22 items. However, because WebQual 4.0 is focusing on technical part, it's usable at the Website's design part, on the other hand, it's not usable at the Web site's pleasant experience part. Parasuraman, Zeithaml & Malhorta(2002; 2005) had invented the measure for measuring on-line service quality in 2002 and 2005. The study in 2002 divided on-line service quality into 5 dimensions. But these were not well-organized, so there needed to be studied again totally. So Parasuraman, Zeithaml & Malhorta(2005) re-worked out the study about on-line service quality measure base on 2002's study and invented E-S-QUAL. After they invented preliminary measure for on-line service quality, they made up a question for customers who had purchased at amazon.com and walmart.com and reassessed this measure. And they perfected an invention of E-S-QUAL consists of 4 dimensions, 22 items of efficiency, system availability, fulfillment, privacy. Efficiency measures assess to sites and usability and others, system availability measures accurate technical function of sites and others, fulfillment measures promptness of delivering products and sufficient goods and others and privacy measures the degree of protection of data about their customers and so on. 2.2 Service Recovery Service industries tend to minimize the losses by coping with service failure promptly. This responses of service providers to service failure mean service recovery(Kelly & Davis, 1994). Bitner(1990) went on his study from customers' view about service providers' behavior for customers to recognize their satisfaction/dissatisfaction at service point. According to them, to manage service failure successfully, exact recognition of service problem, an apology, sufficient description about service failure and some tangible compensation are important. Parasuraman, Zeithaml & Malhorta(2005) approached the service recovery from how to measure, rather than how to manage, and moved to on-line market not to off-line, then invented E-RecS-QUAL which is a measuring tool about on-line service recovery. 2.3 Customer Satisfaction The definition of customer satisfaction can be divided into two points of view. First, they approached customer satisfaction from outcome of comsumer. Howard & Sheth(1969) defined satisfaction as 'a cognitive condition feeling being rewarded properly or improperly for their sacrifice.' and Westbrook & Reilly(1983) also defined customer satisfaction/dissatisfaction as 'a psychological reaction to the behavior pattern of shopping and purchasing, the display condition of retail store, outcome of purchased goods and service as well as whole market.' Second, they approached customer satisfaction from process. Engel & Blackwell(1982) defined satisfaction as 'an assessment of a consistency in chosen alternative proposal and their belief they had with them.' Tse & Wilton(1988) defined customer satisfaction as 'a customers' reaction to discordance between advance expectation and ex post facto outcome.' That is, this point of view that customer satisfaction is process is the important factor that comparing and assessing process what they expect and outcome of consumer. Unlike outcome-oriented approach, process-oriented approach has many advantages. As process-oriented approach deals with customers' whole expenditure experience, it checks up main process by measuring one by one each factor which is essential role at each step. And this approach enables us to check perceptual/psychological process formed customer satisfaction. Because of these advantages, now many studies are adopting this process-oriented approach(Yi, 1995). 2.4 Loyalty Intention Loyalty has been studied by dividing into behavioral approaches, attitudinal approaches and complex approaches(Dekimpe et al., 1997). In the early years of study, they defined loyalty focusing on behavioral concept, behavioral approaches regard customer loyalty as "a tendency to purchase periodically within a certain period of time at specific retail store." But the loyalty of behavioral approaches focuses on only outcome of customer behavior, so there are someone to point the limits that customers' decision-making situation or process were neglected(Enis & Paul, 1970; Raj, 1982; Lee, 2002). So the attitudinal approaches were suggested. The attitudinal approaches consider loyalty contains all the cognitive, emotional, voluntary factors(Oliver, 1997), define the customer loyalty as "friendly behaviors for specific retail stores." However these attitudinal approaches can explain that how the customer loyalty form and change, but cannot say positively whether it is moved to real purchasing in the future or not. This is a kind of shortcoming(Oh, 1995). 3. Research Design 3.1 Research Model Based on the objects of this study, the research model derived is shows, Step 1 and Step 2 are significant, and mediation variable has a significant effect on dependent variables and so does independent variables at Step 3, too. And there needs to prove the partial mediation effect, independent variable's estimate ability at Step 3(Standardized coefficient
shows, Step 1 and Step 2 are significant, and mediation variable has a significant effect on dependent variables and so does independent variables at Step 3, too. And there needs to prove the partial mediation effect, independent variable's estimate ability at Step 3(Standardized coefficient
이메일무단수집거부
이용약관
제 1 장 총칙
제 2 장 이용계약의 체결
제 3 장 계약 당사자의 의무
제 4 장 서비스의 이용
제 5 장 계약 해지 및 이용 제한
제 6 장 손해배상 및 기타사항
Detail Search
Image Search
(β)