• 제목/요약/키워드: Predictive Power

Search Result 698, Processing Time 0.022 seconds

Online news-based stock price forecasting considering homogeneity in the industrial sector (산업군 내 동질성을 고려한 온라인 뉴스 기반 주가예측)

  • Seong, Nohyoon;Nam, Kihwan
    • Journal of Intelligence and Information Systems
    • /
    • v.24 no.2
    • /
    • pp.1-19
    • /
    • 2018
  • Since stock movements forecasting is an important issue both academically and practically, studies related to stock price prediction have been actively conducted. The stock price forecasting research is classified into structured data and unstructured data, and it is divided into technical analysis, fundamental analysis and media effect analysis in detail. In the big data era, research on stock price prediction combining big data is actively underway. Based on a large number of data, stock prediction research mainly focuses on machine learning techniques. Especially, research methods that combine the effects of media are attracting attention recently, among which researches that analyze online news and utilize online news to forecast stock prices are becoming main. Previous studies predicting stock prices through online news are mostly sentiment analysis of news, making different corpus for each company, and making a dictionary that predicts stock prices by recording responses according to the past stock price. Therefore, existing studies have examined the impact of online news on individual companies. For example, stock movements of Samsung Electronics are predicted with only online news of Samsung Electronics. In addition, a method of considering influences among highly relevant companies has also been studied recently. For example, stock movements of Samsung Electronics are predicted with news of Samsung Electronics and a highly related company like LG Electronics.These previous studies examine the effects of news of industrial sector with homogeneity on the individual company. In the previous studies, homogeneous industries are classified according to the Global Industrial Classification Standard. In other words, the existing studies were analyzed under the assumption that industries divided into Global Industrial Classification Standard have homogeneity. However, existing studies have limitations in that they do not take into account influential companies with high relevance or reflect the existence of heterogeneity within the same Global Industrial Classification Standard sectors. As a result of our examining the various sectors, it can be seen that there are sectors that show the industrial sectors are not a homogeneous group. To overcome these limitations of existing studies that do not reflect heterogeneity, our study suggests a methodology that reflects the heterogeneous effects of the industrial sector that affect the stock price by applying k-means clustering. Multiple Kernel Learning is mainly used to integrate data with various characteristics. Multiple Kernel Learning has several kernels, each of which receives and predicts different data. To incorporate effects of target firm and its relevant firms simultaneously, we used Multiple Kernel Learning. Each kernel was assigned to predict stock prices with variables of financial news of the industrial group divided by the target firm, K-means cluster analysis. In order to prove that the suggested methodology is appropriate, experiments were conducted through three years of online news and stock prices. The results of this study are as follows. (1) We confirmed that the information of the industrial sectors related to target company also contains meaningful information to predict stock movements of target company and confirmed that machine learning algorithm has better predictive power when considering the news of the relevant companies and target company's news together. (2) It is important to predict stock movements with varying number of clusters according to the level of homogeneity in the industrial sector. In other words, when stock prices are homogeneous in industrial sectors, it is important to use relational effect at the level of industry group without analyzing clusters or to use it in small number of clusters. When the stock price is heterogeneous in industry group, it is important to cluster them into groups. This study has a contribution that we testified firms classified as Global Industrial Classification Standard have heterogeneity and suggested it is necessary to define the relevance through machine learning and statistical analysis methodology rather than simply defining it in the Global Industrial Classification Standard. It has also contribution that we proved the efficiency of the prediction model reflecting heterogeneity.

A Validation Study for the Practical Use of Screening Scale for Potential Drug-use Adolescents(SPDA) (청소년 약물사용 잠재군 선별척도(SPDA) 활용을 위한 타당화 연구)

  • Lee, Ki-Young;Kim, Young-Mi;Im, Hyuk;Park, Mi-Jin;Park, Sun-Hee
    • Korean Journal of Social Welfare
    • /
    • v.57 no.3
    • /
    • pp.305-335
    • /
    • 2005
  • This paper is a result from validation study for SPDA(A Screening Scale For Potential Drug-use Adolescents) created in 2003 and newly developed during 2004. SPDA aims to screen adolescents in their early stage of drug-use and to help practitioners make a preventive approach for the adolescents. 4307 junior and senior high school students were selected as primary research subjects by stratified and quota sampling methods. 305 adolescents on probation were also selected as a comparison group and asked to answer the same questionnaire. Reliability for SPDA recorded 0.914, which proved to be better than previous year's (0.898). Exploratory and confirmatory factor analyses to test construct validity proved that SPDA could be divided into 7 factors and that each factor structure of SPDA could be a proper measurement model with high level of fitness and factor loadings. Discriminant analysis to test predictive validity confirmed that SPDA could classify the adolescents excellently by the frequency of drug-use, with hit ratio of 86.6 percent(78.8% and 87.4% for junior and senior high school students respectively). For concurrent validity test, Hare Home Self-Esteem Scale, Hare School Self-Esteem, Zuckerman-Kuhlman Sensation-seeking Scale were employed to find correlation with SPDA and all the three scales had significant Pearson correlation coefficients with SPDA. Known-groups validity test indicated that SPDA had an adequate power to classify out adolescents on probation from those in schooling, with a hit ratio of 71.8 percent. Cut-off point to detect adolescents with high risk of substance use was 77, which indicated approximately T score, 55 (0.5 SD), satisfying sensitivity, specificity, and efficiency criteria.

  • PDF

Impact of Sulfur Dioxide Impurity on Process Design of $CO_2$ Offshore Geological Storage: Evaluation of Physical Property Models and Optimization of Binary Parameter (이산화황 불순물이 이산화탄소 해양 지중저장 공정설계에 미치는 영향 평가: 상태량 모델의 비교 분석 및 이성분 매개변수 최적화)

  • Huh, Cheol;Kang, Seong-Gil;Cho, Mang-Ik
    • Journal of the Korean Society for Marine Environment & Energy
    • /
    • v.13 no.3
    • /
    • pp.187-197
    • /
    • 2010
  • Carbon dioxide Capture and Storage(CCS) is regarded as one of the most promising options to response climate change. CCS is a three-stage process consisting of the capture of carbon dioxide($CO_2$), the transport of $CO_2$ to a storage location, and the long term isolation of $CO_2$ from the atmosphere for the purpose of carbon emission mitigation. Up to now, process design for this $CO_2$ marine geological storage has been carried out mainly on pure $CO_2$. Unfortunately the $CO_2$ mixture captured from the power plants and steel making plants contains many impurities such as $N_2$, $O_2$, Ar, $H_2O$, $SO_2$, $H_2S$. A small amount of impurities can change the thermodynamic properties and then significantly affect the compression, purification, transport and injection processes. In order to design a reliable $CO_2$ marine geological storage system, it is necessary to analyze the impact of these impurities on the whole CCS process at initial design stage. The purpose of the present paper is to compare and analyse the relevant physical property models including BWRS, PR, PRBM, RKS and SRK equations of state, and NRTL-RK model which are crucial numerical process simulation tools. To evaluate the predictive accuracy of the equation of the state for $CO_2-SO_2$ mixture, we compared numerical calculation results with reference experimental data. In addition, optimum binary parameter to consider the interaction of $CO_2$ and $SO_2$ molecules was suggested based on the mean absolute percent error. In conclusion, we suggest the most reliable physical property model with optimized binary parameter in designing the $CO_2-SO_2$ mixture marine geological storage process.

Classification Algorithm-based Prediction Performance of Order Imbalance Information on Short-Term Stock Price (분류 알고리즘 기반 주문 불균형 정보의 단기 주가 예측 성과)

  • Kim, S.W.
    • Journal of Intelligence and Information Systems
    • /
    • v.28 no.4
    • /
    • pp.157-177
    • /
    • 2022
  • Investors are trading stocks by keeping a close watch on the order information submitted by domestic and foreign investors in real time through Limit Order Book information, so-called price current provided by securities firms. Will order information released in the Limit Order Book be useful in stock price prediction? This study analyzes whether it is significant as a predictor of future stock price up or down when order imbalances appear as investors' buying and selling orders are concentrated to one side during intra-day trading time. Using classification algorithms, this study improved the prediction accuracy of the order imbalance information on the short-term price up and down trend, that is the closing price up and down of the day. Day trading strategies are proposed using the predicted price trends of the classification algorithms and the trading performances are analyzed through empirical analysis. The 5-minute KOSPI200 Index Futures data were analyzed for 4,564 days from January 19, 2004 to June 30, 2022. The results of the empirical analysis are as follows. First, order imbalance information has a significant impact on the current stock prices. Second, the order imbalance information observed in the early morning has a significant forecasting power on the price trends from the early morning to the market closing time. Third, the Support Vector Machines algorithm showed the highest prediction accuracy on the day's closing price trends using the order imbalance information at 54.1%. Fourth, the order imbalance information measured at an early time of day had higher prediction accuracy than the order imbalance information measured at a later time of day. Fifth, the trading performances of the day trading strategies using the prediction results of the classification algorithms on the price up and down trends were higher than that of the benchmark trading strategy. Sixth, except for the K-Nearest Neighbor algorithm, all investment performances using the classification algorithms showed average higher total profits than that of the benchmark strategy. Seventh, the trading performances using the predictive results of the Logical Regression, Random Forest, Support Vector Machines, and XGBoost algorithms showed higher results than the benchmark strategy in the Sharpe Ratio, which evaluates both profitability and risk. This study has an academic difference from existing studies in that it documented the economic value of the total buy & sell order volume information among the Limit Order Book information. The empirical results of this study are also valuable to the market participants from a trading perspective. In future studies, it is necessary to improve the performance of the trading strategy using more accurate price prediction results by expanding to deep learning models which are actively being studied for predicting stock prices recently.

A Longitudinal Validation Study of the Korean Version of PCL-5(Post-traumatic Stress Disorder Checklist for DSM-5) (PCL-5(DSM-5 기준 외상 후 스트레스 장애 체크리스트) 한국판 종단 타당화 연구)

  • Lee, DongHun;Lee, DeokHee;Kim, SungHyun;Jung, DaSong
    • Korean Journal of Culture and Social Issue
    • /
    • v.28 no.2
    • /
    • pp.187-217
    • /
    • 2022
  • The aim of this study is to examine the psychometric properties of the Korean version of the Post-traumatic Stress Disorder Checklist for DSM-5(PCL-5). For this purpose, online surveys were conducted for two times with a one year interval using the data from 1,077 Korean adults at time 1, and 563 Korean adults at time 2. First, from the result of the confirmatory factor analysis, comparing the model fit of the 1, 4, 6, and 7-factor model, the 4, 6, and 7-factor model showed a acceptable fit, and the best fit was seen in the order of the 7, 6, 4-factor model. Second, the internal consistency, omega coefficient, construct validity, average variance extracted, and test-retest reliability results were all satisfactory.. Third, a correlation analysis with the K-PC-PTSD-5 and the sub-factors of BSI-18 was conducted to check the validity of the Korean Version of PCL-5. As a result, a positive correlation was seen with both K-PC-PTSD-5 and BSI-18. Fourth, a hierarchical multiple regression was performed to examine whether the Korean Version of PCL-5 predicts future PTSD, depression, anxiety, and somatization. As a result, the Korean Version of PCL-5 measured at time 1 significantly predicted PTSD, depression, anxiety, and somatization symptoms at time 2. Fifth, by analyzing the ROC curve, the discriminant power of PCL-5 for screening PTSD symptom groups was confirmed, and the best cut-off score was suggested. As a result of the longitudinal validation of Korean version of PCL-5, it was found that this scale is a reliable and valid measure for Korean adults. By looking into the predictive validity of the scale, it was found that the Korean version of PCL-5 can predict not only PTSD symptoms but also PTSD-related symptoms such as depression, anxiety, and somatization. Also, this study differs from previous validation studies measuring PTSD symptoms in that it suggested a cut-off score to help differentiate PTSD symptom groups.

A stratified random sampling design for paddy fields: Optimized stratification and sample allocation for effective spatial modeling and mapping of the impact of climate changes on agricultural system in Korea (농지 공간격자 자료의 층화랜덤샘플링: 농업시스템 기후변화 영향 공간모델링을 위한 국내 농지 최적 층화 및 샘플 수 최적화 연구)

  • Minyoung Lee;Yongeun Kim;Jinsol Hong;Kijong Cho
    • Korean Journal of Environmental Biology
    • /
    • v.39 no.4
    • /
    • pp.526-535
    • /
    • 2021
  • Spatial sampling design plays an important role in GIS-based modeling studies because it increases modeling efficiency while reducing the cost of sampling. In the field of agricultural systems, research demand for high-resolution spatial databased modeling to predict and evaluate climate change impacts is growing rapidly. Accordingly, the need and importance of spatial sampling design are increasing. The purpose of this study was to design spatial sampling of paddy fields (11,386 grids with 1 km spatial resolution) in Korea for use in agricultural spatial modeling. A stratified random sampling design was developed and applied in 2030s, 2050s, and 2080s under two RCP scenarios of 4.5 and 8.5. Twenty-five weather and four soil characteristics were used as stratification variables. Stratification and sample allocation were optimized to ensure minimum sample size under given precision constraints for 16 target variables such as crop yield, greenhouse gas emission, and pest distribution. Precision and accuracy of the sampling were evaluated through sampling simulations based on coefficient of variation (CV) and relative bias, respectively. As a result, the paddy field could be optimized in the range of 5 to 21 strata and 46 to 69 samples. Evaluation results showed that target variables were within precision constraints (CV<0.05 except for crop yield) with low bias values (below 3%). These results can contribute to reducing sampling cost and computation time while having high predictive power. It is expected to be widely used as a representative sample grid in various agriculture spatial modeling studies.

The Effect of Brand Extension of Private Label on Consumer Attitude - a focus on the moderating effect of the perceived fit difference between parent brands and an extended brand - (PL의 브랜드확장이 소비자태도에 미치는 영향에 관한 연구 : 모브랜드 적합도 인식 차이의 조절효과를 중심으로)

  • Kim, Jong-Keun;Kim, Hyang-Mi;Lee, Jong-Ho
    • Journal of Distribution Research
    • /
    • v.16 no.4
    • /
    • pp.1-27
    • /
    • 2011
  • Introduction: Sales of private labels(PU have been growing m recent years. Globally, PLs have already achieved 20% share, although between 25 and 50% share in most of the European markets(AC. Nielson, 2005). These products are aimed to have comparable quality and prices as national brand(NB) products and have been continuously eroding manufacturer's national brand market share. Stores have also started introducing premium PLs that are of higher-quality and more reasonably priced compared to NBs. Worldwide, many retailers already have a multiple-tier private label architecture. Consumers as a consequence are now able to have a more diverse brand choice in store than ever before. Since premium PLs are priced higher than regular PLs and even, in some cases, above NBs, stores can expect to generate higher profits. Brand extensions and private label have been extensively studied in the marketing field. However, less attention has been paid to the private label extension. Therefore, this research focuses on private label extension using the Multi-Attribute Attitude Model(Fishbein and Ajzen, 1975). Especially there are few studies that consider the hierarchical effect of the PL's two parent brands: store brand and the original PL. We assume that the attitude toward each of the two parent brands affects the attitude towards the extended PL. The influence from each parent brand toward extended PL will vary according to the perceived fit between each parent brand and the extended PL. This research focuses on how these two parent brands act as reference points to one another in the consumers' choice consideration. Specifically we seek to understand how store image and attitude towards original PL affect consumer perceptions of extended premium PL. How consumers perceive extended premium PLs could provide strategic suggestions for retailer managers with specific suggestions on whether it is more effective: to position extended premium PL similarly or dissimilarly to original PL especially on the quality dimension and congruency with store image. There is an extensive body of research on branding and brand extensions (e.g. Aaker and Keller, 1990) and more recently on PLs(e.g. Kumar and Steenkamp, 2007). However there are no studies to date that look at the upgrading and influence of original PLs and attitude towards store on the premium PL extension. This research wishes to make a contribution to this gap using the perceived fit difference between parent brands and extended premium PL as the context. In order to meet the above objectives, we investigate which factors heighten consumers' positive attitude toward premium PL extension. Research Model and Hypotheses: When considering the attitude towards the premium PL extension, we expect four factors to have an influence: attitude towards store; attitude towards original PL; perceived congruity between the store image and the premium PL; perceived similarity between the original PL and the premium PL. We expect that all these factors have an influence on consumer attitude towards premium PL extension. Figure 1 gives the research model and hypotheses. Method: Data were collected by an intercept survey conducted on consumers at discount stores. 403 survey responses were attained (total 59.8% female, across all age ranges). Respondents were asked to respond to a series of Questions measured on 7 point likert-type scales. The survey consisted of Questions that measured: the trust towards store and the original PL; the satisfaction towards store and the original PL; the attitudes towards store, the original PL, and the extended premium PL; the perceived similarity of the original PL and the extended premium PL; the perceived congruity between the store image and the extended premium PL. Product images with specific explanations of the features of premium PL, regular PL and NB we reused as the stimuli for the Question response. We developed scales to measure the research constructs. Cronbach's alphaw as measured each construct with the reliability for all constructs exceeding the .70 standard(Nunnally, 1978). Results: To test the hypotheses, path analysis was conducted using LISREL 8.30. The path analysis for verification of the model produced satisfactory results. The validity index shows acceptable results(${\chi}^2=427.00$(P=0.00), GFI= .90, AGFI= .87, NFI= .91, RMSEA= .062, RMR= .047). With the increasing retailer use of premium PLBs, the intention of this research was to examine how consumers use original PL and store image as reference points as to the attitude towards premium PL extension. Results(see table 1 & 2) show that the attitude of each parent brand (attitudes toward store and original pL) influences the attitude towards extended PL and their perceived fit moderates these influences. Attitude toward the extended PL was influenced by the relative level of perceived fit. Discussion of results and future direction: These results suggest that the future strategy for the PL extension needs to consider that positive parent brand attitude is more strongly associated with the attitude toward PL extensions. Specifically, to improve attitude towards PL extension, building and maintaining positive attitude towards original PL is necessary. Positioning premium PL congruently to store image is also important for positive attitude. In order to improve this research, the following alternatives should also be considered. To improve the research model's predictive power, more diverse products should be included in study. Other attributes of product should also be included such as design, brand name since we only considered trust and satisfaction as factors to build consumer attitudes.

  • PDF

Customer Behavior Prediction of Binary Classification Model Using Unstructured Information and Convolution Neural Network: The Case of Online Storefront (비정형 정보와 CNN 기법을 활용한 이진 분류 모델의 고객 행태 예측: 전자상거래 사례를 중심으로)

  • Kim, Seungsoo;Kim, Jongwoo
    • Journal of Intelligence and Information Systems
    • /
    • v.24 no.2
    • /
    • pp.221-241
    • /
    • 2018
  • Deep learning is getting attention recently. The deep learning technique which had been applied in competitions of the International Conference on Image Recognition Technology(ILSVR) and AlphaGo is Convolution Neural Network(CNN). CNN is characterized in that the input image is divided into small sections to recognize the partial features and combine them to recognize as a whole. Deep learning technologies are expected to bring a lot of changes in our lives, but until now, its applications have been limited to image recognition and natural language processing. The use of deep learning techniques for business problems is still an early research stage. If their performance is proved, they can be applied to traditional business problems such as future marketing response prediction, fraud transaction detection, bankruptcy prediction, and so on. So, it is a very meaningful experiment to diagnose the possibility of solving business problems using deep learning technologies based on the case of online shopping companies which have big data, are relatively easy to identify customer behavior and has high utilization values. Especially, in online shopping companies, the competition environment is rapidly changing and becoming more intense. Therefore, analysis of customer behavior for maximizing profit is becoming more and more important for online shopping companies. In this study, we propose 'CNN model of Heterogeneous Information Integration' using CNN as a way to improve the predictive power of customer behavior in online shopping enterprises. In order to propose a model that optimizes the performance, which is a model that learns from the convolution neural network of the multi-layer perceptron structure by combining structured and unstructured information, this model uses 'heterogeneous information integration', 'unstructured information vector conversion', 'multi-layer perceptron design', and evaluate the performance of each architecture, and confirm the proposed model based on the results. In addition, the target variables for predicting customer behavior are defined as six binary classification problems: re-purchaser, churn, frequent shopper, frequent refund shopper, high amount shopper, high discount shopper. In order to verify the usefulness of the proposed model, we conducted experiments using actual data of domestic specific online shopping company. This experiment uses actual transactions, customers, and VOC data of specific online shopping company in Korea. Data extraction criteria are defined for 47,947 customers who registered at least one VOC in January 2011 (1 month). The customer profiles of these customers, as well as a total of 19 months of trading data from September 2010 to March 2012, and VOCs posted for a month are used. The experiment of this study is divided into two stages. In the first step, we evaluate three architectures that affect the performance of the proposed model and select optimal parameters. We evaluate the performance with the proposed model. Experimental results show that the proposed model, which combines both structured and unstructured information, is superior compared to NBC(Naïve Bayes classification), SVM(Support vector machine), and ANN(Artificial neural network). Therefore, it is significant that the use of unstructured information contributes to predict customer behavior, and that CNN can be applied to solve business problems as well as image recognition and natural language processing problems. It can be confirmed through experiments that CNN is more effective in understanding and interpreting the meaning of context in text VOC data. And it is significant that the empirical research based on the actual data of the e-commerce company can extract very meaningful information from the VOC data written in the text format directly by the customer in the prediction of the customer behavior. Finally, through various experiments, it is possible to say that the proposed model provides useful information for the future research related to the parameter selection and its performance.