• Title/Summary/Keyword: data driven

Search Result 1,926, Processing Time 0.033 seconds

Stock Price Prediction by Utilizing Category Neutral Terms: Text Mining Approach (카테고리 중립 단어 활용을 통한 주가 예측 방안: 텍스트 마이닝 활용)

  • Lee, Minsik;Lee, Hong Joo
    • Journal of Intelligence and Information Systems
    • /
    • v.23 no.2
    • /
    • pp.123-138
    • /
    • 2017
  • Since the stock market is driven by the expectation of traders, studies have been conducted to predict stock price movements through analysis of various sources of text data. In order to predict stock price movements, research has been conducted not only on the relationship between text data and fluctuations in stock prices, but also on the trading stocks based on news articles and social media responses. Studies that predict the movements of stock prices have also applied classification algorithms with constructing term-document matrix in the same way as other text mining approaches. Because the document contains a lot of words, it is better to select words that contribute more for building a term-document matrix. Based on the frequency of words, words that show too little frequency or importance are removed. It also selects words according to their contribution by measuring the degree to which a word contributes to correctly classifying a document. The basic idea of constructing a term-document matrix was to collect all the documents to be analyzed and to select and use the words that have an influence on the classification. In this study, we analyze the documents for each individual item and select the words that are irrelevant for all categories as neutral words. We extract the words around the selected neutral word and use it to generate the term-document matrix. The neutral word itself starts with the idea that the stock movement is less related to the existence of the neutral words, and that the surrounding words of the neutral word are more likely to affect the stock price movements. And apply it to the algorithm that classifies the stock price fluctuations with the generated term-document matrix. In this study, we firstly removed stop words and selected neutral words for each stock. And we used a method to exclude words that are included in news articles for other stocks among the selected words. Through the online news portal, we collected four months of news articles on the top 10 market cap stocks. We split the news articles into 3 month news data as training data and apply the remaining one month news articles to the model to predict the stock price movements of the next day. We used SVM, Boosting and Random Forest for building models and predicting the movements of stock prices. The stock market opened for four months (2016/02/01 ~ 2016/05/31) for a total of 80 days, using the initial 60 days as a training set and the remaining 20 days as a test set. The proposed word - based algorithm in this study showed better classification performance than the word selection method based on sparsity. This study predicted stock price volatility by collecting and analyzing news articles of the top 10 stocks in market cap. We used the term - document matrix based classification model to estimate the stock price fluctuations and compared the performance of the existing sparse - based word extraction method and the suggested method of removing words from the term - document matrix. The suggested method differs from the word extraction method in that it uses not only the news articles for the corresponding stock but also other news items to determine the words to extract. In other words, it removed not only the words that appeared in all the increase and decrease but also the words that appeared common in the news for other stocks. When the prediction accuracy was compared, the suggested method showed higher accuracy. The limitation of this study is that the stock price prediction was set up to classify the rise and fall, and the experiment was conducted only for the top ten stocks. The 10 stocks used in the experiment do not represent the entire stock market. In addition, it is difficult to show the investment performance because stock price fluctuation and profit rate may be different. Therefore, it is necessary to study the research using more stocks and the yield prediction through trading simulation.

KNU Korean Sentiment Lexicon: Bi-LSTM-based Method for Building a Korean Sentiment Lexicon (Bi-LSTM 기반의 한국어 감성사전 구축 방안)

  • Park, Sang-Min;Na, Chul-Won;Choi, Min-Seong;Lee, Da-Hee;On, Byung-Won
    • Journal of Intelligence and Information Systems
    • /
    • v.24 no.4
    • /
    • pp.219-240
    • /
    • 2018
  • Sentiment analysis, which is one of the text mining techniques, is a method for extracting subjective content embedded in text documents. Recently, the sentiment analysis methods have been widely used in many fields. As good examples, data-driven surveys are based on analyzing the subjectivity of text data posted by users and market researches are conducted by analyzing users' review posts to quantify users' reputation on a target product. The basic method of sentiment analysis is to use sentiment dictionary (or lexicon), a list of sentiment vocabularies with positive, neutral, or negative semantics. In general, the meaning of many sentiment words is likely to be different across domains. For example, a sentiment word, 'sad' indicates negative meaning in many fields but a movie. In order to perform accurate sentiment analysis, we need to build the sentiment dictionary for a given domain. However, such a method of building the sentiment lexicon is time-consuming and various sentiment vocabularies are not included without the use of general-purpose sentiment lexicon. In order to address this problem, several studies have been carried out to construct the sentiment lexicon suitable for a specific domain based on 'OPEN HANGUL' and 'SentiWordNet', which are general-purpose sentiment lexicons. However, OPEN HANGUL is no longer being serviced and SentiWordNet does not work well because of language difference in the process of converting Korean word into English word. There are restrictions on the use of such general-purpose sentiment lexicons as seed data for building the sentiment lexicon for a specific domain. In this article, we construct 'KNU Korean Sentiment Lexicon (KNU-KSL)', a new general-purpose Korean sentiment dictionary that is more advanced than existing general-purpose lexicons. The proposed dictionary, which is a list of domain-independent sentiment words such as 'thank you', 'worthy', and 'impressed', is built to quickly construct the sentiment dictionary for a target domain. Especially, it constructs sentiment vocabularies by analyzing the glosses contained in Standard Korean Language Dictionary (SKLD) by the following procedures: First, we propose a sentiment classification model based on Bidirectional Long Short-Term Memory (Bi-LSTM). Second, the proposed deep learning model automatically classifies each of glosses to either positive or negative meaning. Third, positive words and phrases are extracted from the glosses classified as positive meaning, while negative words and phrases are extracted from the glosses classified as negative meaning. Our experimental results show that the average accuracy of the proposed sentiment classification model is up to 89.45%. In addition, the sentiment dictionary is more extended using various external sources including SentiWordNet, SenticNet, Emotional Verbs, and Sentiment Lexicon 0603. Furthermore, we add sentiment information about frequently used coined words and emoticons that are used mainly on the Web. The KNU-KSL contains a total of 14,843 sentiment vocabularies, each of which is one of 1-grams, 2-grams, phrases, and sentence patterns. Unlike existing sentiment dictionaries, it is composed of words that are not affected by particular domains. The recent trend on sentiment analysis is to use deep learning technique without sentiment dictionaries. The importance of developing sentiment dictionaries is declined gradually. However, one of recent studies shows that the words in the sentiment dictionary can be used as features of deep learning models, resulting in the sentiment analysis performed with higher accuracy (Teng, Z., 2016). This result indicates that the sentiment dictionary is used not only for sentiment analysis but also as features of deep learning models for improving accuracy. The proposed dictionary can be used as a basic data for constructing the sentiment lexicon of a particular domain and as features of deep learning models. It is also useful to automatically and quickly build large training sets for deep learning models.

Dietary total sugar intake of Koreans: Based on the Korea National Health and Nutrition Examination Survey (KNHANES), 2008-2011 (한국인의 총 당류 섭취실태 평가: 2008~2011년 국민건강영양조사 자료를 이용하여)

  • Lee, Haeng-Shin;Kwon, Sung-Ok;Yon, Miyong;Kim, Dohee;Lee, Jee-Yeon;Nam, Jiwoon;Park, Seung-Joo;Yeon, Jee-Young;Lee, Soon-Kyu;Lee, Hye-Young;Kwon, Oh-Sang;Kim, Cho-Il
    • Journal of Nutrition and Health
    • /
    • v.47 no.4
    • /
    • pp.268-276
    • /
    • 2014
  • Purpose: The aim of this study is to estimate total sugar intake and identify major food sources of total sugar intake in the diet of the Korean population. Methods: Dietary intake data of 33,745 subjects aged one year and over from the KNHANES 2008-2011 were used in the analysis. Information on dietary intake was obtained by one day 24-hour recall method in KNHANES. A database for total sugar content of foods reported in the KNHANES was established using Release 25 of the U.S. Department of Agriculture National Nutrient Database for Standard Reference, a total sugar database from the Ministry of Food and Drug Safety, and information from nutrition labeling of processed foods. With this database, total sugar intake of each subject was estimated from dietary intake data using SAS. Results: Mean total sugar intake of Koreans was 61.4 g/person/day, corresponding to 12.8% of total daily energy intake. More than half of this amount (35.0 g/day, 7.1% of daily energy intake) was from processed foods. The top five processed food sources of total sugar intake for Koreans were granulated sugar, carbonated beverages, coffee, breads, and fruit and vegetable drinks. Compared to other age groups, total sugar intake of adolescents and young adults was much higher (12 to 18 yrs, 69.6 g/day and 19 to 29 yrs, 68.4 g/day) with higher beverage intake that beverage-driven sugar amounted up to 25% of total sugar intake. Conclusion: This study revealed that more elaborated and customized measures are needed for control of sugar intake of different subpopulation groups, even though current total sugar intake of Koreans was within the range (10-20% of daily energy intake) recommended by Dietary Reference Intakes for Koreans. In addition, development of a more reliable database on total sugar and added sugar content of foods commonly consumed by Koreans is warranted.

Psychophysiologic Responses to Event Imagery in Traffic Accident Related Patients (교통사고관련 환자에서 사건상상에 대한 정신생리반응)

  • Chung, Sang-Keun;Choi, Myong-Su;Hwang, Ik-Keun
    • Sleep Medicine and Psychophysiology
    • /
    • v.8 no.1
    • /
    • pp.45-51
    • /
    • 2001
  • Objectives: The experience of traffic accident is a kind of the psychosocial stressors to person. The traffic accident-related patients may show the psychophysiologic hyperarousal. So we examined the differences of psychophysiologic response between patients with and without the memory of experienceing a traffic accident. Methods: Twenty-four traffic accident-related patients were divided into two groups according to ther memory of a traffic accident. In psychological assessment, levels of anxiety and depression were evaluated by State-Trait Anxiety Inventory, Beck's Depression Inventory, and Hamilton Rating Scales For Anxiety and Depression. Heart rate, electrodermal response (EDR), and electromyographic activity (EMG) were measured by biofeedback system, and systolic and diastolic blood pressure by automated vital sign monitor during baseline, task, and rest periods. We utilized script-driven imagery technique as a stressful task. The patients listened to the script describing their own traffic accident experience and were instructed to imagine the event during the task period. Statistically analytic data were obtained from the differences of psychological and psychophysiologic data between two groups. Results: The memory group did not show significantly higher EDR than the none memory group, but showed higher tendency during baseline, imagery, and rest periods. The memory group showed significantly lower EMG than the none memory group during rest period. However, there were no differences in other psychophysiologic reponses between the two groups. Conclusion: Our results showed that the memory group had higher tendency in autonomic arousal level such as electrodermal response than the none memory group. We suggest that physicians need to minimize repetitive imagery of traffic accident (reexperience), and decrease the autonomic hyperarousal in the treatment of traffic accident-related patients.

  • PDF

Research Trend and Futuristic Guideline of Platform-Based Business in Korea (플랫폼 기반 비즈니스에 대한 국내 연구동향 및 미래를 위한 가이드라인)

  • Namn, Su Hyeon
    • Management & Information Systems Review
    • /
    • v.39 no.1
    • /
    • pp.93-114
    • /
    • 2020
  • Platform is considered as an alternative strategy to the traditional linear pipeline based business. Moreover, in the 4th industrial revolution period, efficiency driven pipeline business model needs to be changed to platform business. We have such success stories about platform as Apple, Google, Amazon, Uber, and so on. However, for those smaller corporations, it is not easy to find out the transformation strategy. The essence of platform business is to leverage network effect in management. Thus platform based management can be rephrased as network management across the business functions. Research on platform business is popular and related to diverse facets. But few scholars cover what the research trend of the domain is. The main purpose of this paper is to identify the research trend on platform business in Korea. To do that we first propose the analytical model for platform architecture whose components are consumers, suppliers, artifacts, and IT platform system. We conjecture that mapping of the research work on platform to the components of the model will make us understand the hidden domain of platform research. We propose three hypotheses regarding the characteristics of research and one proposition for the transitional path from pipeline to platform business model. The mapping is based on the research articles filtered from the Korea Citation Index, using keyword search. Research papers are searched through the keywords provided by authors using the word of "platform". The filtered articles are summarized in terms of the attributes such as major component of platform considered, platform type, main purpose of the research, and research method. Using the filtered data, we test the hypotheses in exploratory ways. The contribution of our research is as follows: First, based on the findings, scholars can find the areas of research on the domain: areas where research has been matured and territory where future research is actively sought. Second, the proposition provided can give business practitioners the guideline for changing their strategy from pipeline to platform oriented. This research needs to be considered as exploratory not inferential since subjective judgments are involved in data collection, classification, and interpretation of research articles.

Predictive Clustering-based Collaborative Filtering Technique for Performance-Stability of Recommendation System (추천 시스템의 성능 안정성을 위한 예측적 군집화 기반 협업 필터링 기법)

  • Lee, O-Joun;You, Eun-Soon
    • Journal of Intelligence and Information Systems
    • /
    • v.21 no.1
    • /
    • pp.119-142
    • /
    • 2015
  • With the explosive growth in the volume of information, Internet users are experiencing considerable difficulties in obtaining necessary information online. Against this backdrop, ever-greater importance is being placed on a recommender system that provides information catered to user preferences and tastes in an attempt to address issues associated with information overload. To this end, a number of techniques have been proposed, including content-based filtering (CBF), demographic filtering (DF) and collaborative filtering (CF). Among them, CBF and DF require external information and thus cannot be applied to a variety of domains. CF, on the other hand, is widely used since it is relatively free from the domain constraint. The CF technique is broadly classified into memory-based CF, model-based CF and hybrid CF. Model-based CF addresses the drawbacks of CF by considering the Bayesian model, clustering model or dependency network model. This filtering technique not only improves the sparsity and scalability issues but also boosts predictive performance. However, it involves expensive model-building and results in a tradeoff between performance and scalability. Such tradeoff is attributed to reduced coverage, which is a type of sparsity issues. In addition, expensive model-building may lead to performance instability since changes in the domain environment cannot be immediately incorporated into the model due to high costs involved. Cumulative changes in the domain environment that have failed to be reflected eventually undermine system performance. This study incorporates the Markov model of transition probabilities and the concept of fuzzy clustering with CBCF to propose predictive clustering-based CF (PCCF) that solves the issues of reduced coverage and of unstable performance. The method improves performance instability by tracking the changes in user preferences and bridging the gap between the static model and dynamic users. Furthermore, the issue of reduced coverage also improves by expanding the coverage based on transition probabilities and clustering probabilities. The proposed method consists of four processes. First, user preferences are normalized in preference clustering. Second, changes in user preferences are detected from review score entries during preference transition detection. Third, user propensities are normalized using patterns of changes (propensities) in user preferences in propensity clustering. Lastly, the preference prediction model is developed to predict user preferences for items during preference prediction. The proposed method has been validated by testing the robustness of performance instability and scalability-performance tradeoff. The initial test compared and analyzed the performance of individual recommender systems each enabled by IBCF, CBCF, ICFEC and PCCF under an environment where data sparsity had been minimized. The following test adjusted the optimal number of clusters in CBCF, ICFEC and PCCF for a comparative analysis of subsequent changes in the system performance. The test results revealed that the suggested method produced insignificant improvement in performance in comparison with the existing techniques. In addition, it failed to achieve significant improvement in the standard deviation that indicates the degree of data fluctuation. Notwithstanding, it resulted in marked improvement over the existing techniques in terms of range that indicates the level of performance fluctuation. The level of performance fluctuation before and after the model generation improved by 51.31% in the initial test. Then in the following test, there has been 36.05% improvement in the level of performance fluctuation driven by the changes in the number of clusters. This signifies that the proposed method, despite the slight performance improvement, clearly offers better performance stability compared to the existing techniques. Further research on this study will be directed toward enhancing the recommendation performance that failed to demonstrate significant improvement over the existing techniques. The future research will consider the introduction of a high-dimensional parameter-free clustering algorithm or deep learning-based model in order to improve performance in recommendations.

The effect of recapitalization on capital structure decision and corporate value in Korean Firms (한국기업의 자본재조정이 자본구조 의사결정과 기업가치에 미치는 영향분석)

  • Kim, Jooyul;Kim, Dongwook;Kim, Byounggon
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.18 no.4
    • /
    • pp.163-174
    • /
    • 2017
  • This study analyzed how Korean firms' recapitalization affects their capital structure decision and firm value. Recapitalization was categorized into three groups according to the influence of the debt to equity ratio: debt ratio-increasing-recapitalization(capital reduction with refund, cash dividend), debt ratio-unchanging-recapitalization (capital reduction without refund, retirement of repurchased stocks), and debt ratio-decreasing-recapitalization(exercise the rights for convertible bonds, bond with stock warrants, exchangeable bonds and stock options). This article highlights how the relationship between the firms' recapitalization and the capital structure decision driven by the change in debt to equity ratio through the recapitalization should affect the firm value. The whole recapitalization sample used for this analysis comprised 22,814 enterprises listed on the Korea Exchange that were analyzed over the 16-year period from 2000 to 2015. To summarize the results of this Panel Data Analysis, firstly, when a firm executes debt ratio-increasing-recapitalization and debt ratio-decreasing-recapitalization at the period of t-1, the debt to equity ratio, which is increased or decreased, should affect the firm's debt capacity in the same period, then, at the period of t, the firm establishes a leverage policy to readjust the debt to equity ratio the other way around. These adjustments of debt-paying-ability from the leverage policy, including the capital structure decision, finally affect the firm value. Secondly, when a firm implements the debt ratio-unchanging-recapitalization in the period of t-1, the debt to equity ratio, which is neutral, should not affect the firm's capital structure decision. But, the firm value is positively affected by the influence of that recapitalization. Conclusively, we acknowledge a firm which carries out the recapitalization balances its capital structure to the optimal level of leverage and that the capital structure decision positively affects the corporate value.

Study of major issues and trends facing ports, using big data news: From 1991 to 2020 (뉴스 빅데이터를 활용한 항만이슈 변화연구 : 1991~2020)

  • Yoon, Hee-Young
    • Journal of Korea Port Economic Association
    • /
    • v.37 no.1
    • /
    • pp.159-178
    • /
    • 2021
  • This study analyzed issues and trends related to ports with 86,611 news articles for the 30 years from 1991 to 2020, using BIGKinds, a big data news analysis service. The analysis was based on keyword analysis, word cloud, relationship diagram analysis offered by BIG Kinds. Analysis results of issues and trends on ports for the last 30 years are summarized as follows. First, during Phase 1 (1991-2000), individual ports such as Busan, Incheon, and Gwangyang ports tried to strengthen their own competitiveness. During Phase 2 (2001-2010), efforts were made on gaining more professional and specialized port management abilities by establishing the Busan Port Authority in 2004, the Incheon Port Authority in 2005, and the Ulsan Port Authority in 2007. During Phase 3 (2011-2020), the promotion of future-oriented, eco-friendly, and smart ports was major issues. Efforts to reduce particulate matters and pollutants produced from ports were accelerated, and an attempt to build a smart port driven by port automation and digitalization was also intensified. Lastly, in 2020, when the maritime sector was severely hit by the unexpected shock of the COVID-19 pandemic, a microscopic analysis of trends and issues in 2019 and 2020 was made to look into the impact the pandemic on the maritime industry. It was found that shipping and port industries experienced more drastic changes than ever while trying to prepare for a post-pandemic era as well as promoting future-oriented ports. This study made policy suggestions by analyzing port-related news articles and trends, and it is expected that based on the findings of this research, further studies on enhancing the competitiveness of ports and devising a sustainable development strategy will follow through a comparative analysis of port issues of different countries, thereby making further progress toward academic research on ports.

Stand-alone Real-time Healthcare Monitoring Driven by Integration of Both Triboelectric and Electro-magnetic Effects (실시간 헬스케어 모니터링의 독립 구동을 위한 접촉대전 발전과 전자기 발전 원리의 융합)

  • Cho, Sumin;Joung, Yoonsu;Kim, Hyeonsu;Park, Minseok;Lee, Donghan;Kam, Dongik;Jang, Sunmin;Ra, Yoonsang;Cha, Kyoung Je;Kim, Hyung Woo;Seo, Kyoung Duck;Choi, Dongwhi
    • Korean Chemical Engineering Research
    • /
    • v.60 no.1
    • /
    • pp.86-92
    • /
    • 2022
  • Recently, the bio-healthcare market is enlarging worldwide due to various reasons such as the COVID-19 pandemic. Among them, biometric measurement and analysis technology are expected to bring about future technological innovation and socio-economic ripple effect. Existing systems require a large-capacity battery to drive signal processing, wireless transmission part, and an operating system in the process. However, due to the limitation of the battery capacity, it causes a spatio-temporal limitation on the use of the device. This limitation can act as a cause for the disconnection of data required for the user's health care monitoring, so it is one of the major obstacles of the health care device. In this study, we report the concept of a standalone healthcare monitoring module, which is based on both triboelectric effects and electromagnetic effects, by converting biomechanical energy into suitable electric energy. The proposed system can be operated independently without an external power source. In particular, the wireless foot pressure measurement monitoring system, which is rationally designed triboelectric sensor (TES), can recognize the user's walking habits through foot pressure measurement. By applying the triboelectric effects to the contact-separation behavior that occurs during walking, an effective foot pressure sensor was made, the performance of the sensor was verified through an electrical output signal according to the pressure, and its dynamic behavior is measured through a signal processing circuit using a capacitor. In addition, the biomechanical energy dissipated during walking is harvested as electrical energy by using the electromagnetic induction effect to be used as a power source for wireless transmission and signal processing. Therefore, the proposed system has a great potential to reduce the inconvenience of charging caused by limited battery capacity and to overcome the problem of data disconnection.

ICT Company Profiling Analysis and the Mechanism for Performance Creation Depending on the Type of Government Start-up Support Program (정부창업지원 프로그램 참여에 따른 ICT 기업 프로파일링과 성과창출 메커니즘)

  • Ha, Sangjip;Park, Do-Hyung
    • Journal of Intelligence and Information Systems
    • /
    • v.28 no.3
    • /
    • pp.237-258
    • /
    • 2022
  • As the global market environment changes, the domestic ICT industry has a growing influence on the world economy. This industry is regarded as an important driving force in the national economy from a technological and social point of view. In particular, small and medium-sized enterprises (SMEs) in the ICT industry are regarded as essential actors of domestic economic development in terms of company diversity, technology development and job creation. However, since it is small compared to large-sized enterprises, it is difficult for SMEs to survive with a differentiated strategy in an incomplete and rapidly changing environment. Therefore, SMEs must make a lot of efforts to improve their own capabilities, and the government needs to provide the desirable help suitable for corporate internal resources so that they can continue to be competitive. This study classifies the types of ICT SMEs participating in government support programs, and analyzes the relationship between resources and performance creation of each type. The data from the "ICT Small and Medium Enterprises Survey" conducted annually by the Ministry of Science and ICT was used. In the first stage, ICT SMEs were clustered based on common factors according to their experiences with government support programs. Three clusters were meaningfully classified, and each cluster was named "active participation type," "initial support type," and "soloist type." As a second step, this study compared the characteristics of each cluster through profiling analysis for each cluster. The third step carried out in this study was to find out the mechanism of R&D performance creation for each cluster through regression analysis. Different factors affected performance creation for each cluster, and the magnitude of the influence was also different. Specifically, for "active participation type", "current manpower", "technology competitiveness", and "R&D investment in the previous year" were found to be important factors in creating R&D performance. "Initial support type" was identified as "whether or not a dedicated R&D organization exists", "R&D investment amount in the previous year", "Ratio of sales to large companies", and "Ratio of vendors supplied to large companies" contributed to the performance. Lastly, in the case of "soloist type", "current workforce" and "future recruitment plan", "technological competitiveness", "R&D investment", "large company sales ratio", and "overseas sales ratio" showed a significant relationship with the performance. This study has practical implications of showing what strategy should be established when supporting SMEs in the future according to the government's participation in the startup program and providing a guide on what kind of support should be provided.