• Title/Summary/Keyword: task classification

Search Result 575, Processing Time 0.023 seconds

The prediction of the stock price movement after IPO using machine learning and text analysis based on TF-IDF (증권신고서의 TF-IDF 텍스트 분석과 기계학습을 이용한 공모주의 상장 이후 주가 등락 예측)

  • Yang, Suyeon;Lee, Chaerok;Won, Jonggwan;Hong, Taeho
    • Journal of Intelligence and Information Systems
    • /
    • v.28 no.2
    • /
    • pp.237-262
    • /
    • 2022
  • There has been a growing interest in IPOs (Initial Public Offerings) due to the profitable returns that IPO stocks can offer to investors. However, IPOs can be speculative investments that may involve substantial risk as well because shares tend to be volatile, and the supply of IPO shares is often highly limited. Therefore, it is crucially important that IPO investors are well informed of the issuing firms and the market before deciding whether to invest or not. Unlike institutional investors, individual investors are at a disadvantage since there are few opportunities for individuals to obtain information on the IPOs. In this regard, the purpose of this study is to provide individual investors with the information they may consider when making an IPO investment decision. This study presents a model that uses machine learning and text analysis to predict whether an IPO stock price would move up or down after the first 5 trading days. Our sample includes 691 Korean IPOs from June 2009 to December 2020. The input variables for the prediction are three tone variables created from IPO prospectuses and quantitative variables that are either firm-specific, issue-specific, or market-specific. The three prospectus tone variables indicate the percentage of positive, neutral, and negative sentences in a prospectus, respectively. We considered only the sentences in the Risk Factors section of a prospectus for the tone analysis in this study. All sentences were classified into 'positive', 'neutral', and 'negative' via text analysis using TF-IDF (Term Frequency - Inverse Document Frequency). Measuring the tone of each sentence was conducted by machine learning instead of a lexicon-based approach due to the lack of sentiment dictionaries suitable for Korean text analysis in the context of finance. For this reason, the training set was created by randomly selecting 10% of the sentences from each prospectus, and the sentence classification task on the training set was performed after reading each sentence in person. Then, based on the training set, a Support Vector Machine model was utilized to predict the tone of sentences in the test set. Finally, the machine learning model calculated the percentages of positive, neutral, and negative sentences in each prospectus. To predict the price movement of an IPO stock, four different machine learning techniques were applied: Logistic Regression, Random Forest, Support Vector Machine, and Artificial Neural Network. According to the results, models that use quantitative variables using technical analysis and prospectus tone variables together show higher accuracy than models that use only quantitative variables. More specifically, the prediction accuracy was improved by 1.45% points in the Random Forest model, 4.34% points in the Artificial Neural Network model, and 5.07% points in the Support Vector Machine model. After testing the performance of these machine learning techniques, the Artificial Neural Network model using both quantitative variables and prospectus tone variables was the model with the highest prediction accuracy rate, which was 61.59%. The results indicate that the tone of a prospectus is a significant factor in predicting the price movement of an IPO stock. In addition, the McNemar test was used to verify the statistically significant difference between the models. The model using only quantitative variables and the model using both the quantitative variables and the prospectus tone variables were compared, and it was confirmed that the predictive performance improved significantly at a 1% significance level.

KB-BERT: Training and Application of Korean Pre-trained Language Model in Financial Domain (KB-BERT: 금융 특화 한국어 사전학습 언어모델과 그 응용)

  • Kim, Donggyu;Lee, Dongwook;Park, Jangwon;Oh, Sungwoo;Kwon, Sungjun;Lee, Inyong;Choi, Dongwon
    • Journal of Intelligence and Information Systems
    • /
    • v.28 no.2
    • /
    • pp.191-206
    • /
    • 2022
  • Recently, it is a de-facto approach to utilize a pre-trained language model(PLM) to achieve the state-of-the-art performance for various natural language tasks(called downstream tasks) such as sentiment analysis and question answering. However, similar to any other machine learning method, PLM tends to depend on the data distribution seen during the training phase and shows worse performance on the unseen (Out-of-Distribution) domain. Due to the aforementioned reason, there have been many efforts to develop domain-specified PLM for various fields such as medical and legal industries. In this paper, we discuss the training of a finance domain-specified PLM for the Korean language and its applications. Our finance domain-specified PLM, KB-BERT, is trained on a carefully curated financial corpus that includes domain-specific documents such as financial reports. We provide extensive performance evaluation results on three natural language tasks, topic classification, sentiment analysis, and question answering. Compared to the state-of-the-art Korean PLM models such as KoELECTRA and KLUE-RoBERTa, KB-BERT shows comparable performance on general datasets based on common corpora like Wikipedia and news articles. Moreover, KB-BERT outperforms compared models on finance domain datasets that require finance-specific knowledge to solve given problems.

A Comparative Study on the Characteristics of Cultural Heritage in China and Vietnam (중국과 베트남의 문화유산 특성 비교 연구)

  • Shin, Hyun-Sil;Jun, Da-Seul
    • Journal of the Korean Institute of Traditional Landscape Architecture
    • /
    • v.40 no.2
    • /
    • pp.34-43
    • /
    • 2022
  • This study compared the characteristics of cultural heritage in China and Vietnam, which have developed in the relationship of mutual geopolitical and cultural influence in history, and the following conclusions were made. First, the definition of cultural heritage in China and Vietnam has similar meanings in both countries. In the case of cultural heritage classification, both countries introduced the legal concept of intangible cultural heritage through UNESCO, and have similarities in terms of intangible cultural heritage. Second, while China has separate laws for managing tangible and intangible cultural heritages, Vietnam integrally manages the two types of cultural heritages under a single law. Vietnam has a slower introduction of the concept of cultural heritage than China, but it shows high integration in terms of system. Third, cultural heritages in both China and Vietnam are graded, which is applied differently depending on the type of heritage. The designation method has a similarity in which the two countries have a vertical structure and pass through steps. By restoring the value of heritage and complementing integrity through such a step-by-step review, balanced development across the country is being sought through tourism to enjoy heritage and create economic effects. Fourth, it was confirmed that the cultural heritage management organization has a central government management agency in both countries, but in China, the authority of local governments is higher than that of Vietnam. In addition, unlike Vietnam, where tangible and intangible cultural heritage are managed by an integrated institution, China had a separate institution in charge of intangible cultural heritage. Fifth, China is establishing a conservation management policy focusing on sustainability that harmonizes the protection and utilization of heritage. Vietnam is making efforts to integrate the contents and spirit of the agreement into laws, programs, and projects related to cultural heritage, especially intangible heritage and economic and social as a whole. However, it is still dependent on the influence of international organizations. Sixth, China and Vietnam are now paying attention to intangible heritage recently introduced, breaking away from the cultural heritage protection policy centered on tangible heritage. In addition, they aim to unite the people through cultural heritage and achieve the nation's unified policy goals. The two countries need to use intangible heritage as an efficient means of preserving local communities or regions. A cultural heritage preservation network should be established for each subject that can integrate the components of intangible heritage into one unit to lay the foundation for the enjoyment of the people. This study has limitations as a research stage comparing the cultural heritage system and preservation management status in China and Vietnam, and the characteristic comparison of cultural heritage policies by type remains a future research task.

Weights for Evaluation items of Conformity index of Bird breeding sites on the West and South coasts of Korea (서·남해 연안성 조류번식지 적합성지수 평가항목 가중치 설정)

  • Kim, Chang-Hyeon;Kim, Won-Bin;Kim, Kyou-Sub;Lee, Chang-Hun
    • Journal of the Korean Institute of Traditional Landscape Architecture
    • /
    • v.41 no.4
    • /
    • pp.40-48
    • /
    • 2023
  • This study is part of a foundational research effort aimed at developing a suitability index for breeding grounds related to avian activities along the domestic South and West coasts, including islands. Focus Group Interviews (FGI) and Analytic Hierarchy Process (AHP) analyses were conducted. The results are as follows. First, as a result of determining the value of the suitability of coastal bird breeding sites, the 'Natural Value(0.763)' was higher than the 'Artificial Value(0.237)'. Other artificial values were identified as sub-ranked except for 'Protected Areas' to ensure continuous integrity of breeding spaces. Second, as a result of re-establishing the 25 evaluation items classified in the two-time FGI as higher concepts, nine natural values and five artificial values were finally selected as a total of 14. Third, the results of the mid-classification evaluation of the importance of the suitability of coastal bird breeding sites were identified in the order of 'Ecological Value(0.392)', 'Topographic Value(0.251)', 'Passive Interference(0.124)', 'Geological Value(0.120)', and 'Active Interference(0.113)'. Fourth, the results of the priority of evaluation items of coastal bird breeding sites were in the order of 'Vegetation Distribution (0.187)', 'Area of Mudflats(0.118)', 'Presence or Absence of Mudflats(0.092)', 'Appearance of Natural Enemies(0.087)', 'Protected Areas(0.08)', 'Island Area (0.069)', 'Over-Breeding devastation(0.064)', 'Soil Composition Ratio(0.056)', 'Distance from Land(0.054)', 'Ocean farm area (0.045)', 'Cultivated land area(0.041)', 'Cultivation behavior(0.038)', 'Angle of the Surface(0.036)', and 'Land Use(0.033)'. It is judged that the weighting result value of the evaluation items derived in this study can be used for priority evaluation focusing on the coastal bird breeding area space. However, it seems that the correlation with the unique habitat suitability of bird individuals needs to be supplemented, and spatial analysis research incorporating species-specific characteristics will be left as a future task.

Transfer Learning using Multiple ConvNet Layers Activation Features with Principal Component Analysis for Image Classification (전이학습 기반 다중 컨볼류션 신경망 레이어의 활성화 특징과 주성분 분석을 이용한 이미지 분류 방법)

  • Byambajav, Batkhuu;Alikhanov, Jumabek;Fang, Yang;Ko, Seunghyun;Jo, Geun Sik
    • Journal of Intelligence and Information Systems
    • /
    • v.24 no.1
    • /
    • pp.205-225
    • /
    • 2018
  • Convolutional Neural Network (ConvNet) is one class of the powerful Deep Neural Network that can analyze and learn hierarchies of visual features. Originally, first neural network (Neocognitron) was introduced in the 80s. At that time, the neural network was not broadly used in both industry and academic field by cause of large-scale dataset shortage and low computational power. However, after a few decades later in 2012, Krizhevsky made a breakthrough on ILSVRC-12 visual recognition competition using Convolutional Neural Network. That breakthrough revived people interest in the neural network. The success of Convolutional Neural Network is achieved with two main factors. First of them is the emergence of advanced hardware (GPUs) for sufficient parallel computation. Second is the availability of large-scale datasets such as ImageNet (ILSVRC) dataset for training. Unfortunately, many new domains are bottlenecked by these factors. For most domains, it is difficult and requires lots of effort to gather large-scale dataset to train a ConvNet. Moreover, even if we have a large-scale dataset, training ConvNet from scratch is required expensive resource and time-consuming. These two obstacles can be solved by using transfer learning. Transfer learning is a method for transferring the knowledge from a source domain to new domain. There are two major Transfer learning cases. First one is ConvNet as fixed feature extractor, and the second one is Fine-tune the ConvNet on a new dataset. In the first case, using pre-trained ConvNet (such as on ImageNet) to compute feed-forward activations of the image into the ConvNet and extract activation features from specific layers. In the second case, replacing and retraining the ConvNet classifier on the new dataset, then fine-tune the weights of the pre-trained network with the backpropagation. In this paper, we focus on using multiple ConvNet layers as a fixed feature extractor only. However, applying features with high dimensional complexity that is directly extracted from multiple ConvNet layers is still a challenging problem. We observe that features extracted from multiple ConvNet layers address the different characteristics of the image which means better representation could be obtained by finding the optimal combination of multiple ConvNet layers. Based on that observation, we propose to employ multiple ConvNet layer representations for transfer learning instead of a single ConvNet layer representation. Overall, our primary pipeline has three steps. Firstly, images from target task are given as input to ConvNet, then that image will be feed-forwarded into pre-trained AlexNet, and the activation features from three fully connected convolutional layers are extracted. Secondly, activation features of three ConvNet layers are concatenated to obtain multiple ConvNet layers representation because it will gain more information about an image. When three fully connected layer features concatenated, the occurring image representation would have 9192 (4096+4096+1000) dimension features. However, features extracted from multiple ConvNet layers are redundant and noisy since they are extracted from the same ConvNet. Thus, a third step, we will use Principal Component Analysis (PCA) to select salient features before the training phase. When salient features are obtained, the classifier can classify image more accurately, and the performance of transfer learning can be improved. To evaluate proposed method, experiments are conducted in three standard datasets (Caltech-256, VOC07, and SUN397) to compare multiple ConvNet layer representations against single ConvNet layer representation by using PCA for feature selection and dimension reduction. Our experiments demonstrated the importance of feature selection for multiple ConvNet layer representation. Moreover, our proposed approach achieved 75.6% accuracy compared to 73.9% accuracy achieved by FC7 layer on the Caltech-256 dataset, 73.1% accuracy compared to 69.2% accuracy achieved by FC8 layer on the VOC07 dataset, 52.2% accuracy compared to 48.7% accuracy achieved by FC7 layer on the SUN397 dataset. We also showed that our proposed approach achieved superior performance, 2.8%, 2.1% and 3.1% accuracy improvement on Caltech-256, VOC07, and SUN397 dataset respectively compare to existing work.