• Title/Summary/Keyword: Sparse data

Search Result 413, Processing Time 0.021 seconds

Stock Price Prediction by Utilizing Category Neutral Terms: Text Mining Approach (카테고리 중립 단어 활용을 통한 주가 예측 방안: 텍스트 마이닝 활용)

  • Lee, Minsik;Lee, Hong Joo
    • Journal of Intelligence and Information Systems
    • /
    • v.23 no.2
    • /
    • pp.123-138
    • /
    • 2017
  • Since the stock market is driven by the expectation of traders, studies have been conducted to predict stock price movements through analysis of various sources of text data. In order to predict stock price movements, research has been conducted not only on the relationship between text data and fluctuations in stock prices, but also on the trading stocks based on news articles and social media responses. Studies that predict the movements of stock prices have also applied classification algorithms with constructing term-document matrix in the same way as other text mining approaches. Because the document contains a lot of words, it is better to select words that contribute more for building a term-document matrix. Based on the frequency of words, words that show too little frequency or importance are removed. It also selects words according to their contribution by measuring the degree to which a word contributes to correctly classifying a document. The basic idea of constructing a term-document matrix was to collect all the documents to be analyzed and to select and use the words that have an influence on the classification. In this study, we analyze the documents for each individual item and select the words that are irrelevant for all categories as neutral words. We extract the words around the selected neutral word and use it to generate the term-document matrix. The neutral word itself starts with the idea that the stock movement is less related to the existence of the neutral words, and that the surrounding words of the neutral word are more likely to affect the stock price movements. And apply it to the algorithm that classifies the stock price fluctuations with the generated term-document matrix. In this study, we firstly removed stop words and selected neutral words for each stock. And we used a method to exclude words that are included in news articles for other stocks among the selected words. Through the online news portal, we collected four months of news articles on the top 10 market cap stocks. We split the news articles into 3 month news data as training data and apply the remaining one month news articles to the model to predict the stock price movements of the next day. We used SVM, Boosting and Random Forest for building models and predicting the movements of stock prices. The stock market opened for four months (2016/02/01 ~ 2016/05/31) for a total of 80 days, using the initial 60 days as a training set and the remaining 20 days as a test set. The proposed word - based algorithm in this study showed better classification performance than the word selection method based on sparsity. This study predicted stock price volatility by collecting and analyzing news articles of the top 10 stocks in market cap. We used the term - document matrix based classification model to estimate the stock price fluctuations and compared the performance of the existing sparse - based word extraction method and the suggested method of removing words from the term - document matrix. The suggested method differs from the word extraction method in that it uses not only the news articles for the corresponding stock but also other news items to determine the words to extract. In other words, it removed not only the words that appeared in all the increase and decrease but also the words that appeared common in the news for other stocks. When the prediction accuracy was compared, the suggested method showed higher accuracy. The limitation of this study is that the stock price prediction was set up to classify the rise and fall, and the experiment was conducted only for the top ten stocks. The 10 stocks used in the experiment do not represent the entire stock market. In addition, it is difficult to show the investment performance because stock price fluctuation and profit rate may be different. Therefore, it is necessary to study the research using more stocks and the yield prediction through trading simulation.

Recent Research for the Seismic Activities and Crustal Velocity Structure (국내 지진활동 및 지각구조 연구동향)

  • Kim, Sung-Kyun;Jun, Myung-Soon;Jeon, Jeong-Soo
    • Economic and Environmental Geology
    • /
    • v.39 no.4 s.179
    • /
    • pp.369-384
    • /
    • 2006
  • Korean Peninsula, located on the southeastern part of Eurasian plate, belongs to the intraplate region. The characteristics of intraplate earthquake show the low and rare seismicity and the sparse and irregular distribution of epicenters comparing to interplate earthquake. To evaluate the exact seismic activity in intraplate region, long-term seismic data including historical earthquake data should be archived. Fortunately the long-term historical earthquake records about 2,000 years are available in Korea Peninsula. By the analysis of this historical and instrumental earthquake data, seismic activity was very high in 16-18 centuries and is more active at the Yellow sea area than East sea area. Comparing to the high seismic activity of the north-eastern China in 16-18 centuries, it is inferred that seismic activity in two regions shows close relationship. Also general trend of epicenter distribution shows the SE-NW direction. In Korea Peninsula, the first seismic station was installed at Incheon in 1905 and 5 additional seismic stations were installed till 1943. There was no seismic station from 1945 to 1962, but a World Wide Standardized Seismograph was installed at Seoul in 1963. In 1990, Korean Meteorological Adminstration(KMA) had established centralized modem seismic network in real-time, consisted of 12 stations. After that time, many institutes tried to expand their own seismic networks in Korea Peninsula. Now KMA operates 35 velocity-type seismic stations and 75 accelerometers and Korea Institute of Geoscience and Mineral Resources operates 32 and 16 stations, respectively. Korea Institute of Nuclear Safety and Korea Electric Power Research Institute operate 4 and 13 stations, consisted of velocity-type and accelerometer. In and around the Korean Peninsula, 27 intraplate earthquake mechanisms since 1936 were analyzed to understand the regional stress orientation and tectonics. These earthquakes are largest ones in this century and may represent the characteristics of earthquake in this region. Focal mechanism of these earthquakes show predominant strike-slip faulting with small amount of thrust components. The average P-axis is almost horizontal ENE-WSW. In north-eastern China, strike-slip faulting is dominant and nearly horizontal average P-axis in ENE-WSW is very similar with the Korean Peninsula. On the other hand, in the eastern part of East Sea, thrust faulting is dominant and average P-axis is horizontal with ESE-WNW. This indicate that not only the subducting Pacific Plate in east but also the indenting Indian Plate controls earthquake mechanism in the far east of the Eurasian Plate. Crustal velocity model is very important to determine the hypocenters of the local earthquakes. But the crust model in and around Korean Peninsula is not clear till now, because the sufficient seismic data could not accumulated. To solve this problem, reflection and refraction seismic survey and seismic wave analysis method were simultaneously applied to two long cross-section traversing the southern Korean Peninsula since 2002. This survey should be continuously conducted.

Short-term Results of Hematopoietic Stem Cell Transplantation for Children with Myelodysplastic Syndrome (소아 골수이형성 증후군에서 조혈모세포이식의 단기간 결과 분석)

  • Lee, Jin;Kim, Soh Yeon;Cho, Bin;Jang, Pil Sang;Chung, Nak Gyun;Kim, Hack Ki
    • Clinical and Experimental Pediatrics
    • /
    • v.45 no.3
    • /
    • pp.370-375
    • /
    • 2002
  • Purpose : In most cases, myelodysplastic syndrome(MDS) transforms into a more aggressive state or acute myelogenous leukemia; it's prognosis is very poor. It is believed that hematopoietic stem cell transplantation(HSCT) is the only curative treatment of MDS, but available data in children are very sparse. In this report, the short term outcome of HSCT in childhood MDS was analyzed. Methods : Ten children with MDS(CMMoL 5, RAEB 3, RAEBt 2) underwent HSCT(HLA-matched sibling transplantation 4, HLA-matched unrelated transplantation 4, cord blood transplantation 1, HLA-mismatched familial transplantation 1) between November 1995 and January 2001 at St. Mary's Hospital. Median follow-up duration was 11 months. Results : Engraftment was successful in all cases and 8 patients are alive without disease. Three cases of VOD were observed and improved without complication. Four cases of grade II and 1 case of grade III acute GVHD were observed and well controlled with treatment. Three patients relapsed after transplantation. One patient is alive without disease after cytoreduction with allogenic stem cell rescue and 2 patients died of relapse. Conclusion : HSCT is a curative strategy of MDS and the survival rate is relatively higher than that of adults. But there is an obvious need for more studies because of the small number of patients and the short duration of the follow-up.