Developing Stock Pattern Searching System using Sequence Alignment Algorithm

서열 정렬 알고리즘을 이용한 주가 패턴 탐색 시스템 개발

  • 김형준 (부산대학교 컴퓨터공학과) ;
  • 조환규 (부산대학교 컴퓨터공학과)
  • Received : 2010.06.01
  • Accepted : 2010.10.07
  • Published : 2010.12.15

Abstract

There are many methods for analyzing patterns in time series data. Although stock data represents a time series, there are few studies on stock pattern analysis and prediction. Since people believe that stock price changes randomly we cannot predict stock prices using a scientific method. In this paper, we measured the degree of the randomness of stock prices using Kolmogorov complexity, and we showed that there is a strong correlation between the degree and the accuracy of stock price prediction using our semi-global alignment method. We transformed the stock price data to quantized string sequences. Then we measured randomness of stock prices using Kolmogorov complexity of the string sequences. We use KOSPI 690 stock data during 28 years for our experiments and to evaluate our methodology. When a high Kolmogorov complexity, the stock price cannot be predicted, when a low complexity, the stock price can be predicted, but the prediction ratio of stock price changes of interest to investors, is 12% prediction ratio for short-term predictions and a 54% prediction ratio for long-term predictions.

시계열 데이터에서 패턴을 분석하는 기법은 많은 발전이 이루어져 오고 있다. 그러나 주식시장의 경우 시계열 데이터임에도 불구하고 패턴 분석 및 예측은 많은 연구가 이루어지지 않고 있으며 예측도가 매우 낮다. 그 이유는 주가의 등락 자체가 본질적으로 무작위하다고 하면 어떠한 과학적 방법으로도 그 예측은 불가능하다. 본 연구에서는 주가의 등락이 보여주는 무작위성의 정도를 Kolmogorov 복잡도를 이용해 측정하여 그 무작위의 정도와 본 논문에서 제시한 반 전역정렬(semi-global alignment)로 예측할 수 있는 주가의 예측의 정확간의 깊은 상관관계가 있음을 보인다. 이를 위해서 주가지수의 등락을 양자화된 문자열로 변환하고 그 문자열의 Kolmogorov 복잡도를 이용해 주가 변동의 무작위성을 측정하였다. 우리는 KOSPI 주식 데이터 28년 690개의 데이터를 수집하여 이를 실험용 데이터로 사용하여 본 논문에서 제시한 방법의 의미를 평가하였다. 그 결과 Kolmogorov 복잡도가 높은 경우에는 변동 예측이 어려우며, Kolmogorov 복잡도가 낮은 경우에는 주식 변동 예측은 가능하나 3종류의 예측율에 대해서 투자자들이 관심이 많은 등락 예측율은 단기 예측은 12% 이상의 예측율을 보일 수 없으며, 장기 예측의 경우 54%의 예측율로 수렴함을 확인하였다.

Keywords

References

  1. Eugene F Fama, "Efficient capital markets: A review of theory and empirical work," Journal of Finance, vol.25, no.2, pp.383-417, May 1970. https://doi.org/10.2307/2325486
  2. Lin W., Orgun M., and Williams G, "An overview of temporal data mining," ADM 02, pp.83-83, 2002.
  3. Eugene F. Fama, "The behavior of stock-market prices," The Journal of Business, vol.38, no.1, pp.34-105, 1965. https://doi.org/10.1086/294743
  4. H. Y. Kim and S. G. Kim, "The Study of the Financial Index Prediction Using the Equalized Multi-layer Arithmetic Neural Network," Journal of KSCI, vol.8, no.3, pp.113-123, 1 2003. (in Korean)
  5. K. S. Cho, K. H. Lee and I. S. Yang. Expert System for Predicting the Stock Market Timing Using Candlesticks Chart. Journal of KIISS, vol.3, no.2, pp.57-70, Dec. 1997. (in Korean)
  6. Richi Nayak and Paul te Braak, "Temporal pattern matching for the prediction of stock prices," AIDM 2007, pp.95-103, 2007.
  7. Bartolozzi M., Leinweber D.B., and Thomas A.W. "Self-organized criticality and stock market dynamics: an empirical study," Physica A: Statistical Mechanics and its Applications, vol.350, no.2-4, pp.451-465, 2005. https://doi.org/10.1016/j.physa.2004.11.061
  8. Gilmore Claire G., Lucey Brian M., and McManus Ginette M, "The dynamics of central european equity market comovements," The Quarterly Review of Economics and Finance, vol.48, no.3, pp.605-622, 2008. https://doi.org/10.1016/j.qref.2006.06.005
  9. Z. R. Struzik, "Wavelet Methods in (Financial) Time-series Processing," Physica A: Statistical Mechanics and its Applications, vol.296 no.1-2, pp.307-319, June 2001. https://doi.org/10.1016/S0378-4371(01)00101-7
  10. Gyozo Gidofalvi, "Using news articles to predict stock price movements," 2001.
  11. T. F. Smith and M. S. Waterman, "Identification of common molecular subsequences." J Mol Biol, vol.147, no.1, pp.195-197, March 1981. https://doi.org/10.1016/0022-2836(81)90087-5
  12. Andrew T. KWon, Holger H. Hoos, and Raymond Ng, "Inference of transcriptional regulation relationships from gene expression data," Bioinformatics, vol.19, no.8, pp.905-912, 2003. https://doi.org/10.1093/bioinformatics/btg106
  13. Sven Meyer zu Eissen and Benno Stein, "Intrinsic plagiarism detection," Lecture Notes in Computer Science, vol.3936, pp.565-569. Springer, 2006.
  14. CloneChecker : A Software Plagiarism Detector. http//ropas.snu.ac.kr/n/clonechecker/.
  15. Narayanan Shivakumar and H'ector Garcia-Molina, "SCAM: A copy detection mechanism for digital documents," 1995.
  16. Geoff Whale, "Plague user manual(release1.2)," Department of Computer Science, University of New South Wales, 1989.
  17. Wise, "YAP3: Improved detection of similarities in computer program and other texts," SIGCSEB: SIGCSE Bulletin (ACM Special Interest Group on Computer Science Education), vol.28, 1996.
  18. David Gitchell and Nicholas Tran. "Sim: a utility for detecting similarity in computer programs," In SIGCSE '99: The proceedings of the thirtieth SIGCSE technical symposium on Computer science education, pp.266-270, 1999.
  19. Ming Li, Xin Chen, Xin Li, Bin Ma, and Paul Vit'anyi, "The similarity metric," In SODA '03: Proceedings of the fourteenth annual ACM-SIAM symposium on Discrete algorithms, pp.863-872, Philadelphia, PA, USA, 2003.
  20. Mehmet M. Dalkilic, Wyatt T. Clark, James C. Costello, and Predrag Radivojac, "Using compression to identify classes of inauthentic texts," In SIAM '06: Proceedings of the 2006. SIAM International Conference on Data Mining, pp.604-608, 2006.
  21. Harry Eugene Stanley Rosario Nunzio Mantegna, "An introduction to econophysics," Cambridge University Press, 2000.