Blockchain Based Financial Portfolio Management Using A3C

Kim, Ju-Bong;Heo, Joo-Seong;Lim, Hyun-Kyo;Kwon, Do-Hyung;Han, Youn-Hee;

doi:10.3745/KTCCS.2019.8.1.17

KIPS Transactions on Computer and Communication Systems (정보처리학회논문지:컴퓨터 및 통신 시스템)

Volume 8 Issue 1
/
Pages.17-28
/
2019
/
2287-5891(pISSN)
/
2734-049X(eISSN)

Korea Information Processing Society (한국정보처리학회)

DOI QR Code

Blockchain Based Financial Portfolio Management Using A3C

A3C를 활용한 블록체인 기반 금융 자산 포트폴리오 관리

김주봉 (한국기술교육대학교 컴퓨터공학부) ;
허주성 (한국기술교육대학교 컴퓨터공학부) ;
임현교 (한국기술교육대학교 창의융합공학협동과정 ICT융합) ;
권도형 (한국기술교육대학교 창의융합공학협동과정 ICT융합) ;
한연희 (한국기술교육대학교 컴퓨터공학부)

Received : 2018.12.04
Accepted : 2018.12.13
Published : 2019.01.31

https://doi.org/10.3745/KTCCS.2019.8.1.17 Citation PDF KSCI HTML

Download PDF

⟨ Previous Next ⟩

Abstract

In the financial investment management strategy, the distributed investment selecting and combining various financial assets is called portfolio management theory. In recent years, the blockchain based financial assets, such as cryptocurrencies, have been traded on several well-known exchanges, and an efficient portfolio management approach is required in order for investors to steadily raise their return on investment in cryptocurrencies. On the other hand, deep learning has shown remarkable results in various fields, and research on application of deep reinforcement learning algorithm to portfolio management has begun. In this paper, we propose an efficient financial portfolio investment management method based on Asynchronous Advantage Actor-Critic (A3C), which is a representative asynchronous reinforcement learning algorithm. In addition, since the conventional cross-entropy function can not be applied to portfolio management, we propose a proper method where the existing cross-entropy is modified to fit the portfolio investment method. Finally, we compare the proposed A3C model with the existing reinforcement learning based cryptography portfolio investment algorithm, and prove that the performance of the proposed A3C model is better than the existing one.

금융투자 관리 전략 중에서 여러 금융 상품을 선택하고 조합하여 분산 투자하는 것을 포트폴리오 관리 이론이라 부른다. 최근, 블록체인 기반 금융 자산, 즉 암호화폐들이 몇몇 유명 거래소에 상장되어 거래가 되고 있으며, 암호화폐 투자자들이 암호화폐에 대한 투자 수익을 안정적으로 올리기 위하여 효율적인 포트폴리오 관리 방안이 요구되고 있다. 한편 딥러닝이 여러 분야에서 괄목할만한 성과를 보이면서 심층 강화학습 알고리즘을 포트폴리오 관리에 적용하는 연구가 시작되었다. 본 논문은 기존에 발표된 심층강화학습 기반 금융 포트폴리오 투자 전략을 바탕으로 대표적인 비동기 심층 강화학습 알고리즘인 Asynchronous Advantage Actor-Critic (A3C)를 적용한 효율적인 금융 포트폴리오 투자 관리 기법을 제안한다. 또한, A3C를 포트폴리오 투자 관리에 접목시키는 과정에서 기존의 Cross-Entropy 함수를 그대로 적용할 수 없기 때문에 포트폴리오 투자 방식에 적합하게 기존의 Cross-Entropy를 변형하여 그 해법을 제시한다. 마지막으로 기존에 발표된 강화학습 기반 암호화폐 포트폴리오 투자 알고리즘과의 비교평가를 수행하여, 본 논문에서 제시하는 Deterministic Policy Gradient based A3C 모델의 성능이 우수하다는 것을 입증하였다.

Keywords

JBCRIN_2019_v8n1_17_f0001.png 이미지

Fig. 1. A3C Model Architecture

JBCRIN_2019_v8n1_17_f0002.png 이미지

Fig. 2. Structure of Dataset X_t for Window Size n

JBCRIN_2019_v8n1_17_f0003.png 이미지

Fig. 3. Portfolio Management Process Example

JBCRIN_2019_v8n1_17_f0004.png 이미지

Fig. 4. Weight Distribution Ratio of Back-Test #3-2

Table 1. Example of Abnormal Data in the Collected BTC Data

JBCRIN_2019_v8n1_17_t0001.png 이미지

Table 2. Data ranges for back-test

JBCRIN_2019_v8n1_17_t0002.png 이미지

Table 3. The Final PVVR Comparison between Models A3C-DPG, DPG, and Random

JBCRIN_2019_v8n1_17_t0003.png 이미지

Table 4. The TE and IR Values of A3C-DPG Model based on DPG Model

JBCRIN_2019_v8n1_17_t0004.png 이미지

References

Nakamoto, Satoshi, Bitcoin: A Peer-to-Peer Electronic Cash System, Cryptography Mailing list at https://metzdowd.com, 2009.
"GUNBOT - Crypto Trading Bot," GUNBOT, https://www.gunbot.com, 2018.
"start [ProfitTrailer Wiki]", ProfitTrailer, https://wiki.profittrailer.com/doku.php?id=start, 2018.
I. Kaastra and M. Boyd, “Designing a neural network for forecasting financial and economic time series,” Neurocomputing, Vol. 10, No. 3, pp. 215-236, 1996. https://doi.org/10.1016/0925-2312(95)00039-9
Candela, "Dataset shift in machine learning," London: MIT Press, 006.3 CAN, 2009.
Y. B, Kim, "Predicting Fluctuations in Cryptocurrency Transactions Based on User Comments and Replies," PLoS ONE, Vol. 11, No. 8, e0161197, 2016. https://doi.org/10.1371/journal.pone.0161197
Sean McNally, "Predicting the Price of Bitcoin Using Machine Learning," 26th Euromicro International Conference on Parallel, Distributed and Network-based Processing, pp. 339-343, Mar. 2018.
R. Sutton and A. Barto, "Reinforcement Learning: an Introduction," MIT Press, 1998.
Volodymyr Mnih, "Asynchronous Methods for Deep Reinforcement Learning," Proceedings of the 33rd International Conference on MachineLearning, New York, NY, USA, 2016. JMLR: W&CP volume48.
Arun Nair, "Massively Parallel Methods for Deep Reinforcement Learning," at Deep Learning Workshop, International Conference on Machine Learning, Lille, France, 2015.
Zhengyao Jiang, "A Deep Reinforcement Learning Framework for the Financial Portfolio Management Problem," In JMLR, 30 pages, 5 figures, 2017.
Christopher JCH Watkins and Peter Dayan. "Q-Learning," Machine Learning, Vol. 8, No. 3-4, pp. 279-292, 1992. https://doi.org/10.1023/A:1022676722315
Kai Arulkumaran, "A Brief Survey of Deep Reinforcement Learning," in IEEE Signal Processing Magazine Special Issue On Deep Learning For Image Understanding, 2017.
Hado van Hasselt, "Deep Reinforcement Learning with Double Q-learning," Proceedings of 30th AAAI Conference on Artificial Intelligence (AAAI-16).
David Silver, Guy Lever, Nicolas Heess, Thomas Degris, Daan Wierstra, and Martin Ried-miller, "Deterministic Policy Gradient Algorithms," ICML(International Conference on Machine Learning) Proceedings of the 31st, pp. 387-395, 2014.
K. Chepuri, T. Homem de Mello, "Solving the vehicle routing problem with stochastic demands using the cross entropy method," Annals of Operations Research, 2004.
G. Alon, D. P. Kroese, T. Raviv, and R. Y. Rubinstein, "Application of the Cross-entropy method to the buffer allocation problem in a simulation-based environment," Annals of Operations Research, 2004.
Gaivoronski, "Stochastic nonstationary optimization for finding universal portfolios," in Annals of Operations Research, Vol. 100, No. 1, pp. 165-188, 2000. https://doi.org/10.1023/A:1019271201970
Agarwal, A., "Algorithms for portfolio management basedon the newton method," in ICML, New York, NY, USA (2006)
Bin Li, Peilin Zhao, Steven C. H. Hoi, and Vivekanand Gopalkrishnan. "Passive aggressive mean reversion strategy for portfolio selection," PSMR, Machine Learning, Vol. 87, No. 2, pp. 221-258, 2012. https://doi.org/10.1007/s10994-012-5281-z
Seyed Taghi Akhavan Niaki and Saeid Hoseinzade. "Forecasting S&P 500 index using artificial neural networks and design of experiments," Journal of Industrial Engineering International, Vol. 9, No. 1, p.1, 2013. https://doi.org/10.1186/2251-712X-9-1
Katia Sycara, K. Decker and Dajun Zeng, "Designing a Multi-Agent Portfolio Management System," Proceedings of the AAAI Workshop on Internet Information Systems, 1995.
K. Sycara, A. Pannu, M. Willamson, Dajun Zeng, K. Decker, "Distributed intelligent agents," IEEE Expert, Vol. 11, Issue 6, Dec. 1996.
Hiroshi Takahashi, "Analyzing the Effectiveness of Investment Strategies through Agent-based Modelling: Overconfident Investment Decision Making and Passive Investment Strategies," eKNOW, 6th International Conference, 2014.
"API - Bithumb," Bithumb, https://www.bithumb.com/u1/US127, 2018.
Mnih, Volodymyr, “Human-level control through deep reinforcementlearning,” Nature, Vol. 518, No. 7540, pp. 529-533, 2015. https://doi.org/10.1038/nature14236
Mu Li, "Efficient Mini-batch Training for Stochastic Optimization," In 2014 ACM, 978-1-4503-2956-9, 2014.
Edward Qian, “Active Risk And Information Ratio,” Journal of Investment Management, Vol. 2, No. 3, pp. 1-15, 2004. https://doi.org/10.11648/j.jim.20130201.11