DOI QR코드

DOI QR Code

Improvement of RocksDB Performance via Large-Scale Parameter Analysis and Optimization

  • Received : 2020.12.30
  • Accepted : 2021.04.26
  • Published : 2022.06.30

Abstract

Database systems usually have many parameters that must be configured by database administrators and users. RocksDB achieves fast data writing performance using a log-structured merged tree. This database has many parameters associated with write and space amplifications. Write amplification degrades the database performance, and space amplification leads to an increased storage space owing to the storage of unwanted data. Previously, it was proven that significant performance improvements can be achieved by tuning the database parameters. However, tuning the multiple parameters of a database is a laborious task owing to the large number of potential configuration combinations. To address this problem, we selected the important parameters that affect the performance of RocksDB using random forest. We then analyzed the effects of the selected parameters on write and space amplifications using analysis of variance. We used a genetic algorithm to obtain optimized values of the major parameters. The experimental results indicate an insignificant reduction (-5.64%) in the execution time when using these optimized values; however, write amplification, space amplification, and data processing rates improved considerably by 20.65%, 54.50%, and 89.68%, respectively, as compared to the performance when using the default settings.

Keywords

Acknowledgement

This work was supported by Institute of Information & communications Technology Planning & Evaluation (IITP) grant funded by the Korea government (MSIT) (No. IITP-2017-0-00477, SW starlab - Research and development of the high performance in-memory distributed DBMS based on flash memory storage in IoT environment) and Korea Ministry of Land, Infrastructure and Transport (MOLIT) as "Innovative Talent Education Program for Smart City".

References

  1. F. Mei, Q. Cao, H. Jiang, and L. Tian, "LSM-tree managed storage for large-scale key-value store," IEEE Transactions on Parallel and Distributed Systems, vol. 30, no. 2, pp. 400-414, 2018. https://doi.org/10.1109/TPDS.2018.2864209
  2. RocksDB, "A persistent key-value store for fast storage environments," 2021 [Online]. Available: https://rocksdb.org.
  3. P. O'Neil, E. Cheng, D. Gawlick, and E. O'Neil, "The log-structured merge-tree (LSM-tree)," Acta Informatica, vol. 33, no. 4, pp. 351-385, 1996. https://doi.org/10.1007/s002360050048
  4. K. Ouaknine, O. Agra, and Z. Guz, "Optimization of RocksDB for Redis on flash," in Proceedings of the International Conference on Compute and Data Analysis, Lakeland, FL, 2017, pp. 155-161.
  5. Z. Cao, S. Dong, S. Vemuri, and D. H. Du, "Characterizing, modeling, and benchmarking RocksDB keyvalue workloads at Facebook," in Proceedings of the 18th USENIX Conference on File and Storage Technologies (FAST 2020), Santa Clara, CA, 2020, pp. 209-223.
  6. H. Kim, J. H. Park, S. H. Jung, and S. W. Lee, "Optimizing RocksDB for better read throughput in blockchain systems," in Proceedings of the 23rd International Computer Science and Engineering Conference (ICSEC), Phuket, Thailand, 2019, pp. 305-309.
  7. X. Y. Hu, E. Eleftheriou, R. Haas, I. Iliadis, and R. Pletka, "Write amplification analysis in flash-based solid state drives," in Proceedings of SYSTOR 2009: The Israeli Experimental Systems Conference, Haifa, Israel, 2009, pp. 1-9.
  8. Y. Lu, J. Shu, and W. Zheng, "Extending the lifetime of flash-based storage through reducing write amplification from file systems," in Proceedings of the 11th USENIX Conference on File and Storage Technologies (FAST 2013), San Jose, CA, 2013, pp. 257-270.
  9. S. Odeh and Y. Cassuto, "NAND flash architectures reducing write amplification through multi-write codes," in Proceedings of 2014 30th Symposium on Mass Storage Systems and Technologies (MSST), Santa Clara, CA, 2014, pp. 1-10.
  10. L. Breiman, "Random forests," Machine Learning, vol. 45, no. 1, pp. 5-32, 2001. https://doi.org/10.1023/A:1010933404324
  11. D. C. Howell, Statistical Methods for Psychology. Pacific Grove, CA: Duxbury, 2002.
  12. D. Whitley, "A genetic algorithm tutorial," Statistics and Computing, vol. 4, no. 2, pp. 65-85, 1994. https://doi.org/10.1007/BF00175354
  13. S. J. Mardle, S. Pascoe, and M. Tamiz, "An investigation of genetic algorithms for the optimization of multi-objective fisheries bioeconomic models," International Transactions in Operational Research, vol. 7, no. 1, pp. 33-49, 2000. https://doi.org/10.1016/S0969-6016(99)00027-1
  14. S. Jena, P. Patro, and S. S. Behera, "Multi-objective optimization of design parameters of a shell & tube type heat exchanger using genetic algorithm," International Journal of Current Engineering and Technology, vol. 3, no. 4, pp. 1379-1386, 2013.
  15. GitHub, "Benchmarking tools," 2022 [Online]. Available: https://github.com/facebook/rocksdb/wiki/Benchmarking-tools.
  16. S. Dong, M. Callaghan, L. Galanis, D. Borthakur, T. Savor, and M. Strum, "Optimizing space amplification in RocksDB," in Proceedings of the 8th Biennial Conference on Innovative Data Systems Research (CIDR), Chaminade, CA, 2017.
  17. F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, et al., "Scikit-learn: machine learning in Python," Journal of Machine Learning Research, vol. 12, pp. 2825-2830, 2011.
  18. K. Kanellis, R. Alagappan, and S. Venkataraman, "Too many knobs to tune? towards faster database tuning by pre-selecting important knobs," in Proceedings of the 12th USENIX Workshop on Hot Topics in Storage and File Systems (HotStorage 2020), Virtual Event, 2020.
  19. D. Van Aken, A. Pavlo, G. J. Gordon, and B. Zhang, "Automatic database management system tuning through large-scale machine learning," in Proceedings of the 2017 ACM International Conference on Management of Data, Chicago, IL, 2017, pp. 1009-1024.
  20. Y. Zhu, J. Liu, M. Guo, Y. Bao, W. Ma, Z. Liu, K. Song and Y. Yang, "BestConfig: tapping the performance potential of systems via automatic configuration tuning," in Proceedings of the 2017 Symposium on Cloud Computing, Santa Clara, CA, 2017, pp. 338-350.
  21. J. Zhang, Y. Liu, K. Zhou, G. Li, Z. Xiao, B. Cheng, et al., "An end-to-end automatic cloud database tuning system using deep reinforcement learning," in Proceedings of the 2019 International Conference on Management of Data, Amsterdam, Netherlands, 2019, pp. 415-432.