Browse > Article
http://dx.doi.org/10.3745/JIPS.04.0244

Improvement of RocksDB Performance via Large-Scale Parameter Analysis and Optimization  

Jin, Huijun (Dept. of Computer Science, Yonsei University)
Choi, Won Gi (Korea Electronics Technology Institute (KETI))
Choi, Jonghwan (Dept. of Computer Science, Yonsei University)
Sung, Hanseung (Tmax Tibero R&D center)
Park, Sanghyun (Dept. of Computer Science, Yonsei University)
Publication Information
Journal of Information Processing Systems / v.18, no.3, 2022 , pp. 374-388 More about this Journal
Abstract
Database systems usually have many parameters that must be configured by database administrators and users. RocksDB achieves fast data writing performance using a log-structured merged tree. This database has many parameters associated with write and space amplifications. Write amplification degrades the database performance, and space amplification leads to an increased storage space owing to the storage of unwanted data. Previously, it was proven that significant performance improvements can be achieved by tuning the database parameters. However, tuning the multiple parameters of a database is a laborious task owing to the large number of potential configuration combinations. To address this problem, we selected the important parameters that affect the performance of RocksDB using random forest. We then analyzed the effects of the selected parameters on write and space amplifications using analysis of variance. We used a genetic algorithm to obtain optimized values of the major parameters. The experimental results indicate an insignificant reduction (-5.64%) in the execution time when using these optimized values; however, write amplification, space amplification, and data processing rates improved considerably by 20.65%, 54.50%, and 89.68%, respectively, as compared to the performance when using the default settings.
Keywords
Database; Genetic Algorithm; Log-Structured Merge-Tree; Optimization; Random Forest; Space Amplification; Write Amplification;
Citations & Related Records
연도 인용수 순위
  • Reference
1 P. O'Neil, E. Cheng, D. Gawlick, and E. O'Neil, "The log-structured merge-tree (LSM-tree)," Acta Informatica, vol. 33, no. 4, pp. 351-385, 1996.   DOI
2 H. Kim, J. H. Park, S. H. Jung, and S. W. Lee, "Optimizing RocksDB for better read throughput in blockchain systems," in Proceedings of the 23rd International Computer Science and Engineering Conference (ICSEC), Phuket, Thailand, 2019, pp. 305-309.
3 S. Odeh and Y. Cassuto, "NAND flash architectures reducing write amplification through multi-write codes," in Proceedings of 2014 30th Symposium on Mass Storage Systems and Technologies (MSST), Santa Clara, CA, 2014, pp. 1-10.
4 D. C. Howell, Statistical Methods for Psychology. Pacific Grove, CA: Duxbury, 2002.
5 S. J. Mardle, S. Pascoe, and M. Tamiz, "An investigation of genetic algorithms for the optimization of multi-objective fisheries bioeconomic models," International Transactions in Operational Research, vol. 7, no. 1, pp. 33-49, 2000.   DOI
6 GitHub, "Benchmarking tools," 2022 [Online]. Available: https://github.com/facebook/rocksdb/wiki/Benchmarking-tools.
7 K. Kanellis, R. Alagappan, and S. Venkataraman, "Too many knobs to tune? towards faster database tuning by pre-selecting important knobs," in Proceedings of the 12th USENIX Workshop on Hot Topics in Storage and File Systems (HotStorage 2020), Virtual Event, 2020.
8 Y. Zhu, J. Liu, M. Guo, Y. Bao, W. Ma, Z. Liu, K. Song and Y. Yang, "BestConfig: tapping the performance potential of systems via automatic configuration tuning," in Proceedings of the 2017 Symposium on Cloud Computing, Santa Clara, CA, 2017, pp. 338-350.
9 F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, et al., "Scikit-learn: machine learning in Python," Journal of Machine Learning Research, vol. 12, pp. 2825-2830, 2011.
10 S. Jena, P. Patro, and S. S. Behera, "Multi-objective optimization of design parameters of a shell & tube type heat exchanger using genetic algorithm," International Journal of Current Engineering and Technology, vol. 3, no. 4, pp. 1379-1386, 2013.
11 F. Mei, Q. Cao, H. Jiang, and L. Tian, "LSM-tree managed storage for large-scale key-value store," IEEE Transactions on Parallel and Distributed Systems, vol. 30, no. 2, pp. 400-414, 2018.   DOI
12 RocksDB, "A persistent key-value store for fast storage environments," 2021 [Online]. Available: https://rocksdb.org.
13 K. Ouaknine, O. Agra, and Z. Guz, "Optimization of RocksDB for Redis on flash," in Proceedings of the International Conference on Compute and Data Analysis, Lakeland, FL, 2017, pp. 155-161.
14 Z. Cao, S. Dong, S. Vemuri, and D. H. Du, "Characterizing, modeling, and benchmarking RocksDB keyvalue workloads at Facebook," in Proceedings of the 18th USENIX Conference on File and Storage Technologies (FAST 2020), Santa Clara, CA, 2020, pp. 209-223.
15 L. Breiman, "Random forests," Machine Learning, vol. 45, no. 1, pp. 5-32, 2001.   DOI
16 D. Whitley, "A genetic algorithm tutorial," Statistics and Computing, vol. 4, no. 2, pp. 65-85, 1994.   DOI
17 S. Dong, M. Callaghan, L. Galanis, D. Borthakur, T. Savor, and M. Strum, "Optimizing space amplification in RocksDB," in Proceedings of the 8th Biennial Conference on Innovative Data Systems Research (CIDR), Chaminade, CA, 2017.
18 D. Van Aken, A. Pavlo, G. J. Gordon, and B. Zhang, "Automatic database management system tuning through large-scale machine learning," in Proceedings of the 2017 ACM International Conference on Management of Data, Chicago, IL, 2017, pp. 1009-1024.
19 Y. Lu, J. Shu, and W. Zheng, "Extending the lifetime of flash-based storage through reducing write amplification from file systems," in Proceedings of the 11th USENIX Conference on File and Storage Technologies (FAST 2013), San Jose, CA, 2013, pp. 257-270.
20 J. Zhang, Y. Liu, K. Zhou, G. Li, Z. Xiao, B. Cheng, et al., "An end-to-end automatic cloud database tuning system using deep reinforcement learning," in Proceedings of the 2019 International Conference on Management of Data, Amsterdam, Netherlands, 2019, pp. 415-432.
21 X. Y. Hu, E. Eleftheriou, R. Haas, I. Iliadis, and R. Pletka, "Write amplification analysis in flash-based solid state drives," in Proceedings of SYSTOR 2009: The Israeli Experimental Systems Conference, Haifa, Israel, 2009, pp. 1-9.