Browse > Article
http://dx.doi.org/10.9708/jksci.2021.26.09.001

A Tool for On-the-fly Repairing of Atomicity Violation in GPU Program Execution  

Lee, Keonpyo (Dept. of AI Convergence Engineering, Gyeongsang National University)
Lee, Seongjin (Dept. of AI Convergence Engineering, Gyeongsang National University)
Jun, Yong-Kee (Dept. of Aerospace Software Engineering, Gyeongsang National University)
Abstract
In this paper, we propose a tool called ARCAV (Atomatic Recovery of CUDA Atomicity violation) to automatically repair atomicity violations in GPU (Graphics Processing Unit) program. ARCAV monitors information of every barrier and memory to make actual memory writes occur at the end of the barrier region or to make the program execute barrier region again. Existing methods do not repair atomicity violations but only detect the atomicity violations in GPU programs because GPU programs generally do not support lock and sleep instructions which are necessary for repairing the atomicity violations. Proposed ARCAV is designed for GPU execution model. ARCAV detects and repairs four patterns of atomicity violations which represent real-world cases. Moreover, ARCAV is independent of memory hierarchy and thread configuration. Our experiments show that the performance of ARCAV is stable regardless of the number of threads or blocks. The overhead of ARCAV is evaluated using four real-world kernels, and its slowdown is 2.1x, in average, of native execution time.
Keywords
Concurrent program; GPU program; Concurrency error; Atomicity violation; On-the-fly repairing;
Citations & Related Records
연도 인용수 순위
  • Reference
1 S. Che, M. Boyer, J. Meng, D. Tarjan, J. W. Sheaffer, S.-H. Lee, and K. Skadron. "Rodinia: A Benchmark Suite for Heterogeneous Computing," In Proceedings of the IEEE International Symposium on Workload Characterization (IISWC), pp. 44-54, Oct. 2009. DOI: 10.1109/IISWC.2009.5306797   DOI
2 S. Zhu, Z. Chen, and G. Sun. "Tuning lock-based multicore program based on sliding windows to tolerate data race," The Journal of Supercomputing, Vol. 75, No. 12, pp. 7872-7894, June 2019. DOI: 10.1007/s11227-019-02921-7   DOI
3 Y. Peng, V. Grover and J. Devietti, "CURD: a dynamic CUDA race detector," ACM SIGPLAN Notices, Vol. 53, pp. 390-403, April 2018. DOI: 10.1145/3296979.3192368   DOI
4 S. Lu, S. Park, E. Seo and Y. Zhou, "Learning from mistakes: a comprehensive study on real world concurrency bug characteristics," ACM SIGOPS Operating Systems Review, Vol. 42, pp. 329-339, March 2008. DOI: 10.1145/1346281.1346323   DOI
5 M. Zhang, Y. Wu, S. Lu, S. Qi, J. Ren and W. Zheng, "A lightweight system for detecting and tolerating concurrency bugs," IEEE Trans.Software Eng., pp. 899-917, Oct. 2016. DOI: 10.1109/TSE.2016.2531666   DOI
6 J. Yu and S. Narayanasamy, "Tolerating concurrency bugs using transactions as lifeguards," in Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture, pp. 263-274, Dec. 2010. DOI: 10.1109/MICRO.2010.56   DOI
7 J. Yu and S. Narayanasamy, "A case for an interleaving constrained shared-memory multi-processor," ACM SIGARCH Computer Architecture News, Vol. 37, pp. 325-336, June 2009. DOI: 10.1145/1555815.1555796   DOI
8 E. Lindholm, J. Nickolls, S. Oberman and J. Montrym, "NVIDIA Tesla: A Unified Graphics and Computing Architecture," IEEE Micro, Vol. 28, pp. 39-55, May 2008. DOI: 10.1109/MM.2008.31   DOI
9 C. Nvidia, "CUDA C Programming Guide, version 9.1," NVIDIA Corp, 2018.
10 R.H. Netzer and B.P. Miller, "What are race conditions?: Some issues and formalizations," ACM Letters on Programming Languages and Systems, Vol. 1, pp. 74-88, March 1992. DOI: 10.1145/130616.130623   DOI
11 B. Krena, Z. Letko, R. Tzoref, S. Ur and T. Vojnar, "Healing data races on-the-fly," in Proceedings of the 2007 ACM workshop on Parallel and distributed systems: testing and debugging, pp. 54-64, July 2007. DOI: 10.1145/1273647.1273658   DOI
12 B. Lucia, J. Devietti, K. Strauss and L. Ceze, "Atom-aid: Detecting and surviving atomicity violations," ACM SIGARCH Computer Architecture News, Vol. 36, pp. 277-288, July 2008. DOI: 10.1109/ISCA.2008.4   DOI
13 B. Pourghassemi and A. Chandramowlishwaran, "CudaCR: an in-kernel application-level checkpoint/restart scheme for CUDA-enabled GPUs," in 2017 IEEE International Conference on Cluster Computing, pp. 725-732, Sep. 2017. DOI: 10.1109/CLUSTER.2017.100   DOI
14 A. Eizenberg, Y. Peng, T. Pigli, W. Mansky and J. Devietti, "BARRACUDA: binary-level analysis of runtime RAces in CUDA programs," in ACM SIGPLAN Notices, pp. 126-140, June 2017. DOI: 10.1145/3062341.3062342   DOI
15 Y. Xu, R. Wang, N. Goswami, T. Li, L. Gao and D. Qian, "Software transactional memory for gpu architectures," in Proceedings of Annual IEEE/ACM International Symposium on Code Generation and Optimization, pp. 1-10, Feb. 2014. DOI: 10.1145/2581122.2544139   DOI
16 P. Li, X. Hu, D. Chen, J. Brock, H. Luo, E.Z. Zhang and C. Ding, "LD: Low-overhead GPU race detection without access monitoring," ACM Transactions on Architecture and Code Optimization, Vol. 14, pp. 9, 2017.
17 L.L. Pullum, "Software fault tolerance techniques and implementation" Artech House, 2001, .
18 Z. Gu, E.T. Barr, D.J. Hamilton and Z. Su, "Has the bug really been fixed?" in Proceedings of the 32nd ACM/IEEE International Conference on Software Engineering-Volume 1, pp. 55-64, March 2010. DOI: 10.1145/3046678   DOI
19 P. Godefroid and N. Nagappan, "Concurrency at Microsoft: An exploratory survey," in CAV Workshop on Exploiting Concurrency Efficiently and Correctly, May. 2008.