Scalable Race Visualization for Debugging Message-Passing Programs

메시지전달 프로그램의 디버깅을 위한 경합의 확장적 시각화

  • Published : 2005.08.01

Abstract

Detecting unaffected race conditions is important for debugging message-passing programs effectively, because such races can influence other races to occur or not. The previous technique used in detecting unaffected races detects a race by halting the execution of a process at the receive event of the race that errors first in the process. However this technique does not guarantee that all of the detected races are unaffected, because halting the execution of processes does disconnect some chains of affects-relations among those races. Tn this paper. we improved the second pass algorithm of the previous technique by producing information about affects-relations of the races that occur first in each Process. Then we effectively visualize affect-relations among the races detected in each process. This visualization is effective in detecting visually unaffected races by simplifying affects-relations among the races which occur first In each Process.

메시지전달 프로그램에서 발생하는 임의의 메시지경합은 다른 경합의 발생에 영향을 줄 수 있으므로, 효과적인 디버깅을 위해서 영향받지 않은 경합을 탐지하는 것이 중요하다. 이러한 경합을 효율적으로 탐지하기 위한 기존의 기법은 각 프로세스에서 가장 먼저 발생하는 경차의 수신사건에서 수행을 중단하여 경합하는 메시지들을 탐지한다. 그러나 프로세스의 수행 중단은 경합들간에 존재하는 영향관계의 단절을 초래하므로, 탐지된 모든 경합이 영향받지 않은 경합임을 보장하지 못한다. 본 논문은 기존의 두 번째 수행을 위한 알고리즘에 각 프로세스에서 가장 먼저 발생한 경합의 영향관계 정보를 생성하는 알고리즘을 추가하여, 탐지된 경차들간의 관계를 효과적으로 시각화하는 기법을 제안한다. 이러한 시각화는 각 프로세스에서 최초로 발생한 경합들간에 형성된 영향관계를 보임으로써 영향받지 않은 경합을 시각적으로 탐지하는데 효과적이다.

Keywords

References

  1. Cypher, R, and E. Leu, 'The Semantics of Blocking and Nonblocking Send and Receive Primitives,' 8th Intl. Parallel Processing Symp., pp. 729-735, IEEE, Apr. 1994 https://doi.org/10.1109/IPPS.1994.288223
  2. Geist, A., A. Beguelin. J. Dongarra, W. Jiang, R Manchek, and V. Sunderam. 'PVM: Parallel Virtual Machine,' A Users' Guide and Tutorial for Networked Parallel Computing, Cambridge, MIT Press, 1994
  3. Snir, M., S. Otto, S. Huss-Lederman, D. Walker, MPI: The Complete Reference, MIT Press, 1996
  4. Damodaran-Kamal, S. K. and J. M. Francioni, 'Testing Races in Parallel Programs with an OtOt Strategy,' Int'l Symp. on Software Testing and Analysis, pp. 216-227, ACM, Aug. 1994 https://doi.org/10.1145/186258.187242
  5. Kilgore, R. and C. Chase, 'Re-execution of Distributed Programs to Detect Bugs Hidden by Racing Messages,' 30th Annual Hawaii Int'l. Conference on System Sciences, Vol. 1, pp. 423-432, Jan. 1997 https://doi.org/10.1109/HICSS.1997.10000
  6. Netzer, R. H. B., and B. P. Miller, 'Optimal Tracing and Replay for Debugging Message-Passing Parallel Programs,' Supercomputing, pp. 502-511, IEEE/ACM, Nov. 1992 https://doi.org/10.1109/SUPERC.1992.236654
  7. Tai, K C. 'Reachability Testing of Asynchronous Message-Passing Programs,' Int'l. Symp. on Software Engineering for Parallel and Dist. Systems, pp. 50-61, IEEE, May 1997 https://doi.org/10.1109/PDSE.1997.596826
  8. Tai, K C. 'Race Analysis of Traces of Asynchronous Message-Passing Programs,' Int'l. Conf, Distributed Computing Systems, pp. 261-268, IEEE, May 1997 https://doi.org/10.1109/ICDCS.1997.598047
  9. Cypher, R., and E. Leu, 'Efficient Race Detection for Message-Passing Programs with Nonblocking Sends and Receives,' 7th Symp. on Parallel and Distributed Processing, pp. 534-541, IEEE, San Antonio, Texas, 1995 https://doi.org/10.1109/SPDP.1995.530730
  10. Damodaran-Kamal, S. K, and J. M. Francioni, 'Nondeterminacy: Testing and Debugging in Message Passing Parallel Programs,' ACM/ONR Workshop on Parallel and Distributed Debugging, Sigplan Notices, 28(12): 118-128, ACM, Dec. 1993
  11. Netzer, R. H. B., T. W. Brennan, and K D. Suresh, 'Debugging Race Conditions in Message-Passing Programs,' SIGMETRICS Symp. on Parallel and Distributed Tools, ACM, May 1996 https://doi.org/10.1145/238020.238033
  12. Gropp, W. and E. Lusk, User's Guide for Mpich, A Portable Implementation of MPI, TR-ANL-96/6, Argonne National Laboratory, 1996
  13. L. Lamport, 'Time, Clocks, and the Ordering of Events in a Distributed System,' Comm. of the ACM, Vol.21, No.7, pp.558-564, Jul., 1978 https://doi.org/10.1145/359545.359563
  14. Fidge, C. J., 'Partial Orders for Parallel Debugging,' SIGPLAN/SIGOPS Workshop on Parallel and Distributed Debugging, pp. 183-194, ACM, May 1988 https://doi.org/10.1145/68210.69233
  15. Mattern, F., 'Virtual Time and Global States of Distributed Systems,' Parallel and Distributed Algorithms, pp. 215-226, Elsevier Science, North holland, 1989