Race State Transition for Detecting Unaffected Race Conditions in Message-Passing Programs

메시지전달 프로그램의 영향받지 않은 경합조건 탐지를 위한 경합상태 전이기법

  • Published : 2006.08.01

Abstract

Detecting unaffected race conditions is important to debugging message-passing programs effectively, because such a message race can affect other races to occur or not. The previous technique to detect efficiently unaffected races detects racing messages by halting at the receive event of the first race to occur in each process. However this technique does not guarantee that all of the detected races are unaffected, because halting such processes does disconnect some chain of affects-relations among those races. In this paper, we present a novel technique that manages the state of the detected race by examining if every received message is affected until the execution terminates. Our technique therefore guarantees to detect efficiently the unaffected races, because it maintains affects-relations of the races all along the execution of program.

메시지전달 프로그램에서 발생하는 임의의 메시지경합은 다른 경합의 발생에 영향을 줄 수 있으므로, 효과적인 디버깅을 위해서 영향받지 않은 경합을 탐지하는 것이 중요하다. 이러한 경합을 효율적으로 탐지하기 위한 기존의 기법은 각 프로세스에서 가장 먼저 발생하는 경합의 수신사건에서 수행을 중단하여 경합하는 메시지들을 탐지한다. 그러나 프로세스의 수행 중단은 경합들간에 존재하는 영향관계의 단절을 초래하므로, 탐지된 모든 경합이 영향받지 않은 경합임을 보장하지 못한다. 본 논문은 탐지된 경합의 상태를 프로그램의 수행 종료까지 수신하는 메시지들의 영향 여부에 따라 전이하는 새로운 기법을 제안한다. 본 기법은 경합을 탐지하고 그들간의 영향관계를 프로그램 종료까지 유지하므로, 영향받지 않은 경합만을 효율적으로 탐지한다.

Keywords

References

  1. Cypher. R. and E. Leu, 'The Semantics of Blocking and Nonblocking Send and Receive Primitives,' 8th IEEE Intl. Parallel Processing Symp., pp. 729-735, IEEE, Apr. 1994 https://doi.org/10.1109/IPPS.1994.288223
  2. Geist, A., A. Beguelin, J. Dongarra, W. Jiang, R. Manchek, and V. Sunderam. 'PVM: Parallel Virtual Machine,' A Users' Guide and Tutorial for Networked Parallel Computing, Cambridge, MIT Press, 1994
  3. Snir, M., S. Otto, S. Huss-Lederman, O. Walker, MPI: The Complete Reference, MIT Press, 1996
  4. Damodaran-Kamal, S. K. and J. M. Francioni, 'Testing Races in Parallel Programs with an OtOt Strategy,' Int'l Symp. on Software Testing and Analysis, pp. 216-227, ACM, Aug. 1994 https://doi.org/10.1145/186258.187242
  5. Kilgore, Rand C. Chase, 'Re-execution of Distributed Programs to Detect Bugs Hidden by Racing Messages,' 30th Annual Hawaii Int'l. Conference on System Sciences (HICSS), Vol. 1, pp. 423-432, Jan. 1997 https://doi.org/10.1109/HICSS.1997.667295
  6. Kranzlrnuller, D., Event Graph Analysis for Debugging Massively Parallel Programs, Ph.D. Dissertation, Joh. Kepler University Linz, Austria, Sept. 2000
  7. Krammer. B., M.S. Muller, and M.M. Resch, 'MPI Application Development Using the Analysis Tool MARMOT,' 4th Int'l Conf. on Computational Science, Lecture Notes in Computer Science, 3038:464-471, Springer-Verlag, june 2004
  8. Kranzlmuller, D., and M. Schulz, 'Notes on Nondeterminism in Message Passing Programs,' 9th European PVM/MPI Users' Group Conf., Lecture Notes in Computer Science, 2474: 357-367, Springer- Verlag, Sept. 2002
  9. Netzer, R. H. B., and B. P. Miller, 'Optimal Tracing and Replay for Debugging MessagePassing Parallel Programs,' Supercomputing, pp, 502-511, IEEE/ACM, Nov. 1992 https://doi.org/10.1109/SUPERC.1992.236654
  10. Tai, K. C. 'Reachability Testing of Asynchronous Message-Passing Programs.' Int'l. Symp. on Software Engineering for Parallel and Distributed Systems, IEEE. pp. 50-61, IEEE. May 1997 https://doi.org/10.1109/PDSE.1997.596826
  11. Cypher, R., and E. Leu, 'Efficient Race Detection for Message-Passing Programs with Nonblocking Sends and Receives,' 7th IEEE Symp. on Parallel and Distributed Processing, pp. 534-541, IEEE, San Antonio, Texas, 1995 https://doi.org/10.1109/SPDP.1995.530730
  12. Tai, K. C. 'Race Analysis of Traces of Asynchronous Message-Passing Programs,' Int'l. Conf. Distributed Computing Systems (ICDCS), pp. 261-268, IEEE, May 1997 https://doi.org/10.1109/ICDCS.1997.598047
  13. Damodaran-Kamal, S. K., and J. M. Francioni, 'Nondeterminacy: Testing and Debugging in Message Passing Parallel Programs,' ACM/ONR Workshop on Parallel and Distributed Debugging, Sigplan Notices, 28(12): 118-128, ACM, Dec. 1993 https://doi.org/10.1145/174266.166789
  14. Netzer, R. H. B., T. W. Brennan, and K. D. Suresh, 'Debugging Race Conditions in Message-Passing Programs,' SIGMETRICS Symp. on Parallel and Distributed Tools (SPDT), ACM, May 1996 https://doi.org/10.1145/238020.238033
  15. Gropp, W. and E. Lusk, User's Guide for Mpich; A Portable Implementation of MPI, TR-ANL-96/6, Argonne National Laboratory, 1996
  16. Gropp, W. and E. L. Lusk, 'Reproducible Measurements of MPI Performance Characteristics,' 6th European PVM/MPI Users' Group Conf., Barcelona, Spain, Lecture Notes in Computer Science, 1697: 11-18, Springer-Verlag, Sept. 1999
  17. Lamport, L., 'Time, Clocks, and the Ordering of Events in a Distributed System,' Communications of the ACM. 21(7): 558-565. ACM. July 1978 https://doi.org/10.1145/359545.359563
  18. Fidge, C. J., 'Partial Orders for Parallel Debugging,' SIGPLAN/SIGOPS Workshop on Parallel and Distributed Debugging, pp. 183-194, ACM, May 1988 https://doi.org/10.1145/68210.69233
  19. Mattern, F., 'Virtual Time and Global States of Distributed Systems,' Parallel and Distributed Algorithms, pp. 215-226, Elsevier Science, North holland, 1989
  20. Claudio, AP., J.D. Cunha, and M.B. Carmo, 'Monitoring and Debugging Message Passing Applications with MPVisualizer,' 8th Euromicro Workshop on Parallel and Distributed Processing, pp.376-382, IEEE, Jan. 2000 https://doi.org/10.1109/EMPDP.2000.823433
  21. Kranzlmuller, D., C. Schaubschlager, and J. Volkert, 'Brief Overview of the MAD Debugging Activities,' 4th International Workshop on Automated Debugging (AADEBUG 2000), Aug. 2000