Browse > Article

Enhancing Dependability of Systems by Exploiting Storage Class Memory  

Kim, Hyo-Jeen (홍익대학교 컴퓨터공학과)
Noh, Sam-H. (홍익대학교 정보컴퓨터공학과)
Abstract
In this paper, we adopt Storage Class Memory, which is next-generation non-volatile RAM technology, as part of main memory parallel to DRAM, and exploit the SCM+DRAM main memory system from the dependability perspective. Our system provides instant system on/off without bootstrapping, dynamic selection of process persistence or non-persistence, and fast recovery from power and/or software failure. The advantages of our system are that it does not cause the problems of checkpointing, i.e., heavy overhead and recovery delay. Furthermore, as the system enables full application transparency, our system is easily applicable to real-world environments. As proof of the concept, we implemented a system based on a commodity Linux kernel 2.6.21 operating system. We verify that the persistence enabled processes continue to execute instantly at system off-on without any state and/or data loss. Therefore, we conclude that our system can improve availability and reliability.
Keywords
Storage Class Memory; next-generation non-volatile RAM; dependability; process persistence; failure recovery; instant system on/off;
Citations & Related Records
연도 인용수 순위
  • Reference
1 D. E. Lowell and P. M. Chen, "Discount Checking: Transparent, Low-Overhead Recovery for General Applications," Technical Report CSE-TR-410-99, University of Michigan, December 1998.
2 G. Bronevetsky, D. Marques, K. Pingali, P. Szwed, and M. Schulz, "Application-level Checkpointing for Shared Memory Programs," In Proceedings of the ACM ASPLOS, pp.235-247, 2004.
3 O. Laadan and J. Nieh, "Transparent Checkpoint- Restart of Multiple Processes on Commodity Operating Systems," In Proceedings of the USENIX Annual Technical Conference, pp.323-336, 2007.
4 M. Baker and M. Sullivan, "The Recovery Box: Using Fast Recovery to Provide High Availability in the UNIX Environment," In Proceedings of the USENIX Summer Conference, pp.31-43, 1992.
5 G. Candea, S. Kawamoto, Y. Fujiki, G. Friedman, and A. Fox, "Microreboot - A Technique for Cheap Recovery," In Proceedings of the USENIX OSDI, pp.31-44, 2004.
6 Y. J. Moon, I. H. Doh, J. Park, and S. H. Noh "Development of an Instant On System Using Storage Class Memory," In Proceedings of the KIISE Korea Computer Congress, vol.36, no.1(A), pp.336-337, 2009 (in Korean).
7 H. Kim, E. Kim J. Choi, D. Lee, and S. H. Noh, "Design and Implementation of Selective Process Persistence by Exploiting Storage Class Memory," In Proceedings of the KIISE Korea Computer Congress 2009, vol.36, no.1(A), pp.338-343, 2009 (in Korean).
8 G. W. Burr, B. N. Kurdi, J. C. Scott, C. H. Lam, K. Gopalakrishnan, and R. S. Shenoy, "Overview of Candidate Device Technologies for Storage- Class Memory," IBM Journal of Research and Development, vol.52, no.4, pp.449-464, 2008.
9 R. F. Freitas and W. W. Wilcke, "Storage-Class Memory: the Next Storage System Technology," IBM Journal of Research and Development, vol. 52, no.4, pp.439–447, 2008.
10 R. F. Freitas, W. W. Wilcke, B. Kurdi, and G. Burr, "Storage Class Memory, Technology and Uses," Tutorial In USENIX FAST, 2009.
11 B. Lee, E. Ipek, O. Mutlu, and D. Burger, "Architecting Phase Change Memory as a Scalable DRAM Alternative," In Proceedings of the ACM ISCA, pp.2-13, 2009.
12 Y. M. Wang, Y. Huang, K. P. Vo, P. Y. Chung, and R. Kintala, "Checkpointing and Its Applications," In Proceedings of the IEEE Fault Tolerant Computing Symposium, pp.22-31, 1995.
13 P. Zhou, B. Zhao, J. Yang, and Y. Zhang, "A Durable and Energy Efficient Main Memory Using Phase Change Memory Technology," In Proceedings of the ACM ISCA, pp.14-23, 2009.
14 M. K. Qureshi, V. Srinivasan, and J. A. Rivers, "Scalable High Performance Main Memory System Using Phase-Change Memory Technology," In Proceedings of the ACM ISCA, pp.24-33, 2009.
15 J. C. Mogul, E. Argollo, M. Shah, and P. Faraboschi, "Operating System Support for NVM+ DRAM Hybrid Main Memory," In Proceedings of the USENIX Workshop on Hot Topics in Operating Systems, 2009.
16 J. S. Shapiro and N. Hardy, "EROS: A Principle- Driven Operating System from the Ground Up," IEEE Software, vol.19, no.1, pp.26-33, 2002.   DOI   ScienceOn
17 E. N. Elnozahy, D. B. Johnson, and W. Zwaenepoel, "The Performance of Consistent Checkpointing," In Proceedings of the Symposium on Reliable Distributed Systems, pp.39-47, 1992.
18 K. Li, J. F. Naughton, and J. S. Plank, "Low- Latency, Concurrent Checkpointing for Parallel Programs," IEEE Transactions on Parallel and Distributed Systems, vol.5, no.8, pp.874-879, 1994.   DOI   ScienceOn