DOI QR코드

DOI QR Code

A Data-driven Approach for Computational Simulation: Trend, Requirement and Technology

  • Received : 2017.08.29
  • Accepted : 2017.11.27
  • Published : 2018.02.28

Abstract

With the emergence of a new paradigm called Open Science and Big Data, the need for data sharing and collaboration is also emerging in the computational science field. This paper, we analyzed data-driven research cases for computational science by field; material design, bioinformatics, high energy physics. We also studied the characteristics of the computational science data and the data management issues. To manage computational science data effectively it is required to have data quality management, increased data reliability, flexibility to support a variety of data types, and tools for analysis and linkage to the computing infrastructure. In addition, we analyzed trends of platform technology for efficient sharing and management of computational science data. The main contribution of this paper is to review the various computational science data repositories and related platform technologies to analyze the characteristics of computational science data and the problems of data management, and to present design considerations for building a future computational science data platform.

Keywords

References

  1. A. B. Nosek, et al. "Promoting an open research culture," Science, Vol. 348, No. 6242, pp. 1422-1425., 2015. http://dx.doi.org/10.1126/science.aab2374
  2. Lee, Ki Yong, et al. "Design and implementation of a data-driven simulation service system," Proceedings of the Sixth International Conference on Emerging Databases: Technologies, Applications, and Theory. ACM, 2016. http://dx.doi.org/10.1145/3007818.3007826
  3. S. R. Jeong, and G. Imran, "Semantic Computing for Big Data: Approaches, Tools and Emerging Directions (2011-2014)," KSII Transactions on Internet & Information Systems, Vol. 8, No. 6, 2014. http://dx.doi.org/10.3837/tiis.2014.06.012
  4. K. Y. Kim, "Business Intelligence and Marketing Insights in an Era of Big Data: The Q-sorting Approach," KSII Transactions on Internet & Information Systems, Vol. 8, No. 2, 2014. http://dx.doi.org/10.3837/tiis.2014.02.014
  5. M. Chung, J. Kim. "The Internet Information and Technology Research Directions based on the Fourth Industrial Revolution," KSII Transactions on Internet & Information Systems, Vol. 10, No.3, 2016. http://dx.doi.org/10.3837/tiis.2016.03.020
  6. W. Joo, and et. al. "A Trend of Data-driven Approach for Computer Simulation," ICONI2016, 2016
  7. P. Giannozzi, et al. "QUANTUM ESPRESSO: a modular and open-source software project for quantum simulations of materials," Journal of physics: Condensed matter, Vol. 21, No. 39, 2009. http://dx.doi.org/10.1007/3-540-35426-3-17
  8. J. Hafner, "Ab-initio simulations of materials using VASP: Density-functional theory and beyond," Journal of computational chemistry, Vol. 29, No. 13, 2008, pp. 2044-2078. http://dx.doi.org/10.1002/jcc.21057
  9. G. Blaha, and et. al. "WIEN2k, An Augmented Plane Wave+ Local Orbitals Program for Calculating Crystal Properties," 2001. http://www.citeulike.org/user/rcollyer/article/6205108
  10. J. Anubhav, P. Kristin, C. Gerbrand, "Research Update: The materials genome initiative: Data sharing and the impact of collaborative ab initio databases," APL Materials, Vol. 4. No. 5, 2016. http://dx.doi.org/10.1063/1.4944683
  11. A. Jain, and et al., "Commentary: The Materials Project: A materials genome approach to accelerating materials innovation," Apl Materials, Vol. 1, No. 1, 2013. https://doi.org/10.1063/1.4812323
  12. T. Ogata and Y. Masayoshi, "New stage of MatNavi, materials database at NIMS," 2012, http://mits.nims.go.jp/index_en.html
  13. NoMaD Repository, http://nomad-repository.eu.
  14. W. Kamp, and et al. "Dynameomics: a comprehensive database of protein dynamics," Structure, Vol 18. No. 4, 2010. http://dx.doi.org/10.1016/j.str.2010.01.012
  15. T. Meyer, and et al. "MoDEL (Molecular Dynamics Extended Library): a database of atomistic molecular dynamics trajectories," Structure, Vol 18, No. 11, pp. 1399-1409, 2010. http://dx.doi.org/10.1016/j.str.2010.07.013
  16. J. Westbrook, and et al. "The protein data bank: unifying the archive." Nucleic acids research, Vol. 30, No. 1, 2002, pp. 245-248. https://doi.org/10.1093/nar/30.1.245
  17. P. Andrio, and et al. "BIGNASim: a NoSQL database structure and analysis portal for nucleic acids simulation data," Nucleic acids research, Vol 44,2016, pp. 272-278. http://dx.doi.org/10.1093/nar/gkv1301
  18. C. Thibault, F. Julien, and C. Thomas, "IBIOMES: managing and sharing biomolecular simulation data in a distributed environment," Journal of chemical information and modeling, Vol 53. No. 3, 2014, pp. 726-736. http://dx.doi.org/10.1021/ci300524j
  19. A. Kumar, and et. al. "DCMS: A data analytics and management system for molecular simulation," Journal of big data, Vol. 2, No. 1, 2014. https://doi.org/10.1186/s40537-014-0009-5
  20. V. Chekanov, "HepSim: a repository with predictions for high-energy physics experiments," Advances in High Energy Physics 2015, 2015. http://dx.doi.org/10.1155/2015/136093
  21. G. Klimeck, and et. al. "nanohub. org: Advancing education and research in nanotechnology," Computing in Science & Engineering, Vol. 10, No. 5, 2008, pp. 17-23. http://dx.doi.org/10.1109/MCSE.2008.120
  22. M. McLennan, and K. Rick, "HUBzero: a platform for dissemination and collaboration in computational science and engineering," Computing in Science & Engineering, Vol. 12, No. 2, 2010. http://dx.doi.org/10.1109/MCSE.2010.41
  23. T. J. Hacker, and et. al. "The NEEShub cyberinfrastructure for earthquake engineering," Computing in Science & Engineering, Vol. 13, No. 4, 2011, pp. 67-78. http://dx.doi.org/10.1109/MCSE.2011.70
  24. G. Pizzi, and et. al., "AiiDA: Automated interactive infrastructure and database for computational science," Computational Materials Science, Vol 111, 2016. https://doi.org/10.1016/j.commatsci.2015.09.013
  25. R. Tansley, and et. al. "The DSpace institutional digital repository system: current functionality," Proceedings of the 3rd ACM/IEEE-CS joint conference on Digital libraries. IEEE Computer Society, 2003. http://dx.doi.org/10.1002/asi.10018
  26. D. Wilcox, and W. Evviv, "Supporting Digital Preservation and Access with Fedora," IFLA WLIC 2017, 2017. http://dx.doi.org/10.1007/3-540-49653-x_4
  27. K. Stapelfeldt and M. Donald, "Islandora and TEI: Current and Emerging Applications/ Approaches," Journal of the Text Encoding Initiative, Vol 5, 2013. http://dx.doi.org/10.4000/jtei.790
  28. D. Dietrich, and P. Rufus, "CKAN: apt-get for the debian of data," 26th chaos communication congress, 2009. https://ckan.org/
  29. G. King, "An Introduction to the Dataverse Network as an Infrastructure for Data Sharing," Sociological Methods and Research, Vol. 36, 2007, pp. 173-199. http://dx.doi.org/10.1177/0049124107306660
  30. C. Lagoze, and et. al. "The Open Archives Initiative Protocol for Metadata Harvesting," http://www.openarchives.org/OAI/2.0/openarchivesprotocol.htm, 2015, https://doi.org/10.1108/07378830310479776
  31. J. Allinson, F. Sebastien, and L. Stuart, "SWORD: Simple Web-service offering repository deposit." Ariadne, Vol. 54, 2008. https://doi.org/10.1045/january2012-lewis
  32. W. K. Michener, and B. J. Matthew, "Ecoinformatics: supporting ecology as a data-intensive science," Trends in ecology & evolution, Vol. 27, No. 2, 2012, pp. 85-93. http://dx.doi.org/10.1016/j.tree.2011.11.016
  33. W. Michener, and et al. "DataONE: Data Observation Network for Earth-Preserving data and enabling innovation in the biological and environmental sciences," D-Lib Magazine, Vol. 17, No. 1/2, 2011. http://dx.doi.org/10.1045/january2011-michener

Cited by

  1. EDISON‐DATA: A flexible and extensible platform for processing and analysis of computational science data vol.49, pp.10, 2018, https://doi.org/10.1002/spe.2732