DOI QR코드

DOI QR Code

Comparative Study of Evaluating the Trustworthiness of Data Based on Data Provenance

  • Gurjar, Kuldeep (Dept. of Computer Science, Kangwon National University) ;
  • Moon, Yang-Sae (Dept. of Computer Science, Kangwon National University)
  • Received : 2014.12.03
  • Accepted : 2016.01.19
  • Published : 2016.06.30

Abstract

Due to the proliferation of data being exchanged and the increase of dependency on this data for critical decision-making, it has become imperative to ensure the trustworthiness of the data at the receiving end in order to obtain reliable results. Data provenance, the derivation history of data, is a useful tool for evaluating the trustworthiness of data. Various frameworks have been proposed to evaluate the trustworthiness of data based on data provenance. In this paper, we briefly review a history of these frameworks for evaluating the trustworthiness of data and present an overview of some prominent state-of-the-art evaluation frameworks. Moreover, we provide a comparative analysis of two key frameworks by evaluating various aspects in an executional environment. Our analysis points to various open research issues and provides an understanding of the functionalities of the frameworks that are used to evaluate the trustworthiness of data.

Keywords

References

  1. D. P. Lanter, "Design of a lineage-based meta-data base for GIS," Cartography and Geographic Information System, vol. 18, no. 4, pp. 255-261, 1990.
  2. T. R. Smith, J. Su, D. Agrawal, and A. El Abbadi, "Database and modeling systems for the earth sciences," IEEE Special Issue on Databases, vol. 16, no. 1, pp. 33-37, 1993.
  3. P. D. Eagan and S. J. Ventura, "Enhancing value of environmental data: data lineage reporting," Journal of Environmental Engineering, vol. 119, no. 1, pp. 5-16, 1993. https://doi.org/10.1061/(ASCE)0733-9372(1993)119:1(5)
  4. A. Woodruff and M. Stonebraker, "Supporting fine-grained data linage in database visualization environment," in Proceedings of the 13th Conference on Data Engineering, Birmingham, UK, 1997, pp. 91-102.
  5. P. Buneman, S. Khanna, and W. C. Tan, "Why and where: a characterization of data provenance," in Proceedings of the 8th International Conference on Database Theory (ICDT), London, UK, 2001, pp. 316-330.
  6. P. S. Wadhwa and P. Kamalapur, "Customized metadata solution for a data warehouse: a success story," White paper, Wipro Technologies, Bangalore, India, 2003.
  7. L. Moreau and P. Missier, "PROV-DM: the PROV data model," World Wide Web Consortium, Technical Report, 2013.
  8. Y. L. Simmhan, B. Plale, and D. Gannon, "A survey of data provenance in e-science," ACM SIGMOD Record, vol. 34, no. 3, pp. 31-36, 2005. https://doi.org/10.1145/1084805.1084812
  9. R. Bose and J. Frew, "Lineage retrieval for scientific data processing: a survey," ACM Computing Surveys, vol. 37, no. 1, pp. 1-28, 2005. https://doi.org/10.1145/1057977.1057978
  10. N. Elkin, "How america searches: health and wellness," Opinion Research Corporation, Survey Report, 2008.
  11. H. T. Moturu and S. Liu, "Quantifying the trustworthiness of social media content," Distributed and Parallel Databases, vol. 29, no. 3, pp. 239-260, 2011. https://doi.org/10.1007/s10619-010-7077-0
  12. C. Dai, H. S. Lim, E. Bertino, and Y. S. Moon, "Assessing the trustworthiness of location data based on provenance" in Proceedings of the 17th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, Seattle, WA, 2009, pp. 276-285.
  13. H. S. Lim, Y. S. Moon, and E. Bertino, "Provenance-based trustworthiness assessment in sensor networks," in Proceedings of the 7th International Workshop on Data Management for Sensor Networks, Singapore, 2010, pp. 2-7.
  14. M. Kuehnhausen, V. S. Frost and, G. J. Minden, "Framework for assessing the trustworthiness of cloud resources," Proceedings of the IEEE International Conference on Cognitive Methods in Situation Awareness and Decision Support (CogSIMA), New Orleans, LA, 2012, pp. 142-145.
  15. K. J. Biba, "Integrity considerations for secure computer systems," MITRE Corp., Bedford, MA, Report No. TR- 3153, 1977.
  16. D. Clark and D. Wilson, "A comparison of commercial and military computer security policies," in Proceedings of the IEEE Symposium on Security and Privacy, Oakland, CA, 1987.
  17. M. Bellare and P. Rogaway, "The exact security of digital signals: how to sign with RSA and Rabin," in Proceedings of the International Conference on Theory and Application of Cryptographic Techniques, Saragossa, Spain, 1996, pp. 399-416.
  18. O. Goldreich, "The foundation of modern cryptography," in Modern Cryptography, Probabilistic Proofs and Pseudorandomness. Heidelberg: Springer, 1999, pp. 1-37.
  19. J. M. Juran, Juran on Leadership for Quality: An Executive Handbook. New York, NY: Free Press, 1989.
  20. The Office of Management and Budget, "Federal collection of information," [Online]. Available: http://www.whitehouse.gov/omb/inforeg_infocoll.
  21. C. Batini and M. Scannapieco, Data Quality, Concepts, Methodologies and Techniques. Heidelberg: Springer, 2006.
  22. C. Batini, C. Cappiello, C. Francalanci, and A. Maurino, "Methodologies for data quality assessment and improvement," ACM Computing Surveys, vol. 41, no. 3, pp. 16-52, 2009.
  23. P. Resnick, R. Zeckhauser, E. Friedmen and K. Kuwabara, "Reputation systems," Communications of the ACM, vol. 43, no. 12, pp. 45-48, 2000.
  24. P. Resnick and R. Zeckhauser, "Trust among strangers in internet transactions: empirical analysis of eBay's reputation system," in The Economics of the Internet and E-commerce. Amsterdam: Elsevier Science, 2002, pp. 127-157.
  25. R. Levien, "Attack Resistant Trust Metrics," Ph.D. dissertation, University of California, Berkeley, CA, 2004.
  26. S. Abitebourl, P. Kanellakis, and G. Grahne, "On the representation and querying of sets of possible words," in Proceedings of the ACM SIGMOD International Conference on Management of Data, San Francisco, CA, 1987, pp. 34-48.
  27. M. R. Garey and D. S. Johnson, Computers and Intractability: A Guide to the Theory of NP-Completeness. San Francisco, CA: W. H. Freeman, 1979.
  28. D. Barbara, H. Gracia-Molina, and D. Porter, "The management of probabilistic data," IEEE Transactions on Knowledge and Data Engineering, vol. 4, no. 5, pp. 487-502, 1992. https://doi.org/10.1109/69.166990
  29. N. Fuhr, "A probabilistic framework for vague queries and imprecise information in databases," in Proceedings of the 16th International Conference on Very Large Databases, Brisbane, Australia, 1997, pp. 696-707.
  30. E. D. Ragan, A. Endert, J. Sanval, and J. Chen, "Characterizing provenance in visualization and data analysis: an organizational framework of provenance types and purposes," IEEE Transactions on Visualization & Computer Graphics, vol. 22, no. 1, pp. 31-40, 2016. https://doi.org/10.1109/TVCG.2015.2467551
  31. J. Widom, "Trio: a system for integrated management of data, accuracy, and lineage," in Proceedings of the 2nd International Conference on Innovative Data Systems Research, Asilomar, CA, 2005, pp. 262-276.
  32. N. N. Vijayakumar and B. Plale, "Towards low overhead provenance tracking in near real time stream filtering," in Proceedings of the International Provenance and Annotation Workshop, Chicago, IL, 2006, pp. 46-54.
  33. A.D. Sarma, M. Theobald, and J. Widom, "Exploiting lineage for confidence computation in uncertain and probabilistic databases," in Proceedings of the 14th International Conference on Data Engineering, Cancun, Mexico, 2008, pp. 1023-1032.
  34. X. Yin, J. Han, and P. S. Yu, "Truth discovery with multiple conflicting information providers on the web," IEEE Transactions on Knowledge and Data Engineering, vol. 20, no. 6, pp. 796-808, 2008. https://doi.org/10.1109/TKDE.2007.190745
  35. M. Gupta, Y. Sun, and J. Han, "Trust analysis with clustering," in Proceedings of the 20th International Conference Companion on World Wide Web, Hyderabad, India, 2011, pp. 53-54.
  36. E. Bertino and H. S. Lim, "Assuring data trustworthiness-concepts and research challenges," in Secure Data Management. Heidelberg: Springer, 2010, pp. 1-12.
  37. M. Blount, "Century: automated aspects of patient care," in Proceedings of the 13th IEEE International Conference on Embedded and Real-Time Computing Systems and Applications, Daegu, Korea, 2007, pp. 504-509.
  38. A. Misra, M. Blount, A. Kementsietsidis, D. M. Sow, and M. Wang, "Advances and challenges for scalable provenance in stream processing systems," in Proceedings of the 2nd International Provenance and Annotation Workshop, Salt lake City, UT, 2008, pp. 253-265.
  39. J. E. G. Malaverri, A. Santanche, and C. B. Medeiros, "A provenance based approach to evaluate data quality in eScience," International Journal of Metadata, Semantics and Ontologies, vol. 9, no. 1, pp. 15-28, 2014. https://doi.org/10.1504/IJMSO.2014.059127
  40. Y. W. Cheah and B. Plale, "Provenance quality assessment methodology and framework," Journal of Data and Information Quality, vol. 5, no. 3, article no. 9, 2015.
  41. E. Bertino, C. Dai, H. S. Lim, and D. Lin, "High-assurance integrity techniques for databases," in Proceedings of the 25th British National Conference on Databases, Cardiff, UK, 2008, pp. 244-256.
  42. C. Dai, D. Lin, E. Bertino, and M. Kantarcioglu, "An approach to evaluate data trustworthiness based on data provenance," in Secure Data Management. Heidelberg: Springer, 2008, pp. 82-98.
  43. Princeton Survey Research Associates International, Leap of Faith: Using the Internet Despite the Dangers. Yonkers, NY: Consumer Reports WebWatch, 2005.
  44. C. Batini, C. Cappiello, C. Francalanci, and A. Maurino, "Methodologies for data quality assessment and improvement," ACM Computing Surveys, vol. 41, no. 3, pp. 16-52, 2009.
  45. X. L. Dong, L. Berti-Equille, and D. Srivastava, "Truth discovery and copying detection in a dynamic world," Proceedings of the VLDB Endowment, vol. 2, no. 1, pp. 562-673, 2009.
  46. J. Pasternack and D. Roth, "Knowing what to believe (when you already know something)," in Proceedings of the 23rd International Conference on Computational Linguistics, Beijing, China, 2010, pp. 877-885.

Cited by

  1. Design and implementation of a Bloom filter-based data deduplication algorithm for efficient data management pp.1868-5145, 2018, https://doi.org/10.1007/s12652-018-0893-1