DOI QR코드

DOI QR Code

Semantic Computing for Big Data: Approaches, Tools, and Emerging Directions (2011-2014)

  • Received : 2014.04.09
  • Accepted : 2014.06.09
  • Published : 2014.06.27

Abstract

The term "big data" has recently gained widespread attention in the field of information technology (IT). One of the key challenges in making use of big data lies in finding ways to uncover relevant and valuable information. The high volume, velocity, and variety of big data hinder the use of solutions that are available for smaller datasets, which involve the manual interpretation of data. Semantic computing technologies have been proposed as a means of dealing with these issues, and with the advent of linked data in recent years, have become central to mainstream semantic computing. This paper attempts to uncover the state-of-the-art semantics-based approaches and tools that can be leveraged to enrich and enhance today's big data. It presents research on the latest literature, including 61 studies from 2011 to 2014. In addition, it highlights the key challenges that semantic approaches need to address in the near future. For instance, this paper presents cutting-edge approaches to ontology engineering, ontology evolution, searching and filtering relevant information, extracting and reasoning, distributed (web-scale) reasoning, and representing big data. It also makes recommendations that may encourage researchers to more deeply explore the applications of semantic technology, which could improve the processing of big data. The findings of this study contribute to the existing body of basic knowledge on semantics and computational issues related to big data, and may trigger further research on the field. Our analysis shows that there is a need to put more effort into proposing new approaches, and that tools must be created that support researchers and practitioners in realizing the true power of semantic computing and solving the crucial issues of big data.

Keywords

1. Introduction

The term “big data” was coined to represent the large amount and many types of digital data that we use today, including documents, images, videos, audio, and websites. Semantics-based approaches are considered useful means of dealing with very large-scale data such as big data. In order to explore this topic, it is first necessary to more clearly describe the concept of big data.

1.1 Big Data

Although the term “big data” has not yet been defined by IEEE and it is not included in the online IEEE dictionary [8]. However, a number of definitions are presented in other popular sources, such as the following:

Generally, big data has three main characteristics: volume, velocity, and variety. However, [1] adds one more characteristic: value. We agree with the use of the value characteristic, and include it in our body of relevant and valuable information.

It has become obvious that new capabilities and technologies are needed to capture and analyze big data. The McKinsey Global Institute estimates that data volume is growing by 40% per year, and will grow to 44 times its initial size between 2009 and 2020. According to [1] the volume of data is not the only characteristic that matters. In fact, there are four key characteristics that define big data (Fig. 1):

Fig. 1.Four key characteristics of big data

Oracle classifies value as an essential characteristic for processing big data, which we partially agree with. In our opinion, value should not just be considered to involve economic value, but also include the meaningful information hidden in big data.

1.2 Significance of Big Data

The significance of big data is clear. The following statistics show the predicted data growth trends, in terms of volume.

Fig. 2 (a).Universal data growth

Data growth - IDC believes that the digital universe will grow by 44 times from 2009 to 2020. IBM estimates that data and content is growing at a compound annual growth rate of 64% a year or more (1 zettabyte = 1 trillion gigabytes). Source: IDC Digital Universe Study.

Fig. 2 (b).Data growth in China

Fig. 2(c).Data growth in India

Fig. 2 (d).Data growth in US

Fig. 2 (e).Data growth in Western Europe

 

2. Significance of Semantic Computing

The term “semantic web” has been around for more than a decade. Its origins trace back to a 2001 Scientific American article by Tim Berners-Lee, who is known as the inventor of the world wide web, and co-authors James Hendler and Ora Lassila [25]. The article presents a futuristic view of the web, where data is linked in a meaningful fashion. To realize this concept, two main streams have emerged: ontology and the reasoning and filtering of data. The three main semantic web standards that currently exist are Resource Description Framework (RDF), SPARQL (SPARQL Protocol and RDF Query Language), and OWL (Web Ontology Language). The goal of these standards is to present end-users with the information that they want at a particular time. The World Wide Web Consortium (W3C), which is the international standards organization leading the semantic web effort, has carried out many case studies that show how organizations are currently using semantic technologies in a variety of areas [24]. The W3C envisions the semantic web as an extension rather than a replacement of the current web. As discussed above, one of the main streams of effort in the semantic web concerns ontology, and includes ontology engineering, evolution, matching, mapping, and merging. The other main stream, which involves filtering and reasoning, includes the areas of content filtering, collaborative filtering, hybrid filtering, and reasoning.

 

3. Existing Challenges in Semantic Approaches to Big Data

The challenges involved in dealing with big data include access [9], capture, storage [3], search, sharing, transfer, analysis [4], and visualization, with respect to volume, velocity, variety, and value. However, the focus of our research efforts is on the following three areas:

Fig. 3.A depiction of linked data (Source: [11])

 

4. Existing Approaches to the Semantic Computing of Big Data

A number of approaches have been proposed for semantically dealing with the issues of big data. These proposed approaches work at different levels, and include areas such as ontology engineering, ontology evolution, reasoning, matching and representing big data. Some details of these approaches are as follows:

The following tables (Table 1, Table 2, Table 3, and Table 4) present summaries of semantic-based approaches that attempt to deal with big data. Table 1 presents our findings on semantics filtering and reasoning approaches proposed to deal with big data.

Table 1.Semantics filtering and reasoning approaches for big data

Table 2.Ontology evolution approaches for big data

Table 3.Ontology-based approaches for representing big data

Table 4.Ontology engineering approaches for big data

Table 2, below, presents our findings on ontology evolution approaches that have proposed for dealing with big data. It shows that only a few approaches have been proposed for ontology evolution in big data.

Table 3, below, presents our findings on ontology-based approaches for representing big data. There are only two proposed approaches in this area.

Table 4 presents our findings on ontology engineering approaches to dealing with big data. It shows that there are numerous proposed approaches in this area.

 

5. Existing Semantics-Based Tools for Big Data

Table 5 presents a summary of semantics-based tools for dealing with big data. Our research only discovered a limited number of such tools. A number of non-commercial tools are still ongoing projects, and many focus on specific domains. For example, SINA [29] focuses on the medical domain.

Table 5.Semantic/ontology-based tools for big data

 

6. Emerging Directions and Future Challenges for Semantics-Based Computing on Big Data

 

7. Conclusion

In this paper, we have attempted to answer questions on areas such as growth trends in data, how semantic computing can help to process huge amounts of data and uncover valuable information within it, what semantic-based approaches and tools are available for processing big data, and what the future looks like, in terms of the level of efficiency required to semantically process big data. We have determined that a semantics-based strategy is a suitable approach to dealing with big data issues. It is worth mentioning that this study examined the latest literature available, from 2011 to 2014. A total of 61 research papers were studied to explore the state of the art in this area. However, this review found that new solutions are needed to extract value, given the volume, variety, and velocity of big data. The use of semantic computing approaches to big data could enable end-users to consume information that is relevant to them. This study also offers recommendations that may encourage researchers to more deeply explore the ways in which semantic technology can improve the processing of big data. Our research may pave the way for the development of better basic knowledge on the semantic and computational issues of big data, and can act as a foundation for further studies within the field.

References

  1. Dijcks, Jean Pierre. "Oracle: Big data for the enterprise," Oracle White Paper, 2012.
  2. Zikopoulos, Paul, and Chris Eaton. Understanding big data: Analytics for enterprise class hadoop and streaming data. McGraw-Hill Osborne Media, 2011.
  3. Red Hat Enterprise Linux: THE FIVE MUST-HAVES OF BIG DATA STORAGE, White Paper, 2013.
  4. Ashley Vance, "Start-Up Goes After Big Data With Hadoop Helper," New York Times Blog, 2010.
  5. Gartner Research: https://www.gartner.com/it-glossary/big-data/ Accessed on 03 April 2014.
  6. http://www-01.ibm.com/software/data/bigdata/ , accessed on 4 April 2014.
  7. http://hadoop.apache.org/ , accessed on 4 April 2014.
  8. http://dictionary.ieee.org/ , accessed on 4 April 2014.
  9. D. Calvanese, Martin Giese, Peter Haase, Ian Horrocks, T. Hubauer,Y. Ioannidis, Ernesto Jimenez-Ruiz, E. Kharlamov, H. Kllapi, J. Kluwer,Manolis Koubarakis, S. Lamparter, R. Moller, C. Neuenstadt, T. Nordtveit, O. Ozcep, M. Rodriguez-Muro, M. Roshchin, F. Savo, Michael Schmidt,Ahmet Soylu, Arild Waaler, and Dmitriy Zheleznyakov, " Optique: OBDA Solution for Big Data." In The Semantic Web: ESWC 2013 Satellite Events, pp. 293-295, 2013.
  10. http://googleblog.blogspot.co.uk/2012/05/introducing-knowledge-graph-things-not.html, accessed on 4 April 2014.
  11. Richard Cyganiak and Anja Jentzsch. "Linking Open Data cloud diagram," http://lod-cloud.net, accessed on 4 April 2014.
  12. Horrocks, Ian, Thomas Hubauer, Ernesto Jimenez-Ruiz, Evgeny Kharlamov, Manolis Koubarakis, Ralf Moller, Konstantina Bereta, Christian Neuenstadt, Ozgur Ozcep, Mikhail Roshchin, Panayiotis Smeros and Dmitriy Zheleznyakov . "Addressing Streaming and Historical Data in OBDA Systems: Optique's Approach (Statement of Interest)." In Proc. of Workshop on Knowledge Discovery and Data Mining Meets Linked Open Data (Know@ LOD), 2013.
  13. Kharlamov, Evgeny, Ernesto Jimenez-Ruiz, Dmitriy Zheleznyakov, Dimitris Bilidas, Martin Giese, Peter Haase, Ian Horrocks, Herald Kllapi, Manolis Koubarakis, Ozgur Ozcep, Mariano Rodriguez-Muro, Riccardo Rosati, Michael Schmidt, Rudolf Schlatte, Ahmet Soylu, and Arild Waaler. "Optique: Towards OBDA Systems for Industry," The Semantic Web: ESWC 2013 Satellite Events, pp. 125-140, 2013.
  14. Lambrix, Patrick, and Rajaram Kaliyaperumal. "A session-based approach for aligning large ontologies," The Semantic Web: Semantics and Big Data, pp. 46-60, 2013.
  15. Huang, Bert, Angelika Kimmig, Lise Getoor, and Jennifer Golbeck. "A flexible framework for probabilistic models of social trust," Social Computing, Behavioral-Cultural Modeling and Prediction, pp. 265-273, 2013.
  16. http://www.globalgraphics.com/technology/knowledge-management/, accessed on 4 April 2014.
  17. John Gantz, David Reinsel, Richard Lee, The digital universe in 2020: Big data, Bigger Digital Shadows and the Biggest Growth in Far East China, IDC Country Brief, Sponsored by EMC, 2013.
  18. John Gantz, David Reinsel, Marshall Amaldas, The digital universe in 2020: Big data, Bigger Digital Shadows and the Biggest Growth in Far East India, IDC Country Brief, Sponsored by EMC, 2013.
  19. John Gantz, David Reinsel, "The digital universe in 2020: Big data, Bigger Digital Shadows and the Biggest Growth in USA," IDC Country Brief, Sponsored by EMC, 2013.
  20. John Gantz, David Reinsel and Carla Arend, "The digital universe in 2020: Big data, Bigger Digital Shadows and the Biggest Growth in Europe," IDC Country Brief, Sponsored by EMC, 2013.
  21. Kouji Kozaki, "Ontology engineering for big data," in Proc. of Ontology and Semantic Web for Big Data Workshop in the ICSEC 2013, September 4, 2013, Bangkok, Thailand.
  22. http://www.ontology.com/, accessed on 6 April 2014.
  23. www.apple.com/ios/siri, accessed on 4 April 2014.
  24. http://www.w3.org/2001/sw/sweo/public/UseCases/, accessed on 4 April 2014.
  25. Berners-Lee, Tim, James Hendler, and Ora Lassila. "The Semantic Web. A new form of Web content that is meaningful to computers will unleash a revolution of new possibilities," Scientific American,vol. 284, no. 5, pp. 1-5, 2001.
  26. Schatzle, Alexander, Martin Przyjaciel-Zablocki, and Georg Lausen. "PigSPARQL: Mapping sparql to pig latin," in Proc. of Proceedings of the International Workshop on Semantic Web Information Management, p. 4. ACM, 2011.
  27. Urbani, Jacopo. "On web-scale reasoning," PhD diss., Ph. D. dissertation, Comput. Sci. Dept., Vrije Universiteit, Amsterdam, Netherlands, 2013.
  28. http://treo.deri.ie, accessed on 3 April, 2014.
  29. http://sina-linkeddata.aksw.org/, accessed on 5 April 2014.
  30. Szekely, Pedro, Craig A. Knoblock, Fengyu Yang, Xuming Zhu, Eleanor E. Fink, Rachel Allen, and Georgina Goodlander. "Connecting the Smithsonian American Art Museum to the Linked Data Cloud," The Semantic Web: Semantics and Big Data, pp. 593-607, 2013.
  31. Konrath, Mathias, Thomas Gottron, Steffen Staab, and Ansgar Scherp. "Schemex-efficient construction of a data catalogue by stream-based indexing of linked data," Web Semantics: Science, Services and Agents on the World Wide Web, vol.16, pp.52-58, 2012. https://doi.org/10.1016/j.websem.2012.06.002
  32. Gottron, Thomas, Ansgar Scherp, Bastian Krayer, and Arne Peters. "LODatio: A Schema-Based Retrieval System for Linked Open Data at Web-Scale," The Semantic Web: ESWC 2013 Satellite Events, pp. 142-146, 2013.
  33. https://developers.facebook.com/docs/opengraph accessed on 3 April, 2014.
  34. Soylu, Ahmet, Martin Giese, Ernesto Jimenez-Ruiz, Evgeny Kharlamov, Dmitry Zheleznyakov, and Ian Horrocks. "OptiqueVQS: towards an ontology-based visual query system for big data," In Proc. of Proceedings of the Fifth International Conference on Management of Emergent Digital EcoSystems, ACM, pp. 119-126. 2013.
  35. Hoppe, Anett, C. Nicolle, and A. Roxin. "Automatic ontology-based user profile learning from heterogeneous web resources in a big data context," in Proc. of Proceedings of the VLDB Endowment , vol. 6, no. 12, pp. 1428-1433, 2013.
  36. Haase, Peter, Ian Horrocks, Dag Hovland, Thomas Hubauer, Ernesto Jimenez-Ruiz, Evgeny Kharlamov, Johan Kluwer1 Christoph Pinkel et al. "Optique System: Towards Ontology and Mapping Management in OBDA Solutions," in Proc. of Second International Workshop on Debugging Ontologies and Ontology Mappings-WoDOOM13, p. 21. 2013.
  37. Janowicz, Krzysztof. "Observation-Driven Geo-Ontology Engineering," Transactions in GIS, vol. 16, no. 3, pp. 351-374, 2012. https://doi.org/10.1111/j.1467-9671.2012.01342.x
  38. Jeansoulin, Robert. "Big data: how geo-information helped shape the future of data engineering," AutoCarto Six Retrospective, pp.190-201, 2013.
  39. Bizer, Christian, Peter Boncz, Michael L. Brodie, and Orri Erling. "The meaningful use of big data: four perspectives--four challenges," ACM SIGMOD Record, vol.40, no. 4, pp.56-60, 2012. https://doi.org/10.1145/2094114.2094129
  40. Berkovich, Simon, and Duoduo Liao, "On clusterization of big data streams," in Proc. of Proceedings of the 3rd International Conference on Computing for Geospatial Research and Applications, p. 26, ACM, 2012. Research Online, vol.24, no. 1, 2014.
  41. McPherson, Jeffrey D., Ian R. Grosse, Sundar Krishnamurty, Jack C. Wileden, Elizabeth R. Dumont, and Michael A. Berthaume, "Integrating Biological and Engineering Ontologies," in Proc. of ASME 2013 International Design Engineering Technical Conferences and Computers and Information in Engineering Conference, pp. V02BT02A022-V02BT02A022. American Society of Mechanical Engineers, 2013.
  42. [42] Rysavy, Steven J., Dennis Bromley, and Valerie Daggett. "DIVE: A Graph-Based Visual-Analytics Framework for Big Data," Computer Graphics and Applications, IEEE , vol.34, no. 2, pp. 26-37, 2014.
  43. Shiri, Ali, "Linked Data Meets Big Data: A Knowledge Organization Systems Perspective," Advances in Classification Research Online, vol.24, no. 1, 2014.
  44. Margara, Alessandro, Jacopo Urbani, Frank van Harmelen, and Henri Bal. "Streaming the Web: Reasoning over Dynamic Data," Web Semantics: Science, Services and Agents on the World Wide Web, vol.25, pp. 24-44, 2014. https://doi.org/10.1016/j.websem.2014.02.001
  45. Nguyen, Trinh Hoang, Vimala Nunavath, and Andreas Prinz, "Big Data Metadata Management in Smart Grids," In Big Data and Internet of Things: A Roadmap for Smart Environments, pp. 189-214, 2014.
  46. Chen, Hsinchun, Roger HL Chiang, and Veda C. Storey, "Business Intelligence and Analytics: From Big Data to Big Impact," MIS Quarterly, vol. 36, no. 4, 2012.
  47. Papageorgiou, Apostolos, Mischa Schmidt, Jaeseung Song, and Nobuharu Kami. "Smart M2M Data Filtering Using Domain-Specific Thresholds in Domain-Agnostic Platforms," In Big Data (BigData Congress), 2013 IEEE International Congress on, pp. 286-293. IEEE, 2013.
  48. Bennett, Mike, "The financial industry business ontology: Best practice for big data," Journal of Banking Regulation, vol. 14, no. 3, pp. 255-268, 2013. https://doi.org/10.1057/jbr.2013.13
  49. Lee, Chun-Hsiang, David Birch, Chao Wu, Dilshan Silva, Orestis Tsinalis, Yang Li, Shulin Yan, Moustafa Ghanem, and Yike Guo, "Building a generic platform for big sensor data application," in Proc. of Big Data, 2013 IEEE International Conference on, pp. 94-102. IEEE, 2013.
  50. Heath, Tom, and Christian Bizer, "Linked data: Evolving the web into a global data space," Synthesis lectures on the semantic web: theory and technology, vol.1, no. 1, pp.1-136, 2011. https://doi.org/10.2200/S00334ED1V01Y201102WBE001
  51. Whetzel, Patricia L., Natalya F. Noy, Nigam H. Shah, Paul R. Alexander, Csongor Nyulas, Tania Tudorache, and Mark A. Musen, "BioPortal: enhanced functionality via new Web services from the National Center for Biomedical Ontology to access and use ontologies in software applications," Nucleic acids research vol.39, no. suppl 2, pp. W541-W545, 2011. https://doi.org/10.1093/nar/gkr469
  52. Shvaiko, Pavel, and Jerome Euzenat, "Ontology matching: state of the art and future challenges," Knowledge and Data Engineering, IEEE Transactions on vol.25, no.1, pp.158-176, 2013.
  53. Kitchin, Rob, "Big data and human geography Opportunities, challenges and risks," Dialogues in Human Geography, vol.3, no. 3, pp. 262-267, 2013. https://doi.org/10.1177/2043820613513388
  54. [54] Calvanese, Diego, Ian Horrocks, Ernesto Jimenez-Ruiz, Evgeny Kharlamov, Michael Meier, Mariano Rodriguez-Muro, and Dmitriy Zheleznyakov, "On Rewriting and Answering Queries in OBDA Systems for Big Data (Short Paper)," in Proc. of OWL Experiences and Directions Workshop (OWLED). 2013.
  55. http://www.oracle.com/technetwork/database/bigdata-appliance/overview/index.html, accessed on 4 April 2014.
  56. http://www.emc.com/leadership/digital-universe/index.htm, accessed on 5 April 2014.
  57. Karnstedt, Marcel, Kai-Uwe Sattler, and Manfred Hauswirth, "Scalable distributed indexing and query processing over Linked Data," Web Semantics: Science, Services and Agents on the World Wide Web, vol. 10, pp. 3-32, 2012. https://doi.org/10.1016/j.websem.2011.11.010
  58. Moro, Andrea, Hong Li, Sebastian Krause, Feiyu Xu, Roberto Navigli, and Hans Uszkoreit. "Semantic rule filtering for web-scale relation extraction," The Semantic Web-ISWC 2013, pp. 347-362. Springer Berlin Heidelberg, 2013.
  59. Fatos Xhafa and Leonard Barolli, "Semantics, intelligent processing and services for big data," Journal of Future Generation Computer Systems, vol.37, pp. 201-202, 2014. https://doi.org/10.1016/j.future.2014.02.004
  60. Urbani, Jacopo, Spyros Kotoulas, Jason Maassen, Frank Van Harmelen, and Henri Bal, "WebPIE: A Web-scale parallel inference engine using MapReduce,"Web Semantics: Science, Services and Agents on the World Wide Web vol. 10, pp. 59-75, 2012. https://doi.org/10.1016/j.websem.2011.05.004
  61. Ruben Verborgh, Miel Vander Sande, Pieter Colpaert, Sam Coppens, Erik Mannens, and Rik Van de Walle, "Web-Scale Querying through Linked Data Fragments," Workshop on Linked Data on the Web (LDOW2014) Seoul, South Korea, 2014.

Cited by

  1. Big data analytics and big data science: a survey vol.3, pp.1, 2014, https://doi.org/10.1080/23270012.2016.1141332
  2. Fuzzy VIKOR approach for selection of big data analyst in procurement management vol.10, pp.1, 2014, https://doi.org/10.4102/jtscm.v10i1.230
  3. Handling Big Data in Modern Healthcare vol.47, pp.4, 2016, https://doi.org/10.1093/labmed/lmw038
  4. Mining Customer Shopping Behavior : A Method Encoding Customer Purchase Decision Attitude vol.10, pp.1, 2014, https://doi.org/10.4018/ijisss.2018010102
  5. High-Performance End-to-End Integrity Verification on Big Data Transfer vol.ed102, pp.8, 2014, https://doi.org/10.1587/transinf.2018edp7297
  6. Real-time data exploitation supported by model- and event-driven architecture to enhance situation awareness, application to crisis management vol.14, pp.6, 2014, https://doi.org/10.1080/17517575.2019.1691268
  7. An intelligent system for energy management in smart cities based on big data and ontology vol.10, pp.2, 2014, https://doi.org/10.1108/sasbe-07-2019-0087
  8. Big-data Analytics: Exploring the Well-being Trend in South Korea Through Inductive Reasoning vol.15, pp.6, 2014, https://doi.org/10.3837/tiis.2021.06.003