Semantic Process Retrieval with Similarity Algorithms

Lee, Hong-Joo;Klein, Mark;

Asia pacific journal of information systems

Volume 18 Issue 1
/
Pages.79-96
/
2008
/
2288-5404(pISSN)
/
2288-6818(eISSN)

The Korea Society of Management Information Systems (한국경영정보학회)

Semantic Process Retrieval with Similarity Algorithms

유사도 알고리즘을 활용한 시맨틱 프로세스 검색방안

Lee, Hong-Joo ;
Klein, Mark (MIT Center for Collective Intelligence)

이홍주 (가톨릭대학교 경영학부) ;

Published : 2008.03.31

PDF KSCI

Download PDF

⟨ Previous Next ⟩

Abstract

One of the roles of the Semantic Web services is to execute dynamic intra-organizational services including the integration and interoperation of business processes. Since different organizations design their processes differently, the retrieval of similar semantic business processes is necessary in order to support inter-organizational collaborations. Most approaches for finding services that have certain features and support certain business processes have relied on some type of logical reasoning and exact matching. This paper presents our approach of using imprecise matching for expanding results from an exact matching engine to query the OWL(Web Ontology Language) MIT Process Handbook. MIT Process Handbook is an electronic repository of best-practice business processes. The Handbook is intended to help people: (1) redesigning organizational processes, (2) inventing new processes, and (3) sharing ideas about organizational practices. In order to use the MIT Process Handbook for process retrieval experiments, we had to export it into an OWL-based format. We model the Process Handbook meta-model in OWL and export the processes in the Handbook as instances of the meta-model. Next, we need to find a sizable number of queries and their corresponding correct answers in the Process Handbook. Many previous studies devised artificial dataset composed of randomly generated numbers without real meaning and used subjective ratings for correct answers and similarity values between processes. To generate a semantic-preserving test data set, we create 20 variants for each target process that are syntactically different but semantically equivalent using mutation operators. These variants represent the correct answers of the target process. We devise diverse similarity algorithms based on values of process attributes and structures of business processes. We use simple similarity algorithms for text retrieval such as TF-IDF and Levenshtein edit distance to devise our approaches, and utilize tree edit distance measure because semantic processes are appeared to have a graph structure. Also, we design similarity algorithms considering similarity of process structure such as part process, goal, and exception. Since we can identify relationships between semantic process and its subcomponents, this information can be utilized for calculating similarities between processes. Dice's coefficient and Jaccard similarity measures are utilized to calculate portion of overlaps between processes in diverse ways. We perform retrieval experiments to compare the performance of the devised similarity algorithms. We measure the retrieval performance in terms of precision, recall and F measure? the harmonic mean of precision and recall. The tree edit distance shows the poorest performance in terms of all measures. TF-IDF and the method incorporating TF-IDF measure and Levenshtein edit distance show better performances than other devised methods. These two measures are focused on similarity between name and descriptions of process. In addition, we calculate rank correlation coefficient, Kendall's tau b, between the number of process mutations and ranking of similarity values among the mutation sets. In this experiment, similarity measures based on process structure, such as Dice's, Jaccard, and derivatives of these measures, show greater coefficient than measures based on values of process attributes. However, the Lev-TFIDF-JaccardAll measure considering process structure and attributes' values together shows reasonably better performances in these two experiments. For retrieving semantic process, we can think that it's better to consider diverse aspects of process similarity such as process structure and values of process attributes. We generate semantic process data and its dataset for retrieval experiment from MIT Process Handbook repository. We suggest imprecise query algorithms that expand retrieval results from exact matching engine such as SPARQL, and compare the retrieval performances of the similarity algorithms. For the limitations and future work, we need to perform experiments with other dataset from other domain. And, since there are many similarity values from diverse measures, we may find better ways to identify relevant processes by applying these values simultaneously.

Keywords

References

김학래, 김홍기, "시맨틱 웹 기반의 e-비즈니스 상호운용성," 한국경영정보학회 춘계학술대회, 2002, pp. 311-319
김학래, 김홍기, "유비쿼터스 서비스를 위한 시맨틱 웹 기술," 한국경영정보학회 추계학술대회, 2003, pp. 31-35
김형도, 김종우, "UML기반의 기업간 비즈니스 프로세스 명세 모델링," Journal of Information Technology Applications & Management, Vol. 13, No. 4, 2006, pp. 71-88
Baeza-Yates, R. and Ribeiro-Neto, B., Modern Information Retrieval, ACM Press, New York, 1999
Bernstein, A., Kaufmann, E., Buerki, C., and Klein, M., "Object Simialrity in Ontologies: A Foundation for Business Intelligence Systems and High-Performance Retrieval," Proceedings of Twenty-Fifth International Conference on Information Systems, 2004, pp. 741-756
Bernstein, A., Kaufmann, E., Kiefer, C., and Bürki, C., SimPack: A Generic Java Library for Similarity Measures in Ontologies, Technical Report, Department of Informatics, University of Zurich, 2005
Bernstein, A. and Kiefer, C., "Imprecise RDQL: Towards Generic Retrieval in Ontologies Using Similarity Joins," Proceedings of SAC'06, Dijon, France, 2006, ACM, pp. 1684-1689
Bernstein, A. and Klein, M., "Towards High-Precision Service Retrieval," Proceedings of the 1st International Semantic Web Conference on The Semantic Web (ISWC'02), London, UK, 2002, Springer-Verlag, pp. 84-101
Bianchini, D., Antonellis, V.D., Pernici, B., and Plebani, P., "Ontology-based methodology for e-service discovery," Information Systems, Vol. 31, 2006, pp. 361-380 https://doi.org/10.1016/j.is.2005.02.010
Davies, J., Fensel, D. and Harmelen, F.V., ed., Towards the Semantic Web: ontologydriven knowledge management, West Sussex, England: John Wiley and Sons Ltd, 2003
Ehrig, M., Koschmider, A. and Oberweis, A., "Measuring Similarity between Semantic Business Process Models," Proceedings of the 4th Asia-Pacific Conference on Conceptual Modelling (APCCM'07), Ballarat, Victoria, Australia, 2007, pp. 71-80
Haase, P., Broekstra, J., Eberhart, A. and Volz, R., "A Comparison of RDF Query Languages," Proceedings of ISWC, 2004, pp. 502-517
Hau, J., Lee, W., and Darlington, J., "A Semantic Similarity Measure for Semantic Web Services," Proceedings of WWW2005, Chiba, Japan, 2005
Hollenstein, S., XQuery Similarity Joins, University of Zurich, 2005
Kiefer, C., Bernstein, A., and Stocker, M., "The Fundamentals of iSPARQL-A Virtual Triple Approach For Similarity-Based Semantic Web Tasks," Proceedings of Proceedings of the 6th International Semantic Web Conference (ISWC), 2007
Klein, M. and Dellarocas, C., "Designing Robust Business Processes," in Thomas W. Malone, Kevin Crowston, and Gerorge A. Herman, ed., Organizing Business Knowledge: The MIT Process Handbook, MIT Press, Cambridge, Massachusetts, USA, 2003, pp. 423-439
Klein, M. and Petti, C., "A Handbook-Based Methodology for Redesigning Business Processes," Knowledge and Process Management, Vol. 13, No. 2, 2006, pp. 108-119 https://doi.org/10.1002/kpm.248
Klusch, M., Fries, B., Khalid, M. and Sycara, K., "OWLS-MX: Hybrid OWL-S Service Matchmaking," Proceedings of AAAI '05, 2005
Klusch, M., Fries, B. and Sycara, K., "Automated Semantic Web Service Discovery with OWLS-MX," Proceedings of AAMAS 2006, Hakodate, Hokkaido, Japan, 2006
Levenshtein, V.I., "Binary Codes Capable of Correcting Deletions, Insertions and Reversals," Soviet Physics Doklady, Vol. 10, 1966, pp. 707-710
Lin, D., "An Information-Theoretic Definition of Similarity," Proceedings of the Fifth International Conference on Machine Learning (ICML '98), Madison, WI, 1998
Malone, T.W., Crowston, K. and Herman, G., ed., Orgznizing Business Knoweldge: The MIT Process Handbook, Cambridge, Massachusetts, USA: MIT Press, 2003
Malone, T.W., Crowston, K., Lee, J. and Pentlad, B., "Tools for inventing organizations: Toward a handbook of organizational processes," Management Science, Vol. 45, No. 3, 1999, pp. 425-443 https://doi.org/10.1287/mnsc.45.3.425
McCool, R., "Rethinking the Semantic Web, Part 1," IEEE INTERNET COMPUTING, Vol. 9, No. 6, 2005, pp. 86-88 https://doi.org/10.1109/MIC.2005.3
Ouzzani, M. and Bouguettaya, A., "Efficient Access to Web Services," IEEE Internet Computing, Vol. 8, No. 2, 2004, pp. 34-44
Resnik, P., "Semantic Similarity in a Taxonomy: An Information-Based Measure and its Application to Problems of Ambiguity in Natural Language," Journal of Artificial Intelligence Research, Vol. 11, 1999, pp. 95-130 https://doi.org/10.1613/jair.514
Sager, T., Bernstein, A., Pinzger, M. and Kiefer, C., "Detecting Similar Java Classes Using Tree Algorithms," Proceedings of the 2006 International Workshop on Mining Software Repositories(MSR'06), Shanghai, China, 2006
Taivalsaari, A., "On the notion of inheritance," ACM Computing Surveys, Vol. 28, No. 3, 1996, pp. 438-479 https://doi.org/10.1145/243439.243441
Valiente, G., Algorithms on Trees and Graphs, Springer-Verlag, Berlin, 2002
Van der Aalst, W.M.P. and Basten, T., Inheritance of Workflows: An approach to tackling problems related to change, Technical report, Eindhoven University of Technology, 1999
Wang, Y. and Stroulia, E., "Semantic Structure Matching for Assessing Web-Service Similarity," Proceedings of 1st International Conference on Service Oriented Computing, Trento, Italy, 2003, pp. 194-207