Browse > Article
http://dx.doi.org/10.3745/JIPS.2014.10.1.145

Probabilistic Models for Local Patterns Analysis  

Salim, Khiat (Signal, System and Data Laboratory (LSSD), Computer Sciences Department, Computer Sciences and Mathematics Faculty, University of Sciences and Technology)
Hafida, Belbachir (Signal, System and Data Laboratory (LSSD), Computer Sciences Department, Computer Sciences and Mathematics Faculty, University of Sciences and Technology)
Ahmed, Rahal Sid (Signal, System and Data Laboratory (LSSD), Computer Sciences Department, Computer Sciences and Mathematics Faculty, University of Sciences and Technology)
Publication Information
Journal of Information Processing Systems / v.10, no.1, 2014 , pp. 145-161 More about this Journal
Abstract
Recently, many large organizations have multiple data sources (MDS') distributed over different branches of an interstate company. Local patterns analysis has become an effective strategy for MDS mining in national and international organizations. It consists of mining different datasets in order to obtain frequent patterns, which are forwarded to a centralized place for global pattern analysis. Various synthesizing models [2,3,4,5,6,7,8,26] have been proposed to build global patterns from the forwarded patterns. It is desired that the synthesized rules from such forwarded patterns must closely match with the mono-mining results (i.e., the results that would be obtained if all of the databases are put together and mining has been done). When the pattern is present in the site, but fails to satisfy the minimum support threshold value, it is not allowed to take part in the pattern synthesizing process. Therefore, this process can lose some interesting patterns, which can help the decider to make the right decision. In such situations we propose the application of a probabilistic model in the synthesizing process. An adequate choice for a probabilistic model can improve the quality of patterns that have been discovered. In this paper, we perform a comprehensive study on various probabilistic models that can be applied in the synthesizing process and we choose and improve one of them that works to ameliorate the synthesizing results. Finally, some experiments are presented in public database in order to improve the efficiency of our proposed synthesizing method.
Keywords
Global Pattern; Maximum Entropy Method; Non-derivable Itemset; Itemset Inclusion-exclusion Model;
Citations & Related Records
연도 인용수 순위
  • Reference
1 Animesh Adhikari, P.R. Rao, << Synthesizing heavy association rules from different real data sources>>, Elsevier Science Direct 2008.
2 Animesh Adhikari, Pralhad Ramachandrarao, Witold Pedrycz. Books, << Developing Multi-databases Mining Applications >> chapter 3 "Mining Multiple Large Databases", Springer-Verlag London Limited 2010.
3 D. Pavlov, H. Mannila, and P. Smyth, << Beyond independence: Probabilistic models for query approximation on binary transaction data >>, Technical Report UCI-ICS-TR-01-09, Information and Computer Science, University of California, Irvine, 2001.
4 V. Poosala and Y. Ioannidis, << Selectivity estimation without the attribute value independence assumption >>, In Proceedings of the 23rd International Conference on Very Large Data Bases (VLDB'97), pages 486-495. San Francisco, CA: Morgan Kaufmann Publishers, 1997.
5 J. N. Darroch and D. Ratcliff, << Generalized iterative scaling for log-linear models >>, Annals of Mathematical Statistics, 43:1470-1480, 1972.   DOI   ScienceOn
6 Toon Calders and Bart Goethals, << Quick Inclusion-Exclusion >>, University of Antwerp, Belgium. Springer-Verlag Berlin Heidelberg 2006.
7 Unil Yun, << Efficient mining of weighted interesting patterns with a strong weight and/or support affinity>>, Information Sciences 177 (2007) 3477-3499 Elsevier 2007.   DOI
8 S. Jaroszewiez, D. Simovici and I.Rosenberg, << An inclusion-exclusion result for Boolean polynomial and its applications in data mining >>, In proceedings of the Discrete Mathematics in DM Workshop.SIAM DM Conference, Washington .D.C. 2002.
9 Szymon Jaroszewiez, Dan A.Simovici, << Support Approximation Using Bonferroni Inequalities >>, University of Massachusetts at Boston, Departement of Computer Science USA.
10 D. Pavlov, H. Mannila, and P. Smyth, << Probabilistic models for query approximation with large Sparse Binary Data Set >>, Technical Report N$^{\circ}$:00-07department of information and computer science, University of California, Irvine. February 2000.
11 A.L. Berger, S.A. Della Pietra, and V.J. Della Pietra, << A maximum entropy approach to natural language processing >>, Computational Linguistics, 22(1):39-72, 1996.
12 J. Pearl, << Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference >>, Morgan Kaufmann Publishers Inc., 1988.
13 F. Jelinek, << Statistical Methods for Speech Recognition >>, MTT Press.1998.
14 H. Mannila and H. Toivonen, << Multiple uses of frequent sets and condensed representations >>, In Proc. KDD Int. Conf. Knowledge Discovery in Databases, 1996.
15 H. Mannila, D. Pavlov, and P. Smyth, <>, In Proceedings of Fifth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (SIGKDD'99), pages 357-361. New York, NY: ACM Press, 1999.
16 C. K. Chow and C. N. Liu, <>, IEEE transactions on Information Theory, IT-14(3):462-467, 1968.
17 Rakesh Agrawal, Ramakrishnan Srikan << Fast Algorithms for Mining Association Rules >> In VLDB'94, pp. 487-499.
18 Jiawi Han, Jina Pei, Yiwen Yin, Runying Mao, <>, Data Mining and Knowledge Discovery, 8:53-87, 2000.
19 N. PASQUIER, Y. BASTIDE, R. TAOUIL, L. LAKHAL, <>, lattices. Information Systems, pages 25-46, 1999.
20 Zhang S, Wu X, Zhang C., Multi-databases mining. IEEE Comput Intell Bull 2:5-13, 2003.
21 Thirunavukkarasu Ramkumar, Rengaramanujam Srinivasan, << Modified algorithms for synthesizing high-frequency rules from different data sources>>, Springer-Verlag London Limited 2008.
22 Shichao Zhang, Xiaofang You, Zhi Jin, Xindong Wu, <>, (2009) .
23 Thirunavukkarasu Ramkumar, Rengaramanujam Srinivasan, << The Effect of Correction Factor in Synthesizing Global Rules in a Multi-databases Mining Scenario >>, Journal of computer science, no.6 (3)/2009, Suceava.
24 Xindong Wu and Shichao Zhang, <>, IEEE TransKnowledge Data Eng 15(2):353-367 (2003).   DOI   ScienceOn
25 Thirunavukkarasu Ramkumar, Rengaramanujam Srinivasan, <>, April, 2010.
26 Shichao Zhang and Mohamed j. Zaki, << Mining Multiple Data Sources: Local Pattern Analysis>>, Data Mining Knowledge Discovery 12(2-3):121-125 ( 2006).   DOI
27 CHAIB Souleyman et MILOUDI Salim, <>, Master thesis, University of mohamed Boudiaf Oran Algeria 2011.