DOI QR코드

DOI QR Code

Novel Push-Front Fibonacci Windows Model for Finding Emerging Patterns with Better Completeness and Accuracy

  • Akhriza, Tubagus Mohammad (Communication and Information System Engineering, Shanghai Jiao Tong University, Pradnya Paramita School of Informatics Management and Computer) ;
  • Ma, Yinghua (School of Information Security Engineering, Shanghai Jiao Tong University) ;
  • Li, Jianhua (School of Information Security Engineering, Shanghai Jiao Tong University)
  • Received : 2017.03.08
  • Accepted : 2017.07.19
  • Published : 2018.02.01

Abstract

To find the emerging patterns (EPs) in streaming transaction data, the streaming is first divided into some time windows containing a number of transactions. Itemsets are generated from transactions in each window, and then the emergence of itemsets is evaluated between two windows. In the tilted-time windows model (TTWM), it is assumed that people need support data with finer accuracy from the most recent windows, while accepting coarser accuracy from older windows. Therefore, a limited array's elements are used to maintain all support data in a way that condenses old windows by merging them inside one element. The capacity of elements that accommodates the windows inside is modeled using a particular number sequence. However, in a stream, as new data arrives, the current array updating mechanisms lead to many null elements in the array and cause data incompleteness and inaccuracy problems. Two models derived from TTWM, logarithmic TTWM and Fibonacci windows model, also inherit the same problems. This article proposes a novel push-front Fibonacci windows model as a solution, and experiments are conducted to demonstrate its superiority in finding more EPs compared to other models.

Keywords

References

  1. B. Wixom et al., "The Current State of Business Intelligence Inacademia: The Arrival of Big Data," Commun. Assoc. Inform. Syst., vol. 34, Jan. 2014, pp. 1-13.
  2. A. Balliu, D. Olivetti, O. Babaoglu, M. Marzolla, and A. Sirbu, "A Big Data Analyzer for Large Trace Logs," Computing, vol. 98, no. 12, Dec. 2016, pp. 1225-1249. https://doi.org/10.1007/s00607-015-0480-7
  3. A. Gandomi and A. Haider, "Beyond the Hype: Big Data Concepts, Methods, and Analytics," Int. J. Infor. Manag., vol. 35, no. 2, Apr. 2015, pp. 137-144. https://doi.org/10.1016/j.ijinfomgt.2014.10.007
  4. M.S. Khan, F. Coenen, D. Reid, R. Patel, and L. Archer, "A Sliding Windows Based Dual Support Framework for Discovering Emerging Trends from Temporal Data," Knowl.-Based Syst., vol. 23, no. 4, May 2010, pp. 316-322. https://doi.org/10.1016/j.knosys.2009.11.005
  5. G. Dong and J. Li, "Efficient Mining of Emerging Patterns: Discovering Trends and Differences," In Proc. ACM SIGKDD Int. Conf. Knowledge Discovery Data Mining, San Diego, CA, USA, Aug. 15-18, 1999, pp. 43-52.
  6. G. Cormode and S. Muthukrishnan, "What's New: Finding Significant Differences in Network Data Streams," IEEE/ACM Trans. Netw., vol. 13, no. 6, Dec. 2005, pp. 1219-1232. https://doi.org/10.1109/TNET.2005.860096
  7. H. Alhammady and K. Ramamohanarao, "Mining Emerging Patterns and Classification in Data Streams," In IEEE/WIC/ACM Int. Conf. Web Intell., Compiegne, France, Sept. 19-22, 2005, pp. 272-275.
  8. Y. Chen, G. Dong, J. Han, B.W. Wah, and J. Wang, "Multi-dimensional Regression Analysis of Time-Series Data Streams," In Proc. Int. Conf. Very large Data Bases, Hong Kong, China, Aug. 20-23, 2002, pp. 323-334.
  9. V.E. Lee, R. Jin, and G. Agrawal, "Frequent Pattern Mining in Data Streams," In Frequent Pattern Mining, New York, USA: Springer, 2014, pp. 199-224.
  10. C. Giannella, J. Han, J. Pei, X. Yan, and P.S. Yu, "Mining Frequent Patterns in Data Streams at Multiple Time Granularities," In Data Mining: Next Generation Challenges and Future Direction, MIT/AAI Press, 2004, pp. 191-212.
  11. T.M. Akhriza, Y.H. Ma, and J.H. Li, "A Novel Fibonacci Windows Model for Finding Emerging Patterns over Online Data Stream," In Int. Conf. Cyber Security Smart Cities, Ind. Contr. Syst. Commun., Shanghai, China, Aug. 2015, pp. 1-8.
  12. P. Kralj, N. Lavrac, D. Gamberger, and A. Krstacic, Contrast Set Mining for Distinguishing between Similar Diseases, Lecture Notes in Computer Science, vol. 4594, New York, USA: Springer, 2007.
  13. J.Y. Li, H. Liu, S.K. Ng, and L. Wong, "Discovery of Significant Rules for Classifying Cancer Diagnosis Data," Bioinformatics, vol. 19, sup. 2, Sept. 2003, pp. 93-102.
  14. J.Y. Li, H. Liu, J.R. Downing, A.E. Yeoh, and L. Wong, "Simple Rules Underlying Gene Expression Profiles of More than Six Subtypes of Acute Lymphoblastic Leukemia (ALL) Patients," Bioinformatics, vol. 19, no. 1, Jan. 2003, pp. 71-78. https://doi.org/10.1093/bioinformatics/19.1.71
  15. L. J Chen and G.Z. Dong, "Masquerader Detection Using OCLEP: One Class Classification Using Length Statistics of Emerging Patterns," In Int. Conf. Web-Age Inform. Manag. Workshops, Hong Kong, China, June 17-19, 2006, p. 5.
  16. C. Borgelt, "Data Mining and Knowledge Discovery," WIREs Mining Knowl. Discovery, vol. 2, no. 6, Oct. 2012, pp. 437-456. https://doi.org/10.1002/widm.1074
  17. X. Chen and Z. Liu, "Finding Contrast Patterns in Imbalanced Classification Based on Sliding Window," In Proc. Int. Conf. MMME, Beijing, China, Dec. 30-31, 2016, pp. 161-166.
  18. H.F. Li and H.S. Chen, "Discovering Emerging Melody Patterns from Customer Query Data Streams of Music Service," In IEEE Int. Conf. Multimedia Expo, Barcelona, Spain, July 11-15, 2011, pp. 1-4.
  19. J. Bailey and E. Loekito, "Efficient Incremental Mining of Contrast Patterns in Changing Data," Inform. Process. Lett., vol. 110, no. 3, Jan. 2009, pp. 88-92. https://doi.org/10.1016/j.ipl.2009.10.012
  20. C. Lee, C. Lin, and M. Chen, "Sliding-Window Filtering: an Efficient Algorithm for Incremental Mining," In Proc. Int. Conf. Inform. Knowl. Manag., Atlanta, GA, USA, Oct. 2001, pp. 263-270.
  21. H-f. Li and S-Y. Lee, "Mining Frequent Itemsets over Data Streams Using Efficient Window Sliding Techniques," Expert Syst. Applicat., vol. 36, no. 2, Mar. 2009, pp. 1466-1477. https://doi.org/10.1016/j.eswa.2007.11.061
  22. J.H. Chang and W.S. Lee, "Finding Recent Frequent Itemsets Adaptively over Online Data Streams," In Proc. ACM. SIGKDD Int. Conf. Knowl. Discovery Data Mining, Washington, DC, USA, Aug. 24-27, 2003, pp. 487-492.
  23. J.H. Chang and W.S. Lee, "estWin: Online Data Stream Mining of Recent Frequent Itemsets by Sliding Window Method," J. Inform. Sci., vol. 31, no. 2, Apr. 2005, pp. 76-90. https://doi.org/10.1177/0165551505050785
  24. G.S. Manku and R. Motwani, "Approximate Frequency Counts over Data Streams," In Proc. Int. Conf. Very Large Data Bases, Hong Kong, China, Aug. 20-23, 2002, pp. 346-357.
  25. J. Cheng, Y. Ke, and W. Ng, "A Survey on Algorithms for Mining Frequent Itemsets over Data Streams," Knowl. Inform. Syst., vol. 16, no. 1, July 2008, pp. 1-27. https://doi.org/10.1007/s10115-007-0092-4
  26. Machine Learning Repository, "Online Retail Data Set," Accessed May 29, 2017. http://archive.ics.uci.edu/ml/datasets/online+retail