DOI QR코드

DOI QR Code

Distributed Incremental Approximate Frequent Itemset Mining Using MapReduce

  • Received : 2023.05.05
  • Published : 2023.05.30

Abstract

Traditional methods for datamining typically assume that the data is small, centralized, memory resident and static. But this assumption is no longer acceptable, because datasets are growing very fast hence becoming huge from time to time. There is fast growing need to manage data with efficient mining algorithms. In such a scenario it is inevitable to carry out data mining in a distributed environment and Frequent Itemset Mining (FIM) is no exception. Thus, the need of an efficient incremental mining algorithm arises. We propose the Distributed Incremental Approximate Frequent Itemset Mining (DIAFIM) which is an incremental FIM algorithm and works on the distributed parallel MapReduce environment. The key contribution of this research is devising an incremental mining algorithm that works on the distributed parallel MapReduce environment.

Keywords

References

  1. ClaudiaAntunesandArlindoL.Oliveira, Sequential Pattern Mining Algorithms: Trade-offs between Speed and Memory, InstitutoSuperiorTecnico/INESC-
  2. Ming-Yen Lin , Pei-Yu Lee , Sue-Chen Hsueh, Apriori-based frequent itemset mining algorithms on MapReduce, Proceedings of the 6th International Conference on Ubiquitous Information Management and Communication, February 20-22, 2012, Kuala Lumpur, Malaysia
  3. Ming-Yen Lin , Pei-Yu Lee , Sue-Chen Hsueh, Apriori-based frequent itemset mining algorithms on MapReduce, Proceedings of the 6th International Conference on Ubiquitous Information Management and Communication, February 20-22, 2012, Kuala Lumpur, Malaysia
  4. D. Cheoung, J. Han, V. Ng, and C. Y. Wong, "Maintenance of discovered associated rules in large databases: An incremental updating technique," in Proc. 12th Int. Conf. Data Engineering, Feb. 1996, pp. 106-114
  5. D. Cheoung, S. Lee, and B. Kao, "A general incremental technique for maintaining discovered association rules," in Proc. 5th Int. Conf. Database System Advanced Application, Apr. 1997, pp. 1-4
  6. V. Ganti, J. Gehrke, and R. Ramakrishnan, "Demon: Mining and monitoring evolving data," in Proc. 16th Int. Conf. Data Engineering, San Diego, CA, 2000, p. 439-448
  7. Otey, M.E., Parthasarathy, S., Wang, C., Veloso, A., Meira, W., Jr., Parallel and distributed methods for incremental frequent itemset mining. Systems, Man, and Cybernetics, Part B: Cybernetics, IEEE Transactions on, 34(6), 2439-2450.
  8. ID,DepartmentofInformationSystemsandComputerScience, Av.RoviscoPais1,1049-001Lisboa,Portugal.
  9. Shaikh, Mohsin, Ki-Seong Lee, and Chan-Gun Lee. "Assessing the Bug-Prediction with Re-Usability Based Package Organization for Object Oriented Software Systems." IEICE TRANSACTIONS on Information and Systems 100.1 (2017): 107-117. https://doi.org/10.1587/transinf.2016EDP7186
  10. Shaikh, Mohsin, and Chan-Gun Lee. "Aspect Oriented Reengineering of Legacy Software Using Cross-Cutting Concern Characterization and Significant Code Smells Detection." International Journal of Software Engineering and Knowledge Engineering 26.03 (2016): 513-536. https://doi.org/10.1142/S0218194016500212
  11. Shaikh, Mohsin, et al. "Open-source electronic health record systems: A systematic review of most recent advances." Health Informatics Journal 28.2 (2022): 1460458222109982
  12. M.J.Zaki, "Parallel and distributed association mining: A survey," IEEE concurrency, vol. 7(4), pp. 14-25, 1999 https://doi.org/10.1109/4434.806975
  13. R. Agrawal and J. Shafer. Parallel mining of association rules. Transactions of Knowledge and Data Engineering, 8(6):962-969, 1996 https://doi.org/10.1109/69.553164
  14. AgrawalRakesh and RamakrishnanSrikant. Fast algorithms for mining association rules.In Proc. 20th Int. Conf. Very Large Data Bases, VLDB, 1994.
  15. Otey, M.E., Parthasarathy, S., Wang, C., Veloso, A., Meira, W., Jr., Parallel and distributed methods for incremental frequent itemset mining. Systems, Man, and Cybernetics, Part B: Cybernetics, IEEE Transactions on, 34(6), 2439-2450.
  16. Amazon Elastic Compute Cloud (Amazon EC2)http://aws.amazon.com/ec2/
  17. Mohsin, Shaikh, and Zeeshan Kaleem. "Program slicing based software metrics towards code restructuring." In 2010 Second International Conference on Computer Research and Development, pp. 738-741. IEEE, 2010.
  18. Shaikh M, Jalbani AH, Ansari A, Ahmed A, Memon K. Evaluating Dependency based Package-level Metrics for Multi-objective Maintenance Tasks. International Journal of Advanced Computer Science and Applications. 2017;8(10).
  19. Shaikh M, Ibarhimov D, Zardari B. Assessing Architectural Sustainability during Software Evolution using Package-Modularization Metrics. International Journal of Advanced Computer Science and Applications. 2019;10(12).
  20. Mujeeb-ur-Rehman Jamali, Abdul Ghafoor Memon, Nadeem A. Kanasro, Mujeeb-u-Rehman Maree. "Data integrity issues and challenges in next generation non-relational document-oriented database outsourced in public cloud", International Journal of Emerging Trends in Engineering Research, Volume 9. No. 4, April 2021- ISSN 2347 - 3983.
  21. Mashooque Ahmed Memon , Mujeeb-ur-Rehman Maree Baloch , Muniba Memon , Syed Hyder Abbas Musavi," A Regression Analysis Based Model for Defect Learning and Prediction in Software Development", Mehran University Research Journal of Engineering and Technology, Vol.40, No. 3, 617- 629, July 2021, p-ISSN: 0254-7821, e-ISSN: 2413-7219. https://doi.org/10.22581/muet1982.2103.15
  22. Memon Abdul Ghafoor, Jianwei Yin, Jinxiang Dong, Maree Mujeeb-u-Rehman, "Service-oriented Mobile Calculus Technology in M-Business interoperability between Customer and e-Shop", Proceedings of the 2005 IEEE International Workshop on Service-Oriented System Engineering (SOSE'05).