• Title/Summary/Keyword: Distributed Data Mining

Search Result 111, Processing Time 0.024 seconds

Clustering Algorithm using the DFP-Tree based on the MapReduce (맵리듀스 기반 DFP-Tree를 이용한 클러스터링 알고리즘)

  • Seo, Young-Won;Kim, Chang-soo
    • Journal of Internet Computing and Services
    • /
    • v.16 no.6
    • /
    • pp.23-30
    • /
    • 2015
  • As BigData is issued, many applications that operate based on the results of data analysis have been developed, typically applications are products recommend service of e-commerce application service system, search service on the search engine service and friend list recommend system of social network service. In this paper, we suggests a decision frequent pattern tree that is combined the origin frequent pattern tree that is mining similar pattern to appear in the data set of the existing data mining techniques and decision tree based on the theory of computer science. The decision frequent pattern tree algorithm improves about problem of frequent pattern tree that have to make some a lot's pattern so it is to hard to analyze about data. We also proposes to model for a Mapredue framework that is a programming model to help to operate in distributed environment.

Predicting Arab Consumers' Preferences on the Korean Contents Distribution

  • Park, Young-Eun;Chaffar, Soumaya;Kim, Myoung-Sook;Ko, Hye-Young
    • Journal of Distribution Science
    • /
    • v.15 no.4
    • /
    • pp.33-40
    • /
    • 2017
  • Purpose - This study aims to examine the analysis of pattern on Arab countries consumers' preferences of the Korean Contents using social media, Facebook since Korean entertainment contents have been distributed in the global marketplace. Then we focus on developing Predictive model using a Data Mining Technique. Research design, data and methodology - In order to understand preference growth of Korean contents in Arabic countries, we- collected data from two popular Facebook pages: 'Korean movies and drama' and 'K-pop'. Then, we adopted a data-driven approach based on Data Mining techniques. Results - It is obvious that the number of likes for K-pop will increase for all North African and Middle Eastern countries, however concerning Korean Movies and Drama except Tunisia it is decreasing for Algeria, Egypt and Morocco. Also, concerning Saudi Arabia and United Arab Emirates, the number of likes will decrease for Korean Movies and Drama which is not the case for Iraq. Conclusions - It is noted in this study that K-contents such as drama, movie and music are sometimes a gateway to a wider interest in Korean culture, food and brands. Moreover, this study gives significant implications for developing predictive model to forecast Korean contents' consumption and preferences.

Towards a Deep Analysis of High School Students' Outcomes

  • Barila, Adina;Danubianu, Mirela;Paraschiv, Andrei Marcel
    • International Journal of Computer Science & Network Security
    • /
    • v.21 no.6
    • /
    • pp.71-76
    • /
    • 2021
  • Education is one of the pillars of sustainable development. For this reason, the discovery of useful information in its process of adaptation to new challenges is treated with care. This paper aims to present the initiation of a process of exploring the data collected from the results obtained by Romanian students at the BBaccalaureate (the Romanian high school graduation) exam, through data mining methods, in order to try an in-depth analysis to find and remedy some of the causes that lead to unsatisfactory results. Specifically, a set of public data was collected from the website of the Ministry of Education, on which several classification methods were tested in order to find the most efficient modeling algorithm. It is the first time that this type of data is subjected to such interests.

Adaptive Frequent Pattern Algorithm using CAWFP-Tree based on RHadoop Platform (RHadoop 플랫폼기반 CAWFP-Tree를 이용한 적응 빈발 패턴 알고리즘)

  • Park, In-Kyu
    • Journal of Digital Convergence
    • /
    • v.15 no.6
    • /
    • pp.229-236
    • /
    • 2017
  • An efficient frequent pattern algorithm is essential for mining association rules as well as many other mining tasks for convergence with its application spread over a very broad spectrum. Models for mining pattern have been proposed using a FP-tree for storing compressed information about frequent patterns. In this paper, we propose a centroid frequent pattern growth algorithm which we called "CAWFP-Growth" that enhances he FP-Growth algorithm by making the center of weights and frequencies for the itemsets. Because the conventional constraint of maximum weighted support is not necessary to maintain the downward closure property, it is more likely to reduce the search time and the information loss of the frequent patterns. The experimental results show that the proposed algorithm achieves better performance than other algorithms without scarifying the accuracy and increasing the processing time via the centroid of the items. The MapReduce framework model is provided to handle large amounts of data via a pseudo-distributed computing environment. In addition, the modeling of the proposed algorithm is required in the fully distributed mode.

Performance Optimization of Big Data Center Processing System - Big Data Analysis Algorithm Based on Location Awareness

  • Zhao, Wen-Xuan;Min, Byung-Won
    • International Journal of Contents
    • /
    • v.17 no.3
    • /
    • pp.74-83
    • /
    • 2021
  • A location-aware algorithm is proposed in this study to optimize the system performance of distributed systems for processing big data with low data reliability and application performance. Compared with previous algorithms, the location-aware data block placement algorithm uses data block placement and node data recovery strategies to improve data application performance and reliability. Simulation and actual cluster tests showed that the location-aware placement algorithm proposed in this study could greatly improve data reliability and shorten the application processing time of I/O interfaces in real-time.

Big Numeric Data Classification Using Grid-based Bayesian Inference in the MapReduce Framework

  • Kim, Young Joon;Lee, Keon Myung
    • International Journal of Fuzzy Logic and Intelligent Systems
    • /
    • v.14 no.4
    • /
    • pp.313-321
    • /
    • 2014
  • In the current era of data-intensive services, the handling of big data is a crucial issue that affects almost every discipline and industry. In this study, we propose a classification method for large volumes of numeric data, which is implemented in a distributed programming framework, i.e., MapReduce. The proposed method partitions the data space into a grid structure and it then models the probability distributions of classes for grid cells by collecting sufficient statistics using distributed MapReduce tasks. The class labeling of new data is achieved by k-nearest neighbor classification based on Bayesian inference.

Dynamic Elasticities Between Financial Performance and Determinants of Mining and Extractive Companies in Jordan

  • Yusop, Nora Yusma;Alhyari, Jad Alkareem;Bekhet, Hussain Ali
    • The Journal of Asian Finance, Economics and Business
    • /
    • v.8 no.7
    • /
    • pp.433-446
    • /
    • 2021
  • This study aims to identify the elasticities and casualties of financial performance and determinants of the mining and extractive companies listed in Jordan's stock market over the 2005-2018 period. The conceptual framework is based on the Resource-Based View theory and Arbitrage Pricing theory is used to describe the relationship between the external environment and the financial performance of the companies. Profitability ratio (return on assets) is utilized as a proxy of financial performance measurement. Meantime, the company's characteristics, macroeconomic variables, and non-economic factors are utilized as independent factors. Data sources are panel data set for mining and extractive companies over the above period. Fully Modified Ordinary Least Square (FMOLS), Dynamic Ordinary Least Squares (DOLS), and Pooled Mean Group (PMG) methods are applied. The empirical findings indicated that company size, sales growth, financial leverage, liquidity, and GDP growth were the critical determinants of mining and extractive companies' financial performance in the Amman Stock Exchange. Thus, the findings conclude that company characteristics and GDP growth mainly drive financial performance. Moreover, the findings reveal that a bidirectional causal elasticity exists between GDP and financial leverage and return on assets (ROA). Sound financial performance can be obtained by paying more attention to GDP growth and firms' characteristics.

Research on Security Threats Emerging from Blockchain-based Services

  • Yoo, Soonduck
    • International Journal of Internet, Broadcasting and Communication
    • /
    • v.13 no.4
    • /
    • pp.1-10
    • /
    • 2021
  • The purpose of the study is to contribute to the positive development of blockchain technology by providing data to examine security vulnerabilities and threats to blockchain-based services and review countermeasures. The findings of this study are as follows. Threats to the security of blockchain-based services can be classified into application security threats, smart contract security threats, and network (P2P) security threats. First, application security threats include wallet theft (e-wallet stealing), double spending (double payment attack), and cryptojacking (mining malware infection). Second, smart contract security threats are divided into reentrancy attacks, replay attacks, and balance increasing attacks. Third, network (P2P) security threats are divided into the 51% control attack, Sybil attack, balance attack, eclipse attack (spread false information attack), selfish mining (selfish mining monopoly), block withholding attack, DDoS attack (distributed service denial attack) and DNS/BGP hijacks. Through this study, it is possible to discuss the future plans of the blockchain technology-based ecosystem through understanding the functional characteristics of transparency or some privacy that can be obtained within the blockchain. It also supports effective coping with various security threats.

The study of a full cycle semi-automated business process re-engineering: A comprehensive framework

  • Lee, Sanghwa;Sutrisnowati, Riska A.;Won, Seokrae;Woo, Jong Seong;Bae, Hyerim
    • Journal of the Korea Society of Computer and Information
    • /
    • v.23 no.11
    • /
    • pp.103-109
    • /
    • 2018
  • This paper presents an idea and framework to automate a full cycle business process management and re-engineering by integrating traditional business process management systems, process mining, data mining, machine learning, and simulation. We build our framework on the cloud-based platform such that various data sources can be incorporated. We design our systems to be extensible so that not only beneficial for practitioners of BPM, but also for researchers. Our framework can be used as a test bed for researchers without the complication of system integration. The automation of redesigning phase and selecting a baseline process model for deployment are the two main contributions of this study. In the redesigning phase, we deal with both the analysis of the existing process model and what-if analysis on how to improve the process at the same time, Additionally, improving a business process can be applied in a case by case basis that needs a lot of trial and error and huge data. In selecting the baseline process model, we need to compare many probable routes of business execution and calculate the most efficient one in respect to production cost and execution time. We also discuss the challenges and limitation of the framework, including the systems adoptability, technical difficulties and human factors.

Study of the Activation Plan for Rural Tourism of the Jeollabuk-do Using Big Data Analysis (빅데이터 분석을 통한 농촌관광 실태와 활성화 방안 연구: 전라북도를 중심으로)

  • Park, Ro Un;Lee, Ki Hoon
    • The Korean Journal of Community Living Science
    • /
    • v.27 no.spc
    • /
    • pp.665-679
    • /
    • 2016
  • This study examined the main factors for activating rural tourism of Jeollabuk-do using big data analysis. The tourism big data was gathered from public open data sources and social network services (SNS), and the analysis tools, 'Opinion Mining', 'Text Mining', and 'Social Network Analysis(SNA)' were used. The opinion mining and text mining analysis identified the key local contents of the 14 areas of Jeollabuk-do and the evaluations of customers on rural tourism. Social network analysis detected the relationships between their contents and determined the importance of the contents. The results of this research showed that each location in Jeollabuk-do had their specific contents attracting visitors and the number of contents affected the scale of tourists. In addition, the number of visitors might be large when their tourism contents were strongly correlated with the other contents. Hence, strong connections among their contents are a point to activate rural tourism. Social network analysis divided the contents into several clusters and derived the eigenvector centralities of the content nodes implying the importance of them in the network. Tourism was active when the nodes at high value of the eigenvector centrality were distributed evenly in every cluster; however the results were contrary when the nodes were located in a few clusters. This study suggests an action plan to extend rural tourism that develop valuable contents and connect the content clusters properly.