• Title/Summary/Keyword: Mining

Search Result 6,583, Processing Time 0.036 seconds

Distributed Incremental Approximate Frequent Itemset Mining Using MapReduce

  • Mohsin Shaikh;Irfan Ali Tunio;Syed Muhammad Shehram Shah;Fareesa Khan Sohu;Abdul Aziz;Ahmad Ali
    • International Journal of Computer Science & Network Security
    • /
    • v.23 no.5
    • /
    • pp.207-211
    • /
    • 2023
  • Traditional methods for datamining typically assume that the data is small, centralized, memory resident and static. But this assumption is no longer acceptable, because datasets are growing very fast hence becoming huge from time to time. There is fast growing need to manage data with efficient mining algorithms. In such a scenario it is inevitable to carry out data mining in a distributed environment and Frequent Itemset Mining (FIM) is no exception. Thus, the need of an efficient incremental mining algorithm arises. We propose the Distributed Incremental Approximate Frequent Itemset Mining (DIAFIM) which is an incremental FIM algorithm and works on the distributed parallel MapReduce environment. The key contribution of this research is devising an incremental mining algorithm that works on the distributed parallel MapReduce environment.

Enhanced Hybrid Privacy Preserving Data Mining Technique

  • Kundeti Naga Prasanthi;M V P Chandra Sekhara Rao;Ch Sudha Sree;P Seshu Babu
    • International Journal of Computer Science & Network Security
    • /
    • v.23 no.6
    • /
    • pp.99-106
    • /
    • 2023
  • Now a days, large volumes of data is accumulating in every field due to increase in capacity of storage devices. These large volumes of data can be applied with data mining for finding useful patterns which can be used for business growth, improving services, improving health conditions etc. Data from different sources can be combined before applying data mining. The data thus gathered can be misused for identity theft, fake credit/debit card transactions, etc. To overcome this, data mining techniques which provide privacy are required. There are several privacy preserving data mining techniques available in literature like randomization, perturbation, anonymization etc. This paper proposes an Enhanced Hybrid Privacy Preserving Data Mining(EHPPDM) technique. The proposed technique provides more privacy of data than existing techniques while providing better classification accuracy. The experimental results show that classification accuracies have increased using EHPPDM technique.

A Study of Web Usage Mining for eCRM

  • Hyuncheol Kang;Jung, Byoung-Cheol
    • Communications for Statistical Applications and Methods
    • /
    • v.8 no.3
    • /
    • pp.831-840
    • /
    • 2001
  • In this study, We introduce the process of web usage mining, which has lately attracted considerable attention with the fast diffusion of world wide web, and explain the web log data, which Is the main subject of web usage mining. Also, we illustrate some real examples of analysis for web log data and look into practical application of web usage mining for eCRM.

  • PDF

Introduction of Profile of Foreign Mining Company, Yamana Gold, in Argentina (아르헨티나에서 외국광산기업, 야마나 골드, 개요소개)

  • Lee, Han-Yeang
    • The Journal of the Petrological Society of Korea
    • /
    • v.18 no.4
    • /
    • pp.371-379
    • /
    • 2009
  • A famous foreign mining company in Argentina, Yamana Gold, its profile including company history, current and future mining projects, and production are introduced in this paper for the Korean mining companies those are sincerely looking for reliable collaborative partners to deliver the practical company informations.

Globalization in mining. Global, regional, local mining review. Comparative analysis with Kazakhstan mining

  • Bukayeva, Aliya
    • Asia-Pacific Journal of Business Venturing and Entrepreneurship
    • /
    • v.5 no.1
    • /
    • pp.81-91
    • /
    • 2010
  • The article contains comparative analysis of global, regional, local mining review in comparison with the Republic of Kazakhstan. At the article is considered the condition, production and consumption raw materials in the world. For Kazakhstan this branch is one of the most important, which is defining not only the level of the economic development of the country, but also its economical safety, export potential, opportunities for further development. The article represents practical interest for students, masters, doctors, and experts of the branch.

  • PDF

Receiver Operating Characteristic Analysis by Data Mining

  • Rhee Seong-Won;Lee Jea-Young
    • Proceedings of the Korean Statistical Society Conference
    • /
    • 2001.11a
    • /
    • pp.195-197
    • /
    • 2001
  • Data Mining is used to discover patterns and relationships in huge amounts of data. Researchers in many different fields have shown great interest in data mining analysis. Using the classification technique of data mining analysis, the available model for Receiver Operating Characteristic(ROC) method is presented. We present that this may help analyze result of data mining techniques.

  • PDF

Introduction of Profile of Foreign Mining Company, Barric Gold, in Argentina (아르헨티나에서 외국광산기업, 바릭골드, 개요소개)

  • Lee, Han-Yeang
    • The Journal of the Petrological Society of Korea
    • /
    • v.18 no.3
    • /
    • pp.269-278
    • /
    • 2009
  • A famous foreign mining company in Argentina, Barrick Gold, its profile including company history, current and future mining projects, and production are introduced in this paper for the Korean mining companies those are sincerely looking for reliable collaborative partners to deliver the practical company informations.

A Study on the Effect of Using Sentiment Lexicon in Opinion Classification (오피니언 분류의 감성사전 활용효과에 대한 연구)

  • Kim, Seungwoo;Kim, Namgyu
    • Journal of Intelligence and Information Systems
    • /
    • v.20 no.1
    • /
    • pp.133-148
    • /
    • 2014
  • Recently, with the advent of various information channels, the number of has continued to grow. The main cause of this phenomenon can be found in the significant increase of unstructured data, as the use of smart devices enables users to create data in the form of text, audio, images, and video. In various types of unstructured data, the user's opinion and a variety of information is clearly expressed in text data such as news, reports, papers, and various articles. Thus, active attempts have been made to create new value by analyzing these texts. The representative techniques used in text analysis are text mining and opinion mining. These share certain important characteristics; for example, they not only use text documents as input data, but also use many natural language processing techniques such as filtering and parsing. Therefore, opinion mining is usually recognized as a sub-concept of text mining, or, in many cases, the two terms are used interchangeably in the literature. Suppose that the purpose of a certain classification analysis is to predict a positive or negative opinion contained in some documents. If we focus on the classification process, the analysis can be regarded as a traditional text mining case. However, if we observe that the target of the analysis is a positive or negative opinion, the analysis can be regarded as a typical example of opinion mining. In other words, two methods (i.e., text mining and opinion mining) are available for opinion classification. Thus, in order to distinguish between the two, a precise definition of each method is needed. In this paper, we found that it is very difficult to distinguish between the two methods clearly with respect to the purpose of analysis and the type of results. We conclude that the most definitive criterion to distinguish text mining from opinion mining is whether an analysis utilizes any kind of sentiment lexicon. We first established two prediction models, one based on opinion mining and the other on text mining. Next, we compared the main processes used by the two prediction models. Finally, we compared their prediction accuracy. We then analyzed 2,000 movie reviews. The results revealed that the prediction model based on opinion mining showed higher average prediction accuracy compared to the text mining model. Moreover, in the lift chart generated by the opinion mining based model, the prediction accuracy for the documents with strong certainty was higher than that for the documents with weak certainty. Most of all, opinion mining has a meaningful advantage in that it can reduce learning time dramatically, because a sentiment lexicon generated once can be reused in a similar application domain. Additionally, the classification results can be clearly explained by using a sentiment lexicon. This study has two limitations. First, the results of the experiments cannot be generalized, mainly because the experiment is limited to a small number of movie reviews. Additionally, various parameters in the parsing and filtering steps of the text mining may have affected the accuracy of the prediction models. However, this research contributes a performance and comparison of text mining analysis and opinion mining analysis for opinion classification. In future research, a more precise evaluation of the two methods should be made through intensive experiments.