• 제목/요약/키워드: utility mining

검색결과 52건 처리시간 0.021초

상위 K 하이 유틸리티 패턴 마이닝 기법 성능분석 (Performance Analysis of Top-K High Utility Pattern Mining Methods)

  • 양흥모;윤은일;김철홍
    • 인터넷정보학회논문지
    • /
    • 제16권6호
    • /
    • pp.89-95
    • /
    • 2015
  • 전통적인 빈발 패턴 마이닝은 데이터베이스로부터 사용자 정의 최소 임계치 이상의 빈도수를 가지는 유효 패턴들을 식별한다. 적절한 임계치 설정은 해당 도메인에 대한 사전 지식을 요구하므로 쉬운 작업이 아니다. 따라서 임계치 설정을 통한 마이닝 결과의 정밀한 제어 불가능으로 인해 도메인 지식을 기반으로 하지 않는 패턴 마이닝 방법이 필요하게 되었다. 상위 K 빈발 패턴 마이닝은 이러한 문제를 해결하기 위해 제안되었으며, 임계치 설정 없이 상위 K개의 중요 패턴들을 마이닝 한다. 사용자는 이를 적용함으로써 데이터베이스에 상관없이 가장 높은 빈도수의 패턴부터 K번째로 높은 빈도수의 패턴까지 찾아낼 수 있다. 비록 상위 K 빈발 패턴 마이닝이 임계치 설정 없이 상위 K개의 중요 패턴들을 마이닝 하지만, 트랜잭션 내 아이템 수량과 데이터베이스 내 서로 다른 아이템 중요도를 고려하지 못하여 많은 실세계 응용의 요구에 부합하지 못한다. 하이 유틸리티 패턴 마이닝은 아이템 중요도가 포함된 비 바이너리 데이터베이스의 특성을 고려하기 위해 제안되었으나 최소 임계치를 필요로 한다. 최근 임계치 설정 없는 하이 유틸리티 패턴 마이닝을 위한 상위 K 하이 유틸리티 패턴 마이닝이 개발되었으며, 이를 통해 사용자는 사전 지식 없이 원하는 수의 패턴을 마이닝 할 수 있다. 본 논문은 상위 K 하이 유틸리티 패턴 마이닝을 위한 알고리즘을 분석한다. 최신 알고리즘에 대한 성능분석을 통해 개선사항 및 발전 방향에 대해 고찰한다.

High Utility Itemset Mining over Uncertain Datasets Based on a Quantum Genetic Algorithm

  • Wang, Ju;Liu, Fuxian;Jin, Chunjie
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • 제12권8호
    • /
    • pp.3606-3629
    • /
    • 2018
  • The discovered high potential utility itemsets (HPUIs) have significant influence on a variety of areas, such as retail marketing, web click analysis, and biological gene analysis. Thus, in this paper, we propose an algorithm called HPUIM-QGA (Mining high potential utility itemsets based on a quantum genetic algorithm) to mine HPUIs over uncertain datasets based on a quantum genetic algorithm (QGA). The proposed algorithm not only can handle the problem of the non-downward closure property by developing an upper bound of the potential utility (UBPU) (which prunes the unpromising itemsets in the early stage) but can also handle the problem of combinatorial explosion by introducing a QGA, which finds optimal solutions quickly and needs to set only very few parameters. Furthermore, a pruning strategy has been designed to avoid the meaningless and redundant itemsets that are generated in the evolution process of the QGA. As proof of the HPUIM-QGA, a substantial number of experiments are performed on the runtime, memory usage, analysis of the discovered itemsets and the convergence on real-life and synthetic datasets. The results show that our proposed algorithm is reasonable and acceptable for mining meaningful HPUIs from uncertain datasets.

Deep Learning Framework with Convolutional Sequential Semantic Embedding for Mining High-Utility Itemsets and Top-N Recommendations

  • Siva S;Shilpa Chaudhari
    • Journal of information and communication convergence engineering
    • /
    • 제22권1호
    • /
    • pp.44-55
    • /
    • 2024
  • High-utility itemset mining (HUIM) is a dominant technology that enables enterprises to make real-time decisions, including supply chain management, customer segmentation, and business analytics. However, classical support value-driven Apriori solutions are confined and unable to meet real-time enterprise demands, especially for large amounts of input data. This study introduces a groundbreaking model for top-N high utility itemset mining in real-time enterprise applications. Unlike traditional Apriori-based solutions, the proposed convolutional sequential embedding metrics-driven cosine-similarity-based multilayer perception learning model leverages global and contextual features, including semantic attributes, for enhanced top-N recommendations over sequential transactions. The MATLAB-based simulations of the model on diverse datasets, demonstrated an impressive precision (0.5632), mean absolute error (MAE) (0.7610), hit rate (HR)@K (0.5720), and normalized discounted cumulative gain (NDCG)@K (0.4268). The average MAE across different datasets and latent dimensions was 0.608. Additionally, the model achieved remarkable cumulative accuracy and precision of 97.94% and 97.04% in performance, respectively, surpassing existing state-of-the-art models. This affirms the robustness and effectiveness of the proposed model in real-time enterprise scenarios.

A Hybrid K-anonymity Data Relocation Technique for Privacy Preserved Data Mining in Cloud Computing

  • S.Aldeen, Yousra Abdul Alsahib;Salleh, Mazleena
    • 인터넷정보학회논문지
    • /
    • 제17권5호
    • /
    • pp.51-58
    • /
    • 2016
  • The unprecedented power of cloud computing (CC) that enables free sharing of confidential data records for further analysis and mining has prompted various security threats. Thus, supreme cyberspace security and mitigation against adversaries attack during data mining became inevitable. So, privacy preserving data mining is emerged as a precise and efficient solution, where various algorithms are developed to anonymize the data to be mined. Despite the wide use of generalized K-anonymizing approach its protection and truthfulness potency remains limited to tiny output space with unacceptable utility loss. By combining L-diversity and (${\alpha}$,k)-anonymity, we proposed a hybrid K-anonymity data relocation algorithm to surmount such limitation. The data relocation being a tradeoff between trustfulness and utility acted as a control input parameter. The performance of each K-anonymity's iteration is measured for data relocation. Data rows are changed into small groups of indistinguishable tuples to create anonymizations of finer granularity with assured privacy standard. Experimental results demonstrated considerable utility enhancement for relatively small number of group relocations.

프라이버시 보존형 데이터 마이닝 방법 및 척도 분석 (Privacy Preserving Data Mining Methods and Metrics Analysis)

  • 홍은주;홍도원;서창호
    • 디지털융복합연구
    • /
    • 제16권10호
    • /
    • pp.445-452
    • /
    • 2018
  • 생활의 모든 것들이 데이터화 되어가고 있는 세상에서 데이터의 양은 기하급수적으로 증가하고 있다. 이러한 데이터는 수집 및 분석을 통하여 새로운 데이터로 가공되어진다. 새로운 데이터는 병원, 금융, 기업 등 여러 분야에서 다양한 용도로 사용되고 있다. 그러나 기존의 데이터에는 개인들의 민감한 정보가 포함되어 있기 때문에 수집 및 분석과정에서 개인의 프라이버시 노출 우려가 있다. 해결 방안으로 프라이버시 보존형 데이터 마이닝(PPDM)기술이 있다. PPDM은 프라이버시를 보존하면서 동시에 데이터로부터 유용한 정보를 추출하는 방법이다. 본 논문에서는 PPDM을 조사하고 데이터의 프라이버시와 유틸리티를 평가하기 위한 다양한 측정방법을 분석한다.

PubMine: An Ontology-Based Text Mining System for Deducing Relationships among Biological Entities

  • Kim, Tae-Kyung;Oh, Jeong-Su;Ko, Gun-Hwan;Cho, Wan-Sup;Hou, Bo-Kyeng;Lee, Sang-Hyuk
    • Interdisciplinary Bio Central
    • /
    • 제3권2호
    • /
    • pp.7.1-7.6
    • /
    • 2011
  • Background: Published manuscripts are the main source of biological knowledge. Since the manual examination is almost impossible due to the huge volume of literature data (approximately 19 million abstracts in PubMed), intelligent text mining systems are of great utility for knowledge discovery. However, most of current text mining tools have limited applicability because of i) providing abstract-based search rather than sentence-based search, ii) improper use or lack of ontology terms, iii) the design to be used for specific subjects, or iv) slow response time that hampers web services and real time applications. Results: We introduce an advanced text mining system called PubMine that supports intelligent knowledge discovery based on diverse bio-ontologies. PubMine improves query accuracy and flexibility with advanced search capabilities of fuzzy search, wildcard search, proximity search, range search, and the Boolean combinations. Furthermore, PubMine allows users to extract multi-dimensional relationships between genes, diseases, and chemical compounds by using OLAP (On-Line Analytical Processing) techniques. The HUGO gene symbols and the MeSH ontology for diseases, chemical compounds, and anatomy have been included in the current version of PubMine, which is freely available at http://pubmine.kobic.re.kr. Conclusions: PubMine is a unique bio-text mining system that provides flexible searches and analysis of biological entity relationships. We believe that PubMine would serve as a key bioinformatics utility due to its rapid response to enable web services for community and to the flexibility to accommodate general ontology.

신용카드 연체자 분류모형의 성능평가 척도 비교 : 예측률과 유틸리티 중심으로 (Comparison of Performance Measures for Credit-Card Delinquents Classification Models : Measured by Hit Ratio vs. by Utility)

  • 정석훈;서용무
    • Journal of Information Technology Applications and Management
    • /
    • 제15권4호
    • /
    • pp.21-36
    • /
    • 2008
  • As the great disturbance from abusing credit cards in Korea becomes stabilized, credit card companies need to interpret credit-card delinquents classification models from the viewpoint of profit. However, hit ratio which has been used as a measure of goodness of classification models just tells us how much correctly they classified rather than how much profits can be obtained as a result of using classification models. In this research, we tried to develop a new utility-based measure from the viewpoint of profit and then used this new measure to analyze two classification models(Neural Networks and Decision Tree models). We found that the hit ratio of neural model is higher than that of decision tree model, but the utility value of decision tree model is higher than that of neural model. This experiment shows the importance of utility based measure for credit-card delinquents classification models. We expect this new measure will contribute to increasing profits of credit card companies.

  • PDF

A New Approach to Web Data Mining Based on Cloud Computing

  • Zhu, Wenzheng;Lee, Changhoon
    • Journal of Computing Science and Engineering
    • /
    • 제8권4호
    • /
    • pp.181-186
    • /
    • 2014
  • Web data mining aims at discovering useful knowledge from various Web resources. There is a growing trend among companies, organizations, and individuals alike of gathering information through Web data mining to utilize that information in their best interest. In science, cloud computing is a synonym for distributed computing over a network; cloud computing relies on the sharing of resources to achieve coherence and economies of scale, similar to a utility over a network, and means the ability to run a program or application on many connected computers at the same time. In this paper, we propose a new system framework based on the Hadoop platform to realize the collection of useful information of Web resources. The system framework is based on the Map/Reduce programming model of cloud computing. We propose a new data mining algorithm to be used in this system framework. Finally, we prove the feasibility of this approach by simulation experiment.

On the underground imaging using borehole camera

  • Jeong Yun-Young;Nakagawa Hideaki;Shimada Hideki;Matsui Kikuo;Kim JaeDong
    • 한국지구물리탐사학회:학술대회논문집
    • /
    • 한국지구물리탐사학회 2003년도 Proceedings of the international symposium on the fusion technology
    • /
    • pp.52-59
    • /
    • 2003
  • It is only possible through the image analysis of borehole wall and the core recovered from borehole constructed in rock mass that the real information about geologic characteristics in rock mass is directly obtained in primary research. Monitoring apparatus with multi-functional utility has implemented and applied in-situ condition for finding the geologic condition of target area. But, this apparatus is very expensive to be applied at the risk of loss during monitoring and cause hard work for moving them to the determined position. This paper shows the underground imaging from the borehole information obtained by a borehole camera with the simple utility and low cost enough to investigate the characteristics of borehole wall. Monitoring for this has been done in open-pit mine located at the northeastern part of Fukuoka Prefecture in Japan, and finally the three dimensional imaging of geological discontinuity was discussed relative to the field condition.

  • PDF

General Set Covering for Feature Selection in Data Mining

  • Ma, Zhengyu;Ryoo, Hong Seo
    • Management Science and Financial Engineering
    • /
    • 제18권2호
    • /
    • pp.13-17
    • /
    • 2012
  • Set covering has widely been accepted as a staple tool for feature selection in data mining. We present a generalized version of this classical combinatorial optimization model to make it better suited for the purpose and propose a surrogate relaxation-based procedure for its meta-heuristic solution. Mathematically and also numerically with experiments on 25 set covering instances, we demonstrate the utility of the proposed model and the proposed solution method.