• 제목/요약/키워드: data-mining method

Search Result 1,369, Processing Time 0.03 seconds

Encoding of XML Elements for Mining Association Rules

  • Hu Gongzhu;Liu Yan;Huang Qiong
    • The Journal of Information Systems
    • /
    • v.14 no.3
    • /
    • pp.37-47
    • /
    • 2005
  • Mining of association rules is to find associations among data items that appear together in some transactions or business activities. As of today, algorithms for association rule mining, as well as for other data mining tasks, are mostly applied to relational databases. As XML being adopted as the universal format for data storage and exchange, mining associations from XML data becomes an area of attention for researchers and developers. The challenge is that the semi-structured data format in XML is not directly suitable for traditional data mining algorithms and tools. In this paper we present an encoding method to encode XML tree-nodes. This method is used to store the XML data in Value Table and Transaction Table that can be easily accessed via indexing. The hierarchical relationship in the original XML tree structure is embedded in the encoding. We applied this method to association rules mining of XML data that may have missing data.

  • PDF

Feature Selection Methodology in Quality Data Mining

  • Soo, Nam-Ho;Halim, Yulius
    • Proceedings of the Korean Operations and Management Science Society Conference
    • /
    • 2004.05a
    • /
    • pp.698-701
    • /
    • 2004
  • In many literatures, data mining has been used as a utilization of data warehouse and data collection. The biggest utilizations of data mining are for marketing and researches. This is solely because of the data available for this field is usually in large amount. The usability of the data mining is expandable also to the production process. While the object of research of the data mining in marketing is the customers and products, data mining in the production field is object to the so called 4MlE, man, machine, materials, method (recipe) and environment. All of the elements are important to the production process which determines the quality of the product. Because the final aim of the data mining in production field is the quality of the production, this data mining is commonly recognized as quality data mining. As the variables researched in quality data mining can be hundreds or more, it could take a long time to reveal the information from the data warehouse. Feature selection methodology is proposed to help the research take the best performance in a relatively short time. The usage of available simple statistical tools in this method can help the speed of the mining.

  • PDF

TEMPORAL CLASSIFICATION METHOD FOR FORECASTING LOAD PATTERNS FROM AMR DATA

  • Lee, Heon-Gyu;Shin, Jin-Ho;Ryu, Keun-Ho
    • Proceedings of the KSRS Conference
    • /
    • 2007.10a
    • /
    • pp.594-597
    • /
    • 2007
  • We present in this paper a novel mid and long term power load prediction method using temporal pattern mining from AMR (Automatic Meter Reading) data. Since the power load patterns have time-varying characteristic and very different patterns according to the hour, time, day and week and so on, it gives rise to the uninformative results if only traditional data mining is used. Also, research on data mining for analyzing electric load patterns focused on cluster analysis and classification methods. However despite the usefulness of rules that include temporal dimension and the fact that the AMR data has temporal attribute, the above methods were limited in static pattern extraction and did not consider temporal attributes. Therefore, we propose a new classification method for predicting power load patterns. The main tasks include clustering method and temporal classification method. Cluster analysis is used to create load pattern classes and the representative load profiles for each class. Next, the classification method uses representative load profiles to build a classifier able to assign different load patterns to the existing classes. The proposed classification method is the Calendar-based temporal mining and it discovers electric load patterns in multiple time granularities. Lastly, we show that the proposed method used AMR data and discovered more interest patterns.

  • PDF

A Methodology for Searching Frequent Pattern Using Graph-Mining Technique (그래프마이닝을 활용한 빈발 패턴 탐색에 관한 연구)

  • Hong, June Seok
    • Journal of Information Technology Applications and Management
    • /
    • v.26 no.1
    • /
    • pp.65-75
    • /
    • 2019
  • As the use of semantic web based on XML increases in the field of data management, a lot of studies to extract useful information from the data stored in ontology have been tried based on association rule mining. Ontology data is advantageous in that data can be freely expressed because it has a flexible and scalable structure unlike a conventional database having a predefined structure. On the contrary, it is difficult to find frequent patterns in a uniformized analysis method. The goal of this study is to provide a basis for extracting useful knowledge from ontology by searching for frequently occurring subgraph patterns by applying transaction-based graph mining techniques to ontology schema graph data and instance graph data constituting ontology. In order to overcome the structural limitations of the existing ontology mining, the frequent pattern search methodology in this study uses the methodology used in graph mining to apply the frequent pattern in the graph data structure to the ontology by applying iterative node chunking method. Our suggested methodology will play an important role in knowledge extraction.

Mining Spatio-Temporal Patterns in Trajectory Data

  • Kang, Ju-Young;Yong, Hwan-Seung
    • Journal of Information Processing Systems
    • /
    • v.6 no.4
    • /
    • pp.521-536
    • /
    • 2010
  • Spatio-temporal patterns extracted from historical trajectories of moving objects reveal important knowledge about movement behavior for high quality LBS services. Existing approaches transform trajectories into sequences of location symbols and derive frequent subsequences by applying conventional sequential pattern mining algorithms. However, spatio-temporal correlations may be lost due to the inappropriate approximations of spatial and temporal properties. In this paper, we address the problem of mining spatio-temporal patterns from trajectory data. The inefficient description of temporal information decreases the mining efficiency and the interpretability of the patterns. We provide a formal statement of efficient representation of spatio-temporal movements and propose a new approach to discover spatio-temporal patterns in trajectory data. The proposed method first finds meaningful spatio-temporal regions and extracts frequent spatio-temporal patterns based on a prefix-projection approach from the sequences of these regions. We experimentally analyze that the proposed method improves mining performance and derives more intuitive patterns.

Dummy Data Insert Scheme for Privacy Preserving Frequent Itemset Mining in Data Stream (데이터 스트림 빈발항목 마이닝의 프라이버시 보호를 위한 더미 데이터 삽입 기법)

  • Jung, Jay Yeol;Kim, Kee Sung;Jeong, Ik Rae
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.23 no.3
    • /
    • pp.383-393
    • /
    • 2013
  • Data stream mining is a technique to obtain the useful information by analyzing the data generated in real time. In data stream mining technology, frequent itemset mining is a method to find the frequent itemset while data is transmitting, and these itemsets are used for the purpose of pattern analyze and marketing in various fields. Existing techniques of finding frequent itemset mining are having problems when a malicious attacker sniffing the data, it reveals data provider's real-time information. These problems can be solved by using a method of inserting dummy data. By using this method, a attacker cannot distinguish the original data from the transmitting data. In this paper, we propose a method for privacy preserving frequent itemset mining by using the technique of inserting dummy data. In addition, the proposed method is effective in terms of calculation because it does not require encryption technology or other mathematical operations.

Data Mining for Uncertain Data Based on Difference Degree of Concept Lattice

  • Qian Wang;Shi Dong;Hamad Naeem
    • Journal of Information Processing Systems
    • /
    • v.20 no.3
    • /
    • pp.317-327
    • /
    • 2024
  • Along with the rapid development of the database technology, as well as the widespread application of the database management systems are more and more large. Now the data mining technology has already been applied in scientific research, financial investment, market marketing, insurance and medical health and so on, and obtains widespread application. We discuss data mining technology and analyze the questions of it. Therefore, the research in a new data mining method has important significance. Some literatures did not consider the differences between attributes, leading to redundancy when constructing concept lattices. The paper proposes a new method of uncertain data mining based on the concept lattice of connotation difference degree (c_diff). The method defines the two rules. The construction of a concept lattice can be accelerated by excluding attributes with poor discriminative power from the process. There is also a new technique of calculating c_diff, which does not scan the full database on each layer, therefore reducing the number of database scans. The experimental outcomes present that the proposed method can save considerable time and improve the accuracy of the data mining compared with U-Apriori algorithm.

Receiver Operating Characteristic Analysis by Data Mining

  • Rhee Seong-Won;Lee Jea-Young
    • Proceedings of the Korean Statistical Society Conference
    • /
    • 2001.11a
    • /
    • pp.195-197
    • /
    • 2001
  • Data Mining is used to discover patterns and relationships in huge amounts of data. Researchers in many different fields have shown great interest in data mining analysis. Using the classification technique of data mining analysis, the available model for Receiver Operating Characteristic(ROC) method is presented. We present that this may help analyze result of data mining techniques.

  • PDF

Data Mining Application in Inbound Call Center

  • Lee, Hyun-Woo
    • Journal of the Korean Data and Information Science Society
    • /
    • v.17 no.2
    • /
    • pp.335-344
    • /
    • 2006
  • The purpose of this paper is to apply data mining method for the inbound call center optimization. Data mining analysis is come to be used in order to predict the degree of difficulty on the consultation. It is the method of maximal efficiency for the call center that uses of the predicted degree of difficulty and customer grade as routing which hits to the skill of the consultation unit. This method is to get the possibility of efficiency for the call center with the maximum efficiency.

  • PDF

IMPLEMENTATION OF SUBSEQUENCE MAPPING METHOD FOR SEQUENTIAL PATTERN MINING

  • Trang, Nguyen Thu;Lee, Bum-Ju;Lee, Heon-Gyu;Ryu, Keun-Ho
    • Proceedings of the KSRS Conference
    • /
    • v.2
    • /
    • pp.627-630
    • /
    • 2006
  • Sequential Pattern Mining is the mining approach which addresses the problem of discovering the existent maximal frequent sequences in a given databases. In the daily and scientific life, sequential data are available and used everywhere based on their representative forms as text, weather data, satellite data streams, business transactions, telecommunications records, experimental runs, DNA sequences, histories of medical records, etc. Discovering sequential patterns can assist user or scientist on predicting coming activities, interpreting recurring phenomena or extracting similarities. For the sake of that purpose, the core of sequential pattern mining is finding the frequent sequence which is contained frequently in all data sequences. Beside the discovery of frequent itemsets, sequential pattern mining requires the arrangement of those itemsets in sequences and the discovery of which of those are frequent. So before mining sequences, the main task is checking if one sequence is a subsequence of another sequence in the database. In this paper, we implement the subsequence matching method as the preprocessing step for sequential pattern mining. Matched sequences in our implementation are the normalized sequences as the form of number chain. The result which is given by this method is the review of matching information between input mapped sequences.

  • PDF