• Title/Summary/Keyword: mining

Search Result 6,725, Processing Time 0.035 seconds

A Study on the Effect of Using Sentiment Lexicon in Opinion Classification (오피니언 분류의 감성사전 활용효과에 대한 연구)

  • Kim, Seungwoo;Kim, Namgyu
    • Journal of Intelligence and Information Systems
    • /
    • v.20 no.1
    • /
    • pp.133-148
    • /
    • 2014
  • Recently, with the advent of various information channels, the number of has continued to grow. The main cause of this phenomenon can be found in the significant increase of unstructured data, as the use of smart devices enables users to create data in the form of text, audio, images, and video. In various types of unstructured data, the user's opinion and a variety of information is clearly expressed in text data such as news, reports, papers, and various articles. Thus, active attempts have been made to create new value by analyzing these texts. The representative techniques used in text analysis are text mining and opinion mining. These share certain important characteristics; for example, they not only use text documents as input data, but also use many natural language processing techniques such as filtering and parsing. Therefore, opinion mining is usually recognized as a sub-concept of text mining, or, in many cases, the two terms are used interchangeably in the literature. Suppose that the purpose of a certain classification analysis is to predict a positive or negative opinion contained in some documents. If we focus on the classification process, the analysis can be regarded as a traditional text mining case. However, if we observe that the target of the analysis is a positive or negative opinion, the analysis can be regarded as a typical example of opinion mining. In other words, two methods (i.e., text mining and opinion mining) are available for opinion classification. Thus, in order to distinguish between the two, a precise definition of each method is needed. In this paper, we found that it is very difficult to distinguish between the two methods clearly with respect to the purpose of analysis and the type of results. We conclude that the most definitive criterion to distinguish text mining from opinion mining is whether an analysis utilizes any kind of sentiment lexicon. We first established two prediction models, one based on opinion mining and the other on text mining. Next, we compared the main processes used by the two prediction models. Finally, we compared their prediction accuracy. We then analyzed 2,000 movie reviews. The results revealed that the prediction model based on opinion mining showed higher average prediction accuracy compared to the text mining model. Moreover, in the lift chart generated by the opinion mining based model, the prediction accuracy for the documents with strong certainty was higher than that for the documents with weak certainty. Most of all, opinion mining has a meaningful advantage in that it can reduce learning time dramatically, because a sentiment lexicon generated once can be reused in a similar application domain. Additionally, the classification results can be clearly explained by using a sentiment lexicon. This study has two limitations. First, the results of the experiments cannot be generalized, mainly because the experiment is limited to a small number of movie reviews. Additionally, various parameters in the parsing and filtering steps of the text mining may have affected the accuracy of the prediction models. However, this research contributes a performance and comparison of text mining analysis and opinion mining analysis for opinion classification. In future research, a more precise evaluation of the two methods should be made through intensive experiments.

Design and Implementation of a Spatial Data Mining System (공간 데이터 마이닝 시스템의 설계 및 구현)

  • Bae, DUck-Ho;Baek, Ji-Haeng;Oh, Hyun-Kyo;Song, Ju-Won;Kim, Sang-Wook;Choi, Myoung-Hoi;Jo, Hyeon-Ju
    • Journal of Korea Spatial Information System Society
    • /
    • v.11 no.2
    • /
    • pp.119-132
    • /
    • 2009
  • Owing to the GIS technology, a vast volume of spatial data has been accumulated, thereby incurring the necessity of spatial data mining techniques. In this paper, we propose a new spatial data mining system named SD-Miner. SD-Miner consists of three parts: a graphical user interface for inputs and outputs, a data mining module that processes spatial mining functionalities, a data storage model that stores and manages spatial as well as non-spatial data by using a DBMS. In particular, the data mining module provides major data mining functionalities such as spatial clustering, spatial classification, spatial characterization, and spatio-temporal association rule mining. SD-Miner has own characteristics: (1) It supports users to perform non-spatial data mining functionalities as well as spatial data mining functionalities intuitively and effectively; (2) It provides users with spatial data mining functions as a form of libraries, thereby making applications conveniently use those functions. (3) It inputs parameters for mining as a form of database tables to increase flexibility. In order to verify the practicality of our SD-Miner developed, we present meaningful results obtained by performing spatial data mining with real-world spatial data.

  • PDF

In situ investigations into mining-induced overburden failures in close multiple-seam longwall mining: A case study

  • Ning, Jianguo;Wang, Jun;Tan, Yunliang;Zhang, Lisheng;Bu, Tengteng
    • Geomechanics and Engineering
    • /
    • v.12 no.4
    • /
    • pp.657-673
    • /
    • 2017
  • Preventing water seepage and inrush into mines where close multiple-seam longwall mining is practiced is a challenging issue in the coal-rich Ordos region, China. To better protect surface (or ground) water and safely extract coal from seams beneath an aquifer, it is necessary to determine the height of the mining-induced fractured zone in the overburden strata. In situ investigations were carried out in panels 20107 (seam No. $2-2^{upper}$) and 20307 (seam No. $2-2^{middle}$) in the Gaojialiang colliery, Shendong Coalfield, China. Longwall mining-induced strata movement and overburden failure were monitored in boreholes using digital panoramic imaging and a deep hole multi-position extensometer. Our results indicate that after mining of the 20107 working face, the overburden of the failure zone can be divided into seven rock groups. The first group lies above the immediate roof (12.9 m above the top of the coal seam), and falls into the gob after the mining. The strata of the second group to the fifth group form the fractured zone (12.9-102.04 m above the coal seam) and the continuous deformation zone extends from the fifth group to the ground surface. After mining Panel 20307, a gap forms between the fifth rock group and the continuous deformation zone, widening rapidly. Then, the lower portion of the continuous deformation zone cracks and collapses into the fractured zone, extending the height of the failure zone to 87.1 m. Based on field data, a statistical formula for predicting the maximum height of overburden failure induced by close multiple seam mining is presented.

Temporal Data Mining Framework (시간 데이타마이닝 프레임워크)

  • Lee, Jun-Uk;Lee, Yong-Jun;Ryu, Geun-Ho
    • The KIPS Transactions:PartD
    • /
    • v.9D no.3
    • /
    • pp.365-380
    • /
    • 2002
  • Temporal data mining, the incorporation of temporal semantics to existing data mining techniques, refers to a set of techniques for discovering implicit and useful temporal knowledge from large quantities of temporal data. Temporal knowledge, expressible in the form of rules, is knowledge with temporal semantics and relationships, such as cyclic pattern, calendric pattern, trends, etc. There are many examples of temporal data, including patient histories, purchaser histories, and web log that it can discover useful temporal knowledge from. Many studies on data mining have been pursued and some of them have involved issues of temporal data mining for discovering temporal knowledge from temporal data, such as sequential pattern, similar time sequence, cyclic and temporal association rules, etc. However, all of the works treated data in database at best as data series in chronological order and did not consider temporal semantics and temporal relationships containing data. In order to solve this problem, we propose a theoretical framework for temporal data mining. This paper surveys the work to date and explores the issues involved in temporal data mining. We then define a model for temporal data mining and suggest SQL-like mining language with ability to express the task of temporal mining and show architecture of temporal mining system.

Analysis of Economic Development Based on Environment Resources in the Mining Sector

  • NAZIR, Munawir;MURDIFIN, Imaduddin;PUTRA, Aditya Halim Perdana Kusuma;HAMZAH, Nasir;MURFAT, Moch Zulkifli
    • The Journal of Asian Finance, Economics and Business
    • /
    • v.7 no.6
    • /
    • pp.133-143
    • /
    • 2020
  • The purpose of this study is to investigate the economic potential of the regions from the mining sector of North Morowali, Central-Sulawesi, Indonesia, and the formulation of pro-business regional development management that aims to create synergy between the local government and mining sector entrepreneurs. This study uses a descriptive qualitative approach by taking data in the form of primary data from FGD and secondary data observations from statistical bureau data in the North Morowali, Indonesia. The analysis unit uses SWOT analysis to determine the economic potential of the North Morowali and Location Quotient (LQ) to analyze the economic potential of the mining sector. The research period covers one year (2018-2019) in North Morowali, Indonesia. All the mining products have considerable potential as a financing unit in North Morowali, while mining potential has not been maximally exploited. The absence of regulations, facilities such as road access, and optimal land and sea transportation are the causes of the difficulty of optimization and access to explore mining products comprehensively. As a new province at Central Sulawesi, more efforts and the role of government are needed to focus attention to North Morowali as an area with great potential in the mining sector.

Sequential Pattern Mining with Optimization Calling MapReduce Function on MapReduce Framework (맵리듀스 프레임웍 상에서 맵리듀스 함수 호출을 최적화하는 순차 패턴 마이닝 기법)

  • Kim, Jin-Hyun;Shim, Kyu-Seok
    • The KIPS Transactions:PartD
    • /
    • v.18D no.2
    • /
    • pp.81-88
    • /
    • 2011
  • Sequential pattern mining that determines frequent patterns appearing in a given set of sequences is an important data mining problem with broad applications. For example, sequential pattern mining can find the web access patterns, customer's purchase patterns and DNA sequences related with specific disease. In this paper, we develop the sequential pattern mining algorithms using MapReduce framework. Our algorithms distribute input data to several machines and find frequent sequential patterns in parallel. With synthetic data sets, we did a comprehensive performance study with varying various parameters. Our experimental results show that linear speed up can be achieved through our algorithms with increasing the number of used machines.

IMPLEMENTATION OF SUBSEQUENCE MAPPING METHOD FOR SEQUENTIAL PATTERN MINING

  • Trang, Nguyen Thu;Lee, Bum-Ju;Lee, Heon-Gyu;Ryu, Keun-Ho
    • Proceedings of the KSRS Conference
    • /
    • v.2
    • /
    • pp.627-630
    • /
    • 2006
  • Sequential Pattern Mining is the mining approach which addresses the problem of discovering the existent maximal frequent sequences in a given databases. In the daily and scientific life, sequential data are available and used everywhere based on their representative forms as text, weather data, satellite data streams, business transactions, telecommunications records, experimental runs, DNA sequences, histories of medical records, etc. Discovering sequential patterns can assist user or scientist on predicting coming activities, interpreting recurring phenomena or extracting similarities. For the sake of that purpose, the core of sequential pattern mining is finding the frequent sequence which is contained frequently in all data sequences. Beside the discovery of frequent itemsets, sequential pattern mining requires the arrangement of those itemsets in sequences and the discovery of which of those are frequent. So before mining sequences, the main task is checking if one sequence is a subsequence of another sequence in the database. In this paper, we implement the subsequence matching method as the preprocessing step for sequential pattern mining. Matched sequences in our implementation are the normalized sequences as the form of number chain. The result which is given by this method is the review of matching information between input mapped sequences.

  • PDF

Design and Implementation of Opinion Mining System based on Association Model (연관성 모델에 기반한 오피년마이닝 시스템의 설계 및 구현)

  • Kim, Keun-Hyung
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.15 no.1
    • /
    • pp.133-140
    • /
    • 2011
  • For both customers and companies, it is very important to analyze online customer reviews, which consist of small documents that include opinions or experiences about products or services, because the customers can get good informations and the companies can establish good marketing strategies. In this paper, we propose the association model for the opinion mining which can analyze customer opinions posted on web. The association model is to modify the association rules mining model in data mining in order to apply efficiently and effectively the association mining techniques to the opinion mining. We designed and implemented the opinion mining systems based on the modified association model and the grouping idea which would enable it to generate significant rules more.