• 제목/요약/키워드: Data mining analysis

검색결과 2,174건 처리시간 0.039초

FEROM: Feature Extraction and Refinement for Opinion Mining

  • Jeong, Ha-Na;Shin, Dong-Wook;Choi, Joong-Min
    • ETRI Journal
    • /
    • 제33권5호
    • /
    • pp.720-730
    • /
    • 2011
  • Opinion mining involves the analysis of customer opinions using product reviews and provides meaningful information including the polarity of the opinions. In opinion mining, feature extraction is important since the customers do not normally express their product opinions holistically but separately according to its individual features. However, previous research on feature-based opinion mining has not had good results due to drawbacks, such as selecting a feature considering only syntactical grammar information or treating features with similar meanings as different. To solve these problems, this paper proposes an enhanced feature extraction and refinement method called FEROM that effectively extracts correct features from review data by exploiting both grammatical properties and semantic characteristics of feature words and refines the features by recognizing and merging similar ones. A series of experiments performed on actual online review data demonstrated that FEROM is highly effective at extracting and refining features for analyzing customer review data and eventually contributes to accurate and functional opinion mining.

Performance Analysis of Perturbation-based Privacy Preserving Techniques: An Experimental Perspective

  • Ritu Ratra;Preeti Gulia;Nasib Singh Gill
    • International Journal of Computer Science & Network Security
    • /
    • 제23권10호
    • /
    • pp.81-88
    • /
    • 2023
  • In the present scenario, enormous amounts of data are produced every second. These data also contain private information from sources including media platforms, the banking sector, finance, healthcare, and criminal histories. Data mining is a method for looking through and analyzing massive volumes of data to find usable information. Preserving personal data during data mining has become difficult, thus privacy-preserving data mining (PPDM) is used to do so. Data perturbation is one of the several tactics used by the PPDM data privacy protection mechanism. In Perturbation, datasets are perturbed in order to preserve personal information. Both data accuracy and data privacy are addressed by it. This paper will explore and compare several perturbation strategies that may be used to protect data privacy. For this experiment, two perturbation techniques based on random projection and principal component analysis were used. These techniques include Improved Random Projection Perturbation (IRPP) and Enhanced Principal Component Analysis based Technique (EPCAT). The Naive Bayes classification algorithm is used for data mining approaches. These methods are employed to assess the precision, run time, and accuracy of the experimental results. The best perturbation method in the Nave-Bayes classification is determined to be a random projection-based technique (IRPP) for both the cardiovascular and hypothyroid datasets.

Analysis of Simultaneous Activities on the Time Use Survey Using Data Mining

  • 남기성;김희재
    • 한국데이터정보과학회:학술대회논문집
    • /
    • 한국데이터정보과학회 2003년도 춘계학술대회
    • /
    • pp.159-170
    • /
    • 2003
  • This Paper analyzed simultaneous activities of the time use survey by Korea National Statistical Office to use data mining‘s association rule. The survey of National Statistical Office in 1999 considered general analysis for simultaneous activities. But if we use the association rule, we can found the ratio of particular activities at the same time. And we found the probability that another activities practise if we act one particular activity. Using this association rule of data mining we can do more developed and analytical sociological study.

  • PDF

데이터마이닝을 이용한 관측적 침하해석의 신뢰성 연구 (A Study on the Reliability of Observational Settlement Analysis Using Data Mining)

  • 우철웅;장병욱
    • 한국농공학회지
    • /
    • 제45권6호
    • /
    • pp.183-193
    • /
    • 2003
  • Most construction works on the soft ground adopt instrumentation to manage settlement and stability of the embankment. The rapid progress of the information technologies and the digital data acquisition on the soft ground instrumentation has led to the fast-growing amount of data. Although valuable information about the behaviour of the soft ground may be hiding behind the data, most of the data are used restrictedly only for the management of settlement and stability. One of the critical issues on soft ground instrumentation is the long-term settlement prediction. Some observational settlement analysis methods are used for this purpose. But the reliability of the analysis results is remained in vague. The knowledge could be discovered from a large volume of experiences on the observational settlement analysis. In this article, we present a database to store settlement records and data mining procedure. A large volume of knowledge about observational settlement prediction were collected from the database by applying the filtering algorithm and knowledge discovery algorithm. Statistical analysis revealed that the reliability of observational settlement analysis depends on stay duration and estimated degree of consolidation.

환자의 프로세스 로그 정보를 이용한 진단 분석 (Diagnosis Analysis of Patient Process Log Data)

  • 배준수
    • 산업경영시스템학회지
    • /
    • 제42권4호
    • /
    • pp.126-134
    • /
    • 2019
  • Nowadays, since there are so many big data available everywhere, those big data can be used to find useful information to improve design and operation by using various analysis methods such as data mining. Especially if we have event log data that has execution history data of an organization such as case_id, event_time, event (activity), performer, etc., then we can apply process mining to discover the main process model in the organization. Once we can find the main process from process mining, we can utilize it to improve current working environment. In this paper we developed a new method to find a final diagnosis of a patient, who needs several procedures (medical test and examination) to diagnose disease of the patient by using process mining approach. Some patients can be diagnosed by only one procedure, but there are certainly some patients who are very difficult to diagnose and need to take several procedures to find exact disease name. We used 2 million procedure log data and there are 397 thousands patients who took 2 and more procedures to find a final disease. These multi-procedure patients are not frequent case, but it is very critical to prevent wrong diagnosis. From those multi-procedure taken patients, 4 procedures were discovered to be a main process model in the hospital. Using this main process model, we can understand the sequence of procedures in the hospital and furthermore the relationship between diagnosis and corresponding procedures.

Analysis and critical estimation of top-ten mineral-raw products mining and export in the Republic of Kazakhstan since Independence in 1991. Priorities of Development. Strategic planning of the East Kazakhstan mining enterprises development

  • Bukayeva, A.D.
    • 벤처창업연구
    • /
    • 제4권2호
    • /
    • pp.21-58
    • /
    • 2009
  • The Purpose of this study is working out of the scientific-theoretical and practical recommendations directed on perfection of strategic planning of development of the enterprises of mining and gold mining branch. The methodological basis of research is based on the economic theory developed by a domestic and foreign science. At processing, generalisation and a writing of materials of the master's thesis following methods were applied: - supervision, - comparison, - the analysis and synthesis, - methods of an induction and deduction, - statistical groupings, - average and relative sizes, - the system approach. Finally, the theoretical and practical importance of this research consists that results of research will allow generating a basis of statement of effective system of strategic planning of a long-term sustainable development of the gold mining enterprises reducing risk of acceptance of inefficient strategic decisions. I would like to express many thanks to the NGO "Semey- My Home" and "EastGeoResources" LLP for their help and support in providing the data collection and data analysis stages of my research from 2006.

  • PDF

ENERGY EFFICIENT BUILDING DESIGN THROUGH DATA MINING APPROACH

  • Hyunjoo Kim;Wooyoung Kim
    • 국제학술발표논문집
    • /
    • The 3th International Conference on Construction Engineering and Project Management
    • /
    • pp.601-605
    • /
    • 2009
  • The objective of this research is to develop a knowledge discovery framework which can help project teams discover useful patterns to improve energy efficient building design. This paper utilizes the technology of data mining to automatically extract concepts, interrelationships and patterns of interest from a large dataset. By applying data mining technology to the analysis of energy efficient building designs one can identify valid, useful, and previously unknown patterns of energy simulation modeling.

  • PDF

SENSOR DATA MINING TECHNIQUES AND MIDDLEWARE STRUCTURE FOR USN ENVIRONMENT

  • Jin, Cheng-Hao;Lee, Yong-Mi;Kim, Hi-Seok;Pok, Gou-Chol;Ryu, Keun-Ho
    • 대한원격탐사학회:학술대회논문집
    • /
    • 대한원격탐사학회 2007년도 Proceedings of ISRS 2007
    • /
    • pp.353-356
    • /
    • 2007
  • With advances in sensor technology, current researches on the pertinent techniques are actively directed toward the way which enables the USN computing service. For many applications using sensor networks, the incoming data are by nature characterized as high-speed, continuous, real-time and infinite. Due to such uniqueness of sensor data characteristics, for some instances a finite-sized buffer may not accommodate the entire incoming data, which leads to inevitable loss of data, and requirement for fast processing makes it impossible to conduct a thorough investigation of data. In addition to the potential problem of loss of data, incoming data in its raw form may exhibit high degree of complexity which evades simple query or alerting services for capturing and extracting useful information. Furthermore, as traditional mining techniques are developed to handle fixed, static historical data, they are not useful and directly applicable for analyzing the sensor data. In this paper, (1) describe how three mining techniques (sensor data outlier analysis, sensor pattern analysis, and sensor data prediction analysis) are appropriate for the USN middleware structure, with their application to the stream data in ocean environment. (2) Another proposal is a middleware structure based on USN environment adaptive to above mining techniques. This middleware structure includes sensor nodes, sensor network common interface, sensor data processor, sensor query processor, database, sensor data mining engine, user interface and so on.

  • PDF

데이터 스트림에서 개방 데이터 마이닝 기반의 빈발항목 탐색 (Finding Frequent Itemsets based on Open Data Mining in Data Streams)

  • 장중혁;이원석
    • 정보처리학회논문지D
    • /
    • 제10D권3호
    • /
    • pp.447-458
    • /
    • 2003
  • 기존의 데이터 마이닝 방법들은 기본적으로 지식 발견의 대상이 되는 데이터 집합이 마이닝 작업 시작 이전에 명확히 정의되는 것으로 가정하며 이러한 가정은 고정적으로 정의된 특정 데이터 집합에 내재된 정보 추출이 데이터 마이닝의 목적이 될 때 유효하다. 또한, 기존의 데이터 마이닝 방법들은 대용량의 데이터 집합에 대한 마이닝 결과를 얻는데 있어서 상당한 처리 시간을 요구한다. 따라서, 새로운 트랜잭션 데이터가 지속적으로 추가되는 데이터 스트림에서 추가된 트랜잭션의 정보들을 포함하는 최신의 마이닝 결과를 최대한 빠른 시간 안에 얻기를 기대하는 실시간 처리 환경에서는 기존의 데이터 마이닝 방법을 적용하는 것이 거의 불가능하다. 이러한 목적에 부합하기 위해서 본 논문에서는 새로운 데이터 마이닝 개념인 개방 데이터 마이닝을 제안한다. 개방 데이터 마이닝에서는 새로운 트랜잭션이 발생함에 따라 이전에 발생한 트랜잭션들에 대한 마이닝 결과가 새롭게 갱신되며 따라서 확장된 전체 트랜잭션 집합에 대한 마이닝 결과를 빠르게 얻을 수 있다. 이러한 방법을 효과적으로 구현하기 위해서는 새롭게 출현한 항목에 대한 지연추가와 이전 데이터 집합에 출현한 항목들 중에서 중요하지 않는 항목에 대한 전지작업이 병행되어야 한다. 논문에서 제안하는 알고리즘은 알고리즘의 특성을 파악하기 위한 일련의 다양한 실험을 통해서 검증된다.

실해역 시험 데이터를 이용한 파일럿 채광로봇 엄빌리컬 케이블의 축진동 해석 (Axial Vibration Analysis of Umbilical Cable with Pilot Mining Robot using Sea Test Data)

  • 민천홍;여태경;홍섭;김형우;최종수;윤석민;김진호
    • 한국해양공학회지
    • /
    • 제29권2호
    • /
    • pp.128-134
    • /
    • 2015
  • Axial vibration analysis is very important for a deep-seabed mining system. In this study, an axial vibration analysis was carried out to estimate the natural frequencies and tensions of the umbilical cable using experimental data obtained from the first pre-pilot mining test. The axial vibrations of the umbilical cable with a pilot mining robot at the bottom end were analytically determined. The range of the added mass coefficients of the pilot mining robot is estimated by comparing the experimental and analytical data. The natural frequencies and maximum tensions are calculated using four estimated added mass coefficients.