• Title/Summary/Keyword: data preprocessing

Search Result 938, Processing Time 0.028 seconds

Influence of Data Preprocessing

  • Zhu, Changming;Gao, Daqi
    • Journal of Computing Science and Engineering
    • /
    • v.10 no.2
    • /
    • pp.51-57
    • /
    • 2016
  • In this paper, we research the influence of data preprocessing. We conclude that using different preprocessing methods leads to different classification performances. Moreover, not all data preprocessing methods are necessary, and a criterion is given to make sure which data preprocessing is necessary and which one is effective. Experiments on some real-world data sets validate that different data preprocessing methods result in different effects. Furthermore, experiments about some algorithms with different preprocessing methods also confirm that preprocessing has a great influence on the performance of a classifier.

An Effective Smart Greenhouse Data Preprocessing System for Autonomous Machine Learning (자율 기계 학습을 위한 효과적인 스마트 온실 데이터 전처리 시스템)

  • Jongtae Lim;RETITI DIOP EMANE Christopher;Yuna Kim;Jeonghyun Baek;Jaesoo Yoo
    • Smart Media Journal
    • /
    • v.12 no.1
    • /
    • pp.47-53
    • /
    • 2023
  • Recently, research on a smart farm that creates new values by combining information and communication technology(ICT) with agriculture has been actively done. In order for domestic smart farm technology to have productivity at the same level of advanced agricultural countries, automated decision-making using machine learning is necessary. However, current smart greenhouse data collection technologies in our country are not enough to perform big data analysis or machine learning. In this paper, we design and implement a smart greenhouse data preprocessing system for autonomous machine learning. The proposed system applies target data to various preprocessing techniques. And the proposed system evaluate the performance of each preprocessing technique and store optimal preprocessing technique for each data. Stored optimal preprocessing techniques are used to perform preprocessing on newly collected data

Improvement of A Preprocessing of Archived Traffic Data Collected by Expressway Vehicle Detection System (고속도로 차량검지기 이력자료 활용을 위한 전처리과정 개선)

  • Lee, Hwan-Pil;NamKoong, Seong;Kim, Soo-Hee;Kim, Jin
    • The Journal of The Korea Institute of Intelligent Transport Systems
    • /
    • v.12 no.1
    • /
    • pp.15-27
    • /
    • 2013
  • While the vehicle detector is collected from a variety of information was mainly used as a real-time data. Recently scheme of application for archived traffic data has become increasingly important. In this background, this research were conducted on the improvement of the preprocessing for archived traffic data application. The purpose of improving specific preprocessing was reflect transportation phenomena by traffic data. As evaluation result, improvement preprocessing was close to the actual value than exist preprocessing.

Electric Load Forecasting using Data Preprocessing and Fuzzy Logic System (데이터 전처리와 퍼지 논리 시스템을 이용한 전력 부하 예측)

  • Bang, Young-Keun;Lee, Chul-Heui
    • The Transactions of The Korean Institute of Electrical Engineers
    • /
    • v.66 no.12
    • /
    • pp.1751-1758
    • /
    • 2017
  • This paper presents a fuzzy logic system with data preprocessing to make the accurate electric power load prediction system. The fuzzy logic system acceptably treats the hidden characteristic of the nonlinear data. The data preprocessing processes the original data to provide more information of its characteristics. Thus the combination of two methods can predict the given data more accurately. The former uses TSK fuzzy logic system to apply the linguistic rule base and the linear regression model while the latter uses the linear interpolation method. Finally, four regional electric power load data in taiwan are used to evaluate the performance of the proposed prediction system.

STATISTICALLY PREPROCESSED DATA BASED PARAMETRIC COST MODEL FOR BUILDING PROJECTS

  • Sae-Hyun Ji;Moonseo Park;Hyun-Soo Lee
    • International conference on construction engineering and project management
    • /
    • 2009.05a
    • /
    • pp.417-424
    • /
    • 2009
  • For a construction project to progress smoothly, effective cost estimation is vital, particularly in the conceptual and schematic design stages. In these early phases, despite the fact that initial estimates are highly sensitive to changes in project scope, owners require accurate forecasts which reflect their supplying information. Thus, cost estimators need effective estimation strategies. Practically, parametric cost estimates are the most commonly used method in these initial phases, which utilizes historical cost data (Karshenas 1984, Kirkham 2007). Hence, compilation of historical data regarding appropriate cost variance governing parameters is a prime requirement. However, precedent practice of data mining (data preprocessing) for denoising internal errors or abnormal values is needed before compilation. As an effort to deal with this issue, this research proposed a statistical methodology for data preprocessing and verified that data preprocessing has a positive impact on the enhancement of estimate accuracy and stability. Moreover, Statistically Preprocessed data Based Parametric (SPBP) cost models are developed based on multiple regression equations and verified their effectiveness compared with conventional cost models.

  • PDF

A Study on the Data Mining Preprocessing Tool For Efficient Database Marketing (효율적인 데이터베이스 마케팅을 위한 데이터마이닝 전처리도구에 관한 연구)

  • Lee, Jun-Seok
    • Journal of Digital Convergence
    • /
    • v.12 no.11
    • /
    • pp.257-264
    • /
    • 2014
  • This paper is to construction of the data mining preprocessing tool for efficient database marketing. We compare and evaluate the often used data mining tools based on the access method to local and remote databases, and on the exchange of information resources between different computers. The evaluated preprocessing of data mining tools are Answer Tree, Climentine, Enterprise Miner, Kensington, and Weka. We propose a design principle for an efficient system for data preprocessing for data mining on the distributed networks. This system is based on Java technology including EJB(Enterprise Java Beans) and XML(eXtensible Markup Language).

Prediction of the price for stock index futures using integrated artificial intelligence techniques with categorical preprocessing

  • Kim, Kyoung-jae;Han, Ingoo
    • Proceedings of the Korean Operations and Management Science Society Conference
    • /
    • 1997.10a
    • /
    • pp.105-108
    • /
    • 1997
  • Previous studies in stock market predictions using artificial intelligence techniques such as artificial neural networks and case-based reasoning, have focused mainly on spot market prediction. Korea launched trading in index futures market (KOSPI 200) on May 3, 1996, then more people became attracted to this market. Thus, this research intends to predict the daily up/down fluctuant direction of the price for KOSPI 200 index futures to meet this recent surge of interest. The forecasting methodologies employed in this research are the integration of genetic algorithm and artificial neural network (GAANN) and the integration of genetic algorithm and case-based reasoning (GACBR). Genetic algorithm was mainly used to select relevant input variables. This study adopts the categorical data preprocessing based on expert's knowledge as well as traditional data preprocessing. The experimental results of each forecasting method with each data preprocessing method are compared and statistically tested. Artificial neural network and case-based reasoning methods with best performance are integrated. Out-of-the Model Integration and In-Model Integration are presented as the integration methodology. The research outcomes are as follows; First, genetic algorithms are useful and effective method to select input variables for Al techniques. Second, the results of the experiment with categorical data preprocessing significantly outperform that with traditional data preprocessing in forecasting up/down fluctuant direction of index futures price. Third, the integration of genetic algorithm and case-based reasoning (GACBR) outperforms the integration of genetic algorithm and artificial neural network (GAANN). Forth, the integration of genetic algorithm, case-based reasoning and artificial neural network (GAANN-GACBR, GACBRNN and GANNCBR) provide worse results than GACBR.

  • PDF

Big Data Preprocessing for Predicting Box Office Success (영화 흥행 실적 예측을 위한 빅데이터 전처리)

  • Jun, Hee-Gook;Hyun, Geun-Soo;Lim, Kyung-Bin;Lee, Woo-Hyun;Kim, Hyoung-Joo
    • KIISE Transactions on Computing Practices
    • /
    • v.20 no.12
    • /
    • pp.615-622
    • /
    • 2014
  • The Korean film market has rapidly achieved an international scale, and this has led to a need for decision-making based on analytical methods that are more precise and appropriate. In this modern era, a highly advanced information environment can provide an overwhelming amount of data that is generated in real time, and this data must be properly handled and analyzed in order to extract useful information. In particular, the preprocessing of large data, which is the most time-consuming step, should be done in a reasonable amount of time. In this paper, we investigated a big data preprocessing method for predicting movie box office success. We analyzed the movie data characteristics for specialized preprocessing methods, and used the Hadoop MapReduce framework. The experimental results showed that the preprocessing methods using big data techniques are more effective than existing methods.

A Nonparametric Approach for Noisy Point Data Preprocessing

  • Xi, Yongjian;Duan, Ye;Zhao, Hongkai
    • International Journal of CAD/CAM
    • /
    • v.9 no.1
    • /
    • pp.31-36
    • /
    • 2010
  • 3D point data acquired from laser scan or stereo vision can be quite noisy. A preprocessing step is often needed before a surface reconstruction algorithm can be applied. In this paper, we propose a nonparametric approach for noisy point data preprocessing. In particular, we proposed an anisotropic kernel based nonparametric density estimation method for outlier removal, and a hill-climbing line search approach for projecting data points onto the real surface boundary. Our approach is simple, robust and efficient. We demonstrate our method on both real and synthetic point datasets.

PREPROCESSING OF THE GPS RAW DATA FOR THE PRECISION ORBIT DETERMINATION BY DGPS TECHNIQUE (DGPS 방식에 의한 위성의 정밀궤도 결정을 위한 GPS 원시 자료 전처리)

  • 문보연;이정숙;이병선;김재훈;박은서;윤재철;노경민;최규홍
    • Journal of Astronomy and Space Sciences
    • /
    • v.19 no.2
    • /
    • pp.163-172
    • /
    • 2002
  • This article investigates the problem of data preprocessing for the precision orbit determination (POD) of low earth orbit satellite using GPS .aw data. Several data preprocessing algorithms have been developed to edit the GPS data automatically such that outlier deletion, cycle slip identification and correction, and time tag error correction. The GPS data are precisely edited for the accuracy of POD. Some methods of data preprocessing are restricted to the rate of the collections of the pseudorange and carrier phase measurements. This study considers the preprocessing efficiency varied with the rate, the quality of receiver and the altitude of the satellite's orbit. We also propose the proper methods in accordance with the rate for single frequency and dual frequency receivers.