• Title/Summary/Keyword: Data Preprocessing

Search Result 939, Processing Time 0.032 seconds

Research on Data Preprocessing Techniques for Efficient Decision-Making in Food Import Procedures (식품 수입 절차에서의 효율적 의사결정을 위한 데이터 전처리 기술에 관한 연구)

  • Jae-Hyeong Park;Yong-Uk Song;Ju-Young Kang
    • The Journal of Bigdata
    • /
    • v.8 no.1
    • /
    • pp.61-71
    • /
    • 2023
  • With the development of data-driven decision-making and sophisticated big data processing technique, there is a growing demand for information on how to process data. However, recent studies with data preprocessing mentioned only as a means to achieve a result. Therefore, in this study, we aimed to write in detail about the data processing pipeline, include preprocessing data. In particular, we shares the context and domain knowledge to aid fluent understand of the research.

Analyzing Preprocessing for Correcting Lighting Effects in Hyperspectral Images (초분광영상의 조명효과 보정 전처리기법 분석)

  • Yeong-Sun Song
    • Journal of the Korean Society of Industry Convergence
    • /
    • v.26 no.5
    • /
    • pp.785-792
    • /
    • 2023
  • Because hyperspectral imaging provides detailed spectral information across a broad range of wavelengths, it can be utilized in numerous applications, including environmental monitoring, food quality inspection, medical diagnosis, material identification, art authentication, and crime scene analysis. However, hyperspectral images often contain various types of distortions due to the environmental conditions during image acquisition, which necessitates the proper removal of these distortions through a data preprocessing process. In this study, a preprocessing method was investigated to effectively correct the distortion caused by artificial light sources used in indoor hyperspectral imaging. For this purpose, a halogen-tungsten artificial light source was installed indoors, and hyperspectral images were acquired. The acquired images were then corrected for distortion using a preprocessing that does not require complex auxiliary equipment. After the corrections were made, the results were analyzed. According to the analysis, a statistical transformation technique using mean and standard deviation with reference to a reference signal was found to be the most effective in correcting distortions caused by artificial light sources.

Preprocessing of Transmitted Spectrum Data for Development of a Robust Non-destructive Sugar Prediction Model of Intact Fruits (과실의 비파괴 당도 예측 모델의 성능향상을 위한 투과스펙트럼의 전처리)

  • Noh, Sang-Ha;Ryu, Dong-Soo
    • Journal of the Korean Society for Nondestructive Testing
    • /
    • v.22 no.4
    • /
    • pp.361-368
    • /
    • 2002
  • The aim of this study was to investigate the effect of preprocessing the transmitted energy spectrum data on development of a robust model to predict the sugar content in intact apples. The spectrum data were measured from 120 Fuji apple samples conveying at the speed of 2 apples per second. Computer algorithms of preprocessing methods such as MSC, SNV, first derivative, OSC and their combinations were developed and applied to the raw spectrum data set. The results indicated that correlation coefficients between the transmitted energy values at each wavelength and sugar contents of apples were significantly improved by the preprocessing of MSC and SNV in particular as compared with those of no-preprocessing. SEPs of the prediction models showed great difference depending on the preprocessing method of the raw spectrum data, the largest of 1.265%brix and the smallest of 0.507% brix. Such a result means that an appropriate preprocessing method corresponding to the characteristics of the spectrum data set should be found or developed for minimizing the prediction errors. It was observed that MSC and SNV are closely related to prediction accuracy, OSC is to number of PLS factors and the first derivative resulted in decrease of the prediction accuracy. A robust calibration model could be d3eveloped by the combined preprocessing of MSC and OSC, which showed that SEP=0.507%brix, bias=0.0327 and R2=0.8823.

Effectiveness of Normalization Pre-Processing of Big Data to the Machine Learning Performance (빅데이터의 정규화 전처리과정이 기계학습의 성능에 미치는 영향)

  • Jo, Jun-Mo
    • The Journal of the Korea institute of electronic communication sciences
    • /
    • v.14 no.3
    • /
    • pp.547-552
    • /
    • 2019
  • Recently, the massive growth in the scale of data has been observed as a major issue in the Big Data. Furthermore, the Big Data should be preprocessed for normalization to get a high performance of the Machine learning since the Big Data is also an input of Machine Learning. The performance varies by many factors such as the scope of the columns in a Big Data or the methods of normalization preprocessing. In this paper, the various types of normalization preprocessing methods and the scopes of the Big Data columns will be applied to the SVM(: Support Vector Machine) as a Machine Learning method to get the efficient environment for the normalization preprocessing. The Machine Learning experiment has been programmed in Python and the Jupyter Notebook.

Design of a real-time image preprocessing system with linescan camera interface (라인스캔 카메라 인터페이스를 갖는 실시간 영상 전처리 시스템의 설계)

  • Lyou, Kyeong;Kim, Kyeong-Min;Park, Gwi-Tae
    • Journal of Institute of Control, Robotics and Systems
    • /
    • v.3 no.6
    • /
    • pp.626-631
    • /
    • 1997
  • This paper represents the design of a real-time image preprocessing system. The preprocessing system performs hardware-wise mask operations and thresholding operations at the speed of camera output single rate. The preprocessing system consists of the preprocessing board and the main processing board. The preprocessing board includes preprocessing unit that includes a $5\times5$ mask processor and LUT, and can perform mask and threshold operations in real-time. To achieve high-resolution image input data($20485\timesn$), the preprocessing board has a linescan camera interface. The main processing board includes the image processor unit and main processor unit. The image processor unit is equipped with TI's TMS320C32 DSP and can perform image processing algorithms at high speed. The main processor unit controls the operation of total system. The proposed system is faster than the conventional CPU based system.

  • PDF

Design of Multiple Model Fuzzy Predictors using Data Preprocessing and its Application (데이터 전처리를 이용한 다중 모델 퍼지 예측기의 설계 및 응용)

  • Bang, Young-Keun;Lee, Chul-Heui
    • The Transactions of The Korean Institute of Electrical Engineers
    • /
    • v.58 no.1
    • /
    • pp.173-180
    • /
    • 2009
  • It is difficult to predict non-stationary or chaotic time series which includes the drift and/or the non-linearity as well as uncertainty. To solve it, we propose an effective prediction method which adopts data preprocessing and multiple model TS fuzzy predictors combined with model selection mechanism. In data preprocessing procedure, the candidates of the optimal difference interval are determined based on the correlation analysis, and corresponding difference data sets are generated in order to use them as predictor input instead of the original ones because the difference data can stabilize the statistical characteristics of those time series and better reveals their implicit properties. Then, TS fuzzy predictors are constructed for multiple model bank, where k-means clustering algorithm is used for fuzzy partition of input space, and the least squares method is applied to parameter identification of fuzzy rules. Among the predictors in the model bank, the one which best minimizes the performance index is selected, and it is used for prediction thereafter. Finally, the error compensation procedure based on correlation analysis is added to improve the prediction accuracy. Some computer simulations are performed to verify the effectiveness of the proposed method.

A Data Preprocessing Framework for Improving Estimation Accuracy of Battery Remaining Time in Mobile Smart Devices (모바일 스마트 장치 배터리의 잔여 시간 예측 향상을 위한 데이터 전처리 프레임워크)

  • Tak, Sungwoo
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.24 no.4
    • /
    • pp.536-545
    • /
    • 2020
  • When general statistical regression methods are applied to predict the battery remaining time of a mobile smart device, they yielded the poor accuracy of estimating battery remaining time as the deviations of battery usage time per battery level became larger. In order to improve the estimation accuracy of general statistical regression methods, a preprocessing task is required to refine the measured raw data with large deviations of battery usage time per battery level. In this paper, we propose a data preprocessing framework that preprocesses raw measured battery consumption data and converts them into refined battery consumption data. The numerical results obtained by experimenting the proposed data preprocessing framework confirmed that it yielded good performance in terms of accuracy of estimating battery remaining time under general statistical regression methods for given refined battery consumption data.

An Implementation of Preprocessing for Interior Point Methods for Linear Programming (내부점 방법을 위한 사전처리의 구현)

  • 성명기;임성묵;박순달
    • Journal of the Korean Operations Research and Management Science Society
    • /
    • v.24 no.1
    • /
    • pp.1-11
    • /
    • 1999
  • We classified preprocessing methods into (1) analytic methods, (2) methods for removing implied free variables, (3) methods using pivot or elementary row operations, (4) methods for removing linearly dependent rows and columns and (5) methods for dense columns. We noted some considerations to which should be paid attention when preprocessing methods are applied to interior point methods for linear programming. We proposed an efficient order of preprocessing methods and data structures. We also noted the recovery process for dual solutions. We implemented the proposed preprocessing methods. and tested it with 28 large scale problems of NETLIB. We compared the results of it with those of preprocessing routines of HOPDM, BPDPM and CPLEX.

  • PDF

Optimized Polynomial Neural Network Classifier Designed with the Aid of Space Search Simultaneous Tuning Strategy and Data Preprocessing Techniques

  • Huang, Wei;Oh, Sung-Kwun
    • Journal of Electrical Engineering and Technology
    • /
    • v.12 no.2
    • /
    • pp.911-917
    • /
    • 2017
  • There are generally three folds when developing neural network classifiers. They are as follows: 1) discriminant function; 2) lots of parameters in the design of classifier; and 3) high dimensional training data. Along with this viewpoint, we propose space search optimized polynomial neural network classifier (PNNC) with the aid of data preprocessing technique and simultaneous tuning strategy, which is a balance optimization strategy used in the design of PNNC when running space search optimization. Unlike the conventional probabilistic neural network classifier, the proposed neural network classifier adopts two type of polynomials for developing discriminant functions. The overall optimization of PNNC is realized with the aid of so-called structure optimization and parameter optimization with the use of simultaneous tuning strategy. Space search optimization algorithm is considered as a optimize vehicle to help the implement both structure and parameter optimization in the construction of PNNC. Furthermore, principal component analysis and linear discriminate analysis are selected as the data preprocessing techniques for PNNC. Experimental results show that the proposed neural network classifier obtains better performance in comparison with some other well-known classifiers in terms of accuracy classification rate.

Performance Analysis of Preprocessing Algorithm in Container Terminal and Suggestion for Optimum Selection (컨테이너 터미널의 선처리 알고리즘 성능분석과 최적선택 제안)

  • Park, Young-Kyu
    • Journal of Distribution Science
    • /
    • v.16 no.12
    • /
    • pp.95-104
    • /
    • 2018
  • Purpose - In order to gain the upper hand in competition between container terminals, efforts to improve container terminal productivity continue. Export containers arrive randomly in the container terminal and are carried in the container terminal yard according to the arrival order. On the other hand, containers are carried out of the container terminal yard in order based on container weight, not in order of arrival. Because the carry-in order and the carry-out order are different, rehandling may occur, which reduces the performance of the container terminals. In order to reduce rehandling number, containers can be moved in advance when they arrive, which is called preprocessing. This paper proposes an effective preprocessing algorithm and analyzes the factors that affect the productivity of the container terminals. It also provides a way to choose the best factors for preprocessing for a variety of situations. Research design, data, and methodology - To analyze the impact of factors affecting the performance of preprocessing algorithms presented in this paper, simulations are performed. The simulations are performed for two types of bays, 12 stacks with 8 tiers, and 8 stacks with 6 tiers. Results - The results of the factor analysis that affects the performance of the preprocessing algorithm were as follows. (1) As the LMF increased, preprocessing number increases and rehandling number decreased. (2) The LML effect was greatest when the LML changed from 0 to 1, and that the effect decreased when it changed above 1. (3) The sum of preprocessing number and rehandling number was then shown to be increased after decrease, as the LMF increased. (4) In the case of NCI, a decrease in NCI showed that the containers would become more grouped and thus the performance was improved. (5) There was a positive effect in the case of EFS. Conclusion - In this paper, preprocessing algorithm was proposed and it was possible to choose the best factors for preprocessing for a variety of situations through simulations. Further research related to this study needs to be carried out in the following topic : a study on the improvement of container performance by connecting the preprocessing with remarshalling.