• Title/Summary/Keyword: Data Preprocessing

Search Result 939, Processing Time 0.031 seconds

Two Dimensional Inter-symbol Interference Compensation for Holographic Data Storage (홀로그래픽 데이터 저장 장치를 위한 2차원 인접 심볼간 간섭 보상)

  • Jeong, Seongkwon;Lee, Jaejin
    • Journal of the Institute of Electronics and Information Engineers
    • /
    • v.52 no.6
    • /
    • pp.10-14
    • /
    • 2015
  • In holographic data storage systems, data is recorded and read by page on a volume of storage medium, and it can increase transmission rate and storage capacity because of two-dimensional page-oriented data processing by charge-coupled devices. However, HDS suffers two-dimensional intersymbol interference unlike conventional data storages. In this paper, we propose a preprocessing method of decreasing ISI before read data in HDS pass to detector. This method has some advantage when we collaborate with the preprocessing for reducing misalignment error and modulation code.

Text mining-based Data Preprocessing and Accident Type Analysis for Construction Accident Analysis (건설사고 분석을 위한 텍스트 마이닝 기반 데이터 전처리 및 사고유형 분석)

  • Yoon, Young Geun;Lee, Jae Yun;Oh, Tae Keun
    • Journal of the Korean Society of Safety
    • /
    • v.37 no.2
    • /
    • pp.18-27
    • /
    • 2022
  • Construction accidents are difficult to prevent because several different types of activities occur simultaneously. The current method of accident analysis only indicates the number of occurrences for one or two variables and accidents have not reduced as a result of safety measures that focus solely on individual variables. Even if accident data is analyzed to establish appropriate safety measures, it is difficult to derive significant results due to a large number of data variables, elements, and qualitative records. In this study, in order to simplify the analysis and approach this complex problem logically, data preprocessing techniques, such as latent class cluster analysis (LCCA) and predictor importance were used to discover the most influential variables. Finally, the correlation was analyzed using an alluvial flow diagram consisting of seven variables and fourteen elements based on accident data. The alluvial diagram analysis using reduced variables and elements enabled the identification of accident trends into four categories. The findings of this study demonstrate that complex and diverse construction accident data can yield relevant analysis results, assisting in the prevention of accidents.

A Concordance Study of the Preprocessing Orders in Microarray Data (마이크로어레이 자료의 사전 처리 순서에 따른 검색의 일치도 분석)

  • Kim, Sang-Cheol;Lee, Jae-Hwi;Kim, Byung-Soo
    • The Korean Journal of Applied Statistics
    • /
    • v.22 no.3
    • /
    • pp.585-594
    • /
    • 2009
  • Researchers of microarray experiment transpose processed images of raw data to possible data of statistical analysis: it is preprocessing. Preprocessing of microarray has image filtering, imputation and normalization. There have been studied about several different methods of normalization and imputation, but there was not further study on the order of the procedures. We have no further study about which things put first on our procedure between normalization and imputation. This study is about the identification of differentially expressed genes(DEG) on the order of the preprocessing steps using two-dye cDNA microarray in colon cancer and gastric cancer. That is, we check for compare which combination of imputation and normalization steps can detect the DEG. We used imputation methods(K-nearly neighbor, Baysian principle comparison analysis) and normalization methods(global, within-print tip group, variance stabilization). Therefore, preprocessing steps have 12 methods. We identified concordance measure of DEG using the datasets to which the 12 different preprocessing orders were applied. When we applied preprocessing using variance stabilization of normalization method, there was a little variance in a sensitive way for detecting DEG.

Prediction of Distillation Column Temperature Using Machine Learning and Data Preprocessing (머신 러닝과 데이터 전처리를 활용한 증류탑 온도 예측)

  • Lee, Yechan;Choi, Yeongryeol;Cho, Hyungtae;Kim, Junghwan
    • Korean Chemical Engineering Research
    • /
    • v.59 no.2
    • /
    • pp.191-199
    • /
    • 2021
  • A distillation column, which is a main facility of the chemical process, separates the desired product from a mixture by using the difference of boiling points. The distillation process requires the optimization and the prediction of operation because it consumes much energy. The target process of this study is difficult to operate efficiently because the composition of feed flow is not steady according to the supplier. To deal with this problem, we could develop a data-driven model to predict operating conditions. However, data preprocessing is essential to improve the predictive performance of the model because the raw data contains outlier and noise. In this study, after optimizing the predictive model based long-short term memory (LSTM) and Random forest (RF), we used a low-pass filter and one-class support vector machine for data preprocessing and compared predictive performance according to the method and range of the preprocessing. The performance of the predictive model and the effect of the preprocessing is compared by using R2 and RMSE. In the case of LSTM, R2 increased from 0.791 to 0.977 by 23.5%, and RMSE decreased from 0.132 to 0.029 by 78.0%. In the case of RF, R2 increased from 0.767 to 0.938 by 22.3%, and RMSE decreased from 0.140 to 0.050 by 64.3%.

Properties of a bearing-only target tracking filter (방위각 정보만을 이용한 표적추적 필터의 특성연구)

  • 허남수;김인환;황창선;이만형
    • 제어로봇시스템학회:학술대회논문집
    • /
    • 1990.10a
    • /
    • pp.789-793
    • /
    • 1990
  • Preprocessing technique of the measurement bearing data is presented to improve the tar-get estimation accuracy for the bearing-only target notion analysis (TMA). Computer simulation is performed to compare with respect to the extended Kalman filter. By computer simulation, the target filter estimator with preprocessing Is both stable and robust to the measurement bearing noise.

  • PDF

A study on Data Preprocessing for Developing Remaining Useful Life Predictions based on Stochastic Degradation Models Using Air Craft Engine Data (항공엔진 열화데이터 기반 잔여수명 예측력 향상을 위한 데이터 전처리 방법 연구)

  • Yoon, Yeon Ah;Jung, Jin Hyeong;Lim, Jun Hyoung;Chang, Tai-Woo;Kim, Yong Soo
    • Journal of Korean Society of Industrial and Systems Engineering
    • /
    • v.43 no.2
    • /
    • pp.48-55
    • /
    • 2020
  • Recently, a study of prognosis and health management (PHM) was conducted to diagnose failure and predict the life of air craft engine parts using sensor data. PHM is a framework that provides individualized solutions for managing system health. This study predicted the remaining useful life (RUL) of aeroengine using degradation data collected by sensors provided by the IEEE 2008 PHM Conference Challenge. There are 218 engine sensor data that has initial wear and production deviations. It was difficult to determine the characteristics of the engine parts since the system and domain-specific information was not provided. Each engine has a different cycle, making it difficult to use time series models. Therefore, this analysis was performed using machine learning algorithms rather than statistical time series models. The machine learning algorithms used were a random forest, gradient boost tree analysis and XG boost. A sliding window was applied to develop RUL predictions. We compared model performance before and after applying the sliding window, and proposed a data preprocessing method to develop RUL predictions. The model was evaluated by R-square scores and root mean squares error (RMSE). It was shown that the XG boost model of the random split method using the sliding window preprocessing approach has the best predictive performance.

A Preprocessing Algorithm for Efficient Lossless Compression of Gray Scale Images

  • Kim, Sun-Ja;Hwang, Doh-Yeun;Yoo, Gi-Hyoung;You, Kang-Soo;Kwak, Hoon-Sung
    • 제어로봇시스템학회:학술대회논문집
    • /
    • 2005.06a
    • /
    • pp.2485-2489
    • /
    • 2005
  • This paper introduces a new preprocessing scheme to replace original data of gray scale images with particular ordered data so that performance of lossless compression can be improved more efficiently. As a kind of preprocessing technique to maximize performance of entropy encoder, the proposed method converts the input image data into more compressible form. Before encoding a stream of the input image, the proposed preprocessor counts co-occurrence frequencies for neighboring pixel pairs. Then, it replaces each pair of adjacent gray values with particular ordered numbers based on the investigated co-occurrence frequencies. When compressing ordered image using entropy encoder, we can expect to raise compression rate more highly because of enhanced statistical feature of the input image. In this paper, we show that lossless compression rate increased by up to 37.85% when comparing results from compressing preprocessed and non-preprocessed image data using entropy encoder such as Huffman, Arithmetic encoder.

  • PDF

Classification Accuracy Improvement for Decision Tree (의사결정트리의 분류 정확도 향상)

  • Rezene, Mehari Marta;Park, Sanghyun
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2017.04a
    • /
    • pp.787-790
    • /
    • 2017
  • Data quality is the main issue in the classification problems; generally, the presence of noisy instances in the training dataset will not lead to robust classification performance. Such instances may cause the generated decision tree to suffer from over-fitting and its accuracy may decrease. Decision trees are useful, efficient, and commonly used for solving various real world classification problems in data mining. In this paper, we introduce a preprocessing technique to improve the classification accuracy rates of the C4.5 decision tree algorithm. In the proposed preprocessing method, we applied the naive Bayes classifier to remove the noisy instances from the training dataset. We applied our proposed method to a real e-commerce sales dataset to test the performance of the proposed algorithm against the existing C4.5 decision tree classifier. As the experimental results, the proposed method improved the classification accuracy by 8.5% and 14.32% using training dataset and 10-fold crossvalidation, respectively.

A comparative study of Depth Preprocessing Method for 3D Data Service Based on Depth Image Based Rendering over T-DMB (지상파 DMB에서의 깊이 영상 기반 렌더링 기반의 3차원 서비스를 위한 깊이 영상 전처리 기술의 비교 연구)

  • Oh, Young-Jin;Jung, Kwang-Hee;Kim, Joong-Kyu;Lee, Gwang-Soon;Lee, Hyun;Hur, Nam-Ho;Kim, Jin-Woong
    • Proceedings of the IEEK Conference
    • /
    • 2008.06a
    • /
    • pp.815-816
    • /
    • 2008
  • In this paper, we evaluate depth image preprocessing for 3D data service based on DIBR over T-DMB. We evaluate two preprocessing methods of depth images. These are gaussian smoothing and adaptive smoothing. The results show that adaptive smoothing is more suitable for images with sharp transition of depth.

  • PDF

Data Preprocessing for Predicting Sarcopenia Based on Machine Learning (기계학습 기반 근감소증 예측을 위한 데이터 전처리 기법)

  • Yoon Choi;Yourim Yoon
    • The Journal of the Convergence on Culture Technology
    • /
    • v.9 no.3
    • /
    • pp.737-744
    • /
    • 2023
  • Sarcopenia is an increasingly common disease among the elder that has recently received attention. Although the causes of sarcopenia are diverse, aging, dietary habits, lack of exercise are the one of the major factors. As the causes of sarcopenia are diverse, it is important to develop strategies for prevention and treatment. However, predicting sarcopnia accuartely is difficult due to the variety of factors involved. Here, machine learning can significantly improve the accuracy and convenience of predicting sarcopenia. However, since lifestyle habits and biological data are vast, using data without preprocessing may be inappropriate in terms of time complexity and accuracy. This paper reviews recent literature on sarcopnia and its causes, focusing on preprocessing the data to be used in sarcopnia prediction machine learning accrodingly.