• Title/Summary/Keyword: 데이터 전처리

Search Result 1,144, Processing Time 0.039 seconds

Post-processing of Input Data for Improving CGH Hologram (CGH 홀로그램 개선을 위한 입력 데이터 전처리)

  • Gil, Jong-In;Jeong, Da-Un;Kim, Man-Bae
    • Proceedings of the Korean Society of Broadcast Engineers Conference
    • /
    • 2010.07a
    • /
    • pp.231-233
    • /
    • 2010
  • 깊이데이터는 CG 또는 실사 영상에서 획득되는데 입체 영상 분야에서 활용도가 높다. 예를 들어 2D영상의 3D화질 개선, 입체영상의 입체감 개선 등의 활용이 되고 있다. 본 논문에서는 이러한 추세에 맞추어 홀로그램을 생성하는 입력 데이터의 전처리과정으로 통하여 CGH 홀로그램을 개선하는 영상처리 기술을 제안한다. 입력 데이터의 전처리를 통해 생성된 홀로그램 영상의 화질 개선을 제안하고, 실험을 통해 제안 방법의 우수성을 보여준다.

  • PDF

A Real-time Context Integration System for Multimodal Sensor Networks using XML (XML을 활용한 멀티모달 센서기반 실시간 컨텍스트 통합 시스템)

  • Yang, Sung-Ihk;Hong, Jin-Hyuk;Cho, Sung-Bae
    • 한국HCI학회:학술대회논문집
    • /
    • 2008.02a
    • /
    • pp.141-146
    • /
    • 2008
  • As the interest about ubiquitous environment is increasing, there are many researches about the services in this environment. These services have important issues in interpreting the users' context, using many kinds of sensors, like PDA, GPS and accelerometers. Low level raw data, which sensors like accelerometers calibrates, are hard to use, and to provide real-time services preprocessing and interpreting the data into context, in real-time, is important. This paper describes a context integrate system which can integrate these sensors and also sensors which has raw data, like accelerometers and physiological sensors, and define the context interpret rule with XML. The proposing system reduces programming operations when adding a sensor to the sensor network or modifying the context interpreting rule by using XML. By using this system, we implemented a real-time data monitoring system which can describe the numeric data into graphs, and assist the user to validate the data and results of the preprocess phase, and also support the external services and applications to use the context of the user.

  • PDF

A Pre-processing Study to Solve the Problem of Rare Class Classification of Network Traffic Data (네트워크 트래픽 데이터의 희소 클래스 분류 문제 해결을 위한 전처리 연구)

  • Ryu, Kyung Joon;Shin, DongIl;Shin, DongKyoo;Park, JeongChan;Kim, JinGoog
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.9 no.12
    • /
    • pp.411-418
    • /
    • 2020
  • In the field of information security, IDS(Intrusion Detection System) is normally classified in two different categories: signature-based IDS and anomaly-based IDS. Many studies in anomaly-based IDS have been conducted that analyze network traffic data generated in cyberspace by machine learning algorithms. In this paper, we studied pre-processing methods to overcome performance degradation problems cashed by rare classes. We experimented classification performance of a Machine Learning algorithm by reconstructing data set based on rare classes and semi rare classes. After reconstructing data into three different sets, wrapper and filter feature selection methods are applied continuously. Each data set is regularized by a quantile scaler. Depp neural network model is used for learning and validation. The evaluation results are compared by true positive values and false negative values. We acquired improved classification performances on all of three data sets.

A Comparative Analysis of the Pre-Processing in the Kaggle Titanic Competition

  • Tai-Sung, Hur;Suyoung, Bang
    • Journal of the Korea Society of Computer and Information
    • /
    • v.28 no.3
    • /
    • pp.17-24
    • /
    • 2023
  • Based on the problem of 'Tatanic - Machine Learning from Disaster', a representative competition of Kaggle that presents challenges related to data science and solves them, we want to see how data preprocessing and model construction affect prediction accuracy and score. We compare and analyze the features by selecting seven top-ranked solutions with high scores, except when using redundant models or ensemble techniques. It was confirmed that most of the pretreatment has unique and differentiated characteristics, and although the pretreatment process was almost the same, there were differences in scores depending on the type of model. The comparative analysis study in this paper is expected to help participants in the kaggle competition and data science beginners by understanding the characteristics and analysis flow of the preprocessing methods of the top score participants.

A Design of Image Preprocessing Subsystem for COMS (통신해양기상위성 영상 데이터 전처리 시스템 설계)

  • Seo Seok-Bae;Koo In-Hoi;Ahn Sang-Il;Kim Eun-Kyou
    • Proceedings of the KSRS Conference
    • /
    • 2006.03a
    • /
    • pp.390-393
    • /
    • 2006
  • 본 논문에서는 현재 개발 중인 통신해양기상위성(COMS : Communication, Ocean and Meteorological Satellite)의 데이터를 처리하는 영상 데이터 전처리 시스템 (IMPS, IMage Preprocessing Subsystem)의 설계 과정과 예비설계 결과를 설명한다.

  • PDF

Personalized Service Based on Context Awareness through User Emotional Perception in Mobile Environment (모바일 환경에서의 상황인식 기반 사용자 감성인지를 통한 개인화 서비스)

  • Kwon, Il-Kyoung;Lee, Sang-Yong
    • Journal of Digital Convergence
    • /
    • v.10 no.2
    • /
    • pp.287-292
    • /
    • 2012
  • In this paper, user personalized services through the emotion perception required to support location-based sensing data preprocessing techniques and emotion data preprocessing techniques is studied for user's emotion data building and preprocessing in V-A emotion model. For this purpose the granular context tree and string matching based emotion pattern matching techniques are used. In addition, context-aware and personalized recommendation services technique using probabilistic reasoning is studied for personalized services based on context awareness.

A Comparison of Ensemble Methods Combining Resampling Techniques for Class Imbalanced Data (데이터 전처리와 앙상블 기법을 통한 불균형 데이터의 분류모형 비교 연구)

  • Leea, Hee-Jae;Lee, Sungim
    • The Korean Journal of Applied Statistics
    • /
    • v.27 no.3
    • /
    • pp.357-371
    • /
    • 2014
  • There are many studies related to imbalanced data in which the class distribution is highly skewed. To address the problem of imbalanced data, previous studies deal with resampling techniques which correct the skewness of the class distribution in each sampled subset by using under-sampling, over-sampling or hybrid-sampling such as SMOTE. Ensemble methods have also alleviated the problem of class imbalanced data. In this paper, we compare around a dozen algorithms that combine the ensemble methods and resampling techniques based on simulated data sets generated by the Backbone model, which can handle the imbalance rate. The results on various real imbalanced data sets are also presented to compare the effectiveness of algorithms. As a result, we highly recommend the resampling technique combining ensemble methods for imbalanced data in which the proportion of the minority class is less than 10%. We also find that each ensemble method has a well-matched sampling technique. The algorithms which combine bagging or random forest ensembles with random undersampling tend to perform well; however, the boosting ensemble appears to perform better with over-sampling. All ensemble methods combined with SMOTE outperform in most situations.

A Sparse Data Preprocessing Using Support Vector Regression (Support Vector Regression을 이용한 희소 데이터의 전처리)

  • Jun, Sung-Hae;Park, Jung-Eun;Oh, Kyung-Whan
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.14 no.6
    • /
    • pp.789-792
    • /
    • 2004
  • In various fields as web mining, bioinformatics, statistical data analysis, and so forth, very diversely missing values are found. These values make training data to be sparse. Largely, the missing values are replaced by predicted values using mean and mode. We can used the advanced missing value imputation methods as conditional mean, tree method, and Markov Chain Monte Carlo algorithm. But general imputation models have the property that their predictive accuracy is decreased according to increase the ratio of missing in training data. Moreover the number of available imputations is limited by increasing missing ratio. To settle this problem, we proposed statistical learning theory to preprocess for missing values. Our statistical learning theory is the support vector regression by Vapnik. The proposed method can be applied to sparsely training data. We verified the performance of our model using the data sets from UCI machine learning repository.

Prediction of Distillation Column Temperature Using Machine Learning and Data Preprocessing (머신 러닝과 데이터 전처리를 활용한 증류탑 온도 예측)

  • Lee, Yechan;Choi, Yeongryeol;Cho, Hyungtae;Kim, Junghwan
    • Korean Chemical Engineering Research
    • /
    • v.59 no.2
    • /
    • pp.191-199
    • /
    • 2021
  • A distillation column, which is a main facility of the chemical process, separates the desired product from a mixture by using the difference of boiling points. The distillation process requires the optimization and the prediction of operation because it consumes much energy. The target process of this study is difficult to operate efficiently because the composition of feed flow is not steady according to the supplier. To deal with this problem, we could develop a data-driven model to predict operating conditions. However, data preprocessing is essential to improve the predictive performance of the model because the raw data contains outlier and noise. In this study, after optimizing the predictive model based long-short term memory (LSTM) and Random forest (RF), we used a low-pass filter and one-class support vector machine for data preprocessing and compared predictive performance according to the method and range of the preprocessing. The performance of the predictive model and the effect of the preprocessing is compared by using R2 and RMSE. In the case of LSTM, R2 increased from 0.791 to 0.977 by 23.5%, and RMSE decreased from 0.132 to 0.029 by 78.0%. In the case of RF, R2 increased from 0.767 to 0.938 by 22.3%, and RMSE decreased from 0.140 to 0.050 by 64.3%.

Diagnostic Classification Based on Nonlinear Representation and Filtering of Process Measurement Data (공정측정데이터의 비선형표현과 전처리를 활용한 분류기반 진단)

  • Cho, Hyun-Woo
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.16 no.5
    • /
    • pp.3000-3005
    • /
    • 2015
  • Reliable monitoring and diagnosis of industrial processes is quite important for in terms of quality and safety. The goal of fault diagnosis is to find process variables responsible for causing specific abnormalities of the process. This work presents a classification-based diagnostic scheme based on nonlinear representation of process data. The use of a nonlinear kernel technique is able to reduce the size of the data considered and provides efficient and reliable representation of the measurement data. As a filtering stage a preprocessing is performed to eliminate unwanted parts of the data with enhanced performance. The case study of an industrial batch process has shown that the performance of the scheme outperformed other methods. In addition, the use of a nonlinear representation technique and filtering improved the diagnosis performance in the case study.