• Title/Summary/Keyword: data pre-processing

Search Result 805, Processing Time 0.027 seconds

Pre-processing Method of Raw Data Based on Ontology for Machine Learning (머신러닝을 위한 온톨로지 기반의 Raw Data 전처리 기법)

  • Hwang, Chi-Gon;Yoon, Chang-Pyo
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.24 no.5
    • /
    • pp.600-608
    • /
    • 2020
  • Machine learning constructs an objective function from learning data, and predicts the result of the data generated by checking the objective function through test data. In machine learning, input data is subjected to a normalisation process through a preprocessing. In the case of numerical data, normalization is standardized by using the average and standard deviation of the input data. In the case of nominal data, which is non-numerical data, it is converted into a one-hot code form. However, this preprocessing alone cannot solve the problem. For this reason, we propose a method that uses ontology to normalize input data in this paper. The test data for this uses the received signal strength indicator (RSSI) value of the Wi-Fi device collected from the mobile device. These data are solved through ontology because they includes noise and heterogeneous problems.

Interactive chinese Character Distance Learning System on the WWW (WWW에서 대화형 원격 한자학습 시스템)

  • Gang, Jong-Gyu;Park, Sang-U;Kim, Hyeon-Suk;Kim, Gye-Hwan;Jin, Seong-Il
    • The Transactions of the Korea Information Processing Society
    • /
    • v.4 no.3
    • /
    • pp.698-708
    • /
    • 1997
  • To construct distance learing servers and provide their service using the WWW(World Wide Web), it is necessary that we use a real-time processing mehtod rather than the processing after downloading method for multimedia data transmission and their processing.To fulfill such requirements, we developed a real-time processing muduloe for distance education which can process multimedia data in AVI and WAV formats in distrbuted eviroments.We in turn developede a real-time WWW server that can provide real-time services of hypertxt and motion poctures data in temsw of adding the real-time porcessing modute to the MuX framework and intergarting them with WWW. We frnally developed as distance lerming system for real-time interactive chinese character learming, bassed on the results from the pre-vious steps.

  • PDF

Improvement of Environmental Sounds Recognition by Post Processing (후처리를 이용한 환경음 인식 성능 개선)

  • Park, Jun-Qyu;Baek, Seong-Joon
    • The Journal of the Korea Contents Association
    • /
    • v.10 no.7
    • /
    • pp.31-39
    • /
    • 2010
  • In this study, we prepared the real environmental sound data sets arising from people's movement comprising 9 different environment types. The environmental sounds are pre-processed with pre-emphasis and Hamming window, then go into the classification experiments with the extracted features using MFCC (Mel-Frequency Cepstral Coefficients). The GMM (Gaussian Mixture Model) classifier without post processing tends to yield abruptly changing classification results since it does not consider the results of the neighboring frames. Hence we proposed the post processing methods which suppress abruptly changing classification results by taking the probability or the rank of the neighboring frames into account. According to the experimental results, the method using the probability of neighboring frames improve the recognition performance by more than 10% when compared with the method without post processing.

A Case Study of Basic Data Science Education using Public Big Data Collection and Spreadsheets for Teacher Education (교사교육을 위한 공공 빅데이터 수집 및 스프레드시트 활용 기초 데이터과학 교육 사례 연구)

  • Hur, Kyeong
    • Journal of The Korean Association of Information Education
    • /
    • v.25 no.3
    • /
    • pp.459-469
    • /
    • 2021
  • In this paper, a case study of basic data science practice education for field teachers and pre-service teachers was studied. In this paper, for basic data science education, spreadsheet software was used as a data collection and analysis tool. After that, we trained on statistics for data processing, predictive hypothesis, and predictive model verification. In addition, an educational case for collecting and processing thousands of public big data and verifying the population prediction hypothesis and prediction model was proposed. A 34-hour, 17-week curriculum using a spreadsheet tool was presented with the contents of such basic education in data science. As a tool for data collection, processing, and analysis, unlike Python, spreadsheets do not have the burden of learning program- ming languages and data structures, and have the advantage of visually learning theories of processing and anal- ysis of qualitative and quantitative data. As a result of this educational case study, three predictive hypothesis test cases were presented and analyzed. First, quantitative public data were collected to verify the hypothesis of predicting the difference in the mean value for each group of the population. Second, by collecting qualitative public data, the hypothesis of predicting the association within the qualitative data of the population was verified. Third, by collecting quantitative public data, the regression prediction model was verified according to the hypothesis of correlation prediction within the quantitative data of the population. And through the satisfaction analysis of pre-service and field teachers, the effectiveness of this education case in data science education was analyzed.

Point Cloud Classification Method for Mountainous Area (산악지역 점군자료 분류기법 연구)

  • Choi, Yun-Woong;Lee, Geun-Sang;Cho, Gi-Sung
    • Proceedings of the Korean Society of Surveying, Geodesy, Photogrammetry, and Cartography Conference
    • /
    • 2010.04a
    • /
    • pp.387-388
    • /
    • 2010
  • There is no generalized and systematic method yet to data pre-processing for point cloud data classification even if there have been lots of previous studies such as local maxima filter, morphology filter, slope based filter and so on. Main focus of this study is to present classification method for bare ground information from LiDAR data for the mountainous area.

  • PDF

Development of High Fidelity Supersonic Flow Air Data Processing Algorithm (고 신뢰도 초고속 공기 유동 데이터 처리 알고리즘 개발)

  • Choi, Jong-Ho;Yoon, Hyun-Gull
    • Journal of the Korean Society of Propulsion Engineers
    • /
    • v.14 no.2
    • /
    • pp.54-62
    • /
    • 2010
  • This paper describes the development of high fidelity air data processing algorithm which can be applied into an air data system for a high speed aerial vehicle. Unlike the previous air data system, current algorithm used several pre-determined pressure data which were obtained with computational fluid dynamic approach without using total pressures having enough sensor redundancy and fault detection ability. The verification of current algorithm was done by commercial software Matlab and Simulink.

Introduction of Acquisition System, Processing System and Distributing Service for Geostationary Ocean Color Imager (GOCI) Data (정지궤도 해색탑재체(GOCI) 데이터의 수신.처리 시스템과 배포 서비스)

  • Yang, Chan-Su;Bae, Sang-Soo;Han, Hee-Jeong;Ahn, Yu-Hwan;Ryu, Joo-Hyung;Han, Tai-Hyun;Yoo, Hong-Rhyong
    • Korean Journal of Remote Sensing
    • /
    • v.26 no.2
    • /
    • pp.263-275
    • /
    • 2010
  • KOSC(Korea Ocean Satellite Center), the primary operational organization for GOCI(Geostationary Ocean Color Imager), was established in KORDI(Korea Ocean Research & Development Institute). For a stable distribution service of GOCI data, various systems were installed at KOSC as follows: GOCI Data Acquisition System, Image Pre-processing System, GOCI Data Processing System, GOCI Data Distribution System, Data Management System, Total Management & Control System and External Data Exchange System. KOSC distributes the GOCI data 8 times to user at 1-hour intervals during the daytime in near-real time according to the distribution policy. Finally, we introduce the KOSC website for users to search, request and download GOCI data.

A Prediction Model for Low Cycle and High Cycle Fatigue Lives of Pre-strained Fe-18Mn TWIP Steel (Fe-18Mn TWIP강의 Pre-strain에 따른 저주기 및 고주기 피로 수명 예측 모델)

  • Kim, Y.W.;Lee, C.S.
    • Transactions of Materials Processing
    • /
    • v.19 no.1
    • /
    • pp.11-16
    • /
    • 2010
  • The influence of pre-strain on low cycle fatigue behavior of Fe-18Mn-0.05Al-0.6C TWIP steel was studied by conducting axial strain-controlled tests. As-received plates were deformed by rolling with reduction ratios of 10 and 30%, respectively. A triangular waveform with a constant frequency of 1 Hz was employed for low cycle fatigue test at the total strain amplitudes in the range of ${\pm}0.4\;{\sim}\;{\pm}0.6$ pct. The results showed that low-cycle fatigue life was strongly dependent on the amount of pre-strain as well as the strain amplitude. Increasing the amount of prestrain, the number of reversals to failure was significantly decreased at high strain amplitudes, but the effect was negligible at low strain amplitudes. A new model for predicting fatigue life of pre-strained body has been suggested by adding ${\Delta}E_{pre-strain}$ to the energy-based fatigue damage parameter. Also, high-cycle fatigue lives predicted using the low-cycle fatigue data well agreed with the experimental ones.

Survey on the use of pre-processed food materials in school foodservices in the Kyunggi area (경기지역 학교급식소에서 전처리 식재료의 이용에 대한 실태 조사 및 중요도${\cdot}$수행도 평가)

  • Lee, Seung-Mi;Lee, Seung-Joo
    • Korean journal of food and cookery science
    • /
    • v.22 no.5 s.95
    • /
    • pp.553-564
    • /
    • 2006
  • This study was conducted to investigate the use and acceptability of pre-processed food materials in school foodservice. Self-administered questionnaires were collected from 81 schools in the Kyunggi area. Statistical data analysis was completed using the SPSS v. 10.0 program. Eighty-one school dietitians from 31 elementary, 31 middle, 19 high school participated in the survey. Most of the subjects (over 95%) understood that it is necessary to use pre-processed foods, and they considered food hygiene as the most important factor. The percentages of school foodservices that purchased and used pre-processed foods were: 82.7% for cabbage, 86.4% for onion 72.8% for carrot, 97% for garlic, 82.7% for potato, and over 90% for meats and fishes. Dietitians were most satisfied with the performance of ‘trash reduction’, and ‘saving cooking time’ when using pre-processed food materials. ‘Appearance’, ‘freshness’, ‘hygiene’, ‘nutrition’, and ‘specialty of the food-processing company’ were aspects of the most concern when purchasing and using pre-processed food materials.

Imputation of Medical Data Using Subspace Condition Order Degree Polynomials

  • Silachan, Klaokanlaya;Tantatsanawong, Panjai
    • Journal of Information Processing Systems
    • /
    • v.10 no.3
    • /
    • pp.395-411
    • /
    • 2014
  • Temporal medical data is often collected during patient treatments that require personal analysis. Each observation recorded in the temporal medical data is associated with measurements and time treatments. A major problem in the analysis of temporal medical data are the missing values that are caused, for example, by patients dropping out of a study before completion. Therefore, the imputation of missing data is an important step during pre-processing and can provide useful information before the data is mined. For each patient and each variable, this imputation replaces the missing data with a value drawn from an estimated distribution of that variable. In this paper, we propose a new method, called Newton's finite divided difference polynomial interpolation with condition order degree, for dealing with missing values in temporal medical data related to obesity. We compared the new imputation method with three existing subspace estimation techniques, including the k-nearest neighbor, local least squares, and natural cubic spline approaches. The performance of each approach was then evaluated by using the normalized root mean square error and the statistically significant test results. The experimental results have demonstrated that the proposed method provides the best fit with the smallest error and is more accurate than the other methods.