• Title/Summary/Keyword: 리샘플링

Search Result 110, Processing Time 0.029 seconds

A comparison of imputation methods using nonlinear models (비선형 모델을 이용한 결측 대체 방법 비교)

  • Kim, Hyein;Song, Juwon
    • The Korean Journal of Applied Statistics
    • /
    • v.32 no.4
    • /
    • pp.543-559
    • /
    • 2019
  • Data often include missing values due to various reasons. If the missing data mechanism is not MCAR, analysis based on fully observed cases may an estimation cause bias and decrease the precision of the estimate since partially observed cases are excluded. Especially when data include many variables, missing values cause more serious problems. Many imputation techniques are suggested to overcome this difficulty. However, imputation methods using parametric models may not fit well with real data which do not satisfy model assumptions. In this study, we review imputation methods using nonlinear models such as kernel, resampling, and spline methods which are robust on model assumptions. In addition, we suggest utilizing imputation classes to improve imputation accuracy or adding random errors to correctly estimate the variance of the estimates in nonlinear imputation models. Performances of imputation methods using nonlinear models are compared under various simulated data settings. Simulation results indicate that the performances of imputation methods are different as data settings change. However, imputation based on the kernel regression or the penalized spline performs better in most situations. Utilizing imputation classes or adding random errors improves the performance of imputation methods using nonlinear models.

A Study on Default Prediction Model: Focusing on The Imbalance Problem of Default Data (부도 예측 모형 연구: 부도 데이터의 불균형 문제를 중심으로)

  • Jinsoo Park;Kangbae Lee;Yongbok Cho
    • Information Systems Review
    • /
    • v.26 no.2
    • /
    • pp.169-183
    • /
    • 2024
  • This study summarizes improvement strategies for addressing the imbalance problem in observed default data that must be considered when constructing a default model and compares and analyzes the performance improvement effects using data resampling techniques and default threshold adjustments. Empirical analysis results indicate that as the level of imbalance resolution in the data increases, and as the default threshold of the model decreases, the recall of the model improves. Conversely, it was found that as the level of imbalance resolution in the data decreases, and as the default threshold of the model increases, the precision of the model improves. Additionally, focusing solely on either recall or precision when addressing the imbalance problem results in a phenomenon where the other performance evaluation metrics decrease significantly due to the trade-off relationship. This study differs from most previous research by focusing on the relationship between improvement strategies for the imbalance problem of default data and the enhancement of default model performance. Moreover, it is confirmed that to enhance the practical usability of the default model, different improvement strategies for the imbalance problem should be applied depending on the main purpose of the model, and there is a need to utilize the Fβ Score as a performance evaluation metric.

Monitoring of Groundwater quality according to groundwater use for agriculture (농업용 지하수 사용에 따른 지하수질 모니터링 평가)

  • Ha, Kyoochul;Ko, Kyung-Seok;Lee, Eunhee;Kim, Sunghyun;Park, Changhui;Kim, Gyoo-Bum
    • Proceedings of the Korea Water Resources Association Conference
    • /
    • 2020.06a
    • /
    • pp.30-30
    • /
    • 2020
  • 본 연구에서는 여름철에 농업용수(벼농사용)로서 집중적으로 지하수를 사용하는 지역에서 시기별 지하수 사용에 따른 지하수 수질변화를 평가하기 위해 수행되었다. 연구지역은 충남 홍성군 양곡리와 신곡리 일부를 포함하는 면적 2.83 ㎢(283.3 ha)에 해당하는 지역이다. 연구지역 지하수 수질의 공간적 분포 및 시간적 변화 특성 평가를 위하여 2019년 2회(7월, 10월)에 걸쳐 지하수 관정(21개소)에 대하여 조사 및 분석을 수행하였다. 지하수 샘플은 현장에서 온도(T), pH, 용존산소(DO) 및 전기전도도(EC), 산화환원전위(Eh) 등을 측정하였고, 실험실에서 주요 양이온 및 미량원소(Ca, Mg, Na, K, Si, Sr), 음이온(F, Cl, Br, NO2, NO3, PO4, SO4), 알칼리도, 용존 유기탄소(DOC)와 용존 유기물(DOM) 등을 분석하였다. 지하수 수질조사 결과, 전체의 14~15개소(67~71%)가 Ca-HCO3 유형으로 분류되었으며, 다음으로는 Ca-Cl 유형이 4~5개소(19~24%)가 관찰되었다. 얕은 심도의 관정에서 상대적으로 심도가 깊은 관정보다 대부분 성분(TDS, Ca, Mg, Na, K, Cl, SO4, HCO3, DOC)에서 높은 농도를 나타내었다. 지하수의 수질자료를 이용하여 다변량통계분석법인 주성분분석(PCA: Principal Components Analysis)과 계층적 군집분석(HCA: Hierachical Cluster Anlaysis)를 수행한 결과, 초기 3개 주요 고유성분(eigenvalue)는 PC1 54.0%, PC2 14.2%, PC3 12.3%로 전체 분산의 88.3%를 설명할 수 있었다. PC1은 Ca, Mg, Na, K, Cl, SO4, DOC가 주요한 영향 인자였으며 PC2는 HCO3, NO3, DO에 영향 받음을 확인하였다. 계층적 군집분석 결과, 연구지역 지하수는 Na-Cl 유형의 C-3 관정을 제외하고는 크게 두 그룹으로 구분되어 졌다. 다변량통계분석의 결과에서도 수리지화학, 동위원소, 용존유기물 등의 특성에서 나타나는 것과 유사한 연구지역의 수질특성을 확인할 수 있었다. 연구지역은 차시기 동안 수질변화는 일부 관정을 제외하고는 유의할 만한 수준으로 관찰되지는 않았고, 지하수 사용에 따른 지하수위 회복도 빠르게 진행되고 있는 것으로 나타났다.

  • PDF

Spectral Characteristics of Sea Surface Height in the East Sea from Topex/Poseidon Altimeter Data (Topex/Poseidon에서 관측된 동해 해수면의 주기특성 연구)

  • 황종선;민경덕;이준우;원중선;김정우
    • Economic and Environmental Geology
    • /
    • v.34 no.4
    • /
    • pp.375-383
    • /
    • 2001
  • We extracted sea surface heights(SSH) from the TopexJPoseidon(T/P) radar altimeter data to compare with fhe SSH estimated from in-situ lide gauges(T/G) at Ulleungdo, Pohang, and SockcholMucko sites. Selection criteria such as wet/dry troposphere, ionosphere, and ocean tide were used to estimate accurate SSH. For time series analysis, the one-hour interval tide gauge SSHs were resampled al lO-day interval of the satellite SSHs. The ocean tide model applied in the altimeter data processing showed periodic aliasings of 175.5 day, 87.8 day, 62J day, 58.5 day, 49.5 day and 46.0 day, and, hence, the ZOO-day filtering was applied to reduce these spectral noises. Wavenumber correlation analysis was also applied to extract common components between the two SSHs, resulting in enhancing the correlation coefficient(CC) dramatically. The original CCs between the satenite and tide gauge SSHs are 0.46. 0.26, and 0.]5, respectively. Ulleungdo shows the largest cc bec;luase the site is far from the coast resulting in the minimun error in the satellite observations. The CCs were then increased to 0.59, 030, and 0.30, respectively, after 200.day filtering, and to 0.69, 0.63. and 0.59 after removing inversely correlative components using wavenumber correlation analysis. The CCs were greatly increased by 87, 227, and 460% when the wavenumber correlation analysis was followed by 2oo-day filtering, resulting in the final CCs of 0.86, 0.85, 0.84, respectively. It was found that the best SSHs were estimated when the two methods were applied to the original data. The low-pass filtered TIP SSHs were found to be well correlated with the TIG SSHs from tide gauges, and the best correlation results were found when we applied both low-pass filtering and spectral correlation analysis to the original SSHs.

  • PDF

Implementation of a Self Controlled Mobile Robot with Intelligence to Recognize Obstacles (장애물 인식 지능을 갖춘 자율 이동로봇의 구현)

  • 류한성;최중경
    • Journal of the Institute of Electronics Engineers of Korea SP
    • /
    • v.40 no.5
    • /
    • pp.312-321
    • /
    • 2003
  • In this paper, we implement robot which are ability to recognize obstacles and moving automatically to destination. we present two results in this paper; hardware implementation of image processing board and software implementation of visual feedback algorithm for a self-controlled robot. In the first part, the mobile robot depends on commands from a control board which is doing image processing part. We have studied the self controlled mobile robot system equipped with a CCD camera for a long time. This robot system consists of a image processing board implemented with DSPs, a stepping motor, a CCD camera. We will propose an algorithm in which commands are delivered for the robot to move in the planned path. The distance that the robot is supposed to move is calculated on the basis of the absolute coordinate and the coordinate of the target spot. And the image signal acquired by the CCD camera mounted on the robot is captured at every sampling time in order for the robot to automatically avoid the obstacle and finally to reach the destination. The image processing board consists of DSP (TMS320VC33), ADV611, SAA7111, ADV7l76A, CPLD(EPM7256ATC144), and SRAM memories. In the second part, the visual feedback control has two types of vision algorithms: obstacle avoidance and path planning. The first algorithm is cell, part of the image divided by blob analysis. We will do image preprocessing to improve the input image. This image preprocessing consists of filtering, edge detection, NOR converting, and threshold-ing. This major image processing includes labeling, segmentation, and pixel density calculation. In the second algorithm, after an image frame went through preprocessing (edge detection, converting, thresholding), the histogram is measured vertically (the y-axis direction). Then, the binary histogram of the image shows waveforms with only black and white variations. Here we use the fact that since obstacles appear as sectional diagrams as if they were walls, there is no variation in the histogram. The intensities of the line histogram are measured as vertically at intervals of 20 pixels. So, we can find uniform and nonuniform regions of the waveforms and define the period of uniform waveforms as an obstacle region. We can see that the algorithm is very useful for the robot to move avoiding obstacles.

Precise Rectification of Misaligned Stereo Images for 3D Image Generation (입체영상 제작을 위한 비정렬 스테레오 영상의 정밀편위수정)

  • Kim, Jae-In;Kim, Tae-Jung
    • Journal of Broadcast Engineering
    • /
    • v.17 no.2
    • /
    • pp.411-421
    • /
    • 2012
  • The stagnant growth in 3D market due to 3D movie contents shortage is encouraging development of techniques for production cost reduction. Elimination of vertical disparity generated during image acquisition requires heaviest time and effort in the whole stereoscopic film-making process. This matter is directly related to competitiveness in the market and is being dealt with as a very important task. The removal of vertical disparity, i.e. image rectification has been treated for a long time in the photogrammetry field. While computer vision methods are focused on fast processing and automation, photogrammetry methods on accuracy and precision. However, photogrammetric approaches have not been tried for the 3D film-making. In this paper, proposed is a photogrammetry-based rectification algorithm that enable to eliminate the vertical disparity precisely by reconstruction of geometric relationship at the time of shooting. Evaluation of proposed algorithm was carried out by comparing the performance with two existing computer vision algorithms. The epipolar constraint satisfaction, epipolar line accuracy and vertical disparity of result images were tested. As a result, the proposed algorithm showed excellent performance than the other algorithms in term of accuracy and precision, and also revealed robustness about position error of tie-points.

A Discrete Model of Conveyor Systems for FMS (FMS를 위한 Conveyor System의 이산구조 모델링)

  • Sin, Ok-Geun
    • The Transactions of the Korea Information Processing Society
    • /
    • v.3 no.6
    • /
    • pp.1397-1406
    • /
    • 1996
  • In this paper, we propose a discrete model of conveyor systems, which is frequently used in flexible manufacturing systems to transfer work-in-process( WIP) between manipulators. In the case where the time required for transferring WIP's between manipulators are greater than that of manufacturing itself, as in many flexible assembly lines, the quantitative model of the transfer systems is needed to analyze the behavior and productivity of the whole manufacturing system. The proposed model is based upon the assumptions that the length of any unit conveyor component is integer multiple of the length of a pallet and hat the transferring speed of the conveyor is constant. Under these assumptions, the observation moments and the length of the conveyor can be quantized. Hence, the state of a conveyor can be represented by two kinds of Boolean variables: one representing the presence of a pallet on each quantize conveyor length and the other representing the mobility of this pallet. The whole conveyor system can be modeling as a network composed of branches and knots based on these two Boolean variables. The proposed modelling method was tested with various conveyor system configurations and showed that the model can be adopted successfully for the simulation of transfer systems and of the piloting of manufacturing processes.

  • PDF

A research on the emotion classification and precision improvement of EEG(Electroencephalogram) data using machine learning algorithm (기계학습 알고리즘에 기반한 뇌파 데이터의 감정분류 및 정확도 향상에 관한 연구)

  • Lee, Hyunju;Shin, Dongil;Shin, Dongkyoo
    • Journal of Internet Computing and Services
    • /
    • v.20 no.5
    • /
    • pp.27-36
    • /
    • 2019
  • In this study, experiments on the improvement of the emotion classification, analysis and accuracy of EEG data were proceeded, which applied DEAP (a Database for Emotion Analysis using Physiological signals) dataset. In the experiment, total 32 of EEG channel data measured from 32 of subjects were applied. In pre-processing step, 256Hz sampling tasks of the EEG data were conducted, each wave range of the frequency (Hz); Theta, Slow-alpha, Alpha, Beta and Gamma were then extracted by using Finite Impulse Response Filter. After the extracted data were classified through Time-frequency transform, the data were purified through Independent Component Analysis to delete artifacts. The purified data were converted into CSV file format in order to conduct experiments of Machine learning algorithm and Arousal-Valence plane was used in the criteria of the emotion classification. The emotions were categorized into three-sections; 'Positive', 'Negative' and 'Neutral' meaning the tranquil (neutral) emotional condition. Data of 'Neutral' condition were classified by using Cz(Central zero) channel configured as Reference channel. To enhance the accuracy ratio, the experiment was performed by applying the attributes selected by ASC(Attribute Selected Classifier). In "Arousal" sector, the accuracy of this study's experiments was higher at "32.48%" than Koelstra's results. And the result of ASC showed higher accuracy at "8.13%" compare to the Liu's results in "Valence". In the experiment of Random Forest Classifier adapting ASC to improve accuracy, the higher accuracy rate at "2.68%" was confirmed than Total mean as the criterion compare to the existing researches.

A Classification Model for Customs Clearance Inspection Results of Imported Aquatic Products Using Machine Learning Techniques (머신러닝 기법을 활용한 수입 수산물 통관검사결과 분류 모델)

  • Ji Seong Eom;Lee Kyung Hee;Wan-Sup Cho
    • The Journal of Bigdata
    • /
    • v.8 no.1
    • /
    • pp.157-165
    • /
    • 2023
  • Seafood is a major source of protein in many countries and its consumption is increasing. In Korea, consumption of seafood is increasing, but self-sufficiency rate is decreasing, and the importance of safety management is increasing as the amount of imported seafood increases. There are hundreds of species of aquatic products imported into Korea from over 110 countries, and there is a limit to relying only on the experience of inspectors for safety management of imported aquatic products. Based on the data, a model that can predict the customs inspection results of imported aquatic products is developed, and a machine learning classification model that determines the non-conformity of aquatic products when an import declaration is submitted is created. As a result of customs inspection of imported marine products, the nonconformity rate is less than 1%, which is very low imbalanced data. Therefore, a sampling method that can complement these characteristics was comparatively studied, and a preprocessing method that can interpret the classification result was applied. Among various machine learning-based classification models, Random Forest and XGBoost showed good performance. The model that predicts both compliance and non-conformance well as a result of the clearance inspection is the basic random forest model to which ADASYN and one-hot encoding are applied, and has an accuracy of 99.88%, precision of 99.87%, recall of 99.89%, and AUC of 99.88%. XGBoost is the most stable model with all indicators exceeding 90% regardless of oversampling and encoding type.

Examination of Aggregate Quality Using Image Processing Based on Deep-Learning (딥러닝 기반 영상처리를 이용한 골재 품질 검사)

  • Kim, Seong Kyu;Choi, Woo Bin;Lee, Jong Se;Lee, Won Gok;Choi, Gun Oh;Bae, You Suk
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.11 no.6
    • /
    • pp.255-266
    • /
    • 2022
  • The quality control of coarse aggregate among aggregates, which are the main ingredients of concrete, is currently carried out by SPC(Statistical Process Control) method through sampling. We construct a smart factory for manufacturing innovation by changing the quality control of coarse aggregates to inspect the coarse aggregates based on this image by acquired images through the camera instead of the current sieve analysis. First, obtained images were preprocessed, and HED(Hollistically-nested Edge Detection) which is the filter learned by deep learning segment each object. After analyzing each aggregate by image processing the segmentation result, fineness modulus and the aggregate shape rate are determined by analyzing result. The quality of aggregate obtained through the video was examined by calculate fineness modulus and aggregate shape rate and the accuracy of the algorithm was more than 90% accurate compared to that of aggregates through the sieve analysis. Furthermore, the aggregate shape rate could not be examined by conventional methods, but the content of this paper also allowed the measurement of the aggregate shape rate. For the aggregate shape rate, it was verified with the length of models, which showed a difference of ±4.5%. In the case of measuring the length of the aggregate, the algorithm result and actual length of the aggregate showed a ±6% difference. Analyzing the actual three-dimensional data in a two-dimensional video made a difference from the actual data, which requires further research.