• Title/Summary/Keyword: Outlier

Search Result 654, Processing Time 0.023 seconds

A TBM data-based ground prediction using deep neural network (심층 신경망을 이용한 TBM 데이터 기반의 굴착 지반 예측 연구)

  • Kim, Tae-Hwan;Kwak, No-Sang;Kim, Taek Kon;Jung, Sabum;Ko, Tae Young
    • Journal of Korean Tunnelling and Underground Space Association
    • /
    • v.23 no.1
    • /
    • pp.13-24
    • /
    • 2021
  • Tunnel boring machine (TBM) is widely used for tunnel excavation in hard rock and soft ground. In the perspective of TBM-based tunneling, one of the main challenges is to drive the machine optimally according to varying geological conditions, which could significantly lead to saving highly expensive costs by reducing the total operation time. Generally, drilling investigations are conducted to survey the geological ground before the TBM tunneling. However, it is difficult to provide the precise ground information over the whole tunnel path to operators because it acquires insufficient samples around the path sparsely and irregularly. To overcome this issue, in this study, we proposed a geological type classification system using the TBM operating data recorded in a 5 s sampling rate. We first categorized the various geological conditions (here, we limit to granite) as three geological types (i.e., rock, soil, and mixed type). Then, we applied the preprocessing methods including outlier rejection, normalization, and extracting input features, etc. We adopted a deep neural network (DNN), which has 6 hidden layers, to classify the geological types based on TBM operating data. We evaluated the classification system using the 10-fold cross-validation. Average classification accuracy presents the 75.4% (here, the total number of data were 388,639 samples). Our experimental results still need to improve accuracy but show that geology information classification technique based on TBM operating data could be utilized in the real environment to complement the sparse ground information.

Comparative Analysis of Anomaly Detection Models using AE and Suggestion of Criteria for Determining Outliers

  • Kang, Gun-Ha;Sohn, Jung-Mo;Sim, Gun-Wu
    • Journal of the Korea Society of Computer and Information
    • /
    • v.26 no.8
    • /
    • pp.23-30
    • /
    • 2021
  • In this study, we present a comparative analysis of major autoencoder(AE)-based anomaly detection methods for quality determination in the manufacturing process and a new anomaly discrimination criterion. Due to the characteristics of manufacturing site, anomalous instances are few and their types greatly vary. These properties degrade the performance of an AI-based anomaly detection model using the dataset for both normal and anomalous cases, and incur a lot of time and costs in obtaining additional data for performance improvement. To solve this problem, the studies on AE-based models such as AE and VAE are underway, which perform anomaly detection using only normal data. In this work, based on Convolutional AE, VAE, and Dilated VAE models, statistics on residual images, MSE, and information entropy were selected as outlier discriminant criteria to compare and analyze the performance of each model. In particular, the range value applied to the Convolutional AE model showed the best performance with AUC PRC 0.9570, F1 Score 0.8812 and AUC ROC 0.9548, accuracy 87.60%. This shows a performance improvement of an accuracy about 20%P(Percentage Point) compared to MSE, which was frequently used as a standard for determining outliers, and confirmed that model performance can be improved according to the criteria for determining outliers.

A Real-time Correction of the Underestimation Noise for GK2A Daily NDVI (GK2A 일단위 NDVI의 과소추정 노이즈 실시간 보정)

  • Lee, Soo-Jin;Youn, Youjeong;Sohn, Eunha;Kim, Mija;Lee, Yangwon
    • Korean Journal of Remote Sensing
    • /
    • v.38 no.6_1
    • /
    • pp.1301-1314
    • /
    • 2022
  • Normalized Difference Vegetation Index (NDVI) is utilized as an indicator to represent the vegetation condition on the land surface in various applications such as land cover, crop yield, agricultural drought, soil moisture, and forest disaster. However, satellite optical sensors for visible and infrared rays cannot see through the clouds, so the NDVI of the cloud pixel is not a valid value for the land surface. This study proposed a real-time correction of the underestimation noise for GEO-KOMPSAT-2A (GK2A) daily NDVI and made sure its feasibility through the quantitative comparisons with Moderate Resolution Imaging Spectroradiometer (MODIS) NDVI and the qualitative interpretation of time-series changes. The underestimation noise was effectively corrected by the procedures such as the time-series correction considering vegetation phenology, the outlier removal using long-term climatology, and the gap filling using rigorous statistical methods. The correlation with MODIS NDVI was higher, and the difference was lower, showing a 32.7% improvement compared to the original NDVI product. The proposed method has an extensibility for use in other satellite products with some modification.

Analysis of the Optimal Window Size of Hampel Filter for Calibration of Real-time Water Level in Agricultural Reservoirs (농업용저수지의 실시간 수위 보정을 위한 Hampel Filter의 최적 Window Size 분석)

  • Joo, Dong-Hyuk;Na, Ra;Kim, Ha-Young;Choi, Gyu-Hoon;Kwon, Jae-Hwan;Yoo, Seung-Hwan
    • Journal of The Korean Society of Agricultural Engineers
    • /
    • v.64 no.3
    • /
    • pp.9-24
    • /
    • 2022
  • Currently, a vast amount of hydrologic data is accumulated in real-time through automatic water level measuring instruments in agricultural reservoirs. At the same time, false and missing data points are also increasing. The applicability and reliability of quality control of hydrological data must be secured for efficient agricultural water management through calculation of water supply and disaster management. Considering the characteristics of irregularities in hydrological data caused by irrigation water usage and rainfall pattern, the Korea Rural Community Corporation is currently applying the Hampel filter as a water level data quality management method. This method uses window size as a key parameter, and if window size is large, distortion of data may occur and if window size is small, many outliers are not removed which reduces the reliability of the corrected data. Thus, selection of the optimal window size for individual reservoir is required. To ensure reliability, we compared and analyzed the RMSE (Root Mean Square Error) and NSE (Nash-Sutcliffe model efficiency coefficient) of the corrected data and the daily water level of the RIMS (Rural Infrastructure Management System) data, and the automatic outlier detection standards used by the Ministry of Environment. To select the optimal window size, we used the classification performance evaluation index of the error matrix and the rainfall data of the irrigation period, showing the optimal values at 3 h. The efficient reservoir automatic calibration technique can reduce manpower and time required for manual calibration, and is expected to improve the reliability of water level data and the value of water resources.

Unveiling the Potential: Exploring NIRv Peak as an Accurate Estimator of Crop Yield at the County Level (군·시도 수준에서의 작물 수확량 추정: 옥수수와 콩에 대한 근적외선 반사율 지수(NIRv) 최댓값의 잠재력 해석)

  • Daewon Kim;Ryoungseob Kwon
    • Korean Journal of Agricultural and Forest Meteorology
    • /
    • v.25 no.3
    • /
    • pp.182-196
    • /
    • 2023
  • Accurate and timely estimation of crop yields is crucial for various purposes, including global food security planning and agricultural policy development. Remote sensing techniques, particularly using vegetation indices (VIs), have show n promise in monitoring and predicting crop conditions. However, traditional VIs such as the normalized difference vegetation index (NDVI) and enhanced vegetation index (EVI) have limitations in capturing rapid changes in vegetation photosynthesis and may not accurately represent crop productivity. An alternative vegetation index, the near-infrared reflectance of vegetation (NIRv), has been proposed as a better predictor of crop yield due to its strong correlation with gross primary productivity (GPP) and its ability to untangle confounding effects in canopies. In this study, we investigated the potential of NIRv in estimating crop yield, specifically for corn and soybean crops in major crop-producing regions in 14 states of the United States. Our results demonstrated a significant correlation between the peak value of NIRv and crop yield/area for both corn and soybean. The correlation w as slightly stronger for soybean than for corn. Moreover, most of the target states exhibited a notable relationship between NIRv peak and yield, with consistent slopes across different states. Furthermore, we observed a distinct pattern in the yearly data, where most values were closely clustered together. However, the year 2012 stood out as an outlier in several states, suggesting unique crop conditions during that period. Based on the established relationships between NIRv peak and yield, we predicted crop yield data for 2022 and evaluated the accuracy of the predictions using the Root Mean Square Percentage Error (RMSPE). Our findings indicate the potential of NIRv peak in estimating crop yield at the county level, with varying accuracy across different counties.

Evaluation of Reference Intervals of Some Selected Chemistry Parameters using Bootstrap Technique in Dogs (Bootstrap 기법을 이용한 개의 혈청검사 일부 항목의 참고범위 평가)

  • Kim, Eu-Tteum;Pak, Son-Il
    • Journal of Veterinary Clinics
    • /
    • v.24 no.4
    • /
    • pp.509-513
    • /
    • 2007
  • Parametric and nonparametric coupled with bootstrap simulation technique were used to reevaluate previously defined reference intervals of serum chemistry parameters. A population-based study was performed in 100 clinically healthy dogs that were retrieved from the medical records of Kangwon National University Animal Hospital during 2005-2006. Data were from 52 males and 48 females(1 to 8 years old, 2.2-5.8 kg of body weight). Chemistry parameters examined were blood urea nitrogen(BUN)(mg/dl), cholesterol(mg/dl), calcium(mg/dl), aspartate aminotransferase(AST)(U/L), alanine aminotransferase(ALT)(U/L), alkaline phosphatase(ALP)(U/L), and total protein(g/dl), and were measured by Ektachem DT 60 analyzer(Johnson & Johnson). All but calcium were highly skewed distributions. Outliers were commonly identified particularly in enzyme parameters, ranging 5-9% of the samples and the remaining were only 1-2%. Regardless of distribution type of each analyte, nonparametric methods showed better estimates for use in clinical chemistry compare to parametric methods. The mean and reference intervals estimated by nonparametric bootstrap methods of BUN, cholesterol, calcium, AST, ALT, ALP, and total protein were 14.7(7.0-24.2), 227.3(120.7-480.8), 10.9(8.1-12.5), 25.4(11.8-66.6), 25.5(11.7-68.9), 87.7(31.1-240.8), and 6.8(5.6-8.2), respectively. This study indicates that bootstrap methods could be a useful statistical method to establish population-based reference intervals of serum chemistry parameters, as it is often the case that many laboratory values do not confirm to a normal distribution. In addition, the results emphasize on the confidence intervals of the analytical parameters showing distribution-related variations.

Recent Trends in Blooming Dates of Spring Flowers and the Observed Disturbance in 2014 (최근의 봄꽃 개화 추이와 2014년 개화시기의 혼란)

  • Lee, Ho-Seung;Kim, Jin-Hee;Yun, Jin I.
    • Korean Journal of Agricultural and Forest Meteorology
    • /
    • v.16 no.4
    • /
    • pp.396-402
    • /
    • 2014
  • The spring season in Korea features a dynamic landscape with a variety of flowers such as magnolias, azaleas, forsythias, cherry blossoms and royal azaleas flowering sequentially one after another. However, the narrowing of south-north differences in flowering dates and those among the flower species was observed in 2014, taking a toll on economic and shared communal values of seasonal landscape. This study was carried out to determine whether the 2014 incidence is an outlier or a mega trend in spring phenology. Data on flowering dates of forsythias and cherry blossoms, two typical spring flower species, as observed for the recent 60 years in 6 weather stations of Korea Meteorological Administration (KMA) indicate that the difference spanning the flowering date of forsythias, the flower blooming earlier in spring, and that of cherry blossoms that flower later than forsythias was 30 days at the longest and 14 days on an average in the climatological normal year for the period 1951-1980, comparing with the period 1981-2010 when the difference narrowed to 21 days at the longest and 11 days on an average. The year 2014 in particular saw the gap further narrowing down to 7 days, making it possible to see forsythias and cherry blossoms blooming at the same time in the same location. 'Cherry blossom front' took 20 days in traveling from Busan, the earliest flowering station, to Incheon, the latest flowering station, in the case of the 1951-1980 normal year, while 16 days for the 1981-2010 and 6 days for 2014 were observed. The delay in flowering date of forsythias for each time period was 20, 17, and 12 days, respectively. It is presumed that the recent climate change pattern in the Korean Peninsula as indicated by rapid temperature hikes in late spring contrastive to slow temperature rise in early spring immediately after dormancy release brought forward the flowering date of cherry blossoms which comes later than forsythias which flowers early in spring. Thermal time based heating requirements for flowering of 2 species were estimated by analyzing the 60 year data at the 6 locations and used to predict flowering date in 2014. The root mean square error for the prediction was within 2 days from the observed flowering dates in both species at all 6 locations, showing a feasibility of thermal time as a prognostic tool.

Estimation of Moisture Content in Cucumber and Watermelon Seedlings Using Hyperspectral Imagery (초분광영상 이용 오이 및 수박 묘의 수분함량 추정)

  • Kim, Seong-Heon;Kang, Jeong-Gyun;Ryu, Chan-Seok;Kang, Ye-Seong;Sarkar, Tapash Kumar;Kang, Dong Hyeon;Ku, Yang-Gyu;Kim, Dong-Eok
    • Journal of Bio-Environment Control
    • /
    • v.27 no.1
    • /
    • pp.34-39
    • /
    • 2018
  • This research was conducted to estimate moisture content in cucurbitaceae seedlings, such as cucumber and watermelon, using hyperspectral imagery. Using a hyperspectral image acquisition system, the reflectance of leaf area of cucumber and watermelon seedlings was calculated after providing water stress. Then, moisture content in each seedling was measured by using a dry oven. Finally, using reflectance and moisture content, the moisture content estimation models were developed by PLSR analysis. After developing the estimation models, performance of the cucumber showed 0.73 of $R^2$, 1.45% of RMSE, and 1.58% of RE. Performance of the watermelon showed 0.66 of $R^2$, 1.06% of RMSE, and 1.14% of RE. The model performed slightly better after removing one sample from cucumber seedlings as outlier and unnecessary. Hence, the performance of new model for cucumber seedlings showed 0.79 of $R^2$, 1.10% of RMSE, and 1.20% of RE. The model performance combined with all samples showed 0.67 of $R^2$, 1.26% of RMSE, and 1.36% of RE. The model of cucumber showed better performance than the model of watermelon. This is because variables of cucumber are consisted of widely distributed variation, and it affected the performance. Further, accuracy and precision of the cucumber model were increased when an insignificant sample was eliminated from the dataset. Finally, it is considered that both models can be significantly used to estimate moisture content, as gradients of trend line are almost same and intersected. It is considered that the accuracy and precision of the estimating models possibly can be improved, if the models are constructed by using variables with widely distributed variation. The improved models will be utilized as the basis for developing low-priced sensors.

Software Reliability Growth Modeling in the Testing Phase with an Outlier Stage (하나의 이상구간을 가지는 테스팅 단계에서의 소프트웨어 신뢰도 성장 모형화)

  • Park, Man-Gon;Jung, Eun-Yi
    • The Transactions of the Korea Information Processing Society
    • /
    • v.5 no.10
    • /
    • pp.2575-2583
    • /
    • 1998
  • The productionof the highly relible softwae systems and theirs performance evaluation hae become important interests in the software industry. The software evaluation has been mainly carried out in ternns of both reliability and performance of software system. Software reliability is the probability that no software error occurs for a fixed time interval during software testing phase. These theoretical software reliability models are sometimes unsuitable for the practical testing phase in which a software error at a certain testing stage occurs by causes of the imperfect debugging, abnornal software correction, and so on. Such a certatin software testing stage needs to be considered as an outlying stage. And we can assume that the software reliability does not improve by means of muisance factor in this outlying testing stage. In this paper, we discuss Bavesian software reliability growth modeling and estimation procedure in the presence of an imidentitied outlying software testing stage by the modification of Jehnski Moranda. Also we derive the Bayes estimaters of the software reliability panmeters by the assumption of prior information under the squared error los function. In addition, we evaluate the proposed software reliability growth model with an unidentified outlying stage in an exchangeable model according to the values of nuisance paramether using the accuracy, bias, trend, noise metries as the quantilative evaluation criteria through the compater simulation.

  • PDF

Assessment and Prediction of Stand Yield in Cryptomeria japonica Stands (삼나무 임분수확량 평가 및 예측)

  • Son, Yeong Mo;Kang, Jin Taek;Hwang, Jeong Sun;Park, Hyun;Lee, Kang Su
    • Journal of Korean Society of Forest Science
    • /
    • v.104 no.3
    • /
    • pp.421-426
    • /
    • 2015
  • The objective of this paper is to look into the growth of Cryptomeria japonica stand in South Korea along with the evaluation on their yields, followed by their carbon stocks and removals. A total of 106 sample plots were selected from Jeonnam, Gyeongnam, and Jeju, where the groups of standard are grown. We only used 92 plots data except outlier. As part of the analysis, the Weibull diameter distribution was applied. In order to estimate the diameter distribution, the growth estimation equation for each of the growth factors including the height, the diameter at breast height, and the basal area was drafted out and the verification for each equation was examined. The site index for figuring out the forest productivity of Cryptomeria japonica stand for each district was also developed as a Schumacher model and 30yr was used as a reference age for the estimation of the site index. It was found that the site index for Cryptomeria japonica stand in South Korea ranges from 10 to 16 and this result was used as a standard for developing the stand yield table. According to the site 14 in the stand yield table, the mean annual increment (MAI) of the Cryptomeria japonica reaches $7.6m^3/ha$ on its 25yr and its growing stock is estimated to be at $190.1m^3/ha$. This volume is about $20m^3$ as high as that of the Chamaesyparis obtusa. Furthermore, the annual carbon absorptions for a Cryptomeria japonica stand reached the peak at 25yr, which is 2.14 tC/ha/yr, $7.83tCO_2/ha/yr$. When compared to the other conifers, this rate is slightly higher than that of a Chamaecyparis obtusa ($7.5tCO_2/ha/yr$) but lower than that of the Pinus koraiensis ($10.4tCO_2/ha/yr$) and Larix kaempferi ($11.2tCO_2/ha/yr$). With such research result as a base, it is necessary to come up with the ways to enhance the utilization of Cryptomeria japonica as timbers, besides making use of their growth data.