• Title/Summary/Keyword: regression outlier

Search Result 116, Processing Time 0.024 seconds

Determination of Calibration Curve for Total Nitrogen Contents Analysis in Fresh Rice Leaves Using Visible and Near Infrared Spectroscopy (벼 생체엽신 질소함량 측정을 위한 근적외선분광분석의 검량식 작성)

  • Kwon Young-Rip;Baek Mi-Hwa;Choi Dong-Chil;Choi Joung-Sik;Choi Yeong-Geun
    • KOREAN JOURNAL OF CROP SCIENCE
    • /
    • v.50 no.6
    • /
    • pp.394-399
    • /
    • 2005
  • Near Infrared Spectroscopy (NIRS) has been used as a tool for the rapid, accurate and nondestructive assay of the fresh rice leaf in nitrogen content. NIRS used in this study was visible and near infrared spectroscopy type instrument, Foss model 6500. To obtain a useful calibration equation, standard regression between the data was analyzed by chemical analysis and by NIRS method. Accuracy of calibration equation for nitrogen content on fresh leaf of rice were 0.879, 0.858 and 0.819, respectively. Accuracy of calibration equation after outlier treatment increased as 0.017, 0.02 and 0.061 improved each with 0.896, 0.878 and 0.880, respectively. Calibration equation combined using merge function after accuracy of calibration equation more increased by 0.911. Difference analysis value between calibration equation and lab value by kjeldahl showed $0.001\%$. With this as same result is the possibility of closing the deterioration of the sample in order to omit a construction and pulverization process it is judged with the fact that the nitrogen content measurement of the fresh rice leaf which the possibility of reducing an hour and an expense is by a near infrared spectroscopy technique will be possible.

Prediction from Linear Regression Equation for Nitrogen Content Measurement in Bentgrasses leaves Using Near Infrared Reflectance Spectroscopy (근적외선 분광분석기를 이용한 잔디 생체잎의 질소 함량 측정을 위한 검량식 개발)

  • Cha, Jung-Hoon;Kim, Kyung-Duck;Park, Dae-Sup
    • Asian Journal of Turfgrass Science
    • /
    • v.23 no.1
    • /
    • pp.77-90
    • /
    • 2009
  • Near Infrared Reflectance Spectroscopy(NIRS) is a quick, accurate, and non-destructive method to measure multiple nutrient components in plant leaves. This study was to acquire a liner regression equation by evaluating the nutrient contents of 'CY2' creeping bentgrass rapidly and accurately using NIRS. In particular, nitrogen fertility is a primary element to keep maintaining good quality of turfgrass. Nitrogen, moisture, carbohydrate, and starch were assessed and analyzed from 'CY2' creeping bentgrass clippings. A linear regression equation was obtained from accessing NIRS values from NIR spectrophotometer(NIR system, Model XDS, XM-1100 series, FOSS, Sweden) programmed with WinISI III project manager v1.50e and ISIscan(R) (Infrasoft International) and calibrated with laboratory values via chemical analysis from an authorized institute. The equation was formulated as MPLS(modified partial least squares) analyzing laboratory values and mathematically pre-treated spectra. The accuracy of the acquired equation was confirmed with SEP(standard error of prediction), which indicated as correlation coefficient($r^2$) and prediction error of sample unacquainted, followed by the verification of model equation of real values and these monitoring results. As results of monitoring, $r^2$ of nitrogen, moisture, and carbohydrate in 'CY2' creeping bentgrass was 0.840, 0.904, and 0.944, respectively. SEP was 0.066, 1.868, and 0.601, respectively. After outlier treatment, $r^2$ was 0.892, 0.925, and 0.971, while SEP was 0.052, 1.577, and 0.394, respectively, which totally showed a high correlation. However, $r^2$ of starch was 0.464, which appeared a low correlation. Thereof, the verified equation appearing higher $r^2$ of nitrogen, moisture, and carbohydrate showed its higher accuracy of prediction model, which finally could be put into practical use for turf management system.

Robust Parameter Estimation using Fuzzy RANSAC (퍼지 RANSAC을 이용한 강건한 인수 예측)

  • Lee Joong-Jae;Jang Hyo-Jong;Kim Gye-Young;Choi Hyung-il
    • Journal of KIISE:Software and Applications
    • /
    • v.33 no.2
    • /
    • pp.252-266
    • /
    • 2006
  • Many problems in computer vision are mainly based on mathematical models. Their optimal solutions can be found by estimating the parameters of each model. However, provided an input data set is involved outliers which are relative]V larger than normal noises, they lead to incorrect results. RANSAC is a representative robust algorithm which is used to resolve the problem. One major problem with RANSAC is that it needs priori knowledge(i.e. a percentage of outliers) of the distribution of data. To solve this problem, we propose a FRANSAC algorithm which improves the rejection rate of outliers and the accuracy of solutions. This is peformed by categorizing all data into good sample set, bad sample set and vague sample set using a fuzzy classification at each iteration and sampling in only good sample set. In the experimental results, we show that the performance of the proposed algorithm when it is applied to the linear regression and the calculation of a homography.

X11ARIMA Procedure (한국형 X11ARIMA 프로시져에 관한 연구)

  • 박유성;최현희
    • The Korean Journal of Applied Statistics
    • /
    • v.11 no.2
    • /
    • pp.335-350
    • /
    • 1998
  • X11ARIMA is established on the basis of X11 which is one of smoothing approach in time series area and this procedure was introduced by Bureau of Census of United States and developed by Dagum(1975). This procedure had been updated and adjusted by Dagum(1988) with 174 economic index of North America and has been used until nowadays. Recently, X12ARIMA procedure has been studied by William Bell et.al. (1995) and Chen. & Findly(1995) whose approaches adapt adjusting outliers, Trend-change effects, seasonal effect, arid Calender effect. However, both of these procedures were implemented for correct adjusting the economic index of North America. This article starts with providing some appropriate and effective ARIMA model for 102 indexes produced by national statistical office in Korea; which consists of production(21), shipping(27), stock(27), and operating rate index(21). And a reasonable smoothing method will be proposed to reflect the specificity of Korean economy using several moving average model. In addition, Sulnal(lunar happy new year) and Chusuk effects will be extracted from the indexes above and both of effects reflect contribution of lunar calender effect. Finally, we will discuss an alternative way to estimate holiday effect which is similar to X12ARIMA procedure in concept of using both of ARIMA model and Regression model for the best fitness.

  • PDF

Background Concentration and Contamination Assessment of Heavy Metals in Korean Coastal Sediments (한반도 연안 퇴적물의 중금속 배경농도 및 오염도 평가)

  • WOO, JUNSIK;LEE, HYOJIN;PARK, JONGKYU;PARK, KYOUNGKYU;CHO, DONGJIN;JANG, DONGJUN;PARK, SOJUNG;CHOI, MANSIK;YOO, JEONGKYU
    • The Sea:JOURNAL OF THE KOREAN SOCIETY OF OCEANOGRAPHY
    • /
    • v.24 no.1
    • /
    • pp.64-78
    • /
    • 2019
  • The background concentrations of heavy metals in Korean coastal sediments were estimated using heavy metal data for 495 sediments obtained from 'National Marine Ecosystem Survey (Coastal ecosystem) in 2016-2017' and the extent of contamination was assessed. Al, Cs, and Li are chosen as appropriate indicators for sediment grain size. In the relationships between heavy metal and indicators concentrations, the lowest slope data were selected through the outlier removal and residual analysis, and the background concentrations were presented as a linear regression line between metal and indicator. Comparing the previous studies for the background concentrations of heavy metals in Korean coastal sediments, concentration levels were generally consistent but those for As and Cd were presented for the first time, and the background concentration using Li as the indicator was presented for the first time.

A Study on the Response Plan by Station Area Cluster through Time Series Analysis of Urban Rail Riders Before and After COVID-19 (COVID-19 전후 도시철도 승차인원 시계열 군집분석을 통한 역세권 군집별 대응방안 고찰)

  • Li, Cheng Xi;Jung, Hun Young
    • KSCE Journal of Civil and Environmental Engineering Research
    • /
    • v.43 no.3
    • /
    • pp.363-370
    • /
    • 2023
  • Due to the spread of COVID-19, the use of public transportation such as urban railroads has changed significantly since the beginning of 2020. Therefore, in this study, daily time series data for each urban railway station were collected for three years before COVID-19 and after the spread of COVID-19, and the similarity of time series analysis was evaluated through DTW (Dynamic Time Warping) distance method to derive regression centers for each cluster, and the effect of various external events such as COVID-19 on changes in the number of users was diagnosed as a time series impact detection function. In addition, the characteristics of use by cluster of urban railway stations were analyzed, and the change in passenger volume due to external shocks was identified. The purpose was to review measures for the maintenance and recovery of usage in the event of re-proliferation of COVID-19.

A Study of Anomaly Detection for ICT Infrastructure using Conditional Multimodal Autoencoder (ICT 인프라 이상탐지를 위한 조건부 멀티모달 오토인코더에 관한 연구)

  • Shin, Byungjin;Lee, Jonghoon;Han, Sangjin;Park, Choong-Shik
    • Journal of Intelligence and Information Systems
    • /
    • v.27 no.3
    • /
    • pp.57-73
    • /
    • 2021
  • Maintenance and prevention of failure through anomaly detection of ICT infrastructure is becoming important. System monitoring data is multidimensional time series data. When we deal with multidimensional time series data, we have difficulty in considering both characteristics of multidimensional data and characteristics of time series data. When dealing with multidimensional data, correlation between variables should be considered. Existing methods such as probability and linear base, distance base, etc. are degraded due to limitations called the curse of dimensions. In addition, time series data is preprocessed by applying sliding window technique and time series decomposition for self-correlation analysis. These techniques are the cause of increasing the dimension of data, so it is necessary to supplement them. The anomaly detection field is an old research field, and statistical methods and regression analysis were used in the early days. Currently, there are active studies to apply machine learning and artificial neural network technology to this field. Statistically based methods are difficult to apply when data is non-homogeneous, and do not detect local outliers well. The regression analysis method compares the predictive value and the actual value after learning the regression formula based on the parametric statistics and it detects abnormality. Anomaly detection using regression analysis has the disadvantage that the performance is lowered when the model is not solid and the noise or outliers of the data are included. There is a restriction that learning data with noise or outliers should be used. The autoencoder using artificial neural networks is learned to output as similar as possible to input data. It has many advantages compared to existing probability and linear model, cluster analysis, and map learning. It can be applied to data that does not satisfy probability distribution or linear assumption. In addition, it is possible to learn non-mapping without label data for teaching. However, there is a limitation of local outlier identification of multidimensional data in anomaly detection, and there is a problem that the dimension of data is greatly increased due to the characteristics of time series data. In this study, we propose a CMAE (Conditional Multimodal Autoencoder) that enhances the performance of anomaly detection by considering local outliers and time series characteristics. First, we applied Multimodal Autoencoder (MAE) to improve the limitations of local outlier identification of multidimensional data. Multimodals are commonly used to learn different types of inputs, such as voice and image. The different modal shares the bottleneck effect of Autoencoder and it learns correlation. In addition, CAE (Conditional Autoencoder) was used to learn the characteristics of time series data effectively without increasing the dimension of data. In general, conditional input mainly uses category variables, but in this study, time was used as a condition to learn periodicity. The CMAE model proposed in this paper was verified by comparing with the Unimodal Autoencoder (UAE) and Multi-modal Autoencoder (MAE). The restoration performance of Autoencoder for 41 variables was confirmed in the proposed model and the comparison model. The restoration performance is different by variables, and the restoration is normally well operated because the loss value is small for Memory, Disk, and Network modals in all three Autoencoder models. The process modal did not show a significant difference in all three models, and the CPU modal showed excellent performance in CMAE. ROC curve was prepared for the evaluation of anomaly detection performance in the proposed model and the comparison model, and AUC, accuracy, precision, recall, and F1-score were compared. In all indicators, the performance was shown in the order of CMAE, MAE, and AE. Especially, the reproduction rate was 0.9828 for CMAE, which can be confirmed to detect almost most of the abnormalities. The accuracy of the model was also improved and 87.12%, and the F1-score was 0.8883, which is considered to be suitable for anomaly detection. In practical aspect, the proposed model has an additional advantage in addition to performance improvement. The use of techniques such as time series decomposition and sliding windows has the disadvantage of managing unnecessary procedures; and their dimensional increase can cause a decrease in the computational speed in inference.The proposed model has characteristics that are easy to apply to practical tasks such as inference speed and model management.

A Study on the Methodology of Extracting the vulnerable districts of the Aged Welfare Using Artificial Intelligence and Geospatial Information (인공지능과 국토정보를 활용한 노인복지 취약지구 추출방법에 관한 연구)

  • Park, Jiman;Cho, Duyeong;Lee, Sangseon;Lee, Minseob;Nam, Hansik;Yang, Hyerim
    • Journal of Cadastre & Land InformatiX
    • /
    • v.48 no.1
    • /
    • pp.169-186
    • /
    • 2018
  • The social influence of the elderly population will accelerate in a rapidly aging society. The purpose of this study is to establish a methodology for extracting vulnerable districts of the welfare of the aged through machine learning(ML), artificial neural network(ANN) and geospatial analysis. In order to establish the direction of analysis, this progressed after an interview with volunteers who over 65-year old people, public officer and the manager of the aged welfare facility. The indicators are the geographic distance capacity, elderly welfare enjoyment, officially assessed land price and mobile communication based on old people activities where 500 m vector areal unit within 15 minutes in Yongin-city, Gyeonggi-do. As a result, the prediction accuracy of 83.2% in the support vector machine(SVM) of ML using the RBF kernel algorithm was obtained in simulation. Furthermore, the correlation result(0.63) was derived from ANN using backpropagation algorithm. A geographically weighted regression(GWR) was also performed to analyze spatial autocorrelation within variables. As a result of this analysis, the coefficient of determination was 70.1%, which showed good explanatory power. Moran's I and Getis-Ord Gi coefficients are analyzed to investigate spatially outlier as well as distribution patterns. This study can be used to solve the welfare imbalance of the aged considering the local conditions of the government recently.

Estimating design floods for ungauged basins in the geum-river basin through regional flood frequency analysis using L-moments method (L-모멘트법을 이용한 지역홍수빈도분석을 통한 금강유역 미계측 유역의 설계홍수량 산정)

  • Lee, Jin-Young;Park, Dong-Hyeok;Shin, Ji-Yae;Kim, Tae-Woong
    • Journal of Korea Water Resources Association
    • /
    • v.49 no.8
    • /
    • pp.645-656
    • /
    • 2016
  • The study performed a regional flood frequency analysis and proposed a regression equation to estimate design floods corresponding to return periods for ungauged basins in Geum-river basin. Five preliminary tests were employed to investigate hydrological independence and homogeneity of streamflow data, i.e. the lag-one autocorrelation test, time homogeneity test, Grubbs-Beck outlier test, discordancy measure test ($D_i$), and regional homogeneity measure (H). The test results showed that streamflow data were time-independent, discordant and homogeneous within the basin. Using five probability distributions (generalized extreme value (GEV), three-parameter log-normal (LN-III), Pearson type 3 (P-III), generalized logistic (GLO), generalized Pareto (GPA)), comparative regional flood frequency analyses were carried out for the region. Based on the L-moment ratio diagram, average weighted distance (AWD) and goodness-of-fit statistics ($Z^{DIST}$), the GLO distribution was selected as the best fit model for Geum-river basin. Using the GLO, a regression equation was developed for estimating regional design floods, and validated by comparing the estimated and observed streamflows at the Ganggyeong station.

The Estimation of Domestic Construction Technology Full-Text Services using Tobit Model (Tobit 모형을 이용한 국내 건설기술 원문서비스 가치 추정)

  • Jeong, Seong-Yun
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.17 no.6
    • /
    • pp.656-662
    • /
    • 2016
  • We have provided a variety of domestic construction technology related full-text services through the Construction Technology Digital Library system since 2001. CODIL is a system that services the database related to construction technology data. On the other hand, there is growing demand for DB every year, but the required budget is shrinking. Therefore, this study investigated the satisfaction to effectively service the construction technique-related full-text with a limited budget. The monetary value of full-text to express satisfaction with the quantified value was estimated using the Tobit model. The Tobit model is used as a contingent valuation method to estimate the value of non-market goods. This model is the limited dependent variable regression model to observations by censoring the limit of the left side or right side so that a biased outlier is not reflected in the willingness to pay. A survey was conducted by sampling 312 respondents. The mean, median, truncating the willingness of payment were calculated for the six types of the full-text services using the Tobit model. The statistically significant variables affecting the willingness to pay for the full-text services were identified. The mean value of per the full-text service was estimated to be 46,530 won. The significance of this study was to use the Tobit model to estimate the value of the construction technology-related full-text services for the first time in Korea.