Browse > Article
http://dx.doi.org/10.9765/KSCOE.2014.26.4.207

Outlier Detection and Treatment for the Conversion of Chemical Oxygen Demand to Total Organic Carbon  

Cho, Beom Jun (Marine Environments & Conservation Research Division, Korea Institute of Ocean Science & Technology)
Cho, Hong Yeon (Marine Environments & Conservation Research Division, Korea Institute of Ocean Science & Technology)
Kim, Sung (Marine Ecosystem Research Division, Korea Institute of Ocean Science & Technology)
Publication Information
Journal of Korean Society of Coastal and Ocean Engineers / v.26, no.4, 2014 , pp. 207-216 More about this Journal
Abstract
Total organic carbon (TOC) is an important indicator used as an direct biological index in the research field of the marine carbon cycle. It is possible to produce the sufficient TOC estimation data by using the Chemical Oxygen Demand(COD) data because the available TOC data is relatively poor than the COD data. The outlier detection and treatment (removal) should be carried out reasonably and objectively because the equation for a COD-TOC conversion is directly affected the TOC estimation. In this study, it aims to suggest the optimal regression model using the available salinity, COD, and TOC data observed in the Korean coastal zone. The optimal regression model is selected by the comparison and analysis on the changes of data numbers before and after removal, variation coefficients and root mean square (RMS) error of the diverse detection methods of the outlier and influential observations. According to research result, it is shown that a diagnostic case combining SIQR (Semi - Inter-Quartile Range) boxplot and Cook's distance method is most suitable for the outlier detection. The optimal regression function is estimated as the TOC(mg/L) = $0.44{\cdot}COD(mg/L)+1.53$, then determination coefficient is showed a value of 0.47 and RMS error is 0.85 mg/L. The RMS error and the variation coefficients of the leverage values are greatly reduced to the 31% and 80% of the value before the outlier removal condition. The method suggested in this study can provide more appropriate regression curve because the excessive impacts of the outlier frequently included in the COD and TOC monitoring data is removed.
Keywords
outlier; optimal regression model; RMS error; determination coefficient; SIQR boxplot and Cook's distance;
Citations & Related Records
Times Cited By KSCI : 3  (Citation Analysis)
연도 인용수 순위
1 So, B.J., Kwon, H.H. and An, J.H. (2012). Trend Analysis of Extreme Precipitation Using Quantile Regression, Journal of Korea Water Resources Association, Vol. 45, No. 8, pp. 815-826.   과학기술학회마을   DOI   ScienceOn
2 Son, J.W., Park, Y.C. and Lee, H.J. (2003). Characteristics of Total Organic Carbon and Chemical Oxygen Demand in the Coastal Waters of Korea. The SeaJournal of the Korean Society of Oceanography, Vol. 8, No. 3, pp. 317-326.   과학기술학회마을
3 Tchobanoglous, G. and Schroeder, E.D. (1985). Water Quality, pp. 101-104.
4 Cook, R.D. (1977). Detection of Influential Observations in Linear Regression, Technometrics, 19, pp. 15-18.   DOI   ScienceOn
5 Aucremanne, L., Brys, G., Hubert, M., Rousseeuw, PJ. and Struyf, A. (2004). A study of belgian inflation, relative prices and nominal rigidities using new robust measures of skewness and tail weight. In: Hubert, M, Pison, G, Struyf, A, Van Aelst, S. (Eds.), Theory and Applications of Recent Robust Methods, Series: Statistics for Industry and Technology. Birkhauser, Basel, pp. 13-25.
6 Barnett, V. and Lewis, T. (1994). Outliers in Statistical Data, John Wiley & Sons, pp. 320-328.
7 Chatterjee, S. and Hadi, A.S. (1986). Influential observations, high leverage points, and outliers in linear regression, Statistical Science, Vol. 1, No. 3, pp. 379-416.   DOI   ScienceOn
8 Hedger, J.I. (2002). Why dissolved organic matter, In : Biogeochemistry of marine dissolved organic matter, edited by Hansell, D.A. and C.A. Carlson, Academic Press, Amsterdam, pp. 1-33.
9 Chen, R.F. and Bada, J.F. (1992). The fluorescence of dissolved organic matter in seawater, Marine Chemistry, Vol. 37, pp. 191-221.   DOI   ScienceOn
10 Cho, H.Y. and Oh, J.H., (2012). Outlier Detection of the Coastal Water Temperature Monitoring Data Using the Approximate and Detail Components, Journal of the Korean Society for Marine Environmental Engineering, Vol. 15, No. 2, pp. 156-162.   과학기술학회마을   DOI
11 Hoaglin, D.C. and Welsch, R.E. (1978). The Hat Matrix in Regression and ANOVA, The American Statistician, Vol. 32, pp. 17-22.
12 Hubert, M. and Vandervieren, E. (2008). An adjusted boxplot for skewed distributions, Computational Statistics and Data Analysis, Vol. 52, pp. 5186-5201.   DOI   ScienceOn
13 Kim, C. and Storer, B.E. (1996). Reference Values for Cook's Distance, Communications in Statistics Simulations and Computations, Vol. 25, pp. 691-708.   DOI
14 Kim, K.H., Son, S.K., Son, J.W. and Ju, S.J. (2006). Methodological comparison of the quantification of total carbon ad organic carbon in marine sediment, Journal of the Korean Society for Marine Environmental Engineering, Vol. 9, pp. 235-242.
15 Kimber, A.C. (1990). Exploratory data analysis for possibly censored data from skewed distributions, Applied Statistics, Vol. 39, pp. 21-30.   DOI
16 Kottegoda, N.T. and Renzo, R. (1997). Statistics, Probability, and Reliability for Civil and Environmental Engineers, pp. 375-380.
17 Lee, J.S., Kim, S.Y., Lee, Y.K., Shin, D.W., Kim, H.J. and Jou, H.T. (2001). A Study on Outlier Adjustment for Multibeam Echosounder Data, The SeaJournal of the Korean Society for Marine Environmental Engineering, Vol. 6, No. 1, pp. 35-39.   과학기술학회마을
18 Koenker, R. and Bassett, J.G. (1978). Regression quantile, Econometrica : Journal of the Econometric Society, Vol. 46, No. 1, pp. 33-50.   DOI   ScienceOn
19 Koenker, R. and Hallock, K.F. (2001). Quantile regression. Journal of Economic Perspectives, Vol. 15, No. 4, pp. 143-156.   DOI   ScienceOn
20 Korea Ocean Research & Development Institute. (2008). Development of management and restoration technologies for estuaries with focus on Han River estuary region, BSPE98101-2028-7, pp. 349-371 (in Korean).
21 Lyman, O.R. and Longnecker, M. (2001). An Introduction to Statistical Methods and Data Analysis, pp. 96-101.
22 Ministry of Land, Transport and Maritime Affairs, Korea Institute of Marine Science & Technology. (2011). Saemangeum coastal system research for marine environmental conservation, Korea Ocean Research & Development Institute, BSPM55630-2269-2, pp. 206-213 (in Korean).
23 Ministry of Maritime Affairs and Fisheries. (2006). Research on Marine Environmental Improvement of Shihwa Lake, Korea Ocean Research & Development Institute, BSPM38800-1825-4, pp. 158-162 (in Korean).
24 Ministry of Maritime Affairs and Fisheries. (2013a). Marine Environment Process Test Standard, Notification No. 2013-230 of the Ministry of Maritime Affairs and Fisheries (in Korean).
25 Ministry of Maritime Affairs and Fisheries. (2013b). Marine Environment Management Act Enforcement Regulations, Act No. 63 of the Ministry of Maritime Affairs and Fisheries (in Korean).
26 Hair, J.F., Black, W.C., Babin, B.J. and Anderson, R.E. (2010). Multivariate Data Analysis. Seventh Edition. Chapter 2. pp. 64-70.
27 Doval. M.D. and Hansell, D.A. (2000). Organic carbon and apparent oxygen utilization in the western south and the central Indian Ocean, Marine Chemistry, Vol. 68, pp. 249-264.   DOI