• Title/Summary/Keyword: Outlier test

Search Result 109, Processing Time 0.028 seconds

Robust Most Significant Periods of Developments In Time Dominated Data

  • Aboukalam, F.
    • International Journal of Reliability and Applications
    • /
    • v.7 no.2
    • /
    • pp.101-110
    • /
    • 2006
  • Let E be a set of n quantitative observations under the time control. The interval of time is to be split into several subintervals such that the observations in each subinterval are almost similar, whereas the observations between the subintervals are very dissimilar. The corresponding time-subintervals become periods or phases of the development that exist in the underlying phenomenon. Aboukalam(2005) proposes a robust solution based on some initial subintervals and a technique for combining any two successive groups in that starter using a t-test under a fixed significant level ($\alpha$). The inconvenience is that; the technique reliability is not released from the level $\alpha$ which must not be defined apart from the number of the periods that is, in its turn, unknown. To avoid this, we propose what so called; most significant periods solution. The new technique constructs its own initial subintervals and uses another way for combining the groups. However, the way of determining and treating outliers has not changed. This paper conducts many empirical simulations using different possible time dominated data in order to illustrate the reliability of the proposed technique. Finally, we apply both techniques on some real time dominated data to explain the advantage of the proposal.

  • PDF

Method of Processing the Outliers and Missing Values of Field Data to Improve RAM Analysis Accuracy (RAM 분석 정확도 향상을 위한 야전운용 데이터의 이상값과 결측값 처리 방안)

  • Kim, In Seok;Jung, Won
    • Journal of Applied Reliability
    • /
    • v.17 no.3
    • /
    • pp.264-271
    • /
    • 2017
  • Purpose: Field operation data contains missing values or outliers due to various causes of the data collection process, so caution is required when utilizing RAM analysis results by field operation data. The purpose of this study is to present a method to minimize the RAM analysis error of the field data to improve the accuracy. Methods: Statistical methods are presented for processing of the outliers and the missing values of the field operating data, and after analyzing the RAM, the differences between before and after applying the technique are discussed. Results: The availability is estimated to be lower by 6.8 to 23.5% than that before processing, and it is judged that the processing of the missing values and outliers greatly affect the RAM analysis result. Conclusion: RAM analysis of OO weapon system was performed and suggestions for improvement of RAM analysis were presented through comparison with the new and current method. Data analysis results without appropriate treatment of error values may result in incorrect conclusions leading to inappropriate decisions and actions.

Relevancy contemplation in medical data analytics and ranking of feature selection algorithms

  • P. Antony Seba;J. V. Bibal Benifa
    • ETRI Journal
    • /
    • v.45 no.3
    • /
    • pp.448-461
    • /
    • 2023
  • This article performs a detailed data scrutiny on a chronic kidney disease (CKD) dataset to select efficient instances and relevant features. Data relevancy is investigated using feature extraction, hybrid outlier detection, and handling of missing values. Data instances that do not influence the target are removed using data envelopment analysis to enable reduction of rows. Column reduction is achieved by ranking the attributes through feature selection methodologies, namely, extra-trees classifier, recursive feature elimination, chi-squared test, analysis of variance, and mutual information. These methodologies are ranked via Technique for Order of Preference by Similarity to Ideal Solution (TOPSIS) using weight optimization to identify the optimal features for model building from the CKD dataset to facilitate better prediction while diagnosing the severity of the disease. An efficient hybrid ensemble and novel similarity-based classifiers are built using the pruned dataset, and the results are thereafter compared with random forest, AdaBoost, naive Bayes, k-nearest neighbors, and support vector machines. The hybrid ensemble classifier yields a better prediction accuracy of 98.31% for the features selected by extra tree classifier (ETC), which is ranked as the best by TOPSIS.

Automated Satellite Image Co-Registration using Pre-Qualified Area Matching and Studentized Outlier Detection (사전검수영역기반정합법과 't-분포 과대오차검출법'을 이용한 위성영상의 '자동 영상좌표 상호등록')

  • Kim, Jong Hong;Heo, Joon;Sohn, Hong Gyoo
    • KSCE Journal of Civil and Environmental Engineering Research
    • /
    • v.26 no.4D
    • /
    • pp.687-693
    • /
    • 2006
  • Image co-registration is the process of overlaying two images of the same scene, one of which represents a reference image, while the other is geometrically transformed to the one. In order to improve efficiency and effectiveness of the co-registration approach, the author proposed a pre-qualified area matching algorithm which is composed of feature extraction with canny operator and area matching algorithm with cross correlation coefficient. For refining matching points, outlier detection using studentized residual was used and iteratively removes outliers at the level of three standard deviation. Throughout the pre-qualification and the refining processes, the computation time was significantly improved and the registration accuracy is enhanced. A prototype of the proposed algorithm was implemented and the performance test of 3 Landsat images of Korea. showed: (1) average RMSE error of the approach was 0.435 pixel; (2) the average number of matching points was over 25,573; (3) the average processing time was 4.2 min per image with a regular workstation equipped with a 3 GHz Intel Pentium 4 CPU and 1 Gbytes Ram. The proposed approach achieved robustness, full automation, and time efficiency.

A Robust Vector Quantization Method against Distortion Outlier and Source Mismatch (이상 신호왜곡과 소스 불일치에 강인한 벡터 양자화 방법)

  • Noh, Myung-Hoon;Kim, Moo-Young
    • Journal of the Institute of Electronics Engineers of Korea SP
    • /
    • v.49 no.3
    • /
    • pp.74-80
    • /
    • 2012
  • In resolution-constrained quantization, the size of Voronoi cell varies depending on probability density function of the input data, which causes large amount of distortion outliers. We propose a vector quantization method that reduces distortion outliers by combining the generalized Lloyd algorithm (GLA) and the cell-size constrained vector quantization (CCVQ) scheme. The training data are divided into the inside and outside regions according to the size of Voronoi cell, and consequently CCVQ and GLA are applied to each region, respectively. As CCVQ is applied to the densely populated region of the source instead of GLA, the number of centroids for the outside region can be increased such that distortion outliers can be decreased. In real-world environment, source mismatch between training and test data is inevitable. For the source mismatch case, the proposed algorithm improves performance in terms of average distortion and distortion outliers.

Automatic generation of reliable DEM using DTED level 2 data from high resolution satellite images (고해상도 위성영상과 기존 수치표고모델을 이용하여 신뢰성이 향상된 수치표고모델의 자동 생성)

  • Lee, Tae-Yoon;Jung, Jae-Hoon;Kim, Tae-Jung
    • Spatial Information Research
    • /
    • v.16 no.2
    • /
    • pp.193-206
    • /
    • 2008
  • If stereo images is used for Digital Elevation Model (DEM) generation, a DEM is generally made by matching left image against right image from stereo images. In stereo matching, tie-points are used as initial match candidate points. The number and distribution of tie-points influence the matching result. DEM made from matching result has errors such as holes, peaks, etc. These errors are usually interpolated by neighbored pixel values. In this paper, we propose the DEM generation method combined with automatic tie-points extraction using existing DEM, image pyramid, and interpolating new DEM using existing DEM for more reliable DEM. For test, we used IKONOS, QuickBird, SPOT5 stereo images and a DTED level 2 data. The test results show that the proposed method automatically makes reliable DEMs. For DEM validation, we compared heights of DEM by proposed method with height of existing DTED level 2 data. In comparison result, RMSE was under than 15 m.

  • PDF

Probabilistic Distribution and Variability of Geotechnical Properties with Randomness Characteristic (무작위성을 보이는 지반정수의 확률분포 및 변동성)

  • Kim, Dong-Hee;Lee, Ju-Hyoung;Lee, Woo-Jin
    • Journal of the Korean Geotechnical Society
    • /
    • v.25 no.11
    • /
    • pp.87-103
    • /
    • 2009
  • To determine the reliable probabilistic distribution model of geotechnical properties, outlier and randomness test for analysis data, parameter estimation of probabilistic distribution model, and goodness-of-fit test for model parameter and probabilistic distribution model have to be performed in sequence. In this paper, the probabilistic distribution model's geotechnical properties of Songdo area in Incheon are estimated by the above proposed procedure. Also, the coefficient of variation (COV) representing the variability of geotechnical properties is determined for several geotechnical properties. Reliable probabilistic distribution model and COV of geotechnical properties can be used for probability-based design procedure and reasonable choice of design value in deterministic design method.

Estimating design floods for ungauged basins in the geum-river basin through regional flood frequency analysis using L-moments method (L-모멘트법을 이용한 지역홍수빈도분석을 통한 금강유역 미계측 유역의 설계홍수량 산정)

  • Lee, Jin-Young;Park, Dong-Hyeok;Shin, Ji-Yae;Kim, Tae-Woong
    • Journal of Korea Water Resources Association
    • /
    • v.49 no.8
    • /
    • pp.645-656
    • /
    • 2016
  • The study performed a regional flood frequency analysis and proposed a regression equation to estimate design floods corresponding to return periods for ungauged basins in Geum-river basin. Five preliminary tests were employed to investigate hydrological independence and homogeneity of streamflow data, i.e. the lag-one autocorrelation test, time homogeneity test, Grubbs-Beck outlier test, discordancy measure test ($D_i$), and regional homogeneity measure (H). The test results showed that streamflow data were time-independent, discordant and homogeneous within the basin. Using five probability distributions (generalized extreme value (GEV), three-parameter log-normal (LN-III), Pearson type 3 (P-III), generalized logistic (GLO), generalized Pareto (GPA)), comparative regional flood frequency analyses were carried out for the region. Based on the L-moment ratio diagram, average weighted distance (AWD) and goodness-of-fit statistics ($Z^{DIST}$), the GLO distribution was selected as the best fit model for Geum-river basin. Using the GLO, a regression equation was developed for estimating regional design floods, and validated by comparing the estimated and observed streamflows at the Ganggyeong station.

Distribution and Trend Analysis of the Significant Wave Heights Using KMA and ECMWF Data Sets in the Coastal Seas, Korea (KMA와 ECMWF 자료를 이용한 연안 유의파고의 분포 및 추세분석)

  • Ko, Dong Hui;Jeong, Shin Taek;Cho, Hong Yeon;Seo, Kyoung Sik
    • Journal of Korean Society of Coastal and Ocean Engineers
    • /
    • v.29 no.3
    • /
    • pp.129-138
    • /
    • 2017
  • The coastal wave environment is a very important factor that directly affects the change of coastal topography, the habitat of marine life, and the design of offshore structures. In recent years, changes in the wave environment due to climate change are expected, and a trend analysis of the wave environment using available data sets is required. In this paper, significant wave heights which are measured at six ocean buoys (Deokjeokdo, Oeyeondo, Chibaldo, Marado, Pohang, Ullengdo) have been used to analyze long-term trend of normal waves. In advance, the outlier of measured data by Korea Meteorological Administration have been removed using Rosner test. And Pearson correlation analysis between the measured data and ECMWF reanalysis data has been conducted. As a results, correlation coefficient between two data were 0.849~0.938. Meanwhile, Mann-Kendall test has been used to analyze the long-term trend of normal waves. As a results, it was found that there were no trend at Deokjeokdo, Oeyeondo and Chibaldo. However, Marado, Pohang and Ullengdo showed an increasing tendency.

Image Fusion of High Resolution SAR and Optical Image Using High Frequency Information (고해상도 SAR와 광학영상의 고주파 정보를 이용한 다중센서 융합)

  • Byun, Young-Gi;Chae, Tae-Byeong
    • Journal of the Korean Society of Surveying, Geodesy, Photogrammetry and Cartography
    • /
    • v.30 no.1
    • /
    • pp.75-86
    • /
    • 2012
  • Synthetic Aperture Radar(SAR) imaging system is independent of solar illumination and weather conditions; however, SAR image is difficult to interpret as compared with optical images. It has been increased interest in multi-sensor fusion technique which can improve the interpretability of $SAR^{\circ\circ}$ images by fusing the spectral information from multispectral(MS) image. In this paper, a multi-sensor fusion method based on high-frequency extraction process using Fast Fourier Transform(FFT) and outlier elimination process is proposed, which maintain the spectral content of the original MS image while retaining the spatial detail of the high-resolution SAR image. We used TerraSAR-X which is constructed on the same X-band SAR system as KOMPSAT-5 and KOMPSAT-2 MS image as the test data set to evaluate the proposed method. In order to evaluate the efficiency of the proposed method, the fusion result was compared visually and quantitatively with the result obtained using existing fusion algorithms. The evaluation results showed that the proposed image fusion method achieved successful results in the fusion of SAR and MS image compared with the existing fusion algorithms.