• Title/Summary/Keyword: Outlier Data

Search Result 415, Processing Time 0.026 seconds

An outlier weight adjustment using generalized ratio-cum-product method for two phase sampling (이중추출법에서 일반화 ratio-cum-product 방법을 이용한 이상점 가중치 보정법)

  • Oh, Jung-Taek;Shin, Key-Il
    • The Korean Journal of Applied Statistics
    • /
    • v.29 no.7
    • /
    • pp.1185-1199
    • /
    • 2016
  • Two phase sampling (double sampling) is often used when there is inadequate population information for proper stratification. Many recent papers have been devoted to the estimation method to improve the precision of the estimator using first phase information. In this study we suggested outlier weight adjustment methods to improve estimation precision based on the weight of the generalized ratio-cum-product estimator. Small simulation studies are conducted to compare the suggested methods and the usual method. Real data analysis is also performed.

The Use of Local Outlier Factor(LOF) for Improving Performance of Independent Component Analysis(ICA) based Statistical Process Control(SPC) (LOF를 이용한 ICA 기반 통계적 공정관리의 성능 개선 방법론)

  • Lee, Jae-Shin;Kang, Bok-Young;Kang, Suk-Ho
    • Journal of the Korean Operations Research and Management Science Society
    • /
    • v.36 no.1
    • /
    • pp.39-55
    • /
    • 2011
  • Process monitoring has been emphasized for the monitoring of complex system such as chemical processing industries to achieve the efficiency enhancement, quality management, safety improvement. Recently, ICA (Independent Component Analysis) based MSPC (Multivariate Statistical Process Control) was widely used in process monitoring approaches. Moreover, DICA (Dynamic ICA) has been introduced to consider the system dynamics. However, the existing approaches show the limitation that their performances are strongly dependent on the statistical distributions of control variables. To improve the limitation, we propose a novel approach for process monitoring by integrating DICA and LOF (Local Outlier Factor). In this paper, we aim to improve the fault detection rate with the proposed method. LOF detects local outliers by using density of surrounding space so that its performance is regardless of data distribution. Therefore, the proposed method not only can consider the system dynamics but can also assure robust performance regardless of the statistical distributions of control variables. Comparison experiments were conducted on the widely used benchmark dataset, Tennessee Eastman process (TE process), and showed the improved performance than existing approaches.

Outlier detection and time series modelling in the stationary time series (정상 시계열에서의 이상치 발견과 시계열 모형구축)

  • 이종협;최기헌
    • The Korean Journal of Applied Statistics
    • /
    • v.5 no.2
    • /
    • pp.139-156
    • /
    • 1992
  • Recently several authors have introduced iterative methods for detecting time series outliers. Most of these methods are developed under the assumption that an underlying outlier-free model is known or can be identified. Since outliers can distort model identification or even make it impossible, we propose procedure begins with a descriptive data analysis of a time series using distance measures between two observations. Properties of the proposed test statistic are presented. To distinguish the type of an outlier are used transfer function models. An empirical example is given to illustrate the time series modeling procedure.

  • PDF

The Mean Reverting Behavior of Inflation in the Philippines

  • CAMBA, Abraham C. Jr.;CAMBA, Aileen L.
    • The Journal of Asian Finance, Economics and Business
    • /
    • v.8 no.10
    • /
    • pp.239-247
    • /
    • 2021
  • Central Bank authorities should carefully manage inflation rate uncertainties to achieve economic growth and development not only in the short-run but also in the long-run. Since inflation is a key macroeconomic variable, an increased understanding about its behavior is undoubtedly important. Thus, paper employs unit root with breakpoints to examine the mean reverting behavior of inflation rate in the Philippines using monthly data from 2002 to 2020. Empirically, the unit root breakpoint innovational and additive outlier tests favor the stationarity or mean reverting behavior of inflation in the Philippines. Also, results of standard unit root tests, ADF, PP, GLS-Dickey-Fuller, KPSS and NP, provide strong evidence of mean reverting processes. The mean reverting behavior of inflation rate reveals that the monetary policy using inflation targeting framework has succeeded in reducing chronic inflation persistence in the Philippines. Thus, this research supports inflation targeting policy that aims to maintain general price level stability for the Philippine economy's long-term growth and development prospects. The findings of this research remain important for the central bankers for not only providing them better understanding about the behavior of inflation rate, but also helping them formulate and implement policy reforms related to money, credit and banking.

Creating Subnetworks from Transcriptomic Data on Central Nervous System Diseases Informed by a Massive Transcriptomic Network

  • Feng, Yaping;Syrkin-Nikolau, Judith A.;Wurtele, Eve S.
    • Interdisciplinary Bio Central
    • /
    • v.5 no.1
    • /
    • pp.1.1-1.8
    • /
    • 2013
  • High quality publicly-available transcriptomic data representing relationships in gene expression across a diverse set of biological conditions is used as a context network to explore transcriptomics of the CNS. The context network, 18367Hu-matrix, contains pairwise Pearson correlations for 22,215 human genes across18,637 human tissue samples1. To do this, we compute a network derived from biological samples from CNS cells and tissues, calculate clusters of co-expressed genes from this network, and compare the significance of these to clusters derived from the larger 18367Hu-matrix network. Sorting and visualization uses the publicly available software, MetaOmGraph (http://www.metnetdb.org/MetNet_MetaOm-Graph.htm). This identifies genes that characterize particular disease conditions. Specifically, differences in gene expression within and between two designations of glial cancer, astrocytoma and glioblastoma, are evaluated in the context of the broader network. Such gene groups, which we term outlier-networks, tease out abnormally expressed genes and the samples in which this expression occurs. This approach distinguishes 48 subnetworks of outlier genes associated with astrocytoma and glioblastoma. As a case study, we investigate the relationships among the genes of a small astrocytoma-only subnetwork. This astrocytoma-only subnetwork consists of SVEP1, IGF1, CHRNA3, and SPAG6. All of these genes are highly coexpressed in a single sample of anaplastic astrocytoma tumor (grade III) and a sample of juvenile pilocytic astrocytoma. Three of these genes are also associated with nicotine. This data lead us to formulate a testable hypothesis that this astrocytoma outlier-network provides a link between some gliomas/astrocytomas and nicotine.

A Study on the Applicability of Machine Learning Algorithms for Detecting Hydraulic Outliers in a Borehole (시추공 수리 이상점 탐지를 위한 기계학습 알고리즘의 적용성 연구)

  • Seungbeom Choi; Kyung-Woo Park;Changsoo Lee
    • Tunnel and Underground Space
    • /
    • v.33 no.6
    • /
    • pp.561-573
    • /
    • 2023
  • Korea Atomic Energy Research Institute (KAERI) constructed the KURT (KAERI Underground Research Tunnel) to analyze the hydrogeological/geochemical characteristics of deep rock mass. Numerous boreholes have been drilled to conduct various field tests. The selection of suitable investigation intervals within a borehole is of great importance. When objectives are centered around hydraulic flow and groundwater sampling, intervals with sufficient groundwater flow are the most suitable. This study defines such points as hydraulic outliers and aimed to detect them using borehole geophysical logging data (temperature and EC) from a 1 km depth borehole. For systematic and efficient outlier detection, machine learning algorithms, such as DBSCAN, OCSVM, kNN, and isolation forest, were applied and their applicability was assessed. Following data preprocessing and algorithm optimization, the four algorithms detected 55, 12, 52, and 68 outliers, respectively. Though this study confirms applicability of the machine learning algorithms, it is suggested that further verification and supplements are desirable since the input data were relatively limited.

Pollution priority control algorithm and monitoring system (오염도 우선순위 방제 알고리즘과 모니터링 시스템)

  • Jin-Seok Lee;Young-Gon Kim;Jung-Min Park
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.24 no.5
    • /
    • pp.97-104
    • /
    • 2024
  • As indoor air pollution has emerged as a social issue since the COVID-19 pandemic, pollution management in large-scale facilities has been recognized as an important task. For this purpose, this study proposes real-time pollution level detection using sensors and efficient control path setting using Dijkstra algorithm as key technologies. In addition, by introducing outlier determination algorithm and priority algorithm, we propose ways to increase the reliability of the data and enable efficient control work. The outlier determination algorithm describes the process of identifying and processing outliers based on sensor data in an environmental monitoring system. It describes in detail the process of averaging the recent 10 sensor data, calculating the Z-score to detect outliers, and removing and replacing the data determined to be outliers. The priority algorithm describes the process of establishing an efficient control path in consideration of the pollution level of each region. It suggests how to select the most polluted areas first and use them as a starting point to set the control path. In addition, it introduces an iterative process of detecting and responding to the pollution level in real time, which allows the system to be continuously optimized and to respond to environmental pollution. Through this, it is expected to increase the reliability and efficiency of the environmental monitoring system through outlier judgment algorithms and priority algorithms, thereby quickly identifying and responding to pollution situations.

The Development of Biodegradable Fiber Tensile Tenacity and Elongation Prediction Model Considering Data Imbalance and Measurement Error (데이터 불균형과 측정 오차를 고려한 생분해성 섬유 인장 강신도 예측 모델 개발)

  • Se-Chan, Park;Deok-Yeop, Kim;Kang-Bok, Seo;Woo-Jin, Lee
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.11 no.12
    • /
    • pp.489-498
    • /
    • 2022
  • Recently, the textile industry, which is labor-intensive, is attempting to reduce process costs and optimize quality through artificial intelligence. However, the fiber spinning process has a high cost for data collection and lacks a systematic data collection and processing system, so the amount of accumulated data is small. In addition, data imbalance occurs by preferentially collecting only data with changes in specific variables according to the purpose of fiber spinning, and there is an error even between samples collected under the same fiber spinning conditions due to difference in the measurement environment of physical properties. If these data characteristics are not taken into account and used for AI models, problems such as overfitting and performance degradation may occur. Therefore, in this paper, we propose an outlier handling technique and data augmentation technique considering the characteristics of the spinning process data. And, by comparing it with the existing outlier handling technique and data augmentation technique, it is shown that the proposed technique is more suitable for spinning process data. In addition, by comparing the original data and the data processed with the proposed method to various models, it is shown that the performance of the tensile tenacity and elongation prediction model is improved in the models using the proposed methods compared to the models not using the proposed methods.

Application of Statistical Geo-Spatial Information Technology to Soil Stratification (통계적 지반 공간 정보 기법을 이용한 지층구조 분석)

  • Kim, Han-Saem;Kim, Hyun-Ki;Shin, Si-Yeol;Chung, Choong-Ki
    • Journal of the Korean Geotechnical Society
    • /
    • v.27 no.7
    • /
    • pp.59-68
    • /
    • 2011
  • Subsurface Investigation results always reflect a level of soil uncertainty, which sometimes requires statistical corrections of the data for the appropriate engineering decision. This study suggests a closed-form framework to extract the outlying data points from the testing results using the statistical geo-spatial information analyses with outlier analysis and kring-based crossvalidation. The suggested analysis method is conducted to soil stratification using the borehole data in Yeouido.

Study on the applicability of the principal component analysis for detecting leaks in water pipe networks (상수관망의 누수감지를 위한 주성분 분석의 적용 가능성에 대한 연구)

  • Kim, Kimin;Park, Suwan
    • Journal of Korean Society of Water and Wastewater
    • /
    • v.33 no.2
    • /
    • pp.159-167
    • /
    • 2019
  • In this paper the potential of the principal component analysis(PCA) technique for the application of detecting leaks in water pipe networks was evaluated. For this purpose the PCA was conducted to evaluate the relevance of the calculated outliers of a PCA model utilizing the recorded pipe flows and the recorded pipe leak incidents of a case study water distribution system. The PCA technique was enhanced by applying the computational algorithms developed in this study which were designed to extract a partial set of flow data from the original 24 hour flow data so that the effective outlier detection rate was maximized. The relevance of the calculated outliers of a PCA model and the recorded pipe leak incidents was analyzed. The developed algorithm may be applied in determining further leak detection field work for water distribution blocks that have more than 70% of the effective outlier detection rate. However, the analysis suggested that further development on the algorithm is needed to enhance the applicability of the PCA in detecting leaks by considering series of leak reports happening in a relatively short period.