• Title/Summary/Keyword: Outlier test

Search Result 109, Processing Time 0.029 seconds

A Study on the Statistical Predictability of Drinking Water Qualities for Contamination Warning System (수질오염 감시체계 구축을 위한 수질 데이터의 통계적 예측 가능성 검토)

  • Park, No-Suk;Lee, Young-Joo;Chae, Seonha;Yoon, Sukmin
    • Journal of Korean Society of Water and Wastewater
    • /
    • v.29 no.4
    • /
    • pp.469-479
    • /
    • 2015
  • This study have been conducted to analyze the feasibility of establishing Contamination Warning System(CWS) that is capable of monitoring early natural or intentional water quality accidents, and providing active and quick responses for domestic C_water supply system. In order to evaluate the water quality data set, pH, turbidity and free residual chlorine concentration data were collected and each statistical value(mean, variation, range) was calculated, then the seasonal variability of those were analyzed using the independent t-test. From the results of analyzing the distribution of outliers in the measurement data using a high-pass filter, it could be confirmed that a lot of lower outliers appeared due to data missing. In addition, linear filter model based on autoregressive model(AR(1) and AR(2)) was applied for the state estimation of each water quality data set. From the results of analyzing the variability of the autocorrelation coefficient structure according to the change of window size(6hours~48hours), at least the window size longer than 12hours should be necessary for estimating the state of water quality data satisfactorily.

An LSTM Neural Network Model for Forecasting Daily Peak Electric Load of EV Charging Stations (EV 충전소의 일별 최대전력부하 예측을 위한 LSTM 신경망 모델)

  • Lee, Haesung;Lee, Byungsung;Ahn, Hyun
    • Journal of Internet Computing and Services
    • /
    • v.21 no.5
    • /
    • pp.119-127
    • /
    • 2020
  • As the electric vehicle (EV) market in South Korea grows, it is required to expand charging facilities to respond to rapidly increasing EV charging demand. In order to conduct a comprehensive facility planning, it is necessary to forecast future demand for electricity and systematically analyze the impact on the load capacity of facilities based on this. In this paper, we design and develop a Long Short-Term Memory (LSTM) neural network model that predicts the daily peak electric load at each charging station using the EV charging data of KEPCO. First, we obtain refined data through data preprocessing and outlier removal. Next, our model is trained by extracting daily features per charging station and constructing a training set. Finally, our model is verified through performance analysis using a test set for each charging station type, and the limitations of our model are discussed.

SHM data anomaly classification using machine learning strategies: A comparative study

  • Chou, Jau-Yu;Fu, Yuguang;Huang, Shieh-Kung;Chang, Chia-Ming
    • Smart Structures and Systems
    • /
    • v.29 no.1
    • /
    • pp.77-91
    • /
    • 2022
  • Various monitoring systems have been implemented in civil infrastructure to ensure structural safety and integrity. In long-term monitoring, these systems generate a large amount of data, where anomalies are not unusual and can pose unique challenges for structural health monitoring applications, such as system identification and damage detection. Therefore, developing efficient techniques is quite essential to recognize the anomalies in monitoring data. In this study, several machine learning techniques are explored and implemented to detect and classify various types of data anomalies. A field dataset, which consists of one month long acceleration data obtained from a long-span cable-stayed bridge in China, is employed to examine the machine learning techniques for automated data anomaly detection. These techniques include the statistic-based pattern recognition network, spectrogram-based convolutional neural network, image-based time history convolutional neural network, image-based time-frequency hybrid convolution neural network (GoogLeNet), and proposed ensemble neural network model. The ensemble model deliberately combines different machine learning models to enhance anomaly classification performance. The results show that all these techniques can successfully detect and classify six types of data anomalies (i.e., missing, minor, outlier, square, trend, drift). Moreover, both image-based time history convolutional neural network and GoogLeNet are further investigated for the capability of autonomous online anomaly classification and found to effectively classify anomalies with decent performance. As seen in comparison with accuracy, the proposed ensemble neural network model outperforms the other three machine learning techniques. This study also evaluates the proposed ensemble neural network model to a blind test dataset. As found in the results, this ensemble model is effective for data anomaly detection and applicable for the signal characteristics changing over time.

Evaluation of Major Taper Equation Models for Developing a Stem Volume Table of Cryptomeria japonica in Jeju Island (제주도 삼나무 수간재적표 개발을 위한 주요 수간곡선식 비교)

  • Hyun-Soo, Kim;Su-Young, Jung;Kwang-Soo, Lee
    • Journal of Environmental Science International
    • /
    • v.31 no.11
    • /
    • pp.941-950
    • /
    • 2022
  • This study was conducted to provide data and stem information to establish a local volume table of Cryptomeria japonica in Jeju Island. Stem analysis was performed on 26 trees by selecting two average trees from each site of the 13 plots of C. japonica stands in 2021 and 2022. During the analysis stage, one outlier tree was rejected, and a total of 260 observations of the specific stem height of 25 trees were used. Of the seven major taper equation models applied for parameter estimation and statistical verification, the Muhairwe 1999 model was found to be the best fit and selected as the optimal model. Stem shape-related estimates were acquired through the selected model, and sectional measurements according to the Smalian formula applied at an interval of 10 cm from the height of the stem were used to develop a volume table. A paired t-test comparison between the C. japonica volume obtained from the present study and those selected from the current yield table by NIFoS(2020), revealed significant differences (p<0.05), highlighting the necessity of a local volume table for C. japonica in Jeju Island.

Performance Comparison of Machine Learning Algorithms for Network Traffic Security in Medical Equipment (의료기기 네트워크 트래픽 보안 관련 머신러닝 알고리즘 성능 비교)

  • Seung Hyoung Ko;Joon Ho Park;Da Woon Wang;Eun Seok Kang;Hyun Wook Han
    • Journal of Information Technology Services
    • /
    • v.22 no.5
    • /
    • pp.99-108
    • /
    • 2023
  • As the computerization of hospitals becomes more advanced, security issues regarding data generated from various medical devices within hospitals are gradually increasing. For example, because hospital data contains a variety of personal information, attempts to attack it have been continuously made. In order to safely protect data from external attacks, each hospital has formed an internal team to continuously monitor whether the computer network is safely protected. However, there are limits to how humans can monitor attacks that occur on networks within hospitals in real time. Recently, artificial intelligence models have shown excellent performance in detecting outliers. In this paper, an experiment was conducted to verify how well an artificial intelligence model classifies normal and abnormal data in network traffic data generated from medical devices. There are several models used for outlier detection, but among them, Random Forest and Tabnet were used. Tabnet is a deep learning algorithm related to receive and classify structured data. Two algorithms were trained using open traffic network data, and the classification accuracy of the model was measured using test data. As a result, the random forest algorithm showed a classification accuracy of 93%, and Tapnet showed a classification accuracy of 99%. Therefore, it is expected that most outliers that may occur in a hospital network can be detected using an excellent algorithm such as Tabnet.

Analysis on Characteristics of Variation in Flood Flow by Changing Order of Probability Weighted Moments (확률가중모멘트의 차수 변화에 따른 홍수량 변동 특성 분석)

  • Maeng, Seung-Jin;Hwang, Ju-Ha
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.10 no.5
    • /
    • pp.1009-1019
    • /
    • 2009
  • In this research, various characteristics of South Korea's design flood have been examined by deriving appropriate design flood, using data obtained from careful observation of actual floods occurring in selected main watersheds of the nation. 19 watersheds were selected for research in Korea. The various characteristics of annual rainfall were analyzed by using a moving average method. The frequency analysis was decided to be performed on the annual maximum flood of succeeding one year as a reference year. For the 19 watersheds, tests of basic statistics, independent, homogeneity, and outlier were calculated per period of annual maximum flood series. By performing a test using the LH-moment ratio diagram and the Kolmogorov-Smirnov (K-S) test, among applied distributions of Gumbel (GUM), Generalized Extreme Value (GEV), Generalized Logistic (GLO) and Generalized Pareto (GPA) distribution was found to be adequate compared with other probability distributions. Parameters of GEV distribution were estimated by L, L1, L2, L3 and L4-moment method based on the change in the order of probability weighted moments. Design floods per watershed and the periods of annual maximum flood series were derived by GEV distribution. According to the result of the analysis performed by using variation rate used in this research, it has been concluded that the time for changing the design conditions to ensure the proper hydraulic structure that considers recent climate changes of the nation brought about by global warming should be around the year 2002.

A Comparative Study on Similarity Measure Techniques for Cross-Project Defect Prediction (교차 프로젝트 결함 예측을 위한 유사도 측정 기법 비교 연구)

  • Ryu, Duksan;Baik, Jongmoon
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.7 no.6
    • /
    • pp.205-220
    • /
    • 2018
  • Software defect prediction is helpful for allocating valuable project resources effectively for software quality assurance activities thanks to focusing on the identified fault-prone modules. If historical data collected within a company is sufficient, a Within-Project Defect Prediction (WPDP) can be utilized for accurate fault-prone module prediction. In case a company does not maintain historical data, it may be helpful to build a classifier towards predicting comprehensible fault prediction based on Cross-Project Defect Prediction (CPDP). Since CPDP employs different project data collected from other organization to build a classifier, the main obstacle to build an accurate classifier is that distributions between source and target projects are not similar. To address the problem, because it is crucial to identify effective similarity measure techniques to obtain high performance for CPDP, In this paper, we aim to identify them. We compare various similarity measure techniques. The effectiveness of similarity weights calculated by those similarity measure techniques are evaluated. The results are verified using the statistical significance test and the effect size test. The results show k-Nearest Neighbor (k-NN), LOcal Correlation Integral (LOCI), and Range methods are the top three performers. The experimental results show that predictive performances using the three methods are comparable to those of WPDP.

Pupil Data Measurement and Social Emotion Inference Technology by using Smart Glasses (스마트 글래스를 활용한 동공 데이터 수집과 사회 감성 추정 기술)

  • Lee, Dong Won;Mun, Sungchul;Park, Sangin;Kim, Hwan-jin;Whang, Mincheol
    • Journal of Broadcast Engineering
    • /
    • v.25 no.6
    • /
    • pp.973-979
    • /
    • 2020
  • This study aims to objectively and quantitatively determine the social emotion of empathy by collecting pupillary response. 52 subjects (26 men and 26 women) voluntarily participated in the experiment. After the measurement of the reference of 30 seconds, the experiment was divided into the task of imitation and spontaneously self-expression. The two subjects were interacted through facial expressions, and the pupil images were recorded. The pupil data was processed through binarization and circular edge detection algorithm, and outlier detection and removal technique was used to reject eye-blinking. The pupil size according to the empathy was confirmed for statistical significance with test of normality and independent sample t-test. Statistical analysis results, the pupil size was significantly different between empathy (M ± SD = 0.050 ± 1.817)) and non-empathy (M ± SD = 1.659 ± 1.514) condition (t(92) = -4.629, p = 0.000). The rule of empathy according to the pupil size was defined through discriminant analysis, and the rule was verified (Estimation accuracy: 75%) new 12 subjects (6 men and 6 women, mean age ± SD = 22.84 ± 1.57 years). The method proposed in this study is non-contact camera technology and is expected to be utilized in various virtual reality with smart glasses.

A Review of Statistical Methods in the Korean Journal of Orthodontics and the American Journal of Orthodontics and Dentofacial Orthopedics (대한치과교정학회지(KJO)와 미국교정학회지(AJODO)에서 사용된 통계기법의 비교분석 및 고찰(1999-2003))

  • Lim, Hoi-Jeong
    • The korean journal of orthodontics
    • /
    • v.34 no.5 s.106
    • /
    • pp.371-379
    • /
    • 2004
  • The purpose of this study was to investigate the changes and types of statistical methods used in the Korean Journal of Orthodontics (KJO) and the American Journal of Orthodontics and Dentofacial Orthopedics (AJODO) from )999 to 2003. The frequency of use, transitions, assumption check of statistical methods and types of advanced statistical methods were examined from each journal. The study consisted of 247 articles published in the KJO and randomly chosen 50 articles per year which were original articles and used statistical methods T-test, analysis of variance(ANOVA), correlation analysis, nonparametric analysis. regression analysis chi-square test. factor analysis, were the order of statistical methods most frequently used in the KJO, while t-test. ANOVA, nonparametric analysis, correlation analysis, regression analysis, chi-square test. factor analysis. were the order of statistical methods used in the AJODO The changes of statistical methods observed in the KJO were not significant $(X^2=17.4\;p=0.5881)$ but the changes observed in the AJODO was seen to be significant $(x^2=42.4,\;p=0.0397)$ Some of the studies examined had overlooked the assumptions of the statistical methods employed. Data investigation such as outlier should be performed before analysis and alternative statistical approaches are applied for a small sample size. Types of advanced statistical methods were factor analysis and discriminant analysis in the KJO and Intention-To-Treat (ITT) analysis in clinical trials through multi-center, survival analysis and Generalized Estimating Equations (GEE) in the AJODO. Appropriate analysis approaches and interpretations should be applied for the correlated and repeated measurements of the orthodontic data set.

Regional Frequency Analysis for Future Precipitation from RCP Scenarios (대표농도경로 시나리오에 의한 미래 강수량의 지역빈도해석)

  • Kim, Duck Hwan;Hong, Seung Jin;Choi, Chang Hyun;Han, Dae Gun;Lee, So Jong;Kim, Hung Soo
    • Journal of Wetlands Research
    • /
    • v.17 no.1
    • /
    • pp.80-90
    • /
    • 2015
  • Variability of precipitation pattern and intensity are increasing due to the urbanization and industrialization which induce increasing impervious area and the climate change. Therefore, more severe urban inundation and flood damage will be occurred by localized heavy precipitation event in the future. In this study, we analyze the future frequency based precipitation under climate change based on the regional frequency analysis. The observed precipitation data from 58 stations provided by Korea Meteorological Administration(KMA) are collected and the data period is more than 30 years. Then the frequency based precipitation for the observed data by regional frequency analysis are estimated. In order to remove the bias from the simulated precipitation by RCP scenarios, the quantile mapping method and outlier test are used. The regional frequency analysis using L-moment method(Hosking and Wallis, 1997) is performed and the future frequency based precipitation for 80, 100, and 200 years of return period are estimated. As a result, future frequency based precipitation in South Korea will be increased by 25 to 27 percent. Especially the result for Jeju Island shows that the increasing rate will be higher than other areas. Severe heavy precipitation could be more and more frequently occurred in the future due to the climate change and the runoff characteristics will be also changed by urbanization, industrialization, and climate change. Therefore, we need prepare flood prevention measures for our flood safety in the future.