• Title/Summary/Keyword: Outlier Detection and Prediction

Search Result 20, Processing Time 0.026 seconds

Online condition assessment of high-speed trains based on Bayesian forecasting approach and time series analysis

  • Zhang, Lin-Hao;Wang, You-Wu;Ni, Yi-Qing;Lai, Siu-Kai
    • Smart Structures and Systems
    • /
    • v.21 no.5
    • /
    • pp.705-713
    • /
    • 2018
  • High-speed rail (HSR) has been in operation and development in many countries worldwide. The explosive growth of HSR has posed great challenges for operation safety and ride comfort. Among various technological demands on high-speed trains, vibration is an inevitable problem caused by rail/wheel imperfections, vehicle dynamics, and aerodynamic instability. Ride comfort is a key factor in evaluating the operational performance of high-speed trains. In this study, online monitoring data have been acquired from an in-service high-speed train for condition assessment. The measured dynamic response signals at the floor level of a train cabin are processed by the Sperling operator, in which the ride comfort index sequence is used to identify the train's operation condition. In addition, a novel technique that incorporates salient features of Bayesian inference and time series analysis is proposed for outlier detection and change detection. The Bayesian forecasting approach enables the prediction of conditional probabilities. By integrating the Bayesian forecasting approach with time series analysis, one-step forecasting probability density functions (PDFs) can be obtained before proceeding to the next observation. The change detection is conducted by comparing the current model and the alternative model (whose mean value is shifted by a prescribed offset) to determine which one can well fit the actual observation. When the comparison results indicate that the alternative model performs better, then a potential change is detected. If the current observation is a potential outlier or change, Bayes factor and cumulative Bayes factor are derived for further identification. A significant change, if identified, implies that there is a great alteration in the train operation performance due to defects. In this study, two illustrative cases are provided to demonstrate the performance of the proposed method for condition assessment of high-speed trains.

Design and Analysis of Lightweight Trust Mechanism for Accessing Data in MANETs

  • Kumar, Adarsh;Gopal, Krishna;Aggarwal, Alok
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.8 no.3
    • /
    • pp.1119-1143
    • /
    • 2014
  • Lightweight trust mechanism with lightweight cryptographic primitives has emerged as an important mechanism in resource constraint wireless sensor based mobile devices. In this work, outlier detection in lightweight Mobile Ad-hoc NETworks (MANETs) is extended to create the space of reliable trust cycle with anomaly detection mechanism and minimum energy losses [1]. Further, system is tested against outliers through detection ratios and anomaly scores before incorporating virtual programmable nodes to increase the efficiency. Security in proposed system is verified through ProVerif automated toolkit and mathematical analysis shows that it is strong against bad mouthing and on-off attacks. Performance of proposed technique is analyzed over different MANET routing protocols with variations in number of nodes and it is observed that system provide good amount of throughput with maximum of 20% increase in delay on increase of maximum of 100 nodes. System is reflecting good amount of scalability, optimization of resources and security. Lightweight modeling and policy analysis with lightweight cryptographic primitives shows that the intruders can be detection in few milliseconds without any conflicts in access rights.

Relevancy contemplation in medical data analytics and ranking of feature selection algorithms

  • P. Antony Seba;J. V. Bibal Benifa
    • ETRI Journal
    • /
    • v.45 no.3
    • /
    • pp.448-461
    • /
    • 2023
  • This article performs a detailed data scrutiny on a chronic kidney disease (CKD) dataset to select efficient instances and relevant features. Data relevancy is investigated using feature extraction, hybrid outlier detection, and handling of missing values. Data instances that do not influence the target are removed using data envelopment analysis to enable reduction of rows. Column reduction is achieved by ranking the attributes through feature selection methodologies, namely, extra-trees classifier, recursive feature elimination, chi-squared test, analysis of variance, and mutual information. These methodologies are ranked via Technique for Order of Preference by Similarity to Ideal Solution (TOPSIS) using weight optimization to identify the optimal features for model building from the CKD dataset to facilitate better prediction while diagnosing the severity of the disease. An efficient hybrid ensemble and novel similarity-based classifiers are built using the pruned dataset, and the results are thereafter compared with random forest, AdaBoost, naive Bayes, k-nearest neighbors, and support vector machines. The hybrid ensemble classifier yields a better prediction accuracy of 98.31% for the features selected by extra tree classifier (ETC), which is ranked as the best by TOPSIS.

Statistical Analysis and Prediction for Behaviors of Tracked Vehicle Traveling on Soft Soil Using Response Surface Methodology (반응표면법에 의한 연약지반 차량 거동의 통계적 분석 및 예측)

  • Lee Tae-Hee;Jung Jae-Jun;Hong Sup;Km Hyung-Woo;Choi Jong-Su
    • Journal of Ocean Engineering and Technology
    • /
    • v.20 no.3 s.70
    • /
    • pp.54-60
    • /
    • 2006
  • For optimal design of a deep-sea ocean mining collector system, based on self-propelled mining vehicle, it is imperative to develop and validate the dynamic model of a tracked vehicle traveling on soft deep seabed. The purpose of this paper is to evaluate the fidelity of the dynamic simulation model by means of response surface methodology. Various statistical techniques related to response surface methodology, such as outlier analysis, detection of interaction effect, analysis of variance, inference of the significance of design variables, and global sensitivity analysis, are examined. To obtain a plausible response surface model, maximum entropy sampling is adopted. From statistical analysis and prediction for dynamic responses of the tracked vehicle, conclusions will be drawn about the accuracy of the dynamic model and the performance of the response surface model.

Time Series Modeling Pipeline for Urban Behavioral Demand Prediction under Uncertainty (COVID-19 사례를 통한 도시 내 비정상적 수요 예측을 위한 시계열 모형 파이프라인 개발 연구)

  • Minsoo Jin;Dongwoo Lee;Youngrok Kim;Hyunsoo Lee
    • The Journal of The Korea Institute of Intelligent Transport Systems
    • /
    • v.22 no.2
    • /
    • pp.80-92
    • /
    • 2023
  • As cities are becoming densely populated, previously unexpected events such as crimes, accidents, and infectious diseases are bound to affect user demands. With a time-series prediction of demand using information with uncertainty, it is impossible to derive reliable results. In particular, the COVID-19 outbreak in early 2020 caused changes in abnormal travel patterns and made it difficult to predict demand for time series. A methodology that accurately predicts demand by detecting and reflecting these changes is, therefore, required. The current study suggests a time series modeling pipeline that automatically detects and predicts abnormal events caused by COVID-19. We expect its wide application in various situations where there is a change in demand due to irregular and abnormal events.

Design of Anomaly Detection System Based on Big Data in Internet of Things (빅데이터 기반의 IoT 이상 장애 탐지 시스템 설계)

  • Na, Sung Il;Kim, Hyoung Joong
    • Journal of Digital Contents Society
    • /
    • v.19 no.2
    • /
    • pp.377-383
    • /
    • 2018
  • Internet of Things (IoT) is producing various data as the smart environment comes. The IoT data collection is used as important data to judge systems's status. Therefore, it is important to monitor the anomaly state of the sensor in real-time and to detect anomaly data. However, it is necessary to convert the IoT data into a normalized data structure for anomaly detection because of the variety of data structures and protocols. Thus, we can expect a good quality effect such as accurate analysis data quality and service quality. In this paper, we propose an anomaly detection system based on big data from collected sensor data. The proposed system is applied to ensure anomaly detection and keep data quality. In addition, we applied the machine learning model of support vector machine using anomaly detection based on time-series data. As a result, machine learning using preprocessed data was able to accurately detect and predict anomaly.

Prediction of Chemical Composition in Distillers Dried Grain with Solubles and Corn Using Real-Time Near-Infrared Reflectance Spectroscopy

  • Choi, Sung Won;Park, Chang Hee;Lee, Chang Sug;Kim, Dong Hee;Park, Sung Kwon;Kim, Beob Gyun;Moon, Sang Ho
    • Journal of The Korean Society of Grassland and Forage Science
    • /
    • v.33 no.3
    • /
    • pp.177-184
    • /
    • 2013
  • This work was conducted to assess the use of Near-infrared reflectance spectroscopy (NIRS) as a technique to analyze nutritional constituents of Distillers dried grain with solubles (DDGS) and corn quickly and accurately, and to apply an NIRS-based indium gallium arsenide array detector, rather than a NIRS-based scanning system, to collect spectra and induce and analyze calibration equations using equipment which is better suited to field application. As a technique to induce calibration equations, Partial Least Squares (PLS) was used, and for better accuracy, various mathematical transformations were applied. A multivariate outlier detection method was applied to induce calibration equations, and, as a result, the way of structuring a calibration set significantly affected prediction accuracy. The prediction of nutritional constituents of distillers dried grains with solubles resulted in the following: moisture ($R^2$=0.80), crude protein ($R^2$=0.71), crude fat ($R^2$=0.80), crude fiber ($R^2$=0.32), and crude ash ($R^2$=0.72). All constituents except crude fiber showed good results. The prediction of nutritional constituents of corn resulted in the following: moisture ($R^2$=0.79), crude protein ($R^2$=0.61), crude fat ($R^2$=0.79), crude fiber ($R^2$=0.63), and crude ash ($R^2$=0.75). Therefore, all constituents except for crude fat and crude fiber were predicted for their chemical composition of DDGS and corn through Near-infrared reflectance spectroscopy.

A Comparative Study on Similarity Measure Techniques for Cross-Project Defect Prediction (교차 프로젝트 결함 예측을 위한 유사도 측정 기법 비교 연구)

  • Ryu, Duksan;Baik, Jongmoon
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.7 no.6
    • /
    • pp.205-220
    • /
    • 2018
  • Software defect prediction is helpful for allocating valuable project resources effectively for software quality assurance activities thanks to focusing on the identified fault-prone modules. If historical data collected within a company is sufficient, a Within-Project Defect Prediction (WPDP) can be utilized for accurate fault-prone module prediction. In case a company does not maintain historical data, it may be helpful to build a classifier towards predicting comprehensible fault prediction based on Cross-Project Defect Prediction (CPDP). Since CPDP employs different project data collected from other organization to build a classifier, the main obstacle to build an accurate classifier is that distributions between source and target projects are not similar. To address the problem, because it is crucial to identify effective similarity measure techniques to obtain high performance for CPDP, In this paper, we aim to identify them. We compare various similarity measure techniques. The effectiveness of similarity weights calculated by those similarity measure techniques are evaluated. The results are verified using the statistical significance test and the effect size test. The results show k-Nearest Neighbor (k-NN), LOcal Correlation Integral (LOCI), and Range methods are the top three performers. The experimental results show that predictive performances using the three methods are comparable to those of WPDP.

A Study of the Application of Machine Learning Methods in the Low-GloSea6 Weather Prediction Solution (Low-GloSea6 기상 예측 소프트웨어의 머신러닝 기법 적용 연구)

  • Hye-Sung Park;Ye-Rin, Cho;Dae-Yeong Shin;Eun-Ok Yun;Sung-Wook Chung
    • The Journal of Korea Institute of Information, Electronics, and Communication Technology
    • /
    • v.16 no.5
    • /
    • pp.307-314
    • /
    • 2023
  • As supercomputing and hardware technology advances, climate prediction models are improving. The Korean Meteorological Administration adopted GloSea5 from the UK Met Office and now operates an updated GloSea6 tailored to Korean weather. Universities and research institutions use Low-GloSea6 on smaller servers, improving accessibility and research efficiency. In this paper, profiling Low-GloSea6 on smaller servers identified the tri_sor_dp_dp subroutine in the tri_sor.F90 atmospheric model as a CPU-intensive hotspot. Applying linear regression, a type of machine learning, to this function showed promise. After removing outliers, the linear regression model achieved an RMSE of 2.7665e-08 and an MAE of 1.4958e-08, outperforming Lasso and ElasticNet regression methods. This suggests the potential for machine learning in optimizing identified hotspots during Low-GloSea6 execution.

The Proactive Threat Protection Method from Predicting Resignation Throughout DRM Log Analysis and Monitor (DRM 로그분석을 통한 퇴직 징후 탐지와 보안위협 사전 대응 방법)

  • Hyun, Miboon;Lee, Sangjin
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.26 no.2
    • /
    • pp.369-375
    • /
    • 2016
  • Most companies are willing to spend money on security systems such as DRM, Mail filtering, DLP, USB blocking, etc., for data leakage prevention. However, in many cases, it is difficult that legal team take action for data case because usually the company recognized that after the employee had left. Therefore perceiving one's resignation before the action and building up adequate response process are very important. Throughout analyzing DRM log which records every single file's changes related with user's behavior, the company can predict one's resignation and prevent data leakage before those happen. This study suggests how to prevent for the damage from leaked confidential information throughout building the DRM monitoring process which can predict employee's resignation.