• Title/Summary/Keyword: Outlier Detection

Search Result 235, Processing Time 0.024 seconds

The Assessing Comparative Study for Statistical Process Control of Software Reliability Model Based on polynomial hazard function (다항 위험함수에 근거한 NHPP 소프트웨어 신뢰모형에 관한 통계적 공정관리 접근방법 비교연구)

  • Kim, Hee-Cheul;Shin, Hyun-Cheul
    • The Journal of Korea Institute of Information, Electronics, and Communication Technology
    • /
    • v.8 no.5
    • /
    • pp.345-353
    • /
    • 2015
  • There are many software reliability models that are based on the times of occurrences of errors in the debugging of software. It is shown that it is possible to do parameter inference for software reliability models based on finite failure model and non-homogeneous Poisson Processes (NHPP). For someone making a decision to market software, the conditional failure rate is an important variables. In this case, finite failure model are used in a wide variety of practical situations. Their use in characterization problems, detection of outlier, linear estimation, study of system reliability, life-testing, survival analysis, data compression and many other fields can be seen from the many study. Statistical process control (SPC) can monitor the forecasting of software failure and thereby contribute significantly to the improvement of software reliability. Control charts are widely used for software process control in the software industry. In this paper, proposed a control mechanism based on NHPP using mean value function of polynomial hazard function.

CUSUM Chart Applied to Monitoring Areal Population Mobility (누적합 관리도를 활용한 생활인구 이상치 탐색)

  • Kim, Hyoung Jun;Sohn, So Young
    • Journal of Korean Society for Quality Management
    • /
    • v.48 no.2
    • /
    • pp.241-256
    • /
    • 2020
  • Purpose: Certain places in Seoul such as Shinchon, Hongdae, and Gangnam, often suffer from sudden overflow of mobile population which can cause serious safety problems. This study suggests the application of spatial CUSUM control chart in monitoring areal population mobility data which is recently provided by Seoul metropolitan government. Methods: Monitoring series of standardized local Moran's I enables one to detect spatio-temporal out-of-control status based on the accumulation of past patterns. Moreover, we visualize such pattern map for more intuitive comprehension of the phenomenon. As a case study, we have analyzed the female mobility population aged 25 to 29 appeared in 51 Jipgyegu near Hongik university on fridays from January, 2017 to June, 2018. They are validated by exploring related articles and through local due diligence. Results: The results of the analysis provide insights in figuring out if the change of the mobility population is short-term by particular incident or long-term by spatial alteration, which allows strategic approach in constructing response system. Specific case near popular downtown near Hongik University has shown that newly opened hotels, shops of global sports brand and franchise bookstores have attracted young female population. Conclusion: We expect that the results of our study contribute to planning effective distribution of administrative resources to prepare against drastic increase in floating population. Furthermore, it can be useful in commercial area analysis and age/gender specific marketing strategy for companies.

Prediction of Chemical Composition in Distillers Dried Grain with Solubles and Corn Using Real-Time Near-Infrared Reflectance Spectroscopy

  • Choi, Sung Won;Park, Chang Hee;Lee, Chang Sug;Kim, Dong Hee;Park, Sung Kwon;Kim, Beob Gyun;Moon, Sang Ho
    • Journal of The Korean Society of Grassland and Forage Science
    • /
    • v.33 no.3
    • /
    • pp.177-184
    • /
    • 2013
  • This work was conducted to assess the use of Near-infrared reflectance spectroscopy (NIRS) as a technique to analyze nutritional constituents of Distillers dried grain with solubles (DDGS) and corn quickly and accurately, and to apply an NIRS-based indium gallium arsenide array detector, rather than a NIRS-based scanning system, to collect spectra and induce and analyze calibration equations using equipment which is better suited to field application. As a technique to induce calibration equations, Partial Least Squares (PLS) was used, and for better accuracy, various mathematical transformations were applied. A multivariate outlier detection method was applied to induce calibration equations, and, as a result, the way of structuring a calibration set significantly affected prediction accuracy. The prediction of nutritional constituents of distillers dried grains with solubles resulted in the following: moisture ($R^2$=0.80), crude protein ($R^2$=0.71), crude fat ($R^2$=0.80), crude fiber ($R^2$=0.32), and crude ash ($R^2$=0.72). All constituents except crude fiber showed good results. The prediction of nutritional constituents of corn resulted in the following: moisture ($R^2$=0.79), crude protein ($R^2$=0.61), crude fat ($R^2$=0.79), crude fiber ($R^2$=0.63), and crude ash ($R^2$=0.75). Therefore, all constituents except for crude fat and crude fiber were predicted for their chemical composition of DDGS and corn through Near-infrared reflectance spectroscopy.

The Assessing Comparative Study for Statistical Process Control of Software Reliability Model Based on Musa-Okumo and Power-law Type (Musa-Okumoto와 Power-law형 NHPP 소프트웨어 신뢰모형에 관한 통계적 공정관리 접근방법 비교연구)

  • Kim, Hee-Cheul
    • The Journal of Korea Institute of Information, Electronics, and Communication Technology
    • /
    • v.8 no.6
    • /
    • pp.483-490
    • /
    • 2015
  • There are many software reliability models that are based on the times of occurrences of errors in the debugging of software. It is shown that it is possible to do likelihood inference for software reliability models based on finite failure model and non-homogeneous Poisson Processes (NHPP). For someone making a decision about when to market software, the conditional failure rate is an important variables. The infinite failure model are used in a wide variety of practical situations. Their use in characterization problems, detection of outlier, linear estimation, study of system reliability, life-testing, survival analysis, data compression and many other fields can be seen from the many study. Statistical process control (SPC) can monitor the forecasting of software failure and thereby contribute significantly to the improvement of software reliability. Control charts are widely used for software process control in the software industry. In this paper, proposed a control mechanism based on NHPP using mean value function of Musa-Okumo and Power law type property.

Time Series Modeling Pipeline for Urban Behavioral Demand Prediction under Uncertainty (COVID-19 사례를 통한 도시 내 비정상적 수요 예측을 위한 시계열 모형 파이프라인 개발 연구)

  • Minsoo Jin;Dongwoo Lee;Youngrok Kim;Hyunsoo Lee
    • The Journal of The Korea Institute of Intelligent Transport Systems
    • /
    • v.22 no.2
    • /
    • pp.80-92
    • /
    • 2023
  • As cities are becoming densely populated, previously unexpected events such as crimes, accidents, and infectious diseases are bound to affect user demands. With a time-series prediction of demand using information with uncertainty, it is impossible to derive reliable results. In particular, the COVID-19 outbreak in early 2020 caused changes in abnormal travel patterns and made it difficult to predict demand for time series. A methodology that accurately predicts demand by detecting and reflecting these changes is, therefore, required. The current study suggests a time series modeling pipeline that automatically detects and predicts abnormal events caused by COVID-19. We expect its wide application in various situations where there is a change in demand due to irregular and abnormal events.

Development of a Framework for Improvement of Sensor Data Quality from Weather Buoys (해양기상부표의 센서 데이터 품질 향상을 위한 프레임워크 개발)

  • Ju-Yong Lee;Jae-Young Lee;Jiwoo Lee;Sangmun Shin;Jun-hyuk Jang;Jun-Hee Han
    • Journal of Korean Society of Industrial and Systems Engineering
    • /
    • v.46 no.3
    • /
    • pp.186-197
    • /
    • 2023
  • In this study, we focus on the improvement of data quality transmitted from a weather buoy that guides a route of ships. The buoy has an Internet-of-Thing (IoT) including sensors to collect meteorological data and the buoy's status, and it also has a wireless communication device to send them to the central database in a ground control center and ships nearby. The time interval of data collected by the sensor is irregular, and fault data is often detected. Therefore, this study provides a framework to improve data quality using machine learning models. The normal data pattern is trained by machine learning models, and the trained models detect the fault data from the collected data set of the sensor and adjust them. For determining fault data, interquartile range (IQR) removes the value outside the outlier, and an NGBoost algorithm removes the data above the upper bound and below the lower bound. The removed data is interpolated using NGBoost or long-short term memory (LSTM) algorithm. The performance of the suggested process is evaluated by actual weather buoy data from Korea to improve the quality of 'AIR_TEMPERATURE' data by using other data from the same buoy. The performance of our proposed framework has been validated through computational experiments based on real-world data, confirming its suitability for practical applications in real-world scenarios.

A Study of the Application of Machine Learning Methods in the Low-GloSea6 Weather Prediction Solution (Low-GloSea6 기상 예측 소프트웨어의 머신러닝 기법 적용 연구)

  • Hye-Sung Park;Ye-Rin, Cho;Dae-Yeong Shin;Eun-Ok Yun;Sung-Wook Chung
    • The Journal of Korea Institute of Information, Electronics, and Communication Technology
    • /
    • v.16 no.5
    • /
    • pp.307-314
    • /
    • 2023
  • As supercomputing and hardware technology advances, climate prediction models are improving. The Korean Meteorological Administration adopted GloSea5 from the UK Met Office and now operates an updated GloSea6 tailored to Korean weather. Universities and research institutions use Low-GloSea6 on smaller servers, improving accessibility and research efficiency. In this paper, profiling Low-GloSea6 on smaller servers identified the tri_sor_dp_dp subroutine in the tri_sor.F90 atmospheric model as a CPU-intensive hotspot. Applying linear regression, a type of machine learning, to this function showed promise. After removing outliers, the linear regression model achieved an RMSE of 2.7665e-08 and an MAE of 1.4958e-08, outperforming Lasso and ElasticNet regression methods. This suggests the potential for machine learning in optimizing identified hotspots during Low-GloSea6 execution.

Reduced Order Modeling of Marine Engine Status by Principal Component Analysis (주성분 분석을 통한 선박 기관 상태의 차수 축소 모델링)

  • Seungbeom Lee;Jeonghwa Seo;Dong-Hwan Kim;Sangmin Han;Kwanwoo Kim;Sungwook Chung;Byeongwoo Yoo
    • Journal of the Society of Naval Architects of Korea
    • /
    • v.61 no.1
    • /
    • pp.8-18
    • /
    • 2024
  • The present study concerns reduced order modeling of a marine diesel engine, which can be used for outlier detection in status monitoring and carbon intensity index calculation. Principal Component Analysis (PCA) is introduced for the reduced order modeling, focusing on the feasibility of detecting and treating nonlinear variables. By cross-correlation, it is found that there are seven non-linear data channels among 23 data channels, i.e., fuel mode, exhaust gas temperature after the turbocharger, and cylinder coolant temperatures. The dataset is handled so that the mean is located at the nominal continuous rating. Polynomial presentation of the dataset is also applied to reflect the linearity between the engine speed and other channels. The first principal mode shows strong effects of linearity of the most data channels to show the linearity of the system. The non-linear variables are effectively explained by other modes. second mode concerns the temperature of the cylinder cooling water, which shows small correlation with other variables. The third and fourth modes correlates the fuel mode and turbocharger exhaust gas temperature, which have inferior linearity to other channels. PCA is proven to be applicable to data given in binary type of fuel mode selection, as well as numerical type data.

Pupil Data Measurement and Social Emotion Inference Technology by using Smart Glasses (스마트 글래스를 활용한 동공 데이터 수집과 사회 감성 추정 기술)

  • Lee, Dong Won;Mun, Sungchul;Park, Sangin;Kim, Hwan-jin;Whang, Mincheol
    • Journal of Broadcast Engineering
    • /
    • v.25 no.6
    • /
    • pp.973-979
    • /
    • 2020
  • This study aims to objectively and quantitatively determine the social emotion of empathy by collecting pupillary response. 52 subjects (26 men and 26 women) voluntarily participated in the experiment. After the measurement of the reference of 30 seconds, the experiment was divided into the task of imitation and spontaneously self-expression. The two subjects were interacted through facial expressions, and the pupil images were recorded. The pupil data was processed through binarization and circular edge detection algorithm, and outlier detection and removal technique was used to reject eye-blinking. The pupil size according to the empathy was confirmed for statistical significance with test of normality and independent sample t-test. Statistical analysis results, the pupil size was significantly different between empathy (M ± SD = 0.050 ± 1.817)) and non-empathy (M ± SD = 1.659 ± 1.514) condition (t(92) = -4.629, p = 0.000). The rule of empathy according to the pupil size was defined through discriminant analysis, and the rule was verified (Estimation accuracy: 75%) new 12 subjects (6 men and 6 women, mean age ± SD = 22.84 ± 1.57 years). The method proposed in this study is non-contact camera technology and is expected to be utilized in various virtual reality with smart glasses.

Body Temperature Monitoring Using Subcutaneously Implanted Thermo-loggers from Holstein Steers

  • Lee, Y.;Bok, J.D.;Lee, H.J.;Lee, H.G.;Kim, D.;Lee, I.;Kang, S.K.;Choi, Y.J.
    • Asian-Australasian Journal of Animal Sciences
    • /
    • v.29 no.2
    • /
    • pp.299-306
    • /
    • 2016
  • Body temperature (BT) monitoring in cattle could be used to early detect fever from infectious disease or physiological events. Various ways to measure BT have been applied at different locations on cattle including rectum, reticulum, milk, subcutis and ear canal. In other to evaluate the temperature stability and reliability of subcutaneous temperature (ST) in highly fluctuating field conditions for continuous BT monitoring, long term ST profiles were collected and analyzed from cattle in autumn/winter and summer season by surgically implanted thermo-logger devices. Purposes of this study were to assess ST in the field condition as a reference BT and to determine any location effect of implantation on ST profile. In results, ST profile in cattle showed a clear circadian rhythm with daily lowest at 05:00 to 07:00 AM and highest around midnight and rather stable temperature readings (mean${\pm}$standard deviation [SD], $37.1^{\circ}C$ to $37.36^{\circ}C{\pm}0.91^{\circ}C$ to $1.02^{\circ}C$). STs are $1.39^{\circ}C$ to $1.65^{\circ}C$ lower than the rectal temperature and sometimes showed an irregular temperature drop below the normal physiologic one: 19.4% or 36.4% of 54,192 readings were below $36.5^{\circ}C$ or $37^{\circ}C$, respectively. Thus, for BT monitoring purposes in a fever-alarming-system, a correction algorithm is necessary to remove the influences of ambient temperature and animal resting behavior especially in winter time. One way to do this is simply discard outlier readings below $36.5^{\circ}C$ or $37^{\circ}C$ resulting in a much improved mean${\pm}$SD of $37.6^{\circ}C{\pm}0.64^{\circ}C$ or $37.8^{\circ}C{\pm}0.55^{\circ}C$, respectively. For location the upper scapula region seems the most reliable and convenient site for implantation of a thermo-sensor tag in terms of relatively low influence by ambient temperature and easy insertion compared to lower scapula or lateral neck.