• Title/Summary/Keyword: AIC(Akaike Information Criterion)

Search Result 68, Processing Time 0.032 seconds

Threshold Estimation of Generalized Pareto Distribution Based on Akaike Information Criterion for Accurate Reliability Analysis (정확한 신뢰성 해석을 위한 아카이케 정보척도 기반 일반화파레토 분포의 임계점 추정)

  • Kang, Seunghoon;Lim, Woochul;Cho, Su-Gil;Park, Sanghyun;Lee, Minuk;Choi, Jong-Su;Hong, Sup;Lee, Tae Hee
    • Transactions of the Korean Society of Mechanical Engineers A
    • /
    • v.39 no.2
    • /
    • pp.163-168
    • /
    • 2015
  • In order to perform estimations with high reliability, it is necessary to deal with the tail part of the cumulative distribution function (CDF) in greater detail compared to an overall CDF. The use of a generalized Pareto distribution (GPD) to model the tail part of a CDF is receiving more research attention with the goal of performing estimations with high reliability. Current studies on GPDs focus on ways to determine the appropriate number of sample points and their parameters. However, even if a proper estimation is made, it can be inaccurate as a result of an incorrect threshold value. Therefore, in this paper, a GPD based on the Akaike information criterion (AIC) is proposed to improve the accuracy of the tail model. The proposed method determines an accurate threshold value using the AIC with the overall samples before estimating the GPD over the threshold. To validate the accuracy of the method, its reliability is compared with that obtained using a general GPD model with an empirical CDF.

Minimum Message Length and Classical Methods for Model Selection in Univariate Polynomial Regression

  • Viswanathan, Murlikrishna;Yang, Young-Kyu;WhangBo, Taeg-Keun
    • ETRI Journal
    • /
    • v.27 no.6
    • /
    • pp.747-758
    • /
    • 2005
  • The problem of selection among competing models has been a fundamental issue in statistical data analysis. Good fits to data can be misleading since they can result from properties of the model that have nothing to do with it being a close approximation to the source distribution of interest (for example, overfitting). In this study we focus on the preference among models from a family of polynomial regressors. Three decades of research has spawned a number of plausible techniques for the selection of models, namely, Akaike's Finite Prediction Error (FPE) and Information Criterion (AIC), Schwartz's criterion (SCH), Generalized Cross Validation (GCV), Wallace's Minimum Message Length (MML), Minimum Description Length (MDL), and Vapnik's Structural Risk Minimization (SRM). The fundamental similarity between all these principles is their attempt to define an appropriate balance between the complexity of models and their ability to explain the data. This paper presents an empirical study of the above principles in the context of model selection, where the models under consideration are univariate polynomials. The paper includes a detailed empirical evaluation of the model selection methods on six target functions, with varying sample sizes and added Gaussian noise. The results from the study appear to provide strong evidence in support of the MML- and SRM- based methods over the other standard approaches (FPE, AIC, SCH and GCV).

  • PDF

A Machine Learning Univariate Time series Model for Forecasting COVID-19 Confirmed Cases: A Pilot Study in Botswana

  • Mphale, Ofaletse;Okike, Ezekiel U;Rafifing, Neo
    • International Journal of Computer Science & Network Security
    • /
    • v.22 no.1
    • /
    • pp.225-233
    • /
    • 2022
  • The recent outbreak of corona virus (COVID-19) infectious disease had made its forecasting critical cornerstones in most scientific studies. This study adopts a machine learning based time series model - Auto Regressive Integrated Moving Average (ARIMA) model to forecast COVID-19 confirmed cases in Botswana over 60 days period. Findings of the study show that COVID-19 confirmed cases in Botswana are steadily rising in a steep upward trend with random fluctuations. This trend can also be described effectively using an additive model when scrutinized in Seasonal Trend Decomposition method by Loess. In selecting the best fit ARIMA model, a Grid Search Algorithm was developed with python language and was used to optimize an Akaike Information Criterion (AIC) metric. The best fit ARIMA model was determined at ARIMA (5, 1, 1), which depicted the least AIC score of 3885.091. Results of the study proved that ARIMA model can be useful in generating reliable and volatile forecasts that can used to guide on understanding of the future spread of infectious diseases or pandemics. Most significantly, findings of the study are expected to raise social awareness to disease monitoring institutions and government regulatory bodies where it can be used to support strategic health decisions and initiate policy improvement for better management of the COVID-19 pandemic.

A Study on Extraction of Useful Information from Big dataset of Multi-attributes - Focus on Single Household in Seoul - (다속성 빅데이터로부터 유용한 정보 추출에 관한 연구 - 서울시 1인 가구를 중심으로 -)

  • Choi, Jung-Min;Kim, Kun-Woo
    • Journal of the Korean housing association
    • /
    • v.25 no.4
    • /
    • pp.59-72
    • /
    • 2014
  • This study proposes a data-mining analysis method for examining variable multi-attribute big-data, which is considered to be more applicable in social science using a Correspondence Analysis of variables obtained by AIC model selection. The proposed method was applied on the Seoul Survey from 2005 to 2010 in order to extract interesting rules or patterns on characteristics of single household. The results found as follows. Firstly, this paper illustrated that the proposed method is efficiently able to apply on a big dataset of huge categorical multi attributes variables. Secondly, as a result of Seoul Survey analysis, it has been found that the more dissatisfied with residential environment the higher tendency of residential mobility in single household. Thirdly, it turned out that there are three types of single households based on the characteristics of their demographic characteristics, and it was different from recognition of home and partner of counselling by the three types of single households. Fourthly, this paper extracted eight significant variables with a spatial aggregated dataset which are highly correlated to the ratio of occupancy of single household in 25 Seoul Municipals, and to conclude, it investigated the relation between spatial distribution of single households and their demographic statistics based on the six divided groups obtained by Cluster Analysis.

Reliability-based Design Optimization on Mobility of Deep-seabed Test Miner Using Censored Data of Current Speed (중도절단 해류속도자료를 이용한 심해저 시험집광기의 주행성능에 관한 신뢰성 기반 최적설계)

  • Park, Sanghyun;Cho, Su-Gil;Lim, Woochul;Kim, Saekyeol;Choi, Sung Sik;Lee, Minuk;Choi, Jong-Su;Kim, Hyung-Woo;Lee, Chang-Ho;Hong, Sup;Lee, Tae Hee
    • Ocean and Polar Research
    • /
    • v.36 no.4
    • /
    • pp.487-494
    • /
    • 2014
  • Deep-seabed test miner operated by a self-propelled mining system moving on soft soil is an essential device to secure floating and towing performances. The performances of the tracked vehicle are seriously influenced by noise factors such as the shear strength of the seafloor, bottom current, seafloor slope, speed of tracked vehicle, reaction forces of flexible hose, steering ratio, etc. Due to uncertainties related to noise factors, the design of a deep-sea manganese nodules test miner that satisfies target reliabilities is difficult. Therefore, reliability-based design optimization (RBDO) is required to guarantee system reliability under circumstances where uncertainties related to noise factors prevail. Among noise factors, the bottom current, a bimodal distribution, is censored due to the observation limit of measurement devices. Therefore, estimated distribution of the bottom current is inaccurate without considering these characteristics and the result of RBDO cannot be guaranteed. In this paper, we define censored data as unknown values over the limit of observation. If this data is estimated by using Akaike information criterion (AIC) that cannot consider the characteristics of censored data, the distribution of estimated data cannot guarantee accurate reliability. Therefore, censored AIC that can consider the characteristics of data is used to estimate accurate distribution of the bottom current. Finally, RBDO, under circumstances where uncertainties related to noise factors combined censored data are present, is performed on the mobility of a deep-sea manganese nodules test miner.

Reliability-based Design Optimization for Lower Control Arm using Limited Discrete Information (제한된 이산정보를 이용한 로어컨트롤암의 신뢰성 기반 최적설계)

  • Jang, Junyong;Na, Jongho;Lim, Woochul;Park, Sanghyun;Choi, Sungsik;Kim, Jungho;Kim, Yongsuk;Lee, Tae Hee
    • Transactions of the Korean Society of Automotive Engineers
    • /
    • v.22 no.2
    • /
    • pp.100-106
    • /
    • 2014
  • Lower control arm (LCA) is a part of chassis in automotive. Performances of LCA such as stiffness, durability and permanent displacement must be considered in design optimization. However it is hard to consider different performances at once in optimization because these are measured by different commercial tools like Radioss, Abaqus, etc. In this paper, firstly, we construct the integrated design automation system for LCA based on Matlab including Hypermesh, Radioss and Abaqus. Secondly, Akaike information criterion (AIC) is used for assessment of reliability of LCA. It can find the best estimated distribution of performance from limited and discrete stochastic information and then obtains the reliability from the distribution. Finally, we consider tolerances of design variables and variation of elastic modulus and achieve the target reliability by carrying out reliability-based design optimization (RBDO) with the integrated system.

Determination of the number of sinusoidal frequencies by a new singular value approach (특이값 접근방법에 의한 정현파의 수의 결정에 관한 연구)

  • Ahn, Tae-Chon;Ryu, Chang-Sun;Lee, Dong-Yoon;Whang, Keun-Chan
    • Proceedings of the KIEE Conference
    • /
    • 1989.11a
    • /
    • pp.467-469
    • /
    • 1989
  • A new singular value approach is presented and analized in order to determine the number of multi pie sinsoidal frequencies from the finite noisy data. Simulations are conducted for Akaike's information criterion(AIC), Rissanen's shortest data description(MDL) and a new singular value approach, in covariance matrix based methods. And then performances are compared.

  • PDF

Differences by Selection Method for Exposure Factor Input Distribution for Use in Probabilistic Consumer Exposure Assessment

  • Kang, Sohyun;Kim, Jinho;Lim, Miyoung;Lee, Kiyoung
    • Journal of Environmental Health Sciences
    • /
    • v.48 no.5
    • /
    • pp.266-271
    • /
    • 2022
  • Background: The selection of distributions of input parameters is an important component in probabilistic exposure assessment. Goodness-of-fit (GOF) methods are used to determine the distribution of exposure factors. However, there are no clear guidelines for choosing an appropriate GOF method. Objectives: The outcomes of probabilistic consumer exposure assessment were compared by using five different GOF methods for the selection of input distributions: chi-squared test, Kolmogorov-Smirnov test (K-S), Anderson-Darling test (A-D), Akaike information criterion (AIC) and Bayesian information criterion (BIC). Methods: Individual exposures were estimated based on product usage factor combinations from 10,000 respondents. The distribution of individual exposure was considered as the true value of population exposures. Results: Among the five GOF methods, probabilistic exposure distributions using the A-D and K-S methods were similar to individual exposure estimations. Comparing the 95th percentiles of the probabilistic distributions and the individual estimations for 10 CPs, there were 0.73 to 1.92 times differences for the A-D method, and 0.73 to 1.60 times differences (excluding tire-shine spray) for the K-S method. Conclusions: There were significant differences in exposure assessment results among the selection of the GOF methods. Therefore, the GOF methods for probabilistic consumer exposure assessment should be carefully selected.

Multiphasic Analysis of Growth Curve of Body Weight in Mice

  • Kurnianto, E.;Shinjo, A.;Suga, D.
    • Asian-Australasian Journal of Animal Sciences
    • /
    • v.12 no.3
    • /
    • pp.331-335
    • /
    • 1999
  • The present study describes the analysis of the multiphasic growth function (MGF) to body weight in laboratory and wild mice. Three genetic groups of laboratory mice (Mus musculus domesticus) designated $CF_{{\sharp}1}$, C3H/HeNCrj and C57BL/6NCrj, and a genetic group of Yonakuni wild mice (Mus musculus molossinus yonakuni, Yk) were used. Mean body weights of each genetic group-sex subclass from birth to 69 days of age taken at 3-day intervals were analyzed by a monophasic, diphasic and triphasic functions for describing growth patterns. A comparison among the three functions of the MGF was based on the goodness-of-fit criteria: residual standard deviation (RSD), adjusted R-square (Adj $R^2$) and Akaike's information criterion (AIC). Result of this study indicated that body weight averaged heavier for males than for females. Among the four genetic groups within both sexes, $CF_{{\sharp}1}$ showed the highest, subsequent followed by C3H/HeNCrj, C57BL/6NCrj and Yk. Comparison among the three functions revealed that the triphasic function was the best fit to growth data, with the lowest RSD, the highest Adj $R^2$ and the lowest AIC, for the four genetic groups. For the triphasic function, RSD within each genetic group-sex subclass was similar for males and females. Adj $R^2$ was 0.999 for all genetic group-sex subclasses. AIC for laboratory mice males and females ranged from -70.48 to 66.50 and from -92.81 to -68.64, respectively; whereas for Yk wild mice males was -74.29 and females -78.42.

Comparison of Temperature Indexes for the Impact Assessment of Heat Stress on Heat-Related Mortality

  • Kim, Young-Min;Kim, So-Yeon;Cheong, Hae-Kwan;Kim, Eun-Hye
    • Environmental Analysis Health and Toxicology
    • /
    • v.26
    • /
    • pp.9.1-9.9
    • /
    • 2011
  • Objectives: In order to evaluate which temperature index is the best predictor for the health impact assessment of heat stress in Korea, several indexes were compared. Methods: We adopted temperature, perceived temperature (PT), and apparent temperature (AT), as a heat stress index, and changes in the risk of death for Seoul and Daegu were estimated with $^1{\circ}C$ increases in those temperature indexes using generalized additive model (GAM) adjusted for the non-temperature related factors: time trends, seasonality, and air pollution. The estimated excess mortality and Akaike's Information Criterion (AIC) due to the increased temperature indexes for the $75^{th}$ percentile in the summers from 2001 to 2008 were compared and analyzed to define the best predictor. Results: For Seoul, all-cause mortality presented the highest percent increase (2.99% [95% CI, 2.43 to 3.54%]) in maximum temperature while AIC showed the lowest value when the all-cause daily death counts were fitted with the maximum PT for the $75^{th}$ percentile of summer. For Daegu, all-cause mortality presented the greatest percent increase (3.52% [95% CI, 2.23 to 4.80%]) in minimum temperature and AIC showed the lowest value in maximum temperature. No lag effect was found in the association between temperature and mortality for Seoul, whereas for Daegu one-day lag effect was noted. Conclusions: There was no one temperature measure that was superior to the others in summer. To adopt an appropriate temperature index, regional meteorological characteristics and the disease status of population should be considered.