• 제목/요약/키워드: Generalized Cross-Validation

검색결과 76건 처리시간 0.023초

Minimum Message Length and Classical Methods for Model Selection in Univariate Polynomial Regression

  • Viswanathan, Murlikrishna;Yang, Young-Kyu;WhangBo, Taeg-Keun
    • ETRI Journal
    • /
    • 제27권6호
    • /
    • pp.747-758
    • /
    • 2005
  • The problem of selection among competing models has been a fundamental issue in statistical data analysis. Good fits to data can be misleading since they can result from properties of the model that have nothing to do with it being a close approximation to the source distribution of interest (for example, overfitting). In this study we focus on the preference among models from a family of polynomial regressors. Three decades of research has spawned a number of plausible techniques for the selection of models, namely, Akaike's Finite Prediction Error (FPE) and Information Criterion (AIC), Schwartz's criterion (SCH), Generalized Cross Validation (GCV), Wallace's Minimum Message Length (MML), Minimum Description Length (MDL), and Vapnik's Structural Risk Minimization (SRM). The fundamental similarity between all these principles is their attempt to define an appropriate balance between the complexity of models and their ability to explain the data. This paper presents an empirical study of the above principles in the context of model selection, where the models under consideration are univariate polynomials. The paper includes a detailed empirical evaluation of the model selection methods on six target functions, with varying sample sizes and added Gaussian noise. The results from the study appear to provide strong evidence in support of the MML- and SRM- based methods over the other standard approaches (FPE, AIC, SCH and GCV).

  • PDF

등가음원을 이용한 엔진 방사 소음의 음향 홀로그래피에 대한 연구 (Acoustic holography for an engine radiation noise using equivalent sources)

  • 전인열;이정권
    • 한국소음진동공학회:학술대회논문집
    • /
    • 한국소음진동공학회 2004년도 추계학술대회논문집
    • /
    • pp.1101-1106
    • /
    • 2004
  • This study presents the reconstruction of sound field radiated from an automotive engine using equivalent sources. Basic concept of the method presented is to replace the engine noise source with elementary sources of multipoles, e.g., monopoles and dipoles. The so-called Helmholtz equation least-squares (HELS) method can reconstruct the sound radiation fields from spherical geometries in a series expansion of spherical Hankel functions and spherical harmonics. In this paper, multi-Point, multipole equivalent sources are employed to reconstruct the sound field radiated from an automotive engine with a fixed rotation speed. To ensure and improve the accuracy of reconstruction, the spatial filters of multipole coefficients and wave-vectors are adopted for suppressing the adverse effect of high-order multipoles. Optimal filter shapes are designed with regularization parameters minimizing the generalized cross validation (GCV) function between actual and reproduced model. After regeneration of field pressures using the proposed method as many as necessary, the vibro-acoustic field of an engine could be reconstructed by using the BEM-based near-field acoustic holography (NAH) technique in a cost-effective manner.

  • PDF

Machine learning approaches for wind speed forecasting using long-term monitoring data: a comparative study

  • Ye, X.W.;Ding, Y.;Wan, H.P.
    • Smart Structures and Systems
    • /
    • 제24권6호
    • /
    • pp.733-744
    • /
    • 2019
  • Wind speed forecasting is critical for a variety of engineering tasks, such as wind energy harvesting, scheduling of a wind power system, and dynamic control of structures (e.g., wind turbine, bridge, and building). Wind speed, which has characteristics of random, nonlinear and uncertainty, is difficult to forecast. Nowadays, machine learning approaches (generalized regression neural network (GRNN), back propagation neural network (BPNN), and extreme learning machine (ELM)) are widely used for wind speed forecasting. In this study, two schemes are proposed to improve the forecasting performance of machine learning approaches. One is that optimization algorithms, i.e., cross validation (CV), genetic algorithm (GA), and particle swarm optimization (PSO), are used to automatically find the optimal model parameters. The other is that the combination of different machine learning methods is proposed by finite mixture (FM) method. Specifically, CV-GRNN, GA-BPNN, PSO-ELM belong to optimization algorithm-assisted machine learning approaches, and FM is a hybrid machine learning approach consisting of GRNN, BPNN, and ELM. The effectiveness of these machine learning methods in wind speed forecasting are fully investigated by one-year field monitoring data, and their performance is comprehensively compared.

시간단위 전력수요자료의 함수적 군집분석: 사례연구 (Functional clustering for electricity demand data: A case study)

  • 윤상후;최영진
    • Journal of the Korean Data and Information Science Society
    • /
    • 제26권4호
    • /
    • pp.885-894
    • /
    • 2015
  • 전력시스템의 안정적이고 효과적인 운영을 위해선 전력수요예측이 필요하다. 본 연구에서는 일별전력수요패턴의 시간에 따른 커브를 군집분석 하려고 한다. 2009년 1월 1일부터 2011년 12월 31일까지의 일별 시간단위 전력수요 자료는 추세성분 제거와 로그변환을 통해 계절성분과 오차성분으로 구성된 시계열자료로 변환되었다. 변환된 자료는 Ma 등 (2006)이 제안한 함수적 군집모형을 사용하여 분석되었고, 모수는 EM알고리즘과 일반화교차검정을 통해 추정되었다. 군집의 수는 휴일과 평일을 잘 분류하는 10개로 결정하였다. 분석결과 월요일, 평일 (화요일~금요일), 토요일, 일요일 또는 공휴일과 계절요인으로 전력수요 평균곡선이 설명된다. 함수적 군집분석을 통한 전력수요패턴의 과학적인 분류는 향후 단기전력수요예측에 활용된다.

밭작물 농업기상을 위한 수치형 산림입지토양도 활용성 평가 (Utilization Evaluation of Numerical forest Soil Map to Predict the Weather in Upland Crops)

  • 강다영;황영은;윤상후
    • 한국농림기상학회지
    • /
    • 제23권1호
    • /
    • pp.34-45
    • /
    • 2021
  • 날씨는 밭작물의 가격 측정과 생산량 및 품질에 영향을 미치기 때문에 농산업에서 가장 많이 고려되는 요소이다. 특히, 밭작물의 경우 평지보다 산지에서 재배되는 등 외부 환경에 많이 노출되어 있다. 본 연구는 수치 산림입지토양도를 이용하여 산지를 구성하고 있는 12개의 토양의 특성 자료와 기상정보 간의 연관성을 파악하였다. 공간적 상관관계가 고려된 GAM, 크리깅, RF를 이용하였으며, 연구자료는 2009년 1월부터 2018년 12월까지의 기상청과 농촌진흥청에서 수집한 일 단위 평균기온, 최고기온, 최저기온, 강우량 자료가 사용되었다. 분석결과 지리적 효과만 반영된 GAM이 상대적으로 추정성능이 우수하였고, 산림입지토양도는 밭작물 재배지 기상정보를 추정에 큰 도움이 되지 않았다. 이에 유의수준을 5%로 통계적 가설검정을 수행하여 중요 요인을 선택하였다. 산림입지토양도의 기후대코드(CLZN_CD)와 토양목본코드 B(SIBFLR_LAR)가 기상정보 추정에 상대적 유의미한 요인으로 선정되었다. 기후대코드를 추가한 모형의 경우 일 평균 기온과 일 최저기온의 공간 보간 성능이 향상되었다. 한반도의 국토는 70%가 산지이고 밭작물은 주로 산지에서 재배되고 있다. 따라서 산지의 기상정보를 추가 수집하여 연구를 수행한다면 생육시기별로 밭작물을 관리하는데 도움이 될 것으로 기대한다.

Calpain-10 SNP43 and SNP19 Polymorphisms and Colorectal Cancer: a Matched Case-control Study

  • Hu, Xiao-Qin;Yuan, Ping;Luan, Rong-Sheng;Li, Xiao-Ling;Liu, Wen-Hui;Feng, Fei;Yan, Jin;Yang, Yan-Fang
    • Asian Pacific Journal of Cancer Prevention
    • /
    • 제14권11호
    • /
    • pp.6673-6680
    • /
    • 2013
  • Objective: Insulin resistance (IR) is an established risk factor for colorectal cancer (CRC). Given that CRC and IR physiologically overlap and the calpain-10 gene (CAPN10) is a candidate for IR, we explored the association between CAPN10 and CRC risk. Methods: Blood samples of 400 case-control pairs were genotyped, and the lifestyle and dietary habits of these pairs were recorded and collected. Unconditional logistic regression (LR) was used to assess the effects of CAPN10 SNP43 and SNP19, and environmental factors. Both generalized multifactor dimensionality reduction (GMDR) and the classification and regression tree (CART) were used to test gene-environment interactions for CRC risk. Results: The GA+AA genotype of SNP43 and the Del/Ins+Ins/Ins genotype of SNP19 were marginally related to CRC risk (GA+AA: OR = 1.35, 95% CI = 0.92-1.99; Del/Ins+Ins/Ins: OR = 1.31, 95% CI = 0.84-2.04). Notably, a high-order interaction was consistently identified by GMDR and CART analyses. In GMDR, the four-factor interaction model of SNP43, SNP19, red meat consumption, and smoked meat consumption was the best model, with a maximum cross-validation consistency of 10/10 and testing balance accuracy of 0.61 (P < 0.01). In LR, subjects with high red and smoked meat consumption and two risk genotypes had a 6.17-fold CRC risk (95% CI = 2.44-15.6) relative to that of subjects with low red and smoked meat consumption and null risk genotypes. In CART, individuals with high smoked and red meat consumption, SNP19 Del/Ins+Ins/Ins, and SNP43 GA+AA had higher CRC risk (OR = 4.56, 95%CI = 1.94-10.75) than those with low smoked and red meat consumption. Conclusions: Though the single loci of CAPN10 SNP43 and SNP19 are not enough to significantly increase the CRC susceptibility, the combination of SNP43, SNP19, red meat consumption, and smoked meat consumption is associated with elevated risk.