• 제목/요약/키워드: kolmogorov-smirnov test

검색결과 207건 처리시간 0.029초

데이터 증가를 통한 선형 모델의 일반화 성능 개량 (중심극한정리를 기반으로) (Improvement of generalization of linear model through data augmentation based on Central Limit Theorem)

  • 황두환
    • 지능정보연구
    • /
    • 제28권2호
    • /
    • pp.19-31
    • /
    • 2022
  • 기계학습 모델 구축 간 트레이닝 데이터를 활용하며, 훈련 간 사용되지 않은 테스트 데이터를 활용하여 모델의 정확도와 일반화 성능을 판단한다. 일반화 성능이 낮은 모델의 경우 새롭게 받아들이게 되는 데이터에 대한 예측 정확도가 현저히 감소하게 되며 이러한 현상을 두고 모델이 과적합 되었다고 한다. 본 연구는 중심극한정리를 기반으로 데이터를 생성 및 기존의 훈련용 데이터와 결합하여 새로운 훈련용 데이터를 구성하고 데이터의 정규성을 증가시킴과 동시에 이를 활용하여 모델의 일반화 성능을 증가시키는 방법에 대한 것이다. 이를 위해 중심극한정리의 성질을 활용해 데이터의 각 특성별로 표본평균 및 표준편차를 활용하여 데이터를 생성하였고, 새로운 훈련용 데이터의 정규성 증가 정도를 파악하기 위하여 Kolmogorov-Smirnov 정규성 검정을 진행한 결과, 새로운 훈련용 데이터가 기존의 데이터에 비해 정규성이 증가하였음을 확인할 수 있었다. 일반화 성능은 훈련용 데이터와 테스트용 데이터에 대한 예측 정확도의 차이를 통해 측정하였다. 새롭게 생성된 데이터를 K-Nearest Neighbors(KNN), Logistic Regression, Linear Discriminant Analysis(LDA)에 적용하여 훈련시키고 일반화 성능 증가정도를 파악한 결과, 비모수(non-parametric) 기법인 KNN과 모델 구성 간 정규성을 가정으로 갖는 LDA의 경우에 대하여 일반화 성능이 향상되었음을 확인할 수 있었다.

에어로빅 운동 프로그램이 노인의 신체적 기능에 미치는 효과 (The Effects of Aerobic Exercise Therapy on Physical Functions in the Elderly)

  • 정숙희;정경희
    • 지역사회간호학회지
    • /
    • 제21권2호
    • /
    • pp.252-262
    • /
    • 2010
  • Purpose: Designed to examine the effects of aerobic exercise therapy on elders' physical functions. Methods: Selected from an elderly welfare center in an agricultural district located in N City. Thirty seven elders were selected in the experimental group and 38 in the control group, and all the subjects aged over 65. Collected data were statistically analyzed by SPSS/PC 12.0 Win. Detailed data analysis methods were Chi-square, Fisher's exact test, Kolmogorov-Smirnov test, t-test, Mann-Whitney u-test, paired t-test, and Wilcoxon's rank sum test. Results: The first hypothesis "The experimental group who had the aerobic exercise therapy will have greater development in lower leg muscular strength compared to the control group" was supported (t=8.95, p<.001). The second hypothesis "Aerobic exercise therapy participants will show greater development in lower leg endurance" was supported (t=6.12, p<.001). The third hypothesis "Aerobic exercise therapy participants will show greater development in flexibility" was supported (U=49.00, p<.001). The forth hypothesis "Aerobic exercise therapy participants will show greater development in balance" was supported (U=322.00, p<.001). Conclusion: The aerobic exercise therapy showed positive effects on physical functions of the elderly in a rural area.

머시닝센터의 고장모드 해석에 관한 연구 (A Study on Failure Mode Analysis of Machining Center)

  • 김봉석;김종수;이수훈;송준엽;박화영
    • 한국정밀공학회지
    • /
    • 제18권6호
    • /
    • pp.74-79
    • /
    • 2001
  • In this study, a failure mode analysis of CNC machining center is described. First, the system is classified through subsystems into components using part lists and drawings. The component failure rate and failure mode analysis are performed to identify the weak components of a machining center with reliability database. The failure probabilistic function of mechanical part is analyzed by Weibull distribution. The Kolmogorov-Smirnov test is also used to verify the goodness of fit.

  • PDF

Classical and Bayesian methods of estimation for power Lindley distribution with application to waiting time data

  • Sharma, Vikas Kumar;Singh, Sanjay Kumar;Singh, Umesh
    • Communications for Statistical Applications and Methods
    • /
    • 제24권3호
    • /
    • pp.193-209
    • /
    • 2017
  • The power Lindley distribution with some of its properties is considered in this article. Maximum likelihood, least squares, maximum product spacings, and Bayes estimators are proposed to estimate all the unknown parameters of the power Lindley distribution. Lindley's approximation and Markov chain Monte Carlo techniques are utilized for Bayesian calculations since posterior distribution cannot be reduced to standard distribution. The performances of the proposed estimators are compared based on simulated samples. The waiting times of research articles to be accepted in statistical journals are fitted to the power Lindley distribution with other competing distributions. Chi-square statistic, Kolmogorov-Smirnov statistic, Akaike information criterion and Bayesian information criterion are used to access goodness-of-fit. It was found that the power Lindley distribution gives a better fit for the data than other distributions.

마산지방 확률강우강도식의 유도 (Derivation of Probable Rainfall Intensity Formula at Masan District)

  • 김지홍;배덕효
    • 한국습지학회지
    • /
    • 제2권1호
    • /
    • pp.49-58
    • /
    • 2000
  • The frequency analysis of annual maximum rainfall data and the derivation of probable rainfall intensity formula at Masan station are performed in this study. Based on the eight different rainfall duration data from 10 minutes to 24 hours, eight types of probability distribution (Gamma, Lognormal, Log-Pearson type III, GEV, Gumbel, Log-Gumbel, Weibull, and Wakeby distributions), three types of parameter estimation scheme (moment, maximum likelihood and probability weighted methods) and three types of goodness-of-fit test (${\chi}^2$, Kolmogorov-Smirnov and Cramer von Mises tests) were considered to find an appropriate probability distribution at Masan station. The Lognormal-2 distribution was selected and the probable rainfall intensity formula was derived by regression analysis. The derived formula can be used for estimating rainfall quantiles of the Masan vicinity areas with convenience and reliability in practice.

  • PDF

Racial and Social Economic Factors Impact on the Cause Specific Survival of Pancreatic Cancer: A SEER Survey

  • Cheung, Rex
    • Asian Pacific Journal of Cancer Prevention
    • /
    • 제14권1호
    • /
    • pp.159-163
    • /
    • 2013
  • Background: This study used Surveillance, Epidemiology and End Results (SEER) pancreatic cancer data to identify predictive models and potential socio-economic disparities in pancreatic cancer outcome. Materials and Methods: For risk modeling, Kaplan Meier method was used for cause specific survival analysis. The Kolmogorov-Smirnov's test was used to compare survival curves. The Cox proportional hazard method was applied for multivariate analysis. The area under the ROC curve was computed for predictors of absolute risk of death, optimized to improve efficiency. Results: This study included 58,747 patients. The mean follow up time (S.D.) was 7.6 (10.6) months. SEER stage and grade were strongly predictive univariates. Sex, race, and three socio-economic factors (county level family income, rural-urban residence status, and county level education attainment) were independent multivariate predictors. Racial and socio-economic factors were associated with about 2% difference in absolute cause specific survival. Conclusions: This study s found significant effects of socio-economic factors on pancreas cancer outcome. These data may generate hypotheses for trials to eliminate these outcome disparities.

공작기계의 신뢰성 평가를 위한 고장 모드 해석에 관한 연구 (A Study on Failure Mode Analysis for Reliability Assesment of Machining Center)

  • 이수훈;김종수;김봉석;송준엽;이승우;박화영;박종권
    • 한국정밀공학회:학술대회논문집
    • /
    • 한국정밀공학회 2000년도 추계학술대회 논문집
    • /
    • pp.1010-1013
    • /
    • 2000
  • In this study, a failure mode analysis of CNC machining center is described. At first, the system is classified through subsystems into components using part lists and drawings. The components failure rate and failure mode analysis are performed to identify the weak components of a machining center with reliability database. The failure probabilistic function of mechanical part is analyzed by Weibull distribution. The Kolmogorov-Smirnov test is also used to verify the goodness of fit.

  • PDF

Prediction of Stand Structure Dynamics for Unthinned Slash Pine Plantations

  • Lee, Young-Jin;Cho, Hyun-Je;Hong, Sung-Cheon
    • The Korean Journal of Ecology
    • /
    • 제23권6호
    • /
    • pp.435-438
    • /
    • 2000
  • Diameter distributions describe forest stand structure information. Prediction equations for percentiles of diameter distribution and parameter recovery procedures for the Weibull distribution function based on four percentile equations were applied to develop prediction system of even-aged slash pine stand structure development in terms of the number of stems per diameter class changes. Four percentiles of the cumulative diameter distribution were predicted as a function of stand characteristics. The predicted diameter distributions were tested against the observed diameter distributions using the Kolmogorov-Smirnov two sample test at the ${\alpha}$=0.05 level. Statistically, no significant differences were detected based on the data from 236 evaluation data sets. This stand level diameter distribution prediction system will be useful in slash pine stand structure modeling and in updating forest inventories for the long-term forest management planning.

  • PDF

구상흑연주철의 피로수명분포에 대한 통계적 해석 (A Statistical Analysis on Fatigue Life Distribution in Spheroidal Graphite Cast Iron)

  • 장성수;김상태
    • 대한기계학회논문집A
    • /
    • 제24권9호
    • /
    • pp.2353-2360
    • /
    • 2000
  • Statistical fatigue properties of metallic materials are increasingly required for reliability design purpose. In this study, static and fatigue tests were conducted and the normal, log-normal, two -parameter Weibull distributions at the 5% significance level are compared using the Kolmogorov-Smirnov goodness-of-fit test. Parameter estimation were compared with experimental results using the maximum likelihood method and least square method. It is found that two-parameter Weibull distribution and maximum likelihood method provide a good fit for static and fatigue life data. Therefore, it is applicable to the static and fatigue life analysis of the spheroidal graphite cast iron. The P-S-N curves were evaluated using log-normal distribution, which showed fatigue life behavior very well.

상수사용량(上水使用量)의 확률분포(確率分布) 특성(特性) (Probability Distribution Characteristics of water Supply Demand)

  • 목동우;현인환
    • 상하수도학회지
    • /
    • 제8권2호
    • /
    • pp.35-42
    • /
    • 1994
  • This study is to analyse probability distribution characteristics of water supply demand. Two cities located near Seoul were selected as study areas. In this study, two probalility distribution types were tested using the K-S(Kolmogorov-Smirnov) method. The K-S method was used to prove the goodness of the selected distribution type. And also, the goodness of maximum day demand to average day demand ratio which was obtained by field data was tested. Conclusions are as follows. 1.Bothl normal distribution type and lognormal distribution type are appropriate as the probalility distribution type for the water supply demand. 2. The probability distribution characteristics can be used to test the goodness of the maximum day to average day demand ratio.

  • PDF